TL;DR
Laravel's queue system can silently drop failed jobs under certain conditions. Implement proper monitoring, set appropriate retry limits, and always have a dead letter queue strategy.
The problem: silent failures
Last month, I was debugging a client's order processing system. They had a classic setup: user places order, job gets dispatched to process payment and send confirmation. Queue workers running via Supervisor. Horizon dashboard showing everything green.
Except customers weren't getting their confirmation emails. Some orders weren't being processed at all. And the monitoring said everything was fine.
Here's what was happening: their jobs were failing, but not in a way that Laravel would catch.
How jobs can fail silently
Laravel's queue system is robust, but it has some assumptions baked in that can bite you:
1. Timeout deaths
When a job exceeds its timeout, the worker process kills it. But depending on your configuration, this might not trigger the failed() method on your job:
// This job will be killed after 60 seconds
class ProcessOrder implements ShouldQueue
{
public $timeout = 60;
public function handle()
{
// If this takes > 60 seconds, the job dies
// But failed() might never be called
$this->processPayment();
}
public function failed(Throwable $e)
{
// This might not be called on timeout!
Log::error('Order failed', ['order' => $this->order->id]);
}
}
2. Memory limit kills
Same issue with memory. If your worker hits the memory limit, PHP dies. No exception, no failed handler, nothing.
3. The "processed" lie
Horizon and other monitoring tools typically track:
- Pending: Jobs in the queue
- Completed: Jobs that finished without throwing an exception
- Failed: Jobs that threw an exception and exceeded retry attempts
But "completed" doesn't mean "successful." A job can complete with partial work done, with silent errors logged, or with business logic failures that don't throw exceptions.
The solution: trust but verify
1. Implement job-level verification
Don't rely on the queue system to tell you if a job succeeded. Track it yourself:
class ProcessOrder implements ShouldQueue
{
public function handle()
{
$order = $this->order;
// Mark as processing
$order->update(['processing_status' => 'processing']);
try {
$this->processPayment();
$this->sendConfirmation();
// Mark as completed
$order->update([
'processing_status' => 'completed',
'processed_at' => now(),
]);
} catch (Throwable $e) {
$order->update(['processing_status' => 'failed']);
throw $e;
}
}
}
2. Add a watchdog query
Run a scheduled command that looks for orphaned jobs:
// In your scheduler
$schedule->command('orders:check-stuck')->everyFiveMinutes();
// The command
$stuckOrders = Order::where('processing_status', 'processing')
->where('updated_at', '<', now()->subMinutes(10))
->get();
foreach ($stuckOrders as $order) {
Log::warning('Stuck order detected', ['order' => $order->id]);
// Alert, retry, or manual intervention
}
3. Set sensible retry limits
Don't let jobs retry forever. Set explicit limits and handle the failure case:
class ProcessOrder implements ShouldQueue
{
public $tries = 3;
public $backoff = [60, 300, 900]; // 1min, 5min, 15min
public function failed(Throwable $e)
{
// This WILL be called after all retries exhausted
$this->order->update(['processing_status' => 'failed_permanently']);
// Notify someone
Notification::send(
User::admins()->get(),
new OrderFailedNotification($this->order, $e)
);
}
}
The bigger picture
Queue monitoring isn't just about watching Horizon. It's about having multiple layers of verification:
- Application-level tracking: Track job state in your database
- Watchdog processes: Actively look for stuck or orphaned work
- Business metrics: Monitor outcomes, not just job counts
- Alerting: Get notified when things go wrong, not when you check the dashboard
Takeaway
The queue is a tool, not a guarantee. Trust the system to do its job, but verify that your business logic actually completed successfully. The few minutes spent implementing proper tracking will save you hours of debugging silent failures.
Need help with queue architecture or debugging silent failures? Let's talk.