TL;DR
A 47-second query was crippling production during peak hours. The culprit: a missing composite index combined with an implicit type conversion. The fix took 30 seconds to deploy. Finding it took 3 hours of methodical investigation. Here's the full process.
The alert
It was 2:47 PM on a Thursday when the Slack alert came in: "Database CPU at 98%, response times degraded." The monitoring dashboard confirmed it - average response times had jumped from 200ms to 15 seconds, and climbing.
This is the kind of problem that makes your stomach drop. Production is on fire, users are experiencing failures, and somewhere in your application is a query doing something catastrophically wrong.
Step 1: Identify the offending query
First priority: find what's causing the damage. On MySQL, the process list is your starting point:
SHOW FULL PROCESSLIST;
Several connections showed the same query running for 40+ seconds:
SELECT orders.*, customers.name, customers.email
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE orders.status = 'pending'
AND orders.region_code = '12'
AND orders.created_at > '2024-01-01'
ORDER BY orders.created_at DESC
LIMIT 50;
On paper, this looks innocent. A simple join, some filters, a sort, a limit. But it was taking 47 seconds to execute.
Step 2: Understand the data volume
Before diving into EXPLAIN, understand what you're dealing with:
SELECT COUNT(*) FROM orders; -- 12.4 million rows
SELECT COUNT(*) FROM customers; -- 890,000 rows
SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- 2.1 million rows
With 12.4 million orders and 2.1 million pending orders, this wasn't a small dataset. But even so, 47 seconds is inexcusable for a query like this.
Step 3: EXPLAIN Analysis
Time to see what MySQL was actually doing:
EXPLAIN SELECT orders.*, customers.name, customers.email
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE orders.status = 'pending'
AND orders.region_code = '12'
AND orders.created_at > '2024-01-01'
ORDER BY orders.created_at DESC
LIMIT 50;
The results were illuminating:
+----+-------------+-----------+------+---------------+------+---------+------+----------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+----------+-----------------------------+
| 1 | SIMPLE | orders | ALL | customer_id | NULL | NULL | NULL | 12400000 | Using where; Using filesort |
| 1 | SIMPLE | customers | ref | PRIMARY | PRIMARY | 4 | orders.customer_id | 1 | NULL |
+----+-------------+-----------+------+---------------+------+---------+------+----------+-----------------------------+
Red flags everywhere:
- type: ALL - A full table scan on 12.4 million rows
- key: NULL - No index being used for the orders table
- Using filesort - Sorting on disk instead of using an index
Step 4: Investigating the index situation
Let's check what indexes exist:
SHOW INDEX FROM orders;
The table had these indexes:
PRIMARYonidcustomer_idoncustomer_idstatusonstatuscreated_atoncreated_at
Individual indexes existed, but no composite index that matched the query's WHERE clause. MySQL can only use one index per table in a simple query, and none of these individual indexes were selective enough.
Step 5: The hidden problem
But wait - why wasn't MySQL using at least the status index? It should reduce 12.4 million rows to 2.1 million.
SHOW CREATE TABLE orders;
The region_code column was defined as INT, but the query was comparing it to a string:
WHERE orders.region_code = '12' -- String comparison on INT column
This forced MySQL to perform an implicit type conversion on every row, preventing index usage entirely. This is a classic performance killer that's easy to miss.
The Hidden Cost of Type Conversion
When you compare a numeric column to a string value, MySQL converts the column value for every row. This prevents index usage because the index stores the original (numeric) values, not the converted ones.
Step 6: The fix
Two changes were needed:
1. Fix the type mismatch
In the application code:
// Before: String comparison
$orders = Order::where('region_code', '12')->get();
// After: Integer comparison
$orders = Order::where('region_code', 12)->get();
2. Add a composite index
CREATE INDEX idx_orders_status_region_created
ON orders (status, region_code, created_at DESC);
The index order matters. We put status first because it's used in an equality comparison, then region_code (also equality), then created_at for the range query and sorting.
Step 7: Verify the fix
EXPLAIN SELECT orders.*, customers.name, customers.email
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE orders.status = 'pending'
AND orders.region_code = 12 -- Now an integer
AND orders.created_at > '2024-01-01'
ORDER BY orders.created_at DESC
LIMIT 50;
New results:
+----+-------------+-----------+-------+-----------------------------------+-----------------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+-----------------------------------+-----------------------------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | range | idx_orders_status_region_created | idx_orders_status_region_created | 12 | NULL | 847 | Using where |
| 1 | SIMPLE | customers | ref | PRIMARY | PRIMARY | 4 | orders.customer_id | 1 | NULL |
+----+-------------+-----------+-------+-----------------------------------+-----------------------------------+---------+------+------+-------------+
The improvements:
- type: range instead of ALL - Using an index range scan
- rows: 847 instead of 12,400,000 - Only scanning matching rows
- No filesort - The index handles ordering
Query time: 47 seconds to 23 milliseconds.
Lessons learned
1. Type consistency matters
Always ensure your comparison types match your column types. This is easy to miss, especially when values come from user input (which is usually strings) or configuration.
// Laravel: Cast route parameters
Route::get('/orders/{region}', function (int $region) {
return Order::where('region_code', $region)->get();
});
2. Composite indexes beat multiple single-column indexes
MySQL can only use one index per table (with some exceptions for index merging). A composite index that matches your common query patterns will almost always outperform multiple single-column indexes.
3. EXPLAIN before you deploy
Make EXPLAIN analysis part of your code review process. Any new query that touches large tables should be analyzed before it hits production.
// Enable query logging in development
DB::enableQueryLog();
// Run your operations
$orders = Order::where(...)->get();
// Check what ran
dd(DB::getQueryLog());
4. Monitor slow queries continuously
Enable MySQL's slow query log:
# my.cnf
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1 # Log queries taking longer than 1 second
The full investigation toolkit
For future reference, here's my go-to toolkit for query debugging:
-- Current running queries
SHOW FULL PROCESSLIST;
-- Query execution plan
EXPLAIN SELECT ...;
EXPLAIN ANALYZE SELECT ...; -- MySQL 8.0.18+
-- Table indexes
SHOW INDEX FROM table_name;
-- Table statistics
SHOW TABLE STATUS LIKE 'table_name';
-- Index usage statistics (MySQL 8.0+)
SELECT * FROM sys.schema_index_statistics
WHERE table_name = 'orders';
-- Unused indexes
SELECT * FROM sys.schema_unused_indexes;
Prevention is better than cure
After this incident, we implemented several preventive measures:
- Query analysis in CI - EXPLAIN runs automatically for new queries
- Slow query alerts - Immediate notification for queries over 5 seconds
- Type casting standards - Explicit casting in all database interactions
- Index review sessions - Monthly review of query patterns vs. existing indexes
The 47-second query was a painful lesson, but it led to a more robust approach to database performance across the entire application.
Struggling with slow database queries? I've optimized databases handling billions of rows. Let's talk.