SQL Performance Tuning: From Slow Queries to Index Decisions

Learn how to analyze slow SQL queries, read execution plans, and design effective indexes. Includes a full worked example and common pitfalls.

Introduction

Slow SQL queries are a fact of life in production. A query that takes 93 ms today might suddenly degrade to 15 seconds tomorrow—not because the SQL changed, but because data distribution shifted or the optimizer chose a bad plan. The root cause is almost always the same: the database is reading far more rows than necessary.

This article walks through a complete performance tuning workflow: identifying a slow query, reading its execution plan, diagnosing the missing index, designing the right composite index, and verifying the improvement. You'll learn the principles that apply to any relational database, with concrete examples you can adapt to your own systems.

database server rack with blinking LEDs

Why Queries Go Slow

A query is slow when the database must scan a large portion of a table to return a small result set. The classic symptom: the application requests 20 rows, but the database reads 1.2 million rows to find them. This mismatch happens when:

No suitable index exists for the WHERE and ORDER BY clauses.
The optimizer chooses a full table scan because it estimates (incorrectly) that scanning is cheaper than using an index.
Data skew causes the optimizer's row count estimates to be wildly off.

In the first case, the fix is to add an index. In the second, you may need to update statistics or hint the query. In the third, you might need to rewrite the query or use a different join strategy.

The Execution Plan: Your First Diagnostic Tool

Before adding any index, you must understand what the database is actually doing. The execution plan shows the access methods, join order, and row estimates for each step.

Key Columns in EXPLAIN (MySQL)

Column	What to Look For
`type`	Aim for `const`, `eq_ref`, `ref`, or `range`. Avoid `index` and `ALL` (full table scan).
`key`	Which index is used. `NULL` means no index.
`rows`	Estimated rows scanned. Lower is better.
`Extra`	`Using filesort` or `Using temporary` are red flags. `Using index` (covering index) is ideal.

Example Plan (PostgreSQL)

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM benefit_cert_task 
WHERE region_code = '330101' 
  AND biz_status = 'pending' 
  AND risk_level = 'high' 
  AND insurance_type = 'pension' 
  AND cert_deadline BETWEEN '2026-06-01' AND '2026-06-08' 
ORDER BY cert_deadline ASC, updated_at DESC 
LIMIT 20;

Output (before index):

Limit  (cost=0.00..12345.67 rows=20 width=200)
  ->  Gather Merge  (cost=0.00..12345.67 rows=20 width=200)
        Workers Planned: 2
        ->  Sort  (cost=... rows=... width=...)
              Sort Key: cert_deadline, updated_at DESC
              ->  Parallel Seq Scan on benefit_cert_task  (cost=0.00..12000.00 rows=1200000 width=200)
                    Filter: ((region_code = '330101') AND ...)
                    Rows Removed by Filter: 399333

Notice the Parallel Seq Scan on a 1.2M-row table, and the Rows Removed by Filter—the database scanned nearly 400K rows just to discard them. The execution time was 93 ms.

Index Design Principles

A well-designed index can turn a full table scan into a handful of index lookups. Follow these rules:

Leftmost prefix: For a composite index (a, b, c), queries must reference a first to use the index.
Equality before range: Place columns with = conditions before columns with range conditions (>, <, BETWEEN, LIKE 'abc%').
Cover the sort: Include ORDER BY columns at the end of the index to avoid an extra sort (Using filesort).
Cover the query: Use INCLUDE (PostgreSQL) or include all selected columns in a covering index to avoid table lookups.

Worked Example: From 93ms to 8ms

Scenario

A benefits certification system has a task table with 1.2 million rows. The application needs the top 20 high-risk pending tasks for a specific region and insurance type, sorted by deadline.

Before Optimization

Existing index: only a primary key on task_id.

Execution time: 93 ms (parallel seq scan).

Diagnosis

The execution plan shows a full table scan. The WHERE clause has four equality conditions (region_code, biz_status, risk_level, insurance_type) and one range condition (cert_deadline). The ORDER BY uses cert_deadline ASC, updated_at DESC.

Recommended Index

CREATE INDEX idx_filter_sort ON benefit_cert_task 
  (region_code, biz_status, risk_level, insurance_type, cert_deadline, updated_at DESC)
INCLUDE (other_returned_columns);

Key points:

Equality columns come first.
Range column (cert_deadline) follows.
Sort column (updated_at) is included with DESC to match the query.
INCLUDE columns avoid a separate table lookup.

After Optimization

Execution plan:

Limit  (cost=0.00..10.00 rows=20 width=200)
  ->  Index Scan using idx_filter_sort on benefit_cert_task  (cost=0.00..10.00 rows=20 width=200)
        Index Cond: ((region_code = '330101') AND ...)

Execution time: 8 ms (index scan).

The improvement is 11.4×, and the buffer reads dropped from 27,112 to 2,031, confirming the access path changed, not just caching.

Common Pitfalls

Adding too many single-column indexes: A composite index is almost always better than multiple single-column indexes. The database can only use one index per table per query (except index merge, which is rare).
Ignoring data skew: If one value (e.g., a power user) has far more rows than others, the optimizer may choose a full scan even with an index. Consider partitioning or query rewriting.
Forgetting to update statistics: After adding an index, run ANALYZE (PostgreSQL) or OPTIMIZE TABLE (MySQL) to give the optimizer accurate information.
Using functions on indexed columns: WHERE YEAR(create_time) = 2026 will never use an index on create_time. Rewrite as WHERE create_time >= '2026-01-01' AND create_time < '2027-01-01'.
Not verifying with production data: Test the index with realistic data volumes and distributions. A 1,000-row test table will never reveal a bad plan.

FAQ

How do I find slow queries in production?

Enable the slow query log (slow_query_log in MySQL, log_min_duration_statement in PostgreSQL). Set a threshold (e.g., 1 second) and periodically review the log. Tools like pt-query-digest (Percona Toolkit) can summarize repeated slow queries.

Should I always add an index when a query is slow?

No. First, check if the query can be rewritten (e.g., avoid functions, use EXISTS instead of IN). Second, verify that the index will be used by checking the execution plan. Third, consider the write overhead—each new index slows down INSERT/UPDATE/DELETE. For read-heavy tables, indexes are cheap; for write-heavy tables, be selective.

What is a covering index?

A covering index contains all columns needed by the query, so the database never has to look up the actual table row. In MySQL, you see Using index in Extra. In PostgreSQL, use INCLUDE to add non-key columns without bloating the B-tree.

How do I handle `Using filesort`?

Using filesort means MySQL is sorting rows in memory or on disk after fetching them. To eliminate it, add the sort columns to the index in the same order as the ORDER BY clause. If the sort direction differs (ASC vs DESC), some databases (PostgreSQL, MySQL 8.0+) support mixed-direction indexes.

What if the optimizer still chooses a bad plan after adding an index?

Update table statistics (ANALYZE). If that doesn't help, you can force an index using hints like FORCE INDEX (MySQL) or SET enable_seqscan = off (PostgreSQL) for testing. The real fix is often to rewrite the query or adjust the index to better match the query's predicates.

Try our SQL formatter to keep your queries readable and maintainable.

页面加载失败