Query Optimization Challenge

StopwatchEvery few months at tilllate we play the query optimization game. At this game I use the slow query log to find out those queries the most load on the servers.

With the queries I found I then either: optimize the query or cache the results to avoid the query.

I prefer the former because caching means data duplication. Which is not very DRY.

Optimization impossible?

There’s one type of query that’s particularly tough to optimize. It’s the “Top-N” query. Example: On the homepage of tilllate we’d like to show the 10 most recent photo reports.

All photo reports are being stored in a table photoreports. Indexes properly set.

mysql> SELECT * FROM photoreports LIMIT 10000,5;
+-----+------------+---------------------+------------+
| id  | date       | name                | country_id |
+-----+------------+---------------------+------------+
| 3376| 2006-04-21 | Einweihungsapéro Wer| 1          |
| 1878| 2007-05-24 | Le Ho Fada          | 30         |
| 1879| 2007-05-25 | Le Ho Fada          | 30         |
| 1928| 2007-05-27 | All You Can Drink! V| 3          |
| 9803| 2004-02-21 | Go Tonight          | 3          |
+-----+------------+---------------------+------------+

5 rows in set (0.02 sec)

mysql> SHOW INDEXES FROM photoreports;
+------------+-------------+-------------+
| Key_name   | Column_name | Cardinality |
+------------+-------------+-------------+
| PRIMARY    | id          | 415230      |
| country_id | country_id  | 240         |
| date       | date        | 7414        |
+------------+-------------+-------------+
3 rows in set (0.00 sec)

To get the 10 most recent photoreports we use the following query:

SELECT date, name
FROM photoreports
WHERE country_id=3
ORDER BY date DESC
LIMIT 10

Explaining the select

But, oh boy, EXPLAIN SELECT shows this query is not efficient at all:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: photoreports
         type: ref
possible_keys: country_id
          key: country_id
      key_len: 5
          ref: const
         rows: 7551
        Extra: Using where; Using filesort

Using filesort. Disk access! Even though all indexes are being properly set. But since MySQL can only use one index per table it will use the one on the country_id and leave no index left for the ordering.

Is there a way to avoid the ugly filesort? – I don’t know. Any ideas are being appreciated.


Update: 11.11.2007

Oops. The solution is actually so simple I did not even consider it *blush*. Event though I wrote about it a year ago.

Using a combined Index the WHERE-clause is using the `country_id` part of the key, the ORDER BY-Clause is using the `date` part.

mysql> alter table photoreports add index (country_id, date);
Query OK, 415230 rows affected (9.92 sec)
Records: 415230  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT date, name FROM photoreports
WHERE country_id=3 ORDER BY date DESC LIMIT 10 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: photoreports
         type: ref
possible_keys: country_id,country_id_2
          key: country_id_2
      key_len: 5
          ref: const
         rows: 7551
        Extra: Using where
1 row in set (0.01 sec)

I think I should call this blog entry: Never look too far for the solution. OR: Performance optimization is something I should leave up to the techies. The CTO’s job is to keep the techies motivated. 🙂

This entry was posted in Database. Bookmark the permalink.

3 Responses to Query Optimization Challenge

  1. Nitek says:

    Why not using a multi-column-index over country_id AND date?
    A photoreport-table doesn’t sound like its changing too often for an additional index..

  2. Martin says:

    Hi, did you try a multi-column index?

    alter table photoreports add index (country_id, date);

  3. Thanks guys… I just updated the entry… 🙂