This article is also available as a TechRepublic download, which includes the code listings in a more manageable text file format.
Any database programmer will tell you that in high-traffic database-driven applications, a single badly-designed SQL query can significantly impact the overall performance of your application. Not only does such a query consume more database time than it ideally should, but it can have an exponential effect on the performance of other application components.
Optimizing query performance is as much a black art as a science, as heavily dependent on the developer’s intuition as on hard statistical performance data. Fortunately, databases likes MySQLcome with some tools to aid the process, and this article discusses three of them briefly: using indexes, analyzing queries with EXPLAIN, and adjusting MySQL’s internal configuration.
#1: Using indexes
MySQL allows you to index database tables, making it possible to quickly seek to records without performing a full table scan first and thus significantly speeding up query execution. You can have up to 16 indexes per table, and MySQL also supports multi-column indexes and full-text search indexes.
Adding an index to a table is as simple as calling the CREATE INDEX command and specifying the field(s) to index. Listing A shows you an example:
Listing A
mysql> CREATE INDEX idx_username ON users(username);
Query OK, 1 row affected (0.15 sec)
Records: 1 Duplicates: 0 Warnings: 0
Here, indexing the username field of the users table ensures that SELECT queries which reference this field in their WHERE or HAVING clause will run a little faster than in the pre-indexed state. You can check that the index was created (Listing B) with the SHOW INDEXcommand:
Listing B
mysql> SHOW INDEX FROM users;
————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
————–+————-+———–+————-+———-+——–+——+————+———+
| users | 1 | idx_username | 1 | username | A | NULL | NULL | NULL | YES | BTREE | |
————–+————-+———–+————-+———-+——–+——+————+———+
1 row in set (0.00 sec)
It’s important to note that indexes are a double-edged sword. Indexing every field of a table is usually unnecessary, and is quite likely to slow things down significantly when inserting or updating data because of the additional work MySQL has to do to rebuild the index each time. On the other hand, avoiding indexes altogether isn’t such a great idea either, because while this will speed up INSERTs, it will cause SELECT operations to slow down. There is thus always a trade-off to be made, and it’s wise to consider what the primary function of the table will be (data retrieval or data edit) when designing the indexing system.
#2: Optimizing query performance
When analyzing query performance, it’s also useful to consider the EXPLAIN keyword. This keyword, when placed in front of a SELECT query, describes how MySQL intends to execute the query and the number of rows it will need to process to successfully deliver a result set. To illustrate, consider the following simple example (Listing C):
Listing C
mysql> EXPLAIN SELECT city.name, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND’;
+—-+————-+———+——-+—————+———+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——-+—————+———+———+——-+——+————-+
| 1 | SIMPLE | country | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | city | ALL | NULL | NULL | NULL | NULL | 4079 | Using where |
+—-+————-+———+——-+—————+———+———+——-+——+————-+
2 rows in set (0.00 sec)
Here, the query is structured as a join between two tables and the EXPLAIN keyword describes how MySQL will process the join. It should be clear the current design will require MySQL to process only one record in the country table (which is indexed) but all 4079 records in the citytable (which isn’t). This then suggests scope for improvement using other optimization tricks – for example, adding an index to the city table as follows (Listing D):
Listing D
mysql> CREATE INDEX idx_ccode ON city(countrycode);
Query OK, 4079 rows affected (0.15 sec)
Records: 4079 Duplicates: 0 Warnings: 0
And now, when you re-run the query with EXPLAIN, you’ll see a noticeable improvement (Listing E):
Listing E
mysql> EXPLAIN SELECT city.name, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND’;
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
| 1 | SIMPLE | country | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | city | ref | idx_ccode | idx_ccode | 3 | const | 333 | Using where |
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
2 rows in set (0.01 sec)
As this illustrates, MySQL now only needs to scan 333 records in the city table to produce a result set — a reduction of almost 90 percent! Naturally, this translates into faster query execution time and more efficient usage of database resources.
#3: Adjusting internal variables
MySQL is so open that it’s fairly easy to further fine-tune its default settings to obtain greater performance and stability. Some of the key variables that should be optimized are listed below.
- Altering Index Buffer Size (key_buffer)
This variable controls the size of the buffer used when handling table indices (both read and write operations). The MySQL manual recommends that this variable be increased "to as much as you can afford" to ensure you get the best performance on indexed tables, and recommends a value equivalent to about 25 percent of the total system memory. This is one of the more important MySQL configuration variables and if you’re interested in optimizing and improving performance, trying different values for the key_buffer_size variable is a good place to start. - Altering Table Buffer Size (read_buffer_size)
When a query requires a table to be scanned sequentially, MySQL allocates a memory buffer to this query. The read_buffer_size variable controls the size of this buffer. If you find that sequential scans are proceeding slowly, you can improve performance by increasing this value, and hence the size of the memory buffer. - Setting The Number Of Maximum Open Tables (table_cache)
This variable controls the maximum number of tables MySQL can have open at any one time, and thus controls the server’s ability to respond to incoming requests. This variable is closely related to the max_connections variables — increasing this value allows MySQL to keep a larger number of tables open, just as increasing max_connections increases the number of allowed connections. Consider altering this value if you have a high-volume server which receives queries on multiple different databases and tables. - Deciding A Time Limit For Long Queries (long_query_time)
MySQL comes with a so-called "slow query log", which automatically logs all queries that do not end within a particular time limit. This log is useful to track inefficient or misbehaving queries, and to find targets for optimization algorithms. The long_query_time variable controls this maximum time limit, in seconds.
The previous discussion should give you some insight into three tools you can use to analyze and optimize your SQL queries, and help you squeeze better performance out of your application. Go on and try them out — and happy optimizing!
http://www.techrepublic.com/article/three-easy-ways-to-optimize-your-mysql-queries/6137168