Loading MyISAM Tables Into MemoryPage last updated on 2011 / 16 / 08
Adding indexes to MyISAM tables usually help (and sometimes hinder) performance allowing MySQL to sift through smaller amounts of data. Regardless of whether indexes have been made, data typically still has to be read from storage (typically a hard disk drive). Disks are the main bottleneck in database applications today, though faster solid state drives are combating this issue.
For regularly accessed indexes, you may want to consider the LOAD INDEX INTO CACHE statement MySQL Manual, that will load indexes of a particular table from disk to memory. Although the manual states that you can choose particular indexes of a table, the current behaviour is for MySQL to load all indexes of a table into memory.
This strategy makes use of the key buffer (MySQL Manual) which stores the most frequently accessed index keys into memory. LOAD INDEX INTO CACHE effectively loads all keys into memory so lookups are fast from the start. To ensure that the command to load indexes into memory works, you should ensure that the key buffer variable in your my.cnf file is large enough to fit the index into memory.
For an example of this in action, check out the storing MD5 values article which puts the index into memory, an adaptation of it is below so you can try it for yourself.
For this benchmark, creating the table `md5values16` is suffice for testing
- CREATE TABLE IF NOT EXISTS `md5values16`
- `hash` BINARY(16) NOT NULL,
- UNIQUE KEY `hash` (`hash`)
The following script will populate the table with 1,000,000 values. It should only take a few seconds to process.
- // Filling the table with 100,000 values
- for($i = 1;$i < 1000000;$i+= 10000)
- echo $i,' ';
- $array = array();
- for($j = $i;$j < $i + 10000;$j++)
- $array = '(UNHEX(MD5(\''.$j.'\')))';
- mysql_query('INSERT INTO md5values16 VALUES '.implode(',',$array)) or die(mysql_error());
Now, the benchmark script. Note that with MyISAM tables, your Operating System will also play a role in caching data, which can sometimes lead to unusual or unpredicted results. The data I created above was added straight after restarting the MySQL server to ensure the data had not been 'touched' yet.
- // Benchmark tables
- function microtime_float()
- list($usec, $sec) = explode(" ", microtime());
- return ((float)$usec + (float)$sec);
- function perform_benchmark()
- for($pass = 1;$pass <= 5;$pass++)
- $time_start = microtime_float();
- for($i = 0;$i < 50000;$i++)
- $rand = rand(1,100000);
- mysql_query('SELECT SQL_NO_CACHE hash FROM md5values16 WHERE hash = UNHEX(MD5(\''.$rand.'\'))');
- echo "Pass $pass took\t".(microtime_float() - $time_start)." seconds\n";
- mysql_query('SET @@global.key_buffer_size = 32768;') or die(mysql_error());
- echo "Performing 5 passes with tiny key buffer\n";
- mysql_query('SET @@global.key_buffer_size = 8388608;') or die(mysql_error());
- echo "Performing 5 passes with 8MB key buffer\n";
- mysql_query('SET @@global.key_buffer_size = 0;') or die(mysql_error());
- mysql_query('SET @@global.key_buffer_size = 67108864;') or die(mysql_error());
- mysql_query('LOAD INDEX INTO CACHE md5values16;') or die(mysql_error());
- echo "Performing 5 passes with whole index in key buffer (64MB)\n";
Performing 5 passes with tiny key buffer
- Pass 1 took 9.7664041519165 seconds
- Pass 2 took 9.0273330211639 seconds
- Pass 3 took 9.7535579204559 seconds
- Pass 4 took 9.2537310123444 seconds
- Pass 5 took 9.0727047920227 seconds
Performing 5 passes with 8MB key buffer
- Pass 1 took 9.414717912674 seconds
- Pass 2 took 8.8966379165649 seconds
- Pass 3 took 8.0092940330505 seconds
- Pass 4 took 7.690486907959 seconds
- Pass 5 took 7.0581459999084 seconds
Performing 5 passes with whole index in key buffer (64MB)
- Pass 1 took 5.8024401664734 seconds
- Pass 2 took 5.6130590438843 seconds
- Pass 3 took 5.5719788074493 seconds
- Pass 4 took 5.6170449256897 seconds
- Pass 5 took 5.328222990036 seconds
This is not perhaps the best example, as a fair percentage of the time taken in all passes is not the actual lookup of the data, e.g. the conversions of the MD5 values and sending the query. Regardless, the benefit is apparent when accessing an index in memory rather than on disk.