Tip: Storing MD5 Values

Page last updated on 2011 / 16 / 08

Use 16 bytes, not 32

A common occurrence I have noticed in MySQL apps is that MD5 values are stored as 32 byte values rather than 16. Just to 're-hash', an MD5 value is a 16 byte hexadecimal value, typically used as a unique fixed-length signature of a string, useful for identifying unique strings or one-way encryption of passwords.

When represented as a non-binary string an MD5 value is 32 bytes and often stored as a fixed-length 32 byte CHAR column in a database. Each of the 32 characters range from 0 to F, essentially 16 values. These characters can be 'crunched' from 2 bytes to 1 by merging 2 hex characters into one binary character, reducing the hash size to 16 bytes. e.g. 16x16 = 256 values = 1 byte.

MySQL has some handy functions to convert between binary and text representations of an MD5, namely HEX() and UNHEX() which consecutively turn binary to hexadecimal values and vice versa.

Essentially, it is more advantageous to use a BINARY (16) column than a CHAR (32) column for storing MD5 values.

Why, you ask? Because the field size is smaller, searches upon it will be quicker, as shown in this benchmark below.

Testing 32 byte CHAR versus 16 byte BINARY column

Step 1: Create 2 tables, one storing 32 byte MD5 hex values, the other storing 16 byte binary values.

  1. CREATE TABLE IF NOT EXISTS `md5values16`
  2. (
  3. `hash` BINARY(16) NOT NULL,
  4. UNIQUE KEY `hash` (`hash`)
  5. )
  6. ENGINE=myisam;
  7.  
  8. CREATE TABLE IF NOT EXISTS `md5values32`
  9. (
  10. `hash` CHAR(32) NOT NULL,
  11. UNIQUE KEY `hash` (`hash`)
  12. )
  13. ENGINE=myisam;
  14.  

Step 2: Populate the tables with data

  1. <?php
  2.  
  3. // Filling the tables with 1 million values
  4.  
  5. mysql_connect('localhost','root','root');
  6. mysql_select_db('test');
  7.  
  8. for($i = 1;$i < 1000000;$i+= 10000)
  9. {
  10. echo $i,' ';
  11. $array = array(array(),array());
  12. for($j = $i;$j < $i + 10000;$j++)
  13. {
  14. $array[0][] = '(MD5(\''.$j.'\'))';
  15. $array[1][] = '(UNHEX(MD5(\''.$j.'\')))';
  16. }
  17. mysql_query('INSERT INTO md5values32 VALUES '.implode(',',$array[0])) or die('2. '.mysql_error());
  18. mysql_query('INSERT INTO md5values16 VALUES '.implode(',',$array[1])) or die('1. '.mysql_error());
  19. }
  20.  
  21. ?>
  22.  

Step 3: Add a UNIQUE index to the tables and load the indexes into memory

  1. ALTER TABLE `md5values16` ADD UNIQUE (`hash`);
  2. ALTER TABLE `md5values32` ADD UNIQUE (`hash`);
  3. LOAD INDEX INTO CACHE `md5values16`;
  4. LOAD INDEX INTO CACHE `md5values32`;
  5.  

Step 4: Benchmark the tables for comparison, using a random 100,000 queries

  1. <?php
  2.  
  3. // Benchmark tables
  4. mysql_connect('localhost','root','root');
  5. mysql_select_db('test');
  6.  
  7.  
  8. function microtime_float()
  9. {
  10. list($usec, $sec) = explode(" ", microtime());
  11. return ((float)$usec + (float)$sec);
  12. }
  13.  
  14. $time_start = microtime_float();
  15. for($i = 0;$i < 100000;$i++)
  16. {
  17. $rand = rand(1,1000000);
  18. mysql_query('SELECT hash FROM md5values32 WHERE hash = MD5(\''.$i.'\')');
  19. }
  20. echo (microtime_float() - $time_start)." seconds for 16 byte column\n";
  21.  
  22. $time_start = microtime_float();
  23. for($i = 0;$i < 100000;$i++)
  24. {
  25. $rand = rand(1,1000000);
  26. mysql_query('SELECT hash FROM md5values16 WHERE hash = UNHEX(MD5(\''.$i.'\'))');
  27. }
  28. echo (microtime_float() - $time_start)." seconds for 32 byte column\n";
  29.  
  30. ?>
  31.  

The Result

1st Pass
12.945791006088 seconds for 16 byte column
19.781244039536 seconds for 32 byte column

2nd Pass
12.882713079453 seconds for 16 byte column
20.835686922073 seconds for 32 byte column

3rd Pass
12.871006011963 seconds for 16 byte column
20.780813932419 seconds for 32 byte column


Previous Article
MySQL Partitioning and Its Uses
Next Article
Loading MyISAM Tables Into Memory




Tweet