A Quick and Efficient URL Shortener Script, Using PHP & MySQL
Page last updated on 2011 / 04 / 09URL shorteners have sprung up everywhere on the web, partly (I assume) because they're quick to make although the primary driver has been social networking, ala Twitter and Facebook, which both use URL shorteners. Their aim is to reduce an ugly and long URL like www.innvo.com/?loadpage=TRUE&PHPSESSID=886bb73b3156b0aa24aac99d2de0b238 to something more palatable like innvo.com/xyz ... with emphasis on keeping the URL as short as possible.
Although in this example I am going to use a 6 character domain, pretty short you may think, there are some services that are only 4 characters long, which pretty much makes them champions of URL shortening... or does it?
In this post I will share code that creates a PHP and MySQL URL shortener, with attempted emphasis on scale.
The code requires available of URL rewriting such as [url=http://httpd.apache.org/docs/1.3/howto/htaccess.html].htaccess[/url], and a version of MySQL that supports partitioning. The former is quite important but doesn't necessarily have to be mod_rewrite and .htaccess on apache, the main thing is that requests to your domain get redirected to the script I have posted below.
Also, MySQL partitioning is not essential for this but may help in scaling for larger datasets.
Take note that this is a very simple URL shortener with no special bells and whistles attached.
OK, to start off with... let's create the .htaccess file.
Save the following as ".htaccess" in the root folder of your domain (the uppermost publicly visible web folder where your index/home page usually sits)
RewriteEngine On RewriteRule !^index.php /index.php [L]
And now for the SQL, execute this list of queries on your chosen MySQL database
CREATE TABLE IF NOT EXISTS `counter` ( `id` INT(10) UNSIGNED NOT NULL ) ENGINE=myisam; INSERT INTO `counter` (`id`) VALUES (0); CREATE TABLE IF NOT EXISTS `id_to_url` ( `id` INT(10) UNSIGNED NOT NULL, `url` TEXT NOT NULL, UNIQUE KEY `id` (`id`) ) ENGINE=myisam; /*!50100 PARTITION BY KEY (id) PARTITIONS 64 */; CREATE TABLE IF NOT EXISTS `urls_to_hash` ( `hash` BINARY(16) NOT NULL, `id` INT(10) UNSIGNED NOT NULL, UNIQUE KEY `hash` (`hash`) ) ENGINE=myisam; /*!50100 PARTITION BY KEY (`hash`) PARTITIONS 64 */;
Here is the PHP, you must save this as index.php in your root web folder as that is where the .htaccess file above is currently redirecting to.
<?php // List of characters we use to shorten URLs, 73 in total, all URL safe define('INDEX','0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ$-_.+!*\'(),'); define('BASE',73); $rindex = array_flip(array_slice(preg_split("''",INDEX),1,-1)); // Connect to the MySQL database, ensure credentials are correct function dbconnect() { mysql_connect('localhost','username','password') or die('Invalid MySQL login details'); mysql_select_db('database') or die('Invalid database selected'); } // Convert a base 10 number to a base 73 number function dec10to73($num) { $out = ''; for($t = floor(log10($num) / log10(BASE)); $t >= 0; $t--) { $a = floor($num / pow(BASE,$t)); $out = $out.substr(INDEX,$a,1); $num = $num-($a*pow(BASE,$t)); } return $out; } // Convert a base 73 number to a base 10 number function dec73to10($dec73) { GLOBAL $rindex; $len = strlen($dec73); $out = $rindex[$dec73[$len-1]]; $k = 1; for($j = $len-2;$j >= 0;$j--) { if(!isset($rindex[$dec73[$j]])) { header('HTTP/1.1 400 Bad Request'); die('The requested URI does not conform to this URL shortener\'s syntax'); } $out = bcadd($out,bcmul($rindex[$dec73[$j]],bcpow(73,$k)),0); $k++; } return $out; } if(strlen($_SERVER['REQUEST_URI']) > 1) // Request for a shortened URL { dbconnect(); if(!$longurl = mysql_fetch_array(mysql_query('SELECT url FROM id_to_url WHERE id = '.dec73to10(substr($_SERVER['REQUEST_URI'],1))),MYSQL_ASSOC)) { header('HTTP/1.1 404 Not Found'); die('The shortened URL you have requested does not exist in our database'); } header('Location: '.$longurl['url']); exit(0); } elseif(isset($_POST['url']) && trim($_POST['url'])) // Request to add a URL to the database (or check if it exists), return short version for display { dbconnect(); ($_POST['url'] = stripslashes($_POST['url'])) && $md5 = md5($_POST['url'],TRUE); if(@!$p = parse_url($_POST['url']) || !isset($p['host']) || preg_match("'.?3hr.in$'",$p['host'])) echo 'Sorry, the URL you requested to shorten was not valid.'; else { mysql_query('LOCK TABLES urls_to_hash'); mysql_query('INSERT IGNORE urls_to_hash (hash,id) SELECT ''.mysql_real_escape_string($md5).'',id+1 FROM counter') or die(mysql_error()); if(mysql_affected_rows()) { mysql_query('INSERT IGNORE id_to_url (id,url) SELECT id+1,''.mysql_real_escape_string($_POST['url']).'' FROM counter') or die(mysql_error()); mysql_query('UPDATE counter SET id = id + 1') or die(mysql_error()); } mysql_query('UNLOCK TABLES urls_to_hash'); $shorturl = dec10to73(array_shift(mysql_fetch_array(mysql_query('SELECT id FROM urls_to_hash WHERE hash = ''.mysql_real_escape_string($md5).'''),MYSQL_ASSOC))); // Fetch numerical ID of URL and convert it from base 10 to 72 echo '<strong>Your shortened URL: <a href="http://3hr.in/'.$shorturl.'">http://3hr.in/'.$shorturl.'</a></strong>'; } } ?> <form method="post">URL: <input type="text" name="url" /><input type="submit" value="Shorten URL" /></form>
Give it a try! If you encounter an SQL error when attempting to create the tables, you may want to reduce the number of partitions from 64, otherwise you would have to have access to MySQL's server settings which is a slightly different topic and shouldn't be an issue here.
Here's the good and bad of the way this URL shortener works, starting with the good:
It's Small
It's less than 100 lines of code, so is simple and can be adapted easily.
No gimmicks
...that other URL shorteners have... that don't have much to do with getting a short URL
Shortest URLs in the world!
It's an empty database so the first ~5000 URL's will only have 1-2 character URLs. Popular URL shorteners already are into 4-5 characters + the domain
Scalability
The database should be very fast at retrieving URLs for several reasons:
A 16 byte unique hash is created for each URL by using MD5, and stored in the `urls_to_hash` table. The MD5 is the UNIQUE key of that table. Usually MD5 strings are stored as 32 byte strings, twice as long. Half the column size means it will work faster.
This MD5 is the field used to check if a URL already exists in the database, and if so, to fetch the associated ID. It is much faster than doing a plain text search of a URL, which are also variable in length.
The `id_to_url` table takes an ID and fetches the corresponding URL. Since the ID is the PRIMARY KEY, searches here are very quick.
Whenever a new URL is added, the `counter` table is used to get the next incremental value for the ID value to be inserted.
While INSERTs are going on, the table is locked so there is no race conditions.
Partitioning
I have added 64 partitions to the `urls_to_hash` table and the `id_to_url` table. Partitions are pretty much "tables within tables", the MySQL documentation goes into great detail about the advantages of using them.
In short, a typical query on either table will mean MySQL only has to search 1/64th of the table, and in our case, only 1/64th of the index.
So when there's over a 1,000,000 URL's in the URL shortener database, MySQL only has to sift through ~15,000 to find the right record.
Currently, up to 1024 partitions can be used per table in MySQL.
And the downsides....
No gimmicks
I've just included the most basic form possible to add URL's. If any other pages/css/javascript need to be added to the server, the .htaccess file will have to be changed as all requests are currently redirected to index.php (suggestion, put other files in a separate folder)
Limits
I have used an INT field in the `counter` table and therefore the subsequent `id` fields in the URL tables, meaning there is a limit of 2^32 URLs that can be stored before unique 4 byte numbers run out. The workaround is to have BIGINT columns instead, which can go up to 2^64 (8 bytes). If you find yourself in this situation, you have a very popular URL shortener, and more links than the best search engines of today have in their indexes.
There's lots of URL shorteners out there
True. On the bright side, new ones have the shortest URLs.
Comments welcome. There is a working (carbon copy) of this script situated on 3hr.in (6 characters)
Tweet