Creating a Directory Tree with PHP & MySQL (Part 1)
Page last updated on 2011 / 04 / 09Hierarchy structures are commonplace for websites and relational databases, a popular example would be the classic "web directory", where items are stored in a directory tree and users click into the category structure to find items of interest to them.
A classic problem of directory design is how to store the relationship between categories. Having pairs of id/parentid in a table is a simple solution, and effective enough for directories with a shallow depth, but what about for larger structures, such as the DMOZ directory dump?

Some reading material on SQL trees may be of interest to you if you're considering building a large hierarchy.
On the PHP side of things, Kevin van Zonneveld has created a nice little explodeTree function to represent data in a multi-dimensional array.
To prevent this tutorial becoming huge, I'd recommend reading the first link in particular to get to grips with the whys and why nots of using tree structures in SQL. Some more bedtime reading about nested sets and data hierarchy can be found at code project.
An Example of Tree Structures in Action
The following script will generate a tree structure and store it in MySQL. It's quite lengthy, but it should give you a good visualisation of how to take advantage of tree structures and is easy enough to adapt for your own needs.
This first part of a 2 part post gives you the script enabling you to create and store directory structures. The notes at the foot of the page should guide you through the script in the event you run into problems or are interesting in tinkering with it for your own requirements. The 2nd post gives a basic GUI and admin functions for viewing and altering the tree structure.
If I understand the lingo correctly (it can get a little heavy) this script is a 'preorder tree traversal' which can bring about a new way of looking at and extrapolating your data.
- This method avoids recursion, you can fetch breadcrumbs for a category thats 14 levels, 20 levels or even 50 levels deep using one query. In this particular script both the parent and child categories are fetched in one query.
- All subcategories are methodically encapsulated within their parent nodes, and each node can give you a calculation of how many subcategories there are without any further querying
- Generally, for static and/or large tree structures, this structuring of your categories is advantageous for speed and ease of querying.
- The cost is that updates to the tree structure can be expensive, i.e. removing or adding (and sometimes editing) a node in the middle of the tree, which requires altering all records to the 'right side' of the tree to avoid having gaps or collisions in the tree structure. In general, you want to avoid having to update a large tree often, or save the updates for one particular moment where the whole tree can be rebuilt with its updates.
Save and run the following script to generate some example data. It is assumed you have MySQL installed and already have created a database called 'test'. Download this 300K list of categories (1.7MB) as sample data to work with
buildtree.php
<?php // buildtree.php mysql_connect('localhost','root','root') or die('Cant connect to MySQL'); mysql_select_db('test') or die('Cant connect to MySQL'); /* Using the dmozregional.txt file on innvo.com, otherwise you can pass a different file or even an array Contains around 314,000 categories Will take about 40 seconds to process, ensure you have enough memory (around 200MB in this case) and roughly 10x the size of your input file in general cases */ $tree = new generate_tree(fopen('dmozregional.txt','r')); $tree->to_mysql_data(); class generate_tree { var $cats = array(); var $thiscat = array(); var $depths = array(); var $lftrgt = array(); var $depth = 0; var $inc = 0; var $catid = 0; var $fp1,$fp2; /* Run generate_tree once if your dataset is small or memory is not an issue. */ public function generate_tree(&$linearcats) { $this->depth = 0; $this->cats = array(); echo "Step 1: Gathering Data: ".date('H:i:s')."\n"; if(is_array($linearcats)) // Run through array { foreach($linearcats as $cat) { $this->cats[$cat] = array(); // Adding to 2 dimension list of categories with $cat as the key array_shift($linearcats); } } elseif(is_resource($linearcats)) { while(!feof($linearcats)) { if(!trim($cat = trim(fgets($linearcats)))) continue; $this->cats[$cat] = array(); // Adding to 2 dimension list of categories with $cat as the key } } if(!is_resource($this->fp1)) // 1st Pass, open files { $this->fp1 = fopen('/tmp/tree.txt','w'); $this->fp2 = fopen('/tmp/top.txt','w'); } echo "Step 2: Tree Structure: ".date('H:i:s')."\n"; $this->cats = $this->explodeTree($this->cats); echo "Step 3: Pre-Order Tree Traversal: ".date('H:i:s')."\n"; $this->mptt($this->cats); } /******************************** Function explodeTree with thanks to Kevin van Zonneveld info: http://kevin.vanzonneveld.net/techblog/article/convert_anything_to_tree_structures_in_php/ Altered from original version but serves largely the same purpose Example input data is same/similar as example data in the above URL ********************************/ public function explodeTree($array) { if(!is_array($array) || !count($array)) return array(); $returnArr = array(); foreach ($array as $key => $val) { // Get parent parents and the current leaf $parents = preg_split("'/'",$key,-1,PREG_SPLIT_NO_EMPTY); $leaf = array_pop($parents); // Build parent structure // Might be slow for really deep and large structures $parentArr = &$returnArr; foreach($parents as $parent) { if(!isset($parentArr[$parent])) $parentArr[$parent] = array(); elseif(!is_array($parentArr[$parent])) $parentArr[$parent] = array(); $parentArr = &$parentArr[$parent]; } // Add the final part to the structure if(empty($parentArr[$leaf])) $parentArr[$leaf] = $val; elseif(is_array($parentArr[$leaf])) $parentArr[$leaf][] = $val; } return $returnArr; } /******************************** Function mptt (modified pre-order tree traversal) Used to recursively walk through the array and provide lft and rgt values (and category depth) ********************************/ public function mptt(&$cats) { foreach($cats as $catname => $array) { $this->depths[$this->depth] = 0; $this->thiscat[$this->depth] = $catname; // Marking this depth of categories as this category $imp = implode('/',$this->thiscat); // Full category path $this->lftrgt[$imp] = array(++$this->inc,++$this->catid); // Assign lft if(count($array)) { ++$this->depth; // Deeper $this->mptt($array); // Reiterate } fwrite($this->fp1,$this->lftrgt[$imp][0]."\t".(++$this->inc)."\t".count($this->thiscat)."\t".$this->lftrgt[$imp][1]."\t".strtoupper(md5($imp))."\t".$catname."\n"); // $lft $rgt $depth $id $hash $name if($this->depth == 0) fwrite($this->fp2,$this->lftrgt[$imp][1]."\n"); unset($this->lftrgt[$imp]); } --$this->depth; // Shallower array_pop($this->thiscat); // Pop this category depth as we're moving up from here } /******************************** Function mysql_data Creates schema, deletes any old data and creates indexes if not already made. Deletes temporary file data that MySQL uses to load data into tables ********************************/ public function to_mysql_data() { echo "Step 4: MySQL Schema: ".date('H:i:s')."\n"; // Creating DB Schema and populating with data. Deleting any old data if present mysql_query('CREATE TABLE IF NOT EXISTS category_tree ( lft mediumint(8) unsigned NOT NULL, rgt mediumint(8) unsigned NOT NULL, depth tinyint(3) unsigned NOT NULL, id mediumint(8) unsigned NOT NULL, hash binary(16) NOT NULL, name varbinary(255) NOT NULL) ENGINE=InnoDB;') or die(mysql_error()); mysql_query('TRUNCATE TABLE category_tree') or die(mysql_error()); mysql_query('LOAD DATA INFILE \'/tmp/tree.txt\' INTO TABLE category_tree FIELDS TERMINATED BY \'\t\' (lft,rgt,depth,id,@hash,name) SET hash = UNHEX(@hash)') or die(mysql_error()); mysql_query('INSERT INTO category_tree (lft,rgt,depth,id) SELECT 0,MAX(rgt)+1,0,0 FROM category_tree') or die(mysql_error()); mysql_query('CREATE TABLE IF NOT EXISTS category_top (id MEDIUMINT(8) unsigned NOT NULL,PRIMARY KEY (id)) ENGINE=MyISAM') or die(mysql_error()); mysql_query('TRUNCATE TABLE category_top') or die(mysql_error()); mysql_query('LOAD DATA INFILE \'/tmp/top.txt\' INTO TABLE category_top FIELDS TERMINATED BY \'\t\'') or die(mysql_error()); // Delete temporary data unlink('/tmp/tree.txt'); unlink('/tmp/top.txt'); echo "Step 5: MySQL Indexes: ".date('H:i:s')."\n"; // Adding indexes for SELECT speed. Script will die appropriately here if you have already created the indexes mysql_query('ALTER TABLE category_tree ADD PRIMARY KEY (lft,rgt,depth);') or die('I1 '.mysql_error()); mysql_query('ALTER TABLE category_tree ADD UNIQUE (id);') or die('I2 '.mysql_error()); mysql_query('ALTER TABLE category_tree ADD UNIQUE (hash);') or die('I3 '.mysql_error()); } } ?>
Script Overview
- Lines 4-5: Connect to MySQL or terminate script
- Line 13: Generate the tree structure. You can call this more than once if you have a large dataset. One method would be to call generate_tree() for every top level category you have so that tree sizes are limited to them.
- Line 14: $tree->to_mysql_data() is called at the end to write data to MySQL, after all the following events have occurred
- Lines 28-61: Function generate_tree, invoked when the class is initiated.
- - Lines 33-49: Loads the data you pass into an array, with values situated in the array keys. You can pass an array to this function or a file resource. For files, ensure there is one record per line.
- - Lines 52-55: On first invocation, a couple of temporary files are created for storing data deriving from the tree structure
- - Line 58: Calls explodeTree() which converts the array to a multi-dimensional tree array
- - Line 60: mptt() to create the lft, rgt and depth values in order to use our tree in MySQL. Data is written to temporary files that are then used by MySQL to populate the tables.
Part 2 of this tutorial looks at the tree structure in action, and how to perform basic modifications of the tree.
Tweet