XMLParser?XMLParser was designed by me (Adam A Flynn) after spending a huge amount of time messing with PHP's XML extention because a client needed something that worked in both PHP 4 and PHP 5. The result of my initial tinkerings was a piece of horribly hacked code which did the job of 1 line in SimpleXML. I decided that before embarking on another XML project, I would write a parser that could work like SimpleXML and work with both PHP 4 and PHP 5. This is the result.
This parser comes in 2 flavours. One is for PHP 4 and one is for PHP 5. Both flavours are accessed through the exact same interface, so you can write code to use this parsing system and, as long as you include the right flavour of the parser file (see version_compare() for how to figure out if the PHP version is pre or post 5.0), everything should work perfectly under both PHP versions.
Using XMLParserBecause I want to draw the parallels between my parser and SimpleXML (since it was designed, more or less, to mimic SimpleXML but with php4 functionality), I'm going to use the same series of examples and the same XML document that the SimpleXML documentation pages on php.net use.
Our example XML document (example.xml)
<?xml version='1.0' standalone='yes' ?>
<title>PHP: Behind the Parser</title>
So, this language. It's like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary.
Getting StartedTo get started with the parser, you need to first load the XML document you are working with. For our purposes, we're going to call the document example.xml. Like SimpleXML, the XMLParser constructor takes the actual XML document itself, not just the name of the file which contains it. So, what this means is that we need to do the file_get_contents() call before we initialise the XMLParser object.
Once we call XMLParser and fill it with some data, we need to tell it to work its magic and actually do some parsing. Doing that is as simple as calling the Parse method.
Setting up the parser
//Get the XML document loaded into a variable
$xml = file_get_contents('example.xml');
//Set up the parser object
$parser = new XMLParser($xml);
//Work the magic...
The XML parser has a error handling function which should trigger a PHP error if there are any issues parsing the XML document. This function is called trigger_error, and is a method of the XMLParser class. Feel free to change it to display errors however you would like; the arguments should be straight forward. Assuming, however, we didn't get any errors, we can press forward.
The object structure of XMLParser is really quite straight forward, however, it takes some getting used to. The document root is contained in the document member of the Parser. This means, in the above example, $parser->document would be the root tag, regardless of the tag's name. From there, each child tag encountered is assigned to an array named for the tag's name. So, $parser->document->movie would be the way to access the first movie tag. $parser->document->movie is an array, not an XMLTag object. Therefore, in most cases, trying to access the first movie object through $parser->document->movie would be incorrect.
Working with XMLParserNow, I said I was going to try to closely follow the SimpleXML documentation, and I'm not breaking that promice. Example 2 on the SimpleXML documentation involves echoing the plot of the first movie, and example 3 involves echoing the plot of the movie for each movie. I'm going to combine example 2 and 3 into one example below. This example assumes that you've already loaded and parsed the XML document (like in the above example).
//Echo the plot of the first <movie>
//Echo the plot of each <movie>
foreach($parser->document->movie as $movie)
Okay, so the syntax isn't quite as pretty as it is in SimpleXML. PHP 4 compatability won't let me use __toString() to make the code easier to work with, and, after all, one of the primary goals of this parser are to be PHP 4 compatable. After all, if we were only deploying on PHP 5 servers, you'd probably be reading the SimpleXML documentation, not this makeshift document for my XML parser, right? If you feel the desire to implement a __toString() method in the PHP 5 version of the parser to make outputting the tagData member happen behind the scenes when you call echo to the object, go right ahead, but, for these examples, I'm going to assume that you have to do things the long way.
So, to break the above example down, basically what we did was we navigated our way through the document tree to the plot object that we wanted, then we outputted its tagData member. What is tagData you might ask? tagData is the value that PHP's XML parser's character_data_handler is given. Because it looks like PHP parses XML documents line by line, the value is concatenated, so you can have multiple lines worth of character data. I've also put a trim() call in before the data is sent to tagData. This prevents spaces and other whitespace characters from throwing things off. So the whitespace from the start and end of each character data line will be stripped. If it causes bugs for you, that's the cause.
Reserved NamesFinally, before I get into attributes (yes, of course this parser handles attributes) a quick note on why I used the name tagData instead of something shorter. In version 1, I used data as the name of the member to hold character data, as it was shorter and easier to work with. However, as I started to work with the class myself, I noticed that I used a tag called <data> a few times; also, I got a few e-mails from people who had used the class and ran into issues when they had a tag called <data>. This becomes an issue since the parser will try to add an element to the data array (or create one) over top of the already defined member, which results in a PHP error and a document tree that isn't quite right. So, to fix this problem, I decided to rename all of the members used internally in the XMLTag class in version 1.1. They were all prefixed with tag, since I can't see any reason why someone would name an XML tag <tagdata> or <tagattrs>. If, by some odd chance you feel the need to use one of the names listed below as the name of an XML tag, either a) don't, or b) rename the member in XMLTag and run a find and replace to rename it everywhere in the class. The list of reserved names for tags are:
AttributesAttributes are very simple to work with. Every XMLTag object has an associative array member called tagAttrs. In this member, the keys represent the attribute name and the values represent the attribute values. I don't think I need to go into much more depth than that, but I'll toss in an example mirroring the SimpleXML example for attributes.
//For each of the <rating> tags, display them
foreach($parser->document->movie->rating as $rating)
//If the rating is in stars...
if($rating->tagAttrs['type'] == 'stars')
echo $rating->tagData.' stars';
//If the rating is in thumbs...
if($rating->tagAttrs['type'] == 'thumbs')
echo $rating->tagData.' thumbs up';
As you can see, the attributes are accessed from tagAttrs, and, once again, the character data is accessed from tagData. Pretty simple, eh?
Setting and Comparing ValuesBecause we aren't using any of PHP 5's hip new OO features to make this parser easy to work with (once again, for PHP 4 compatability), setting and comparing values is much easier. In SimpleXML, you need to type cast things before you are allowed to work with them like strings. In XMLParser, by contrast, you don't need to type cast. Just use = or == like normal and things will work fine. Just make sure you are working with tagData, tagAttrs, or one of the other XMLTag members. If you aren't, you're trying to preform string operations on an object and will get errors out of PHP.
Other Members of XMLTagDepending on how much attention you paid to the sources, you probably noticed a few other members exist in the XMLTag class. These members are described below:
|tagChildren||This member is an array of references to all of the direct child tags of the given object, in order of occurance in the XML document. It is simply an alternative to accessing the children tags by their names, and is used when names are arbitrary or unknown.|
|tagParents||This member contains the number of parents this object has before the document root. This number, currently, is only used to determine how many tabs are required to nicely format the XML output.|
|tagName||This member contains the name of the current tag. Again, it is only used internally for the proper output of the XML document.|
Outputting The XML DocumentAs if just parsing XML documents in both PHP 4 and PHP 5 wasn't good enough, there is also functionality in this system to output the XML document. For the most part, this functionality was only used by me to test to be sure that the system was properly parsing the entire XML tree without having to resort to lots of difficult to read var_dump() statements, however, this could also be used to modify XML data (through the document tree) and output it again, or to create a whole new document tree from the ground up and get the XML for that. To access this functionality, simply output the return value of the GenerateXML() method on the XMLParser object. So, in the above examples, it would just be something as simple as the below.
Output the XML Document
This code will start the XML generation at the root tag. You can call the GetXML() method from any XMLTag object if you wish to start the generation of XML from "deeper" in the document tree.