Intranet Journal
The online resource for intranet professionals
XML Basics Part III: An Example of Well-Formed and Valid XML
2/10/04
|
|
If you've been following along with Part I and Part II of XML Basics, you're ready for Part III, where you put the pieces together and experiment with hands-on with XML.
As mentioned earlier, XML is hierarchical, and as such the proper nesting of tag elements is crucial. In the example we are going to walk through, I am going to build a pretend movie catalog using XML. To understand the structure and see what we'll be creating take a peek at the following illustration:
From this diagram, you can see that <MovieCatalog> is the root element and will appear only once. In addition, for each movie, the elements such as <length> and <title> will appear only once. Within the document, however, there will be multiple <movie> tags each with its own set of child tags (<length>, <genre>, <actors>, etc...). The tag <actors> will not have any content within it; it will simply contain one or more <actor> child tags.
With this hierarchy in mind, we are now ready to begin creating an XML document. Using Notepad or any other text editor, create a blank new file called "MovieCatalog.xml" and save it to your hard drive. Once we add substance to the XML file, we will be using Internet Explorer to view the results.
Type the following into your XML file:
<?xml version="1.0"?>
All XML files start with this statement as discussed in Part II of this series. Then, type the following, which constitutes the meat of our file.
<MovieCatalog>
<movie>
<title>The Matrix</title>
<length>136</length>
<genre>Sci-Fi and Fantasy</genre>
<actors>
<actor>Keanu Reeves</actor>
<actor>Laurence Fishburne</actor>
<actor>Carrie Ann Moss</actor>
</actors>
<datereleased>1999</datereleased>
<director>Wachowski Brothers</director>
<format>DVD</format>
</movie>
<movie>
<title>Titanic</title>
<length>194</length>
<genre>Drama</genre>
<actors>
<actor>Leonardo DiCaprio</actor>
<actor>Kate Winslet</actor>
</actors>
<datereleased>1999</datereleased>
<director>James Cameron</director>
<format>DVD</format>
</movie>
<movie>
<title>The Sixth Sense</title>
<length>106</length>
<genre>Thriller</genre>
<actors>
<actor>Bruce Willis</actor>
<actor>Haley Joel Osment</actor>
</actors>
<datereleased>1999</datereleased>
<director>M. Night Shyamalan</director>
<format>VHS</format>
</movie>
</MovieCatalog>
Once you save the file, you can open the moviecatalog.xml file in Internet Explorer. If you typed everything correctly, you should see the following:
Internet Explorer automatically parses the file and you can easily see the hierarchy as well as expand and collapse based on the hierarchy. At its simplest state (everything collapsed), this document would look as follows:
You can see part of the power of XML is being able to utilize the structure to your advantage to selectively choose the elements and content you wish to display and with which you want to work. Once you work with some XSL, this distinction becomes more clear. You can choose your formatting based on individual elements, output medium (Web, e-mail, etc.) or a combination of both.
This example clearly illustrates and follows the following rules of elements that were explained in Part II of this article series:
<movie serial="123XYZ">
<title>The Matrix</title>
<length>136</length>
<genre>Sci-Fi and Fantasy</genre>
<actors>
<actor>Keanu Reeves</actor>
<actor>Laurence Fishburne</actor>
<actor>Carrie Ann Moss</actor>
</actors>
<datereleased>1999</datereleased>
<director>Wachowski Brothers</director>
<format>DVD</format>
</movie>
In reality the difference between attributes and elements and what you can do with the information they contain is negligible. The XML community continues to debate whether attributes are truly necessary. Therefore, which one you choose often becomes a personal and design decision.
Well Formed and Valid XML Explained
All XML documents must be well-formed. If they are not, the XML parser would throw an error. For example, if I misspell the closing tag </movie> and then try and open it in Internet Explorer, I get the following error:
What the parser is saying is that the XML file is not well-formed. A more official definition of well-formed is:
Well-formed XML documents comply with the basic syntax and structural rules of the XML specification.
Valid is not the same as well-formed. The definition of a Valid XML document is:
Valid XML documents are well-formed documents that comply with the rules defined in a Document Type Definition (DTD).
Thus, an XML document can be well-formed but not valid. However, the reverse is not true. An XML document cannot be valid without also being well-formed.
Document Type Definitions (DTD)
A DTD is used to separate the XML data description from the individual XML documents and applications. Thus, multiple XML applications can share a single description of the data. The DTD defines the rules about how the document should be structured, what elements should be included, and what kind of data may be included.
The rules within a DTD create a well-defined and standardized space for XML documents to exist. It is the cornerstone piece required to realize one of the key benefits of XML — system interoperability. The DTD enhances reliable data exchange between diverse applications and business partners since everyone is playing by the same set of rules (the DTD).
Using the MovieCatalog example we created earlier, let's create a simple DTD example. While a DTD can be included internally with an XML document, for reasons stated above, it makes more sense to create an external DTD so multiple XML files can reference and use it.
In this example, we will create a file called MovieCatalog.dtd. It will be linked from within the MovieCatalog.xml file with the following DOCTYPE statement (discussed in more detail in Part II of this article series):
<!DOCTYPE MovieCatalog SYSTEM "MovieCatalog.dtd">
This DOCTYPE statement would typically be inserted as the second line within MovieCatalog.xml directly after the <?xml version="1.0"?> statement that begins the file.
The following is the DTD code for our example:
<!-- Movie Catalog DTD -->
<!ELEMENT MovieCatalog (movie+) >
<!ELEMENT movie (title+, length?, genre?, actors?, datereleased?, director*, format?) >
<!ELEMENT title (#PCDATA) >
<!ELEMENT length (#PCDATA) >
<!ELEMENT genre (#PCDATA) >
<!ELEMENT actors (actor*) >
<!ELEMENT actor (#PCDATA) >
<!ELEMENT datereleased (#PCDATA) >
<!ELEMENT director (#PCDATA) >
<!ELEMENT format (#PCDATA) >
<!-End of DTD for MovieCatalog.xml-->
In this DTD there are two different types of ELEMENT statements. The first type of ELEMENT statement (as in line 2, 3, and 6 of this code) declares an XML element type name and its permissible sub-elements ("children"). This syntax is used for elements that are empty. That is, they have no content and simply hold other elements. Let me explain what the details of this statement mean, section by section.
<!ELEMENT movie (title+, length?, genre?, actors?, datereleased?, director*, format?) >
The element being defined above is <movie></movie>. It shall contain child elements of: title, length, genre, actors, datereleased, director, and format. These child elements must appear in that order. It also specifies the number of elements that can appear and if they are required or not by the character following the element name. These characters are defined as follows:
| + | Must appear at least one time. May appear more than one time. |
| ? | May appear 0 or 1 time. If it appears, it will be exactly once. |
| * | May appear 0 or many times. It may or may not appear. If it does, it can appear many times. |
The second type of ELEMENT statement you see in this file (all the other lines besides 2, 3, and 6), defines the element and the type of content it can contain. In this case, they all contain content of the type PCDATA, parsed character data. It is the most common type of content.
If you wish to experiment with validating your XML files against a DTD while you are using Internet Explorer, you can download a free XML Validator tool from Microsoft. This tool lets you specify a path to your XML file and then when you click the "validate" button, it will validate the file against the specified DTD and give you output (success or error messages). It is a fun way to really learn what your DTD does and how it relates to the actual XML document.
Congratulations, you now have all the basic information and tools you need to create well-formed and valid XML documents. The last remaining foundational piece is learning how to use some simple XSL to format XML files for your desired output. In the next and final article in this series I will illustrate using XSL to format the movie catalog for the web.