XML eXtensible Markup Language

What is XML

XML is a meta-markup language. Documents which have been specified and validated using XML can be imported easily into different applications. At first glance an XML document looks similar to the HTML source code of a web page, this is because a HTML page has its syntax defined in an XML DTD (explained later).

An XML document is not restricted to user readable pages. It may contain instructions for a drilling machine, user data for a credit card transaction or the operation of a video recorder. XML specifies only the structure of the document, not how it is handled by the receiving application.

There are two very important terms applied to XML documents, these are Well-Formed and Valid.

A Well-Formed document is one which conforms to the basic XML specification in that the sections of the document are formed in such a way that it can be parsed by an application without any syntax errors. It does not mean that the document contains any useful or understandable information.

A valid document is a Well-Formed document which conforms to a DTD (Document Type Definition). A document type definition specifies a structure for the document and the sections it may contain. If the document contains sections or attributes other than allowed for in the DTD then it is not a valid document. It is entirely possible for a valid XML document to contain inappropriate values or other information which would cause the parsing application to generate errors.

Example XML document

A company wants to send a consignment to a customer. The computer generates the following XML document and sends it to the despatch department.


<order number="123456" customerid="fred56" >
  <box size="Large" >
     <box label="book and disk" size="small" color="white">
        <book title="snow crash"/>
        <floppydisk size="3.5 Inch">
          <software title="sound card drivers"/>
          <software title="Network card driver"/>
        </floppydisk>
     </box>
     <penguin name="tux" size="30 cm"/>
  </box>
  <invoice/>
</order>

order1.xml XML Syntax
This describes the following:
  1. A large box containing
    1. A smaller box with a label on containing
      1. A book (Snow Crash)
      2. A floppy disk (with Sound card and network card drivers on it)
    2. A 30cm high penguin called Tux.
  2. Invoice for order number 123456

This document therefore specifies the items to be packed and the order in which they go into the box. It would be a simple job to write a program convert this into a set of work instructions for the staff in the packing department.

Another program could be written to convert the XML document into an email and send it to the customer (fred56) stating


Dear Fred56,
Thank you for your order 123456 which has just been despatched to you.
Your order is packed into (1) Large box with a separate invoice.
It contains the following:
  Book Title(Snow Crash)
  Floppy disk containing 
    Software(Sound card drivers)
    Software(Network card drivers)
  Penguin name(Tux)
	
Please comes back again soon

The example above was an example of well formed XML. The example below is equally well formed but evidently invalid (you cannot pack a 30 cm penguin into a floppy disk).


<floppydisk size="3.5 Inch">
  <penguin name="tux" size="30 cm"/>
</floppydisk>

invalid1.xml XML Syntax

To prevent situations like this the DTD comes into play. Using a DTD you can specify which items can be contained within others. See the following tiny DTD.


<?xml encoding="UTF-8"?>
<!ELEMENT box (penguin | software)*>
<!ELEMENT penguin EMPTY>
<!ELEMENT floppydisk (software)*>
<!ELEMENT software EMPTY>

box1.dtd DTD Syntax

It specifies:

Using this DTD it is now possible to check that the following XML document is correct (it is). There are various programs around which can be used to validate XML a good example being XML Spy.


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE box SYSTEM "box.dtd" >
<box>
  <penguin/>
  <floppydisk>
      <software/>
  </floppydisk>
</box>

box1.xml XML Syntax

The DTD (box1.dtd) doesn't allow attributes (name, height etc) to be put into the elements in the XML document. A more useful DTD is listed below.


<?xml encoding="UTF-8"?>
<!ELEMENT box (penguin | floppydisk)*>
<!ATTLIST box size  (small | large) #REQUIRED
  label	CDATA #IMPLIED
  color	(brown | white)	 #IMPLIED>
<!ELEMENT penguin EMPTY>
<!ATTLIST penguin size	(15cm | 30cm) 	#REQUIRED
  name	(tux | barbie)	#IMPLIED>
<!ELEMENT floppydisk (software)*>
<!ATTLIST floppydisk size  (1440 | 1200 | 720 | 360) #REQUIRED>
<!ELEMENT software EMPTY>
<!ATTLIST software title	 CDATA  #REQUIRED>

box2.dtd DTD Syntax

A valid XML file according to this DTD.


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE box SYSTEM "box2.dtd">
<box size="small" label="Penguin and disk inside">
<penguin size="15cm"/>
<floppydisk size="1440">
<software title="Electronic war and peace"/>
</floppydisk>
</box>

box2.xml xml Syntax