XML

XML a markup language similar to HTML. Unlike HTML though, XML is a general purpose markup language, meaning that you have to define your own tags. XML's primary purpose is to share data across different systems, similar to JSON (see chapter 11).

XML declaration

Similar to HTML, where you start a document with <!DOCTYPE html>, you can to the same in XML. The prolog is optional, but if it exists, it must come first in the document.

In the prolog you declare what version of XML you're using, the encoding (UTF-8 is the default and most often used), and if the document relies on information from external sources (if it does, the value should be "no"). E.g.:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

It is good practice to specify the encoding used to avoid errors.

Syntax

As HTML, XML consist of opening and closing tags. Each tag is a attribute. Inside the tags are data values. If the XML document lacks closing tags, it is not well-formed, and thus not correct.

You decide for yourself what attributes and values should be present in the document, but the document needs to conform to some semantic rules. These are usually defined by the Document Type Definition (DTD) or XML Schema Definition (XSD). DTD and XSD will be presented later on. A document that contains an undefined tag is invalid.

An XML body could look like:

<person>
    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>
</person>

Nesting

XML documents must contain one root element, that is the parent of all other elements. In the example below, <person> is the root element.

<person>
    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>
</person>

In cases where we want to define several persons in the same document, we could create a new root element and name it <persons>:

<persons>

  <person>
    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>
  </person>

  <person>
    <name>Another Nameson</name>
    <height>181</height>
    <age>25</age>
  </person>

</persons>

Elements in XML must be properly nested, meaning that you can't write:

<person> <name> Name Nameson </person> </name>

The name closing tag must come before the parent element's closing tag.

Case sensitivity

Tags in XML are case sensitive, meaning that the tag <Name> is different from the tag <name>. Opening and closing tags must be written with the same case.

Entity references

Some characters in XML have a special meaning.

Placing a character like "<" inside an XML element will generate an error because the parser interprets it as the start of a new element. To avoid this error, replace the "<" character with an entity reference:

There are 5 pre-defined entity references in XML:

Code

Character

Meaning

<

<

Less than

>

>

Grater than

&

&

Ampersand

'

'

Apostrophe

"

"

Quotation mark

Comments

Similar to HTML, comments in XML are written:

<!-- Comment -->

Comments can be used when you want to give additional information to humans that are reading the document and need to understand the content.

How a JSON might look like as XML

The following JSON:

{
    "person" : {
        "name" : "Name Nameson",
        "height" : 173,
        "age" : 23
    }
}

Would in XML be translated as:

<person>
    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>
</person>

Document Type Definition

Document Type Definition (DTD) and XML Schema Definition (XSD) is used to describe the XML language precisely. This means that the XML document must follow the defined grammatical rules.

The DTD is defined in the prolog, after the XML header (xml version and encoding) and before the body (the elements with values).

The syntax for DTD is:

<!DOCTYPE element[
     declaration1
     declaration2
]>
  • The element tells the parser to parse the document from the specified root element.

  • The square brackets [ ] enclose an optional list of entity declarations called internal subset.

For the example:

<persons>

  <person>

    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>

  </person>

</persons>

The element in <!DOCTYPE element []> is persons.

The declarations, like declaration1, are written as: <!ELEMENT element-name (#PCDATA)>. Before declaring the elements, you should declare all elements in the parent element (<person> in this case) as such: <!ELEMENT person (name, height, age)>. For the XML example above, we will get:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person [
  <!ELEMENT person (name, height, age)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT height (#PCDATA)>
  <!ELEMENT age (#PCDATA)>
]>

<persons>

  <person>

    <name>Name Nameson</name>
    <height>173</height>
    <age>23</age>

  </person>

</persons>

PCDATA means "parsed character data", meaning that the value should be parsed by the XML parser. The alternative is CDATA, meaning there are no markup, and the parser should treat the section as regular text.

XML Schema Definition

Instead of using DTD, you can use XML Schema Definition (XSD). As DTD it is used to describe and validate the structure and content of a XML document.

To use XSD, you first need to declare a schema in your XML document:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

XSD consists of three types: Simple, complex and global types.

Simple type is used only in the context of text. Some predefined simple types are: xs:int, xs:boolean, xs:string and xs:date.

<xs:element name="company" type="xs:string" />

Complex type is a container for other element definitions. This allows you to specify which child elements an element can contain and to provide some structure within your XML document. E.g.:

<xs:element name="Address">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="name" type="xs:string" />
        <xs:element name="company" type="xs:string" />
            <xs:element name="phone" type="xs:int" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

Here, Address element consists of child elements. This is a container for other <xs:element> definitions, that allows to build a simple hierarchy of elements in the XML document.

Global types define a single type in the document, which can be used by all other references. E.g. if you want to generalise a person and company with different addresses of the company. Is such case, you can defined a general type as below:

<xs:element name="AddressType">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="name" type="xs:string" />
        <xs:element name="company" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

When in use, it would look like:

<xs:element name="Address1">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="address" type="AddressType" />
        <xs:element name="phone1" type="xs:int" />
        </xs:sequence>
    </xs:complexType>
</xs:element>
<xs:element name="Address2">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="address" type="AddressType" />
        <xs:element name="phone2" type="xs:int" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

Here, instead of having to define the name and the company twice (once for Address1 and once for Address2), we now have a single definition. This makes maintenance simpler, i.e. if you want to add "Postcode" elements to the address, you only have to add them in one place.

Displaying XML documents

XML are usually used to store and transport data, but you can display XML data. If you don't define how, the raw XML is displayed in the browser. But you could style the shown data using CSS. You do this by writing:

<?xml-stylesheet type="text/css" href="stylesheet.css"?>

You can then use regular CSS, as you would with HTML.

Last updated