XML
XML a markup language similar to HTML. Unlike HTML though, XML is a general purpose markup language, meaning that you have to define your own tags. XML's primary purpose is to share data across different systems, similar to JSON (see chapter 11).
XML declaration
Similar to HTML, where you start a document with <!DOCTYPE html>
, you can to the same in XML. The prolog is optional, but if it exists, it must come first in the document.
In the prolog you declare what version of XML you're using, the encoding (UTF-8 is the default and most often used), and if the document relies on information from external sources (if it does, the value should be "no"). E.g.:
It is good practice to specify the encoding used to avoid errors.
Syntax
As HTML, XML consist of opening and closing tags. Each tag is a attribute. Inside the tags are data values. If the XML document lacks closing tags, it is not well-formed, and thus not correct.
You decide for yourself what attributes and values should be present in the document, but the document needs to conform to some semantic rules. These are usually defined by the Document Type Definition (DTD) or XML Schema Definition (XSD). DTD and XSD will be presented later on. A document that contains an undefined tag is invalid.
An XML body could look like:
Nesting
XML documents must contain one root element, that is the parent of all other elements. In the example below, <person>
is the root element.
In cases where we want to define several persons in the same document, we could create a new root element and name it <persons>
:
Elements in XML must be properly nested, meaning that you can't write:
The name closing tag must come before the parent element's closing tag.
Case sensitivity
Tags in XML are case sensitive, meaning that the tag <Name>
is different from the tag <name>
. Opening and closing tags must be written with the same case.
Entity references
Some characters in XML have a special meaning.
Placing a character like "<" inside an XML element will generate an error because the parser interprets it as the start of a new element. To avoid this error, replace the "<" character with an entity reference:
There are 5 pre-defined entity references in XML:
Code
Character
Meaning
<
<
Less than
>
>
Grater than
&
&
Ampersand
'
'
Apostrophe
"
"
Quotation mark
Comments
Similar to HTML, comments in XML are written:
Comments can be used when you want to give additional information to humans that are reading the document and need to understand the content.
How a JSON might look like as XML
The following JSON:
Would in XML be translated as:
Document Type Definition
Document Type Definition (DTD) and XML Schema Definition (XSD) is used to describe the XML language precisely. This means that the XML document must follow the defined grammatical rules.
The DTD is defined in the prolog, after the XML header (xml version and encoding) and before the body (the elements with values).
The syntax for DTD is:
The element tells the parser to parse the document from the specified root element.
The square brackets [ ] enclose an optional list of entity declarations called internal subset.
For the example:
The element in <!DOCTYPE element []>
is persons
.
The declarations, like declaration1, are written as: <!ELEMENT element-name (#PCDATA)>
. Before declaring the elements, you should declare all elements in the parent element (<person>
in this case) as such: <!ELEMENT person (name, height, age)>
. For the XML example above, we will get:
PCDATA
means "parsed character data", meaning that the value should be parsed by the XML parser. The alternative is CDATA
, meaning there are no markup, and the parser should treat the section as regular text.
XML Schema Definition
Instead of using DTD, you can use XML Schema Definition (XSD). As DTD it is used to describe and validate the structure and content of a XML document.
To use XSD, you first need to declare a schema in your XML document:
XSD consists of three types: Simple, complex and global types.
Simple type is used only in the context of text. Some predefined simple types are: xs:int
, xs:boolean
, xs:string
and xs:date
.
Complex type is a container for other element definitions. This allows you to specify which child elements an element can contain and to provide some structure within your XML document. E.g.:
Here, Address element consists of child elements. This is a container for other <xs:element>
definitions, that allows to build a simple hierarchy of elements in the XML document.
Global types define a single type in the document, which can be used by all other references. E.g. if you want to generalise a person and company with different addresses of the company. Is such case, you can defined a general type as below:
When in use, it would look like:
Here, instead of having to define the name and the company twice (once for Address1 and once for Address2), we now have a single definition. This makes maintenance simpler, i.e. if you want to add "Postcode" elements to the address, you only have to add them in one place.
Displaying XML documents
XML are usually used to store and transport data, but you can display XML data. If you don't define how, the raw XML is displayed in the browser. But you could style the shown data using CSS. You do this by writing:
You can then use regular CSS, as you would with HTML.
Last updated
Was this helpful?