|
|
|
XML, acronym for Extensible Markup Language, is the new universal meta-language of the IT industry. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. The dictionary defines it as: "A metalanguage written in SGML that allows one to design a markup language, used to allow for the easy interchange of documents on the World Wide Web". By construction, XML documents are conforming SGML documents.
XML was originally designed for the World Wide Web, but is used in a much broader sense now: a means to exchange information between software applications. XML is both machine-understandable and human-readable, although it is not really intended for human consumption.
The XML specification is published as a W3C (World Wide Web Consortium) Recommendation. There is also an online annotated XML specification.
XML is a `meta-language', which means it is designed as a recipe to construct application specific languages for describing things like:
The amount of XML applications is already huge and still growing by the day. XML is the foundation of a big family of XML based standards.
XML based languages use 'tags' (with 'attributes') to delimit information elements. Tags are used to mark-up information. Any information element (if not empty) begins with a start-tag, <elementname>, and ends with an end-tag, </elementname>. Start-tags can also have attributes(attributename="..."), . This means that pieces of information can be marked-up with element tags or stored in attributes attached to element tags. When to use elements versus attributes depends on the type of information. Specific feature of elements is that they can be 'nested', i.e. an element can contain one or more other elements. Empty elements, i.e. elements without content, can (optionally)be abbreviated as: <elementname/>.
The XML specification specifically requires an XML document to be well-formed, i.e:
- It contains one or more elements.
- There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.
- For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.
A simple XML example:
<author id='ED'>
<firstname>Eric</firstname>
<middlename/>
<lastname>Dortmans</lastname>
</author>
Reserved characters (<, >, &, ", ') that XML uses for its syntax should be escaped in normal information content to not confuse parsers.
XML supports various character encodings to suit various European, Middle Eastern, and Asian languages.
The grammar of an XML file can be constrained and validated using a DTD (Document Type Description). However, a more powerful way of defining and constraining an XML based language is offered by the XML Schema language.
The XML grammar allows various alternative representations for the same logical information. The Canonical XML specification describes a method for generating a canonical physical representation, the canonical form, of an XML document. If two documents have the same canonical form, then the two documents are logically equivalent. XML canonicalization is designed to be particularly useful to applications, e.g. security applications, that require the ability to test whether the information content of a document or document subset has been changed.
The textual form of XML may be good for readability and ease-of-use but is not ideal for bandwidth, processing or storage constrained applications. If this is your problem than Binary XML may be the solution.
Processing XML can be done using XSLT or using normal programming languages like Java, C/C++, C#, VB, Python, Perl, PHP, etc., supported by specific XML processing libaries and frameworks. XML is currently supported in all major programming languages!
More information can be found in numerous on-line XML resources.
|
|