Introduction To XML

What is XML?

  • XML stands for eXtensible Markup Language
  • All of you must have used HTML tags and elements. HTML provides a fixed set of elements and we are bound to use only those elements.
  • XML on the other hand allows us to create our own tags and elements
  • Since we create our own tags they can be descriptive making the document more readable
  • HTML is designed to display your data in a web browser
  • XML is designed to represent your data rather than its display. The display of data is taken care by other means like CSS or XSL or custom applications
  • HTML page or data can be displayed only on web browsers
  • XML data can be used by any application including web browser which understands how to interpret the data
  • Since the data is separated from display, any change in data can be easily incorporated without touching the display mechanism
  • XML originated from SGML – Standard Generalized Markup Language – which provides specifications to create markup languages
  • HTML is also an example of markup language
  • SGML and XML are controlled by World Wide Web Consortium(W3C)
  • XML made its first public appearance in 1996
  • The first official specification of XML was published in 1998

A Simple XML document

Consider following file named myfirstxml.xml which represents a simple XML document. Try to compare it with HTML. XML files are just plain text files having .xml extention


<? Xml version="1.0" ?>

<!DOCTYPE mylibrary SYSTEM "mylibrary.dtd">


<book book_no="100">

<author>Author 1</author>

<title>Title 1 </title>

<photo src="photo1.gif" />


<book book_no="200">

<author>Author 2</author>

<title> Title 2</title>

<photo src="photo2.gif" />



Common XML Terms

  • Processing instructions

<? Xml version="1.0" ?>

They are special instructions and enclosed in a pair of <? And ?>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

  • Version : specifies xml version used for the document. Currently it should be 1.0
  • Encoding : Optional argument. Specifies character code set used
  • Standalone : Optional argument. Specifies weather the document depends on any other external document or markup. If your document is based on any DTD then set it to "no" .
  • Document Type Declaration

<!DOCTYPE mylibrary SYSTEM "mylibrary.dtd">

If your XML document is based on some DTD you must declare that DTD name here. The document name "mylibrary" is arbitery and need not be the same as the DTD file name

  • Tag


Tags are identifiers of a particular instance of data. They are enclosed between a pair of < and >. Generally a set of start tag (<--->) and end tag(<--- />) form an element

  • Element


<photo src="photo1.gif" />

An element is a set of tags. Element generally comprise of a set of start tag and end tag. However some times they can be represented in an alternative way like shown in the second example. Here instead of using a pair of <photo> and </photo> we have used a shortcut <photo --- />

  • Attribute


They provide some extra information about an element

  • Root


Every XML document must have an element at the top of hierarchy called the root element

  • Tree




An XML document can be viewed as an inverted tree with root element at the top and all other elements at various branch levels

  • Node


Each point which starts a branch or is at a leaf level is called as a Node

  • Parent


Parent elements are the elements having sub elements

  • Child


Child elements are the elements beneath parent elements

Basic Rules of XML Grammar

  • XML is case sensitive. So, all the tag names – start and end - must appear in the same case
    <mytag> is not same as <MYTAG> or <MyTag>
  • All start tags must have corresponding end tags
    <mytag>Some Data
    <my_other_tag>Some other data</my_other_tag>
    Above XML is wrong as <mytag> do not have corresponding end tag </mytag>
  • Empty elements must be written in abbreviated form
    <photo src="mypicture.gif" />
  • All tags must be nested properly
    <mytag>some data
    <my_other_tag>Some other data
    Above XML is invalid because the nesting of tags is incorrect. The correct nesting would be
    <mytag>some data
    <my_other_tag>Some other data
  • All attribute values must be enclosed in quotation marks
    <book book_no=100> is invalid. Valid usage would be
    <book book_no="100">

What is a DTD ?

  • DTD stands for Document Type Declaration
  • It defines the structure or rules for an XML document which is based on the DTD
  • DTD is written in a special format called Extended Backus-Naur Form(EBNF)

Bipin Joshi is an independent software consultant and trainer by profession specializing in Microsoft web development technologies. Having embraced the Yoga way of life he is also a yoga mentor, meditation teacher, and spiritual guide to his students. He is a prolific author and writes regularly about software development and yoga on his websites. He is programming, meditating, writing, and teaching for over 27 years. To know more about his private online courses on ASP.NET and meditation go here and here.

Posted On : 18 December 2000

Tags : XML