4. Introduction of XML
XML stands for EXTENSIBLE MARKUP
LANGUAGE.
XML represent the text information in a
standard format .
XML was designed to transport and store
information.
It is used for documents containing structure
information in a reliable way.
5. What do mark up languages
do?
Markup languages consist of a set of
markup
conventions used for encoding texts.
A markup language specifies –
What markup is allowed
What markup is required
How the mark up is distinguished from
text
What the markup means
6. Features of XML
• XML files are text files, which can be managed
by any text editor.
• XML is very simple, because it has less than
10 syntax rules.
• XML tags are not predefined. You must define
your own tags
• XML is designed to be self-descriptive
7. ADVANTAGES OF XML
It can represent common computer science data
structures: records, lists and trees.
Its self-documenting format describes structure and
field names as well as specific values.
The strict syntax and parsing requirements make the
necessary parsing algorithms extremely simple,
efficient, and consistent
XML is heavily used as a format for document storage
and processing, both online and offline.
8. How is XML different from
HTML?
HTML and XML have different sets of goals.
HTML was designed to display data and hence
focused on the ‘look’ of the data,
XML was designed to describe and carry data and
hence focuses on ‘what data is’.
XML Does Not DO Anything
HTML and XML are complementary to each
other.
9. XML Syntax Rules
Now lets take a look at some of the
important rules of XML syntax.
All XML Elements Must Have a Closing Tag
XML Tags are Case Sensitive
<p>This is a
paragraph.</p>
<br />
. The tag <Letter>
is different from
the tag <letter>.
10. XML Syntax Rules
XML Elements Must be Properly Nested
XML Documents Must Have a Root Element
<b><i>This
text is bold
and
italic</i></b>
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
11. XML Elements
An Element is a technical term for a
textual unit, viewed as a structural
component.
Different types of elements are given
different names
The names do not express meaning and
meanings are application dependent
XML elements are extensible
12. Authoring XML
Elements
An XML element is made up of a start tag, an end tag,
and data in between.
Example:
<director> Matthew Dunn </director>
Example of another element with the same value:
<actor> Matthew Dunn </actor>
XML tags are case-sensitive:
<CITY> <City> <city>
13. Authoring XML Attribute
(cont’d)
An attribute is a name-value pair separated by an
equal sign (=).
Example:
<City ZIP=“94608”> Emeryville </City>
Attributes are used to attach additional, secondary
information to an element.
14. XML Attributes
XML elements can have attributes in
name/value pairs as in HTML.
Attributes must always be in quotes.
Either single or double quotes are valid,
though double quotes are most common.
Attributes are always contained within
the start tag of an element.
15. What is an XML DTD ?
DTD stands for Document Type Definition.
DTD is a formal model for defining the role
of each element
It formally defines the relationship between
the various elements that form the documents.
The purpose of a Document Type Definition is
to define the legal building blocks of an XML
document.
16. Document Type Definitions (DTDs 1)Document Type Definitions (DTDs 1)
XML document types can be specified using a DTD
DTD does not constrain data types
All values represented as strings in XML
DTD definition syntax
<!ELEMENT element (subelements-specification) >
<!ATTLIST element (attributes) >
… more details later
Valid XML documents refer to a DTD (or other
Schema)
17. Document Type Definitions (DTDs 2)Document Type Definitions (DTDs 2)
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE test PUBLIC "-//Webster//DTD test V1.0//EN"
<test> "test" is a document element </test>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE test PUBLIC "-//Webster//DTD test V1.0//EN"
<test> "test" is a document element </test>
<!DOCTYPE test [
<!ELEMENT test EMPTY> ]>
<test/>
<!DOCTYPE test [
<!ELEMENT test EMPTY> ]>
<test/>
External Public DTD Declaration
Internal DTD Declaration
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE test SYSTEM "test.dtd">
<test> "test" is a document element </test>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE test SYSTEM "test.dtd">
<test> "test" is a document element </test>
External DTD Declaration referring to a file or a URL
test =
name of
the root
element
DTD is
defined in
file test.dtd
DTD is
defined
inside XML
Application
should
know DTD
18. XML Validation
There are two types of XML documents
“Well formed” XML
An XML document that conforms to the
syntax of XML is called ‘well formed’
“Valid” XML
An XML document that conforms to a
DTD
Is called a ‘Valid’ DTD
19. What is a well-formed XML document ?What is a well-formed XML document ?
Well-formed documents follow basic syntax rules e.g.
there is an XML declaration in the first line
there is a single document root
all tags use proper delimiters
all elements have start and end tags
But can be minimized if empty: <br/> instead of
<br></br>
all elements are properly nested
<author> <firstname>Mark</firstname>
<lastname>Twain</lastname> </author>
appropriate use of special characters
21. Viewing XML Files
Raw XML files can be viewed in all major browsers.
Don't expect XML files to be displayed as HTML pages.
Viewing XML Files
Viewing an Invalid XML File
Look at this
XML file:
note.xml
Look at this
XML file:
note_error.xml
23. XML future
Given the direction in which
it is growing and the level of support that
XML has received the XML appears to
be the
future of Web publishing
24. SummarSummarYY
XML has a wide range of applications
XML is just a formalism (meta-language), unlike HTML
The W3C framework includes
General purpose (accessory, transducing, ..) languages such as XML
Schema, XSLT, XPath, XQuery, Xlink, RDF, …
Useful languages for contents (vector graphics, multimedia animation,
formulas
Other organizations
Define domain-specific vocabularies
Define alternative XML-based general purpose languages
XML is mostly used “behind the scene”, but increasingly
directly for web contents (via XSLT mostly)