2. What is XML?
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. You must define your
own tags
XML uses a Document Type Definition (DTD) or an
XML Schema to describe the data
XML with a DTD or XML Schema is designed to be self-
descriptive
XML is a W3C Recommendation
3. XML was designed to describe
data and to focus on what data
is.
HTML was designed to display
data and to focus on how data
looks.
4. XML is a W3C
Recommendation
The Extensible Markup Language
(XML) became a W3C
Recommendation 10. February
1998
5. The Main Difference Between XML and HTML
XML was designed to carry data.
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on
what data is.
HTML was designed to display data and to focus on how
data looks.
HTML is about displaying information, while XML is
about describing information
6. XML Does not DO Anything
XML was not designed to DO anything.
Maybe it is a little hard to understand, but XML does
not DO anything. XML was created to structure, store
and to send information.
The following example is a note to Tove from Jani,
stored as XML:
8. The note has a header and a
message body. It also has sender
and receiver information. But still,
this XML document does not DO
anything. It is just pure
information wrapped in XML tags.
Someone must write a piece of
software to send, receive or
display it.
9. XML is Free and Extensible
XML tags are not predefined. You must "invent"
your own tags.
The tags used to mark up HTML documents and the
structure of HTML documents are predefined. The
author of HTML documents can only use tags that are
defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags and his
own document structure.
The tags in the example above (like <to> and <from>)
are not defined in any XML standard. These tags are
"invented" by the author of the XML document.
10. XML is a Complement to HTML
XML is not a replacement for HTML.
It is important to understand that XML is not a
replacement for HTML. In future Web development it is
most likely that XML will be used to describe the data,
while HTML will be used to format and display the same
data.
My best description of XML is this: XML is a cross-
platform, software and hardware independent
tool for transmitting information
11. XML in Future Web Development
XML is going to be everywhere.
We have been participating in XML development since
its creation. It has been amazing to see how quickly the
XML standard has been developed and how quickly a
large number of software vendors have adopted the
standard.
We strongly believe that XML will be as important to the
future of the Web as HTML has been to the foundation
of the Web and that XML will be the most common tool
for all data manipulation and data transmission.
14. With XML, your data is stored outside your HTML.
When HTML is used to display data, the data is stored
inside your HTML. With XML, data can be stored in
separate XML files. This way you can concentrate on
using HTML for data layout and display, and be sure
that changes in the underlying data will not require any
changes to your HTML.
XML data can also be stored inside HTML pages as
"Data Islands". You can still concentrate on using HTML
only for formatting and displaying the data.
16. With XML, data can be exchanged between
incompatible systems.
In the real world, computer systems and databases
contain data in incompatible formats. One of the most
time-consuming challenges for developers has been to
exchange data between such systems over the
Internet.
Converting the data to XML can greatly reduce this
complexity and create data that can be read by many
different types of applications
18. With XML, financial information can be exchanged
over the Internet.
Expect to see a lot about XML and B2B (Business To
Business) in the near future.
XML is going to be the main language for exchanging
financial information between businesses over the
Internet. A lot of interesting B2B applications are under
development
20. With XML, plain text files can be used to share
data.
Since XML data is stored in plain text format, XML
provides a software- and hardware-independent way of
sharing data.
This makes it much easier to create data that different
applications can work with. It also makes it easier to
expand or upgrade a system to new operating systems,
servers, applications, and new browsers.
22. With XML, plain text files can
be used to store data.
XML can also be used to store
data in files or in databases.
Applications can be written to
store and retrieve information
from the store, and generic
applications can be used to display
the data.
24. With XML, your data is available to more users.
Since XML is independent of hardware, software and
application, you can make your data available to other
than only standard HTML browsers.
Other clients and applications can access your XML files
as data sources, like they are accessing databases.
Your data can be made available to all kinds of "reading
machines" (agents), and it is easier to make your data
available for blind people, or people with other
disabilities.
26. XML is the mother of WAP and
WML.
The Wireless Markup Language
(WML), used to markup Internet
applications for handheld devices
like mobile phones, is written in
XML.
28. The syntax rules of XML are
very simple and very strict.
The rules are very easy to
learn, and very easy to use.
Because of this, creating
software that can read and
manipulate XML is very easy.
29. An Example XML Document
XML documents use a self-describing and simple
syntax.
<?xml version="1.0" encoding="ISO-8859-1"?>
<note> <to>Tove</to> <from>Jani</from>
<heading>Reminder</heading> <body>Don't forget
me this weekend!</body> </note>The first line in the
document - the XML declaration - defines the XML
version and the character encoding used in the
document. In this case the document conforms to the
1.0 specification of XML and uses the ISO-8859-1
(Latin-1/West European) character set
30. The next line describes the root element of the
document (like it was saying: "this document is a
note"):
<note>The next 4 lines describe 4 child elements of the
root (to, from, heading, and body):
<to>Tove</to> <from>Jani</from>
<heading>Reminder</heading> <body>Don't forget
me this weekend!</body>And finally the last line
defines the end of the root element:
</note>
31. All XML Elements Must Have a Closing Tag
With XML, it is illegal to omit the closing tag.
In HTML some elements do not have to have a closing
tag. The following code is legal in HTML:
<p>This is a paragraph <p>This is another
paragraphIn XML all elements must have a closing tag,
like this:
<p>This is a paragraph</p> <p>This is another
paragraph</p>
32. XML Tags are Case Sensitive
Unlike HTML, XML tags are case sensitive.
With XML, the tag <Letter> is different from the tag
<letter>.
Opening and closing tags must therefore be written
with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
33. XML Elements Must be Properly Nested
Improper nesting of tags makes no sense to XML.
In HTML some elements can be improperly nested
within each other like this:
<b><i>This text is bold and italic</b></i>In XML all
elements must be properly nested within each other like
this:
<b><i>This text is bold and italic</i></b>
34. XML Documents Must Have a Root Element
All XML documents must contain a single tag pair
to define a root element.
All other elements must be within this root element.
All elements can have sub elements (child elements).
Sub elements must be correctly nested within their
parent element:
<root> <child> <subchild>.....</subchild> </child>
</root>
36. With XML, it is illegal to omit quotation marks
around attribute values.
XML elements can have attributes in name/value pairs
just like in HTML. In XML the attribute value must
always be quoted. Study the two XML documents
below. The first one is incorrect, the second is correct:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note date=12/11/2002>
<to>Tove</to>
<from>Jani</from>
</note>
37. <?xml version="1.0" encoding="ISO-8859-1"?>
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute
in the note element is not quoted. This is correct:
date="12/11/2002". This is incorrect:
date=12/11/2002.
38. With XML, White Space is Preserved
With XML, the white space in your document is
not truncated.
This is unlike HTML. With HTML, a sentence like this:
Hello my name is Tove,
will be displayed like this:
Hello my name is Tove,
because HTML reduces multiple, consecutive white
space characters to a single white space.
39. Comments in XML
The syntax for writing comments in XML is similar to that of
HTML.
<!-- This is a comment -->
There is Nothing Special About XML
There is nothing special about XML. It is just plain text with
the addition of some XML tags enclosed in angle brackets.
Software that can handle plain text can also handle XML. In a
simple text editor, the XML tags will be visible and will not be
handled specially.
In an XML-aware application however, the XML tags can be
handled specially. The tags may or may not be visible, or
have a functional meaning, depending on the nature of the
application.
42. XML documents can be extended to carry more information.
Look at the following XML NOTE example:
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Let's imagine that we created an application that extracted
the <to>, <from>, and <body> elements from the XML
document to produce this output:
MESSAGE To: Tove
From: Jani
Don't forget me this weekend!
43.
44.
45. Imagine that the author of the XML document added
some extra information to it:
<note> <date>2002-08-01</date> <to>Tove</to>
<from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>Should the application break or crash?
No. The application should still be able to find the <to>,
<from>, and <body> elements in the XML document
and produce the same output.
XML documents are Extensible
47. Elements are related as parents and
children.
To understand XML terminology, you have to
know how relationships between XML elements
are named, and how element content is
described.
Imagine that this is a description of a book:
My First XMLIntroduction to XML
What is HTML
What is XML
XML Syntax
Elements must have a closing tag
Elements must be properly nested
48. Imagine that this XML document describes the
book:
<book>
<title>My First XML</title>
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
49. Book is the root element. Title,
prod, and chapter are child
elements of book. Book is the
parent element of title, prod,
and chapter. Title, prod, and
chapter are siblings (or sister
elements) because they have the
same parent
51. Elements can have different content types.
An XML element is everything from (including) the
element's start tag to (including) the element's end tag.
An element can have element content, mixed content,
simple content, or empty content. An element can
also have attributes.
In the example above, book has element content,
because it contains other elements. Chapter has mixed
content because it contains both text and other
elements. Para has simple content (or text content)
because it contains only text. Prod has empty content,
because it carries no information.
In the example above only the prod element has
attributes. The attribute named id has the value
"33-657". The attribute named media has the value
"paper".
53. XML elements must follow these naming rules:
Names can contain letters, numbers, and other
characters
Names must not start with a number or punctuation
character
Names must not start with the letters xml (or XML, or
Xml, etc)
Names cannot contain spaces
Take care when you "invent" element names and follow
these simple rules:
Any name can be used, no words are reserved, but the
idea is to make names descriptive. Names with an
underscore separator are nice.
Examples: <first_name>, <last_name>.
54. name from first. Or if you name something "first.name," your
software may think that "name" is a property of the object "first."
Element names can be as long as you like, but don't exaggerate.
Names should be short and simple, like this: <book_title> not like
this: <the_title_of_the_book>.
XML documents often have a corresponding database, in which fields
exist corresponding to elements in the XML document. A good
practice is to use the naming rules of your database for the elements
in the XML documents.
Non-English letters like éòá are perfectly legal in XML element
names, but watch out for problems if your software vendor doesn't
support them.
The ":" should not be used in element names because it is reserved
to be used for something called namespaces (more later).
56. XML elements can have attributes.
From HTML you will remember this: <IMG
SRC="computer.gif">. The SRC attribute
provides additional information about the IMG
element.
In HTML (and in XML) attributes provide
additional information about elements:
<img src="computer.gif"> <a
href="demo.asp">Attributes often provide
information that is not a part of the data. In
the example below, the file type is irrelevant to
the data, but important to the software that
wants to manipulate the element:
<file type="gif">computer.gif</file>
57. Quote Styles, "female" or 'female'?
Attribute values must always be enclosed in
quotes, but either single or double quotes can
be used. For a person's sex, the person tag
can be written like this:
<person sex="female">or like this:
<person sex='female'>Note: If the attribute
value itself contains double quotes it is
necessary to use single quotes, like in this
example:
<gangster name='George "Shotgun" Ziegler'>
58. Note: If the attribute value itself
contains single quotes it is
necessary to use double quotes,
like in this example:
<gangster name="George
'Shotgun' Ziegler">
59. Use of Elements vs. Attributes
Data can be stored in child
elements or in attributes.
Take a look at these examples:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person> <sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
60. In the first example sex is an attribute.
In the last, sex is a child element. Both
examples provide the same
information.
There are no rules about when to use
attributes, and when to use child
elements. My experience is that
attributes are handy in HTML, but in
XML you should try to avoid them. Use
child elements if the information feels
like data.
61. I like to store data in child elements.
The following three XML documents contain
exactly the same information:
A date attribute is used in the first example:
<note date="12/11/2002"> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading> <body>Don't
forget me this weekend!</body> </note>
62. A date element is used in the
second example:
<note>
<date>12/11/2002</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body> </note>
63. An expanded date element is used
in the third: (THIS IS MY
FAVORITE):
<note> <date> <day>12</day>
<month>11</month>
<year>2002</year> </date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body> </note>
64. Avoid using attributes?
Should you avoid using attributes?
Some of the problems with using attributes are:
attributes cannot contain multiple values (child elements
can)
attributes are not easily expandable (for future changes)
attributes cannot describe structures (child elements can)
attributes are more difficult to manipulate by program
code
attribute values are not easy to test against a Document
Type Definition (DTD) - which is used to define the legal
elements of an XML document
65. If you use attributes as containers for data,
you end up with documents that are difficult to
read and maintain. Try to use elements to
describe data. Use attributes only to provide
information that is not relevant to the data.
Don't end up like this (this is not how XML
should be used):
<note day="12" month="11" year="2002"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>
67. Rules always have exceptions.
My rule about attributes has one exception:
Sometimes I assign ID references to elements. These
ID references can be used to access XML elements in
much the same way as the NAME or ID attributes in
HTML. This example demonstrates this:
<messages> <note id="p501"> <to>Tove</to>
<from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note> <note id="p502"> <to>Jani</to>
<from>Tove</from> <heading>Re:
Reminder</heading> <body>I will not!</body>
</note> </messages>
68. The ID in these examples is just a
counter, or a unique identifier, to
identify the different notes in the
XML file, and not a part of the
note data.
What I am trying to say here is
that metadata (data about data)
should be stored as attributes,
and that data itself should be
stored as elements.
70. Well Formed XML Documents
A "Well Formed" XML document has
correct XML syntax.
A "Well Formed" XML document is a document
that conforms to the XML syntax rules that
were described in the previous chapters:
XML documents must have a root element
XML elements must have a closing tag
XML tags are case sensitive
XML elements must be properly nested
XML attribute values must always be quoted
73. A "Valid" XML document also
conforms to a DTD.
A "Valid" XML document is a "Well
Formed" XML document, which also
conforms to the rules of a Document
Type Definition (DTD):
<?xml version="1.0" encoding="ISO-
8859-1"?> <!DOCTYPE note SYSTEM
"InternalNote.dtd"> <note>
<to>Tove</to> <from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!
</body> </note>
74. XML DTD
A DTD defines the legal
elements of an XML document.
The purpose of a DTD is to define
the legal building blocks of an XML
document. It defines the
document structure with a list of
legal elements
76. Internal DOCTYPE declaration
If the DTD is included in your XML
source file, it should be wrapped
in a DOCTYPE definition with the
following syntax:
<!DOCTYPE root-element
[element-declarations]>
77. <?xml version="1.0"?>
<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)> ]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
78. The DTD above is interpreted like this:
!DOCTYPE note (in line 2) defines that this is a document of
the type note.
!ELEMENT note (in line 3) defines the note element as
having four elements: "to,from,heading,body".
!ELEMENT to (in line 4) defines the to element to be of the
type "#PCDATA".
!ELEMENT from (in line 5) defines the from element to be of
the type "#PCDATA"
and so on.....
79. External DOCTYPE declaration
If the DTD is external to your XML
source file, it should be wrapped
in a DOCTYPE definition with the
following syntax:
<!DOCTYPE root-element SYSTEM
"filename">
80. <?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
81. And this is a copy of the file "note.dtd" containing the DTD:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
82. Why use a DTD?
With DTD, each of your XML files can
carry a description of its own format
with it.
With a DTD, independent groups of
people can agree to use a common
DTD for interchanging data.
Your application can use a standard
DTD to verify that the data you receive
from the outside world is valid.
You can also use a DTD to verify your
own data.
84. The building blocks of XML
documents
Seen from a DTD point of view, all
XML documents (and HTML
documents) are made up by the
following simple building blocks:
Elements
Attributes
Entities
PCDATA
CDATA
85. Elements
Elements are the main building
blocks of both XML and HTML
documents.
Examples of HTML elements are "body"
and "table". Examples of XML elements
could be "note" and "message".
Elements can contain text, other
elements, or be empty. Examples of
empty HTML elements are "hr", "br"
and "img".
Examples:
<body>body text in
between</body><message>some
message in between</message>
86. Attributes
Attributes provide extra information
about elements.
Attributes are always placed inside the
starting tag of an element. Attributes
always come in name/value pairs. The
following "img" element has additional
information about a source file:
<img src="computer.gif" />The name of
the element is "img". The name of the
attribute is "src". The value of the attribute
is "computer.gif". Since the element itself
is empty it is closed by a " /".
87. Entities
Entities are variables used to define
common text. Entity references are
references to entities.
Most of you will know the HTML entity
reference: " ". This "no-
breaking-space" entity is used in HTML
to insert an extra space in a document.
Entities are expanded when a
document is parsed by an XML parser.
88. The following entities are
predefined in XML:
Entity References Character
< <
> >
& &
" "
' '
89. PCDATA
PCDATA means parsed character data.
Think of character data as the text found between the
start tag and the end tag of an XML element.
PCDATA is text that will be parsed by a parser. Tags
inside the text will be treated as markup and entities
will be expanded.
CDATA
CDATA also means character data.
CDATA is text that will NOT be parsed by a parser. Tags
inside the text will NOT be treated as markup and
entities will not be expanded.
91. Declaring an Element
In the DTD, XML elements are
declared with an element
declaration. An element
declaration has the following
syntax:
<!ELEMENT element-name
category> or <!ELEMENT
element-name (element-
content)>
92. Empty elements
Empty elements are declared with
the category keyword EMPTY:
<!ELEMENT element-name
EMPTY> example:<!ELEMENT br
EMPTY>XML example:<br />
93. Elements with only character
data
Elements with only character data
are declared with #PCDATA inside
parentheses:
<!ELEMENT element-name
(#PCDATA)> example:<!ELEMENT
from (#PCDATA)>
94. Elements with any contents
Elements declared with the
category keyword ANY, can
contain any combination of
parsable data:
<!ELEMENT element-name
ANY>example:<!ELEMENT note
ANY>
95. Elements with children
(sequences)
Elements with one or more children
are defined with the name of the
children elements inside parentheses:
<!ELEMENT element-name (child-
element-name)> or <!ELEMENT
element-name (child-element-
name,child-element-
name,.....)>example:<!ELEMENT note
(to,from,heading,body)>
96. When children are declared in a
sequence separated by commas, the
children must appear in the same
sequence in the document. In a full
declaration, the children must also be
declared, and the children can also
have children. The full declaration of
the "note" element will be:
<!ELEMENT note
(to,from,heading,body)> <!ELEMENT
to (#PCDATA)> <!ELEMENT from
(#PCDATA)> <!ELEMENT heading
(#PCDATA)> <!ELEMENT body
(#PCDATA)>
97. Declaring only one occurrence
of the same element
<!ELEMENT element-name (child-
name)>example:<!ELEMENT note
(message)>The example
declaration above declares that
the child element message must
occur once, and only once inside
the "note" element.
98. Declaring minimum one
occurrence of the same
element
<!ELEMENT element-name (child-
name+)>example:<!ELEMENT
note (message+)>The + sign in
the example above declares that
the child element message must
occur one or more times inside
the "note" element.
99. Declaring zero or more
occurrences of the same
element
<!ELEMENT element-name (child-
name*)>example:<!ELEMENT
note (message*)>The * sign in
the example above declares that
the child element message can
occur zero or more times inside
the "note" element.
100. Declaring zero or one
occurrences of the same
element
<!ELEMENT element-name (child-
name?)>example:<!ELEMENT
note (message?)>The ? sign in
the example above declares that
the child element message can
occur zero or one times inside the
"note" element.
101. Declaring either/or content
example:<!ELEMENT note
(to,from,header,(message|
body))>The example above
declares that the "note" element
must contain a "to" element, a
"from" element, a "header"
element, and either a "message"
or a "body" element.
102. Declaring mixed content
example:<!ELEMENT note
(#PCDATA|to|from|header|
message)*>The example above
declares that the "note" element
can contain zero or more
occurrences of parsed character,
"to", "from", "header", or
"message" elements
104. Declaring Attributes
An attribute declaration has the
following syntax:
<!ATTLIST element-name
attribute-name attribute-type
default-value>example:DTD
example: <!ATTLIST payment
type CDATA "check"> XML
example: <payment type="check"
/>
105. The attribute-type can have the
following values:
Value Explanation
CDATA The value is character data
(en1|
en2|..)
The value must be one from an
enumerated list
ID The value is a unique id
IDREF
The value is the id of another
element
IDREFS The value is a list of other ids
NMTOKE
N
The value is a valid XML name
NMTOKE
NS
The value is a list of valid XML
names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATIO
N
The value is a name of a notation
xml: The value is a predefined xml value
106. The default-value can have the following values:
Value
Explanation
value The default value of the attribute
#REQUIRED
The attribute value must be included
in the element
#IMPLIED
The attribute does not have to be
included
#FIXED value The attribute value is fixed
107. Specifying a Default attribute
value
DTD: <!ELEMENT square EMPTY> <!
ATTLIST square width CDATA
"0">Valid XML: <square
width="100" />In the example above,
the "square" element is defined to be
an empty element with a "width"
attribute of type CDATA. If no width is
specified, it has a default value of 0.
110. <!ATTLIST element-name attribute-name
attribute-type #FIXED "value">Example
DTD: <!ATTLIST sender company CDATA
#FIXED "Microsoft">Valid XML: <sender
company="Microsoft" />Invalid XML:
<sender company=“abc" />Use the
#FIXED keyword when you want an
attribute to have a fixed value without
allowing the author to change it. If an
author includes another value, the XML
parser will return an error.
#FIXED
Syntax
111. Enumerated attribute values
Syntax: <!ATTLIST element-name attribute-
name (en1|en2|..) default-value>DTD example:
<!ATTLIST payment type (check|cash) "cash">
XML example: <payment type="check" /> or
<payment type="cash" />Use enumerated
attribute values when you want the attribute
values to be one of a fixed set of legal values.
113. Entities are variables used to define shortcuts
to common text.
- Entity references are references to entities.
- Entities can be declared internal, or external
116. DTD Summary
This tutorial has taught you how to describe
the structure of an XML document.
You have learned how to use a DTD to define
the legal elements of an XML document, and
how the DTD can be declared inside your XML
document, or as an external reference.
You have learned how to declare the legal
elements, attributes, entities, and CDATA
sections for XML documents.
You have also seen how to validate an XML
document against a DTD.
118. What is an XML Schema?
The purpose of an XML Schema is to define the legal
building blocks of an XML document, just like a DTD.
An XML Schema:
defines elements that can appear in a document
defines attributes that can appear in a document
defines which elements are child elements
defines the order of child elements
defines the number of child elements
defines whether an element is empty or can include
text
defines data types for elements and attributes
defines default and fixed values for elements and
attributes
119. XML Schemas are the Successors
of DTDs
We think that very soon XML Schemas
will be used in most Web applications
as a replacement for DTDs. Here are
some reasons:
XML Schemas are extensible to future
additions
XML Schemas are richer and more
powerful than DTDs
XML Schemas are written in XML
XML Schemas support data types
XML Schemas support namespaces
121. XML Schemas Support Data Types
One of the greatest strength of XML Schemas
is the support for data types.
With support for data types:
It is easier to describe allowable document
content
It is easier to validate the correctness of data
It is easier to work with data from a database
It is easier to define data facets (restrictions
on data)
It is easier to define data patterns (data
formats)
It is easier to convert data between different
data types
122. XML Schemas use XML Syntax
Another great strength about XML Schemas is
that they are written in XML.
Some benefits of that XML Schemas are
written in XML:
You don't have to learn a new language
You can use your XML editor to edit your
Schema files
You can use your XML parser to parse your
Schema files
You can manipulate your Schema with the XML
DOM
You can transform your Schema with XSLT
123. XML Schemas Secure Data Communication
When sending data from a sender to a receiver, it is
essential that both parts have the same "expectations"
about the content.
With XML Schemas, the sender can describe the data in
a way that the receiver will understand.
A date like: "03-11-2004" will, in some countries, be
interpreted as 3.November and in other countries as
11.March.
However, an XML element with a data type like this:
<date type="date">2004-03-11</date>
ensures a mutual understanding of the content,
because the XML data type "date" requires the format
"YYYY-MM-DD".
124. XML Schemas are Extensible
XML Schemas are extensible, because
they are written in XML.
With an extensible Schema definition you
can:
Reuse your Schema in other Schemas
Create your own data types derived from
the standard types
Reference multiple schemas in the same
document
125. Well-Formed is not Enough
A well-formed XML document is a document that conforms to
the XML syntax rules, like:
it must begin with the XML declaration
it must have one unique root element
start-tags must have matching end-tags
elements are case sensitive
all elements must be closed
all elements must be properly nested
all attribute values must be quoted
entities must be used for special characters
Even if documents are well-formed they can still contain
errors, and those errors can have serious consequences.
Think of the following situation: you order 5 gross of laser
printers, instead of 5 laser printers. With XML Schemas, most
of these errors can be caught by your validating software.
126. A Simple XML Document
Look at this simple XML document
called "note.xml":
<?xml version="1.0"?> <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body> </note>
127. A DTD File
The following example is a DTD file called "note.dtd"
that defines the elements of the XML document above
("note.xml"):
<!ELEMENT note (to, from, heading, body)> <!
ELEMENT to (#PCDATA)> <!ELEMENT from
(#PCDATA)> <!ELEMENT heading (#PCDATA)> <!
ELEMENT body (#PCDATA)>The first line defines the
note element to have four child elements: "to, from,
heading, body".
Line 2-5 defines the to, from, heading, body elements
to be of type "#PCDATA".
128. An XML Schema
The following example is an XML
Schema file called "note.xsd" that
defines the elements of the XML
document above ("note.xml"):
130. A Reference to a DTD
This XML document has a
reference to a DTD:
<?xml version="1.0"?><!
DOCTYPE note SYSTEM
"http://www.w3schools.com/dtd/n
ote.dtd"><note> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body> </note>
131. A Reference to an XML Schema
This XML document has a reference to
an XML Schema:
<?xml version="1.0"?><note
xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance"
xsi:schemaLocation="http://www.w3sc
hools.com note.xsd"> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!
</body> </note>
133. The <schema> Element
The <schema> element is the
root element of every XML
Schema:
<?xml version="1.0"?
><xs:schema>...
...</xs:schema>
134. The <schema> element may contain
some attributes. A schema declaration
often looks something like this:
<?xml version="1.0"?><xs:schema
xmlns:xs="http://www.w3.org/2001/X
MLSchema"
targetNamespace="http://www.w3sch
ools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">... ...
</xs:schema>
136. What is a Simple Element?
A simple element is an XML element that
can contain only text. It cannot contain any
other elements or attributes.
However, the "only text" restriction is quite
misleading. The text can be of many
different types. It can be one of the types
included in the XML Schema definition
(boolean, string, date, etc.), or it can be a
custom type that you can define yourself.
You can also add restrictions (facets) to a
data type in order to limit its content, or
you can require the data to match a
specific pattern
137. Defining a Simple Element
The syntax for defining a simple element is:
<xs:element name="xxx" type="yyy"/>
where xxx is the name of the element and yyy
is the data type of the element. XML Schema
has a lot of built-in data types. The most
common types are:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
138. Example
Here are some XML elements:
<lastname>Refsnes</lastname>
<age>36</age> <dateborn>1970-03-
27</dateborn>And here are the corresponding
simple element definitions:
<xs:element name="lastname"
type="xs:string"/> <xs:element name="age"
type="xs:integer"/> <xs:element
name="dateborn" type="xs:date"/>
139. Default and Fixed Values for Simple Elements
Simple elements may have a default value OR a fixed
value specified.
A default value is automatically assigned to the element
when no other value is specified.
In the following example the default value is "red":
<xs:element name="color" type="xs:string"
default="red"/>A fixed value is also automatically
assigned to the element, and you cannot specify
another value.
In the following example the fixed value is "red":
<xs:element name="color" type="xs:string"
fixed="red"/>
141. What is an Attribute?
Simple elements cannot have
attributes. If an element has
attributes, it is considered to be of
a complex type. But the attribute
itself is always declared as a
simple type.
142. How to Define an Attribute?
The syntax for defining an attribute is:
<xs:attribute name="xxx" type="yyy"/>
where xxx is the name of the attribute and yyy
specifies the data type of the attribute. XML
Schema has a lot of built-in data types. The
most common types are:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
143. Example
Here is an XML element with an
attribute:
<lastname
lang="EN">Smith</lastname>An
d here is the corresponding
attribute definition:
<xs:attribute name="lang"
type="xs:string"/>
144. Default and Fixed Values for Attributes
Attributes may have a default value OR a fixed value
specified.
A default value is automatically assigned to the
attribute when no other value is specified.
In the following example the default value is "EN":
<xs:attribute name="lang" type="xs:string"
default="EN"/>A fixed value is also automatically
assigned to the attribute, and you cannot specify
another value.
In the following example the fixed value is "EN":
<xs:attribute name="lang" type="xs:string"
fixed="EN"/>
145. Optional and Required
Attributes
Attributes are optional by default.
To specify that the attribute is
required, use the "use" attribute:
<xs:attribute name="lang"
type="xs:string"
use="required"/>
146. Restrictions on Content
When an XML element or attribute has a data
type defined, it puts restrictions on the
element's or attribute's content.
If an XML element is of type "xs:date" and
contains a string like "Hello World", the
element will not validate.
With XML Schemas, you can also add your own
restrictions to your XML elements and
attributes. These restrictions are called facets
148. Mozilla Firefox
As of version 1.0.2, Firefox has support for XML and
XSLT (and CSS).
Mozilla
Mozilla includes Expat for XML parsing and has support
to display XML + CSS. Mozilla also has some support for
Namespaces.
Mozilla is available with an XSLT implementation.
Netscape
As of version 8, Netscape uses the Mozilla engine, and
therefore it has the same XML / XSLT support as
Mozilla.
Opera
As of version 9, Opera has support for XML and XSLT
(and CSS). Version 8 supports only XML + CSS.
Internet Explorer
As of version 6, Internet Explorer supports XML,
Namespaces, CSS, XSLT, and XPath
152. XML Data Embedded in HTML
An XML data island is XML data embedded into
an HTML page.
Here is how it works; assume we have the
following XML document ("note.xml"):
<?xml version="1.0" encoding="ISO-8859-1"?
> <note> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading> <body>Don't
forget me this weekend!</body> </note>
153. Then, in an HTML document, you
can embed the XML file above
with the <xml> tag. The id
attribute of the <xml> tag defines
an ID for the data island, and the
src attribute points to the XML file
to embed:
156. The HTML file looks like this:
<html> <body> <xml id="cdcat"
src="cd_catalog.xml"></xml>
<table border="1"
datasrc="#cdcat"> <tr>
<td><span
datafld="ARTIST"></span></td>
<td><span
datafld="TITLE"></span></td>
</tr> </table> </body> </html>
157. Example explained:
The datasrc attribute of the <table> tag binds the HTML
table element to the XML data island. The datasrc
attribute refers to the id attribute of the data island.
<td> tags cannot be bound to data, so we are using
<span> tags. The <span> tag allows the datafld
attribute to refer to the XML element to be displayed. In
this case, it is datafld="ARTIST" for the <ARTIST>
element and datafld="TITLE" for the <TITLE> element
in the XML file. As the XML is read, additional rows are
created for each <CD> element.
159. Name Conflicts
Since element names in XML are not
predefined, a name conflict will occur
when two different documents use the
same element names.
This XML document carries information
in a table:
<table> <tr> <td>Apples</td>
<td>Bananas</td> </tr>
</table>This XML document carries
information about a table (a piece of
furniture):
161. Solving Name Conflicts Using a Prefix
This XML document carries information in a table:
<h:table> <h:tr> <h:td>Apples</h:td>
<h:td>Bananas</h:td> </h:tr> </h:table>This XML
document carries information about a piece of furniture:
<f:table> <f:name>African Coffee Table</f:name>
<f:width>80</f:width> <f:length>120</f:length>
</f:table>Now there will be no name conflict because the two
documents use a different name for their <table> element
(<h:table> and <f:table>).
By using a prefix, we have created two different types of
<table> elements.
162. XML PARSERS:
Two basic approach followed by
parsers are SAX(single API for
XML) or DOM(Document object
model).
163. SAX
Sequential ,event based
Cannot move laterally between
elements
Tough to use for complex
structures
Saves memory space
Better choice for quick,less
intensive parsing and processing
Intertactive so can be used for
larger files
164. DOM
Memory tree representation
Lets you move back and forth ,up
and down
Easy to use and has clean
interface
Memory intensive for larger XML
documents
Better choice for complex XML
structures
Can be used for smaller files as
memory intensive.