2. Chapter Objectives -1
Discuss markup language
List and explain drawbacks of HTML
Discuss the architecture of XML documents
List the benefits of XML
Discuss Parser
2 Core XML / Chapter 1 / Slide 2 of 35
3. Chapter Objectives -2
Build a complete XML Document:
Character Data
Comments
Processing Instructions
Entities
General Entities
Parameter Entities
The DOCTYPE Declarations
3 Core XML / Chapter 1 / Slide 3 of 35
4. History of Markup
Documents recorded Typesetters formatting
using paper and pen documents
Tools used by typesetters
to format a document
4 Core XML / Chapter 1 / Slide 4 of 35
5. Markup Language
A Markup language defines the rules that help to add
meaning to the content and structure of documents.
They are classified as:
Stylistic Markup – It determines the presentation of the
document
Structure Markup – It defines the structure of the
document
Semantic Markup – It determines the content of the
document
5 Core XML / Chapter 1 / Slide 5 of 35
6. SGML
Generalized Markup Language (GML) is the
system of formatting documents.
GML was fine-tuned and came to be known
as Standard Generalized Markup Language
(SGML).
SGML is the source of origin of all markup
languages
6 Core XML / Chapter 1 / Slide 6 of 35
7. Features of SGML
It describes markup language, which allows
authors to create their own tags that relate to
their content.
It needs a separate file that will contain all
the rules for the language, for its
interpretation
A SGML application is markup language
derived from SGML.
7 Core XML / Chapter 1 / Slide 7 of 35
8. HTML
HTML is the most famous markup language derived
from SGML.
It was created to mark up technical papers so that
they could be transferred across different platforms
for the scientific community.
It is now also used by those non-scientific users who
are concerned about their document’s presentation.
8 Core XML / Chapter 1 / Slide 8 of 35
9. Drawbacks of HTML
Fixed tag set
Presentation technology does not relate to the
contents
It is flat
Clogging
HTML is not international
Data interchange is impossible
Does not have a robust linking mechanism
HTML is not reusable
9 Core XML / Chapter 1 / Slide 9 of 35
10. HTML and XML code Examples
<UL> HTML Code <Details> XML Code
<LI> TOM CRUISE <CONTACT>
<UL>
<LI> CLIENT ID : 100 <PERSON_NAME>TOM CRUISE
<LI> COMPANY : XYZ Corp. </PERSON_NAME>
<LI> Email : tom@usa.net <ID> 100 </ID>
<LI> Phone : 3336767 <Company>XYZ Corp. </Company>
<LI> Street Adress: 25th <Email> tom@usa.net</Email>
St.
<LI> City : Toronto <Phone> 3336767 </Phone>
<LI> State : Toronto <Street> 25th St. </Street>
<LI> Zip : 20056 <City> Toronto </City>
</UL> <State> Toronto </State>
</UL> <ZIP> 20056 </ZIP>
</CONTACT>
</Details> 10Core XML / Chapter 1 / Slide 10 of 35
11. XML -1
XML stands for Extensible Markup Language.
It overcomes all the drawbacks of HTML.
It allows the user to define their own set of tags, and also
makes it possible for others (people or programs) to
understand it.
It is more flexible than HTML.
It inherits the features of SGML and combines it with the
features of HTML.
It is a smaller version of SGML. 11
Core XML / Chapter 1 / Slide 11 of 35
12. XML -2
XML is a metalanguage and it describes other
languages.
The data contained in an XML file can be displayed
in different ways.
It can also be offered to other applications for further
processing.
Style sheets help transform structured data into
different HTML views. This enables data to be
displayed on different browsers.
12Core XML / Chapter 1 / Slide 12 of 35
13. XML Architecture - 1
XML supports three-tier architecture for handling
and manipulating data.
It can be generated from existing databases using a
scalable three-tier model.
XML tags represent the logical structure of data that
can be interpreted and used in various ways by
different applications.
The middle-tier is used to access multiple databases
and translate data into XML.
13Core XML / Chapter 1 / Slide 13 of 35
15. XML – A Universal data format
HTML is a single markup language, but XML is a
family of markup languages.
Any type of data can be easily defined in XML.
XML is popular because it supports a wide range of
applications and is easy to use.
XML has a structured data format, which allows it to
store complex data
15Core XML / Chapter 1 / Slide 15 of 35
16. Benefits of XML
The three-tier architecture has easier
scalability and better security.
The benefits of XML are classified into the
following:
Business benefits
Technological benefits
16Core XML / Chapter 1 / Slide 16 of 35
17. Business Benefits
Information sharing:
Allows businesses to define data formats in XML
Provides tools to read, write and transform data between
XML and other formats
XML inside a single application:
Powerful, flexible and extensible language
Content Delivery:
Supports different users and channels, like digital TV,
phone, web and multimedia kiosks
17Core XML / Chapter 1 / Slide 17 of 35
18. Technological Benefits
Separation of data and
presentation
Semantic Technological Extensibility
information Benefits
Re-use of data
18Core XML / Chapter 1 / Slide 18 of 35
19. XML Document Structure
An XML document is composed of sets of “entities”
identified by unique names.
All documents begin with a root or document
entity.
Entities are aliases for more complex functions.
Documents are logically composed of declarations,
elements, comments, character references, and
processing instructions.
19Core XML / Chapter 1 / Slide 19 of 35
20. Well formed and Valid Documents
An XML document is considered as well formed, if
a minimum set of requirements defined in the XML
1.0 specification are satisfied.
The requirements ensure that correct language terms
are used in the right manner .
A valid XML document is a well-formed XML
document, which conforms to the rules of a
Document Type Definition (DTD).
DTD defines the rules that an XML markup in the
XML document must follow.
20Core XML / Chapter 1 / Slide 20 of 35
21. Parsers - 1
Parsers help the computer interpret an XML
file.
<?xml
version= “1.0”?
>
<nxn> </nxn>
Editor with the XML document parsed by the Parsed document
XML document parser viewed in the browser
Their are two types of parsers:
Non Validating parser
Validating parser
21Core XML / Chapter 1 / Slide 21 of 35
22. Parsers - 2
XML
file
Parsers load the XML
and other related files
to check whether the
XML document is well
formed and valid
Other related Data tree
files (like DTD
file)
22Core XML / Chapter 1 / Slide 22 of 35
23. Data versus Markup
Markup
<NAME> Tom Cruise </NAME>
Data
23Core XML / Chapter 1 / Slide 23 of 35
24. Creating an XML Document
To create an XML document:
State an XML declaration
Create a root element
Create the XML code
Verify the document
24Core XML / Chapter 1 / Slide 24 of 35
25. Stating an XML Declaration
Syntax
<?xml version=“1.0” standalone=“no” encoding=“UTP-8”?>
‘Standalone’ and ‘encoding’ attributes are
optional, only the version number is mandatory
‘Standalone’ – is the external declaration
‘Encoding’ - specifies the character encoding
used by the author
XML 1.0 version is default
25Core XML / Chapter 1 / Slide 25 of 35
26. Creating a Root Element
There can only be one root element
It describes the function of the document
Every XML document must have a root
element
Example
<?xml version=“1.0” standalone=“no” encoding=“UTP-8”?>
<BOOK>
</BOOK>
26Core XML / Chapter 1 / Slide 26 of 35
27. Creating the XML Code -1
It is the process of creating our own elements
and attributes as required by our application.
Elements are the basic units of XML content.
Tags tell the user agent to do something to the
content encased between the start andClosingtag.
Opening Tag Content end Tag
Parts of an
element <TITLE> Aptech Ltd </TITLE>
Element
27Core XML / Chapter 1 / Slide 27 of 35
28. Creating the XML Code -2
Rules govern the elements:
At least one element required
XML tags are case sensitive
End the tags correctly
Nest tags Properly
Use legal tags
Length of markup names
Define Valid Attributes
28Core XML / Chapter 1 / Slide 28 of 35
29. Verify the document
The document should follow the
XML rules; otherwise it will not be
read by the browser or by any other
XML reader
29Core XML / Chapter 1 / Slide 29 of 35
30. Comments
This is information for the understanding of
the user, and is to be ignored by the
processor.
Syntax
<!- - Write the comment here -- >
Example The example given will
<!-- don't show these
<NAME>KATE WINSLET</NAME>
display only the name TOM
<NAME>NICOLE KIDMAN</NAME> CRUSIE, and others are
-->
<NAME>ARNOLD</NAME> treated as comments.
<NAME>TOM CRUISE</NAME> 30
Core XML / Chapter 1 / Slide 30 of 35
31. Processing Instruction
A processing information is a bit of information meant
for the application using the XML document.
These instructions are directly passed to the application
using the parser.
The XML declaration is also a processing agent.
<?xml:stylesheet type=“text/xsl”?>
Name of application Instruction information
31Core XML / Chapter 1 / Slide 31 of 35
32. Character Data
The text between the start and end tags is
defined as ‘character data’.
Character data may be any legal (Unicode).
Character data is classified into:
PCDATA
CDATA
32Core XML / Chapter 1 / Slide 32 of 35
33. PCDATA
It stands for parsed character data.
PCDATA is text that will be parsed by a Parser.
Tags inside the text will be treated as markup and
entities will be expanded.
Entity Name Character
< <
> > Predefined entities
& &
" "
' '
33Core XML / Chapter 1 / Slide 33 of 35
34. CDATA
It means character data.
It will not be parsed by the Parser.
CDATA are used to make it convenient to include
large blocks of special characters.
The character string ]]> is not allowed within a
CDATA block as it will signal the end of the
CDATA block.
<SAMPLE>
<![CDATA[<DOCUMENT>
<NAME>TOM CRUISE</NAME>
Example <EMAIL>tom@usa.com</EMAIL>
</DOCUMENT>]]>
</SAMPLE>
34Core XML / Chapter 1 / Slide 34 of 35
35. Entities
Entities are used to avoid typing long pieces of text
repeatedly within a document.
There are two categories of entities:
General entities
Syntax
<!ENTITY ADDRESS "text that is to be represented
by an entity">
Parameter entities
Syntax
<!ENTITY % ADDRESS "text that is to be represented by an entity">
35Core XML / Chapter 1 / Slide 35 of 35
36. Examples of Entities
An example of Parameter entities An example of a General entity
< CLIENT = "&APTECH;" PRODUCT
= "&PRODUCT_ID;" QUANTITY <!ENTITY full_address " My
= "15"> Address 12 Tenth Ave. Suite 12
Entity declaration Paris, France">
Syntax Entity declaration
%PARAMETER_ENTITY_NAM Syntax
E; &ENTITY_NAME;
Example Example
%address; &address;
36Core XML / Chapter 1 / Slide 36 of 35
37. The DOCTYPE declarations
The <!DOCTYPE [..]> declaration follows the XML
declaration in an XML document.
Syntax
<?xml version="1.0"?>
<!DOCTYPE myDoc [
...declare the entities here....
<myDoc>
...body of the document....
</myDoc>
Example
<!DOCTYPE CUSTOMERS [
<!ENTITY firstFloor "15 Downing St Floor 1">
<!ENTITY secondFloor "15 Downing St Floor 2">
<!ENTITY thirdFloor "15 Downing St Floor 3">
]>
37Core XML / Chapter 1 / Slide 37 of 35
38. Attributes
An attribute gives information about an
element.
Attributes are embedded in the element start
tag.
An attribute consists of an attribute name and
attribute value.
Example
<TV count="8">SONY</TV>
<LAPTOP count="10">IBM</LAPTOP>
38Core XML / Chapter 1 / Slide 38 of 35
39. Summary-1
A markup language defines a set of rules that adds
meaning to the content and structure of documents
XML is extensible, which means that we can define our
own set of tags, and make it possible for other parties
(people or programs) to know and understand these tags.
This makes XML much more flexible than HTML
XML inherits features from SGML and includes the
features of HTML. XML can be generated from existing
databases using a scalable three-tier model. XML-based
data does not contain information about how data should
be displayed
An XML document is composed of a set of “entities”
identified by unique names 39
Core XML / Chapter 1 / Slide 39 of 35
40. Summary-2
A well-formed document is one that conforms to the basic
rules of XML; a valid document is a well-formed
document that conforms to the rules of a DTD (Document
Type Definition)
The parser helps the computer to interpret an XML file
Steps involved in the building of an XML document are:
Stating an XML declaration
Creating a root element
Creating the XML code
Verifying the document
Character data is classified into PCDATA and CDATA
40Core XML / Chapter 1 / Slide 40 of 35
41. Summary-3
Entities are used to avoid typing long pieces of text repeatedly
in a document. The two types of entities are:
General entities
Parameter entities
The <!DOCTYPE […]> declaration follows the XML
declaration in an XML document.
An attribute gives information about an element
41Core XML / Chapter 1 / Slide 41 of 35