SlideShare uma empresa Scribd logo
1 de 36
Introduction to XML
Shannon Davis | ssdavis@wustl.edu
• Self-describing document
“Hi. I am
a book.”
What Is XML?
• Self-describing document
What Is XML?
<book> </book>
• Simplicity
• Open standard
• Extensibility
• Interoperability
• Separates content from presentation
Why Use XML?
Why Use XML?
Why Use XML?
XHTML
XML HTML
SGML
History of Mark Up Languages
XHTML
XML HTML
SGML
<H1><I>I am Born</I></H1><BR>
History of Mark Up Languages
XHTML
XML HTML
SGML
<head type=“chapter” n=“01”>I am Born</head>
History of Mark Up Languages
XHTML
XML HTML
SGML
<h1><i>I am Born</i></h1><br />
History of Mark Up Languages
every XML document must declare itself
as an XML document
<?xml version="1.0"?>
<?xml version="1.0"? Encoding=“utf-8”?>
Basic Rules of XML
every XML document must have a root
element that wraps the entire
document
<TEI></TEI>
or:
<modsCollection></modsCollection>
Basic Rules of XML
every XML tag that opens must close
<div1></div1>
<head></head>
<name></name>
• The only exception to this are self-closing tags:
<pb/>
<milestone/>
<link/>
Basic Rules of XML
Basic Rules of XML
tags are case-sensitive, and tag-pairs
must match
<title></title>
not:
<title></TITLE>
or:
<Title></TITLE>
Basic Rules of XML
all tags must nest correctly
<title><persName>Dr. Strangelove</persName>,
<subtitle> or, How I learned to stop worrying and
love the bomb.</subtitle></title>
not:
<title><persName>Dr. Strangelove
</persName>,<subtitle> or, How I
learned to stop worrying and love the
bomb.</title></subtitle>
Basic Rules of XML
Well-formed XML
The following is NOT a well-formed document. Why?
<?xml version="1.0"?>
<BOOK>
<TITLE>The Adventures of Huckleberry Finn
<AUTHOR>Mark Twain</TITLE></AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</price>
</BOOK>
Review: Basic Rules of XML
• an XML document must have an XML declaration:
<?xml version="1.0"?>
• every XML document must have a root element that wraps the
entire document:
• every XML tag that opens must close: the only exception to this
are self-closing tags
• tags are case-sensitive and tags must match
• all tags must nest correctly
Exercise 1
Using what you’ve learned about well formed XML,
create an XML file describing a text.
1. Open Wordpad or Notepad
2. Open springtime.txt from student_files
3. Use any tags you like to mark up the text to
create a well formed XML document.
Key Concepts of XML
XML applications
Dublin Core –broad metadata standard that supports various
purposes and business models
MathML—Math Markup Language
GedML—Genealogical Markup Language
ParlML—Parliamentary Markup Language
RETS—Real Estate Transaction Language
TEI—Text Encoding Initiative
For more examples, see:
List of XML Markup Languages.
Key Concepts of XML
Valid XML
an XML application’s tag set is enforced through
an XML schema
OR
a DTD (document type definition)
Structure of an XML document
• the prolog
• The XML declaration
<?xml version="1.0"?>
• other declarations (i.e., DTD, entities)
<!DOCTYPE COLL SYSTEM “red.textclass.dtd">
<!ENTITY TEI "Text Encoding Initiative">
• the document element
• defined by root element
<TEI></TEI>
Building Blocks of XML
• elements and attributes
• general entities
• XML data
Building Blocks of XML
elements and attributes
<front>
CONTENTS
PAGE
<chapter>SPRINGTIME</chapter> <pageNo>1</pageNo>
SOME NAMES OF CHARACTERS IN FICTION 15
THOMAS HEARNE, 1678–1735 29
RECOLLECTIONS 51
</front>
Building Blocks of XML
elements and attributes
<text type=“essay”>
Governesses used to tell us that the seasons of the
year each consist of three months, and of these
<month type=“third”>March</month>, April, and May
make the springtime.</text>
<element attribute="value“>content</element>
Attribute values must always be
in single or double quotes
Review: Basic Rules of XML
• an XML document must have an XML declaration
• every XML document must have a root element that wraps
the entire document:
• every XML tag that opens must close: the only exception
to this are self-closing tags
• tags are case-sensitive and tags must match
• all tags must nest correctly
• attribute values must always be in single or double
quotation marks
Exercise 1, cont.
Using the text you marked up earlier, add attributes and
values to the elements.
Ex: BY <author type=“knight”>SIR FRANCIS
DARWIN</author>
Building Blocks of XML
general entities
• used as a placeholder for non-ASCII data, such as
special characters, non-Roman alphabets, and
non-text media
• to be used in the document element, entities must
be declared in prolog
(except for XML Unicode entities)
Building Blocks of XML
general entities
• within the document element (anywhere after the
prolog) an entity takes the standard syntax of
starting with & and ending with ;
• ampersands (&) and angle brackets (<>) are
reserved characters in XML and must be encoded
as entities
<measure type=“weight”> > 50lbs</measure>
<measure type=“weight”>&gt; 50lbs</measure>
Review: Basic Rules of XML
• an XML document must have an XML declaration
• every XML document must have a root element that wraps
the entire document:
• every XML tag that opens must close: the only exception
to this are self-closing tags
• tags are case-sensitive and tags must match
• all tags must nest correctly
• attribute values must always be in single or double
quotation marks
• ampersands (&) and angle brackets (<>) are
reserved characters in XML and must be
encoded as entities
Building Blocks of XML
data
CDATA (character data)
• text data ignored by XML parser
PCDATA (parsed character data)
• text data parsed by XML parser
NDATA (notation data)
• all other media types referenced in the
XML document
Review: Key Concepts of XML
• Well-formed XML
• Follows the basic rules--no content model
• Valid XML
• an XML schema
• a DTD (document type definition)
Review: Structure of XML document
• the prolog
• The XML declaration
<?xml version="1.0"?>
• other declarations (i.e., DTD, entities)
• the document element
• defined by root element, (i.e., <TEI>)
Review: Building Blocks of XML
• elements and attributes
• general entities
• XML data
WU site wide license @ http://sl.wustl.edu/catalog/index.php
•Easy-to-use and provides robust functionality for editing,
project management, and validation of structured mark-up
sources.
•Supports output to multiple target formats, including: PDF ,
TXT , HTML and XML
Software: oXygen XML Editor
• Multiplatform availability: Windows, Mac
• Multilanguage support: English, German, French, Italian,
and Japanese
• Unicode support
• Spell checking supporting English, German and French
• Easy error tracking
• Content completion
• Built in templates
oXygen Features
• Preview transformation results as XHTML or XML or in
your browser
• Import data from a database, Excel, HTML or text file
• XML project manager
• Manual and automatic validation of XML documents
against XML Schema schemas, and DTDs
• Batch validate selected files in project
oXygen Features

Mais conteúdo relacionado

Mais procurados (20)

HTML and XML Difference FAQs
HTML and XML Difference FAQsHTML and XML Difference FAQs
HTML and XML Difference FAQs
 
XML
XMLXML
XML
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 
Xml basics
Xml basicsXml basics
Xml basics
 
Xml dom
Xml domXml dom
Xml dom
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml basics for beginning
Xml basics for beginningXml basics for beginning
Xml basics for beginning
 
Basics of XML
Basics of XMLBasics of XML
Basics of XML
 
Css
CssCss
Css
 
XML's validation - XML Schema
XML's validation - XML SchemaXML's validation - XML Schema
XML's validation - XML Schema
 
eXtensible Markup Language (By Dr.Hatem Mohamed)
eXtensible Markup Language (By Dr.Hatem Mohamed)eXtensible Markup Language (By Dr.Hatem Mohamed)
eXtensible Markup Language (By Dr.Hatem Mohamed)
 
Dtd
DtdDtd
Dtd
 
00 introduction
00 introduction00 introduction
00 introduction
 
Xml Lecture Notes
Xml Lecture NotesXml Lecture Notes
Xml Lecture Notes
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
 
Xml
XmlXml
Xml
 
Xml Presentation-3
Xml Presentation-3Xml Presentation-3
Xml Presentation-3
 
Dom parser
Dom parserDom parser
Dom parser
 
XML Schema
XML SchemaXML Schema
XML Schema
 

Destaque

Part 1 picturebox using vb.net
Part 1 picturebox using vb.netPart 1 picturebox using vb.net
Part 1 picturebox using vb.netGirija Muscut
 
Pioneers of Information Science in Europe: The Oeuvre of Norbert Henrichs
Pioneers of Information Science in Europe: The Oeuvre of Norbert HenrichsPioneers of Information Science in Europe: The Oeuvre of Norbert Henrichs
Pioneers of Information Science in Europe: The Oeuvre of Norbert HenrichsWolfgang Stock
 
Part 5 create sequence increment value using negative value
Part 5 create sequence increment value using negative valuePart 5 create sequence increment value using negative value
Part 5 create sequence increment value using negative valueGirija Muscut
 
Logical Programming With ruby-prolog
Logical Programming With ruby-prologLogical Programming With ruby-prolog
Logical Programming With ruby-prologPreston Lee
 
Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Larry Nung
 
Transforming the world with Information technology
Transforming the world with Information technologyTransforming the world with Information technology
Transforming the world with Information technologyGlenn Klith Andersen
 
Part 8 add,update,delete records using records operation buttons in vb.net
Part 8 add,update,delete records using records operation buttons in vb.netPart 8 add,update,delete records using records operation buttons in vb.net
Part 8 add,update,delete records using records operation buttons in vb.netGirija Muscut
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++Microsoft
 
Python Tools for Visual Studio: Python na Microsoftovom .NET-u
Python Tools for Visual Studio: Python na Microsoftovom .NET-uPython Tools for Visual Studio: Python na Microsoftovom .NET-u
Python Tools for Visual Studio: Python na Microsoftovom .NET-uNikola Plejic
 
Part2 database connection service based using vb.net
Part2 database connection service based using vb.netPart2 database connection service based using vb.net
Part2 database connection service based using vb.netGirija Muscut
 
Vb.net session 15
Vb.net session 15Vb.net session 15
Vb.net session 15Niit Care
 
Making Information Usable: The Art & Science of Information Design
Making Information Usable: The Art & Science of Information DesignMaking Information Usable: The Art & Science of Information Design
Making Information Usable: The Art & Science of Information DesignHubbard One
 
How Not To Be Seen
How Not To Be SeenHow Not To Be Seen
How Not To Be SeenMark Pesce
 
Cognitive information science
Cognitive information scienceCognitive information science
Cognitive information scienceS. Kate Devitt
 
Prolog -Cpt114 - Week3
Prolog -Cpt114 - Week3Prolog -Cpt114 - Week3
Prolog -Cpt114 - Week3a_akhavan
 
Part 3 binding navigator vb.net
Part 3 binding navigator vb.netPart 3 binding navigator vb.net
Part 3 binding navigator vb.netGirija Muscut
 
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...RuleML
 

Destaque (20)

Part 1 picturebox using vb.net
Part 1 picturebox using vb.netPart 1 picturebox using vb.net
Part 1 picturebox using vb.net
 
Pioneers of Information Science in Europe: The Oeuvre of Norbert Henrichs
Pioneers of Information Science in Europe: The Oeuvre of Norbert HenrichsPioneers of Information Science in Europe: The Oeuvre of Norbert Henrichs
Pioneers of Information Science in Europe: The Oeuvre of Norbert Henrichs
 
Part 5 create sequence increment value using negative value
Part 5 create sequence increment value using negative valuePart 5 create sequence increment value using negative value
Part 5 create sequence increment value using negative value
 
Logical Programming With ruby-prolog
Logical Programming With ruby-prologLogical Programming With ruby-prolog
Logical Programming With ruby-prolog
 
Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Debugging in visual studio (basic level)
Debugging in visual studio (basic level)
 
Transforming the world with Information technology
Transforming the world with Information technologyTransforming the world with Information technology
Transforming the world with Information technology
 
Part 8 add,update,delete records using records operation buttons in vb.net
Part 8 add,update,delete records using records operation buttons in vb.netPart 8 add,update,delete records using records operation buttons in vb.net
Part 8 add,update,delete records using records operation buttons in vb.net
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++
 
Python Tools for Visual Studio: Python na Microsoftovom .NET-u
Python Tools for Visual Studio: Python na Microsoftovom .NET-uPython Tools for Visual Studio: Python na Microsoftovom .NET-u
Python Tools for Visual Studio: Python na Microsoftovom .NET-u
 
Part2 database connection service based using vb.net
Part2 database connection service based using vb.netPart2 database connection service based using vb.net
Part2 database connection service based using vb.net
 
Vb.net session 15
Vb.net session 15Vb.net session 15
Vb.net session 15
 
Making Information Usable: The Art & Science of Information Design
Making Information Usable: The Art & Science of Information DesignMaking Information Usable: The Art & Science of Information Design
Making Information Usable: The Art & Science of Information Design
 
How Not To Be Seen
How Not To Be SeenHow Not To Be Seen
How Not To Be Seen
 
Information Overload and Information Science / Mieczysław Muraszkiewicz
Information Overload and Information Science / Mieczysław MuraszkiewiczInformation Overload and Information Science / Mieczysław Muraszkiewicz
Information Overload and Information Science / Mieczysław Muraszkiewicz
 
Cognitive information science
Cognitive information scienceCognitive information science
Cognitive information science
 
Prolog -Cpt114 - Week3
Prolog -Cpt114 - Week3Prolog -Cpt114 - Week3
Prolog -Cpt114 - Week3
 
Part 3 binding navigator vb.net
Part 3 binding navigator vb.netPart 3 binding navigator vb.net
Part 3 binding navigator vb.net
 
Presentation1
Presentation1Presentation1
Presentation1
 
Cpp lab 13_pres
Cpp lab 13_presCpp lab 13_pres
Cpp lab 13_pres
 
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
 

Semelhante a Introduction to XML (20)

Ch2 neworder
Ch2 neworderCh2 neworder
Ch2 neworder
 
XML, DTD & XSD Overview
XML, DTD & XSD OverviewXML, DTD & XSD Overview
XML, DTD & XSD Overview
 
chapter 4 web authoring unit 4 xml.pptx
chapter 4 web authoring  unit 4 xml.pptxchapter 4 web authoring  unit 4 xml.pptx
chapter 4 web authoring unit 4 xml.pptx
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 
23xml
23xml23xml
23xml
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Intro xml
Intro xmlIntro xml
Intro xml
 
Xml
XmlXml
Xml
 
Intro to xml
Intro to xmlIntro to xml
Intro to xml
 
Xhtml
XhtmlXhtml
Xhtml
 
IT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notesIT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notes
 
XML
XMLXML
XML
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
Xml
XmlXml
Xml
 
Xml
XmlXml
Xml
 
Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
 
Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Introduction to XML

  • 1. Introduction to XML Shannon Davis | ssdavis@wustl.edu
  • 2. • Self-describing document “Hi. I am a book.” What Is XML?
  • 3. • Self-describing document What Is XML? <book> </book>
  • 4. • Simplicity • Open standard • Extensibility • Interoperability • Separates content from presentation Why Use XML?
  • 7. XHTML XML HTML SGML History of Mark Up Languages
  • 8. XHTML XML HTML SGML <H1><I>I am Born</I></H1><BR> History of Mark Up Languages
  • 9. XHTML XML HTML SGML <head type=“chapter” n=“01”>I am Born</head> History of Mark Up Languages
  • 10. XHTML XML HTML SGML <h1><i>I am Born</i></h1><br /> History of Mark Up Languages
  • 11. every XML document must declare itself as an XML document <?xml version="1.0"?> <?xml version="1.0"? Encoding=“utf-8”?> Basic Rules of XML
  • 12. every XML document must have a root element that wraps the entire document <TEI></TEI> or: <modsCollection></modsCollection> Basic Rules of XML
  • 13. every XML tag that opens must close <div1></div1> <head></head> <name></name> • The only exception to this are self-closing tags: <pb/> <milestone/> <link/> Basic Rules of XML
  • 14. Basic Rules of XML tags are case-sensitive, and tag-pairs must match <title></title> not: <title></TITLE> or: <Title></TITLE>
  • 15. Basic Rules of XML all tags must nest correctly <title><persName>Dr. Strangelove</persName>, <subtitle> or, How I learned to stop worrying and love the bomb.</subtitle></title> not: <title><persName>Dr. Strangelove </persName>,<subtitle> or, How I learned to stop worrying and love the bomb.</title></subtitle>
  • 16. Basic Rules of XML Well-formed XML The following is NOT a well-formed document. Why? <?xml version="1.0"?> <BOOK> <TITLE>The Adventures of Huckleberry Finn <AUTHOR>Mark Twain</TITLE></AUTHOR> <BINDING>mass market paperback</BINDING> <PAGES>298</PAGES> <PRICE>$5.49</price> </BOOK>
  • 17. Review: Basic Rules of XML • an XML document must have an XML declaration: <?xml version="1.0"?> • every XML document must have a root element that wraps the entire document: • every XML tag that opens must close: the only exception to this are self-closing tags • tags are case-sensitive and tags must match • all tags must nest correctly
  • 18. Exercise 1 Using what you’ve learned about well formed XML, create an XML file describing a text. 1. Open Wordpad or Notepad 2. Open springtime.txt from student_files 3. Use any tags you like to mark up the text to create a well formed XML document.
  • 19. Key Concepts of XML XML applications Dublin Core –broad metadata standard that supports various purposes and business models MathML—Math Markup Language GedML—Genealogical Markup Language ParlML—Parliamentary Markup Language RETS—Real Estate Transaction Language TEI—Text Encoding Initiative For more examples, see: List of XML Markup Languages.
  • 20. Key Concepts of XML Valid XML an XML application’s tag set is enforced through an XML schema OR a DTD (document type definition)
  • 21. Structure of an XML document • the prolog • The XML declaration <?xml version="1.0"?> • other declarations (i.e., DTD, entities) <!DOCTYPE COLL SYSTEM “red.textclass.dtd"> <!ENTITY TEI "Text Encoding Initiative"> • the document element • defined by root element <TEI></TEI>
  • 22. Building Blocks of XML • elements and attributes • general entities • XML data
  • 23. Building Blocks of XML elements and attributes <front> CONTENTS PAGE <chapter>SPRINGTIME</chapter> <pageNo>1</pageNo> SOME NAMES OF CHARACTERS IN FICTION 15 THOMAS HEARNE, 1678–1735 29 RECOLLECTIONS 51 </front>
  • 24. Building Blocks of XML elements and attributes <text type=“essay”> Governesses used to tell us that the seasons of the year each consist of three months, and of these <month type=“third”>March</month>, April, and May make the springtime.</text> <element attribute="value“>content</element> Attribute values must always be in single or double quotes
  • 25. Review: Basic Rules of XML • an XML document must have an XML declaration • every XML document must have a root element that wraps the entire document: • every XML tag that opens must close: the only exception to this are self-closing tags • tags are case-sensitive and tags must match • all tags must nest correctly • attribute values must always be in single or double quotation marks
  • 26. Exercise 1, cont. Using the text you marked up earlier, add attributes and values to the elements. Ex: BY <author type=“knight”>SIR FRANCIS DARWIN</author>
  • 27. Building Blocks of XML general entities • used as a placeholder for non-ASCII data, such as special characters, non-Roman alphabets, and non-text media • to be used in the document element, entities must be declared in prolog (except for XML Unicode entities)
  • 28. Building Blocks of XML general entities • within the document element (anywhere after the prolog) an entity takes the standard syntax of starting with & and ending with ; • ampersands (&) and angle brackets (<>) are reserved characters in XML and must be encoded as entities <measure type=“weight”> > 50lbs</measure> <measure type=“weight”>&gt; 50lbs</measure>
  • 29. Review: Basic Rules of XML • an XML document must have an XML declaration • every XML document must have a root element that wraps the entire document: • every XML tag that opens must close: the only exception to this are self-closing tags • tags are case-sensitive and tags must match • all tags must nest correctly • attribute values must always be in single or double quotation marks • ampersands (&) and angle brackets (<>) are reserved characters in XML and must be encoded as entities
  • 30. Building Blocks of XML data CDATA (character data) • text data ignored by XML parser PCDATA (parsed character data) • text data parsed by XML parser NDATA (notation data) • all other media types referenced in the XML document
  • 31. Review: Key Concepts of XML • Well-formed XML • Follows the basic rules--no content model • Valid XML • an XML schema • a DTD (document type definition)
  • 32. Review: Structure of XML document • the prolog • The XML declaration <?xml version="1.0"?> • other declarations (i.e., DTD, entities) • the document element • defined by root element, (i.e., <TEI>)
  • 33. Review: Building Blocks of XML • elements and attributes • general entities • XML data
  • 34. WU site wide license @ http://sl.wustl.edu/catalog/index.php •Easy-to-use and provides robust functionality for editing, project management, and validation of structured mark-up sources. •Supports output to multiple target formats, including: PDF , TXT , HTML and XML Software: oXygen XML Editor
  • 35. • Multiplatform availability: Windows, Mac • Multilanguage support: English, German, French, Italian, and Japanese • Unicode support • Spell checking supporting English, German and French • Easy error tracking • Content completion • Built in templates oXygen Features
  • 36. • Preview transformation results as XHTML or XML or in your browser • Import data from a database, Excel, HTML or text file • XML project manager • Manual and automatic validation of XML documents against XML Schema schemas, and DTDs • Batch validate selected files in project oXygen Features

Notas do Editor

  1. What is XML? It is a self-describing document
  2. The tags used describe what the document is about Example: This book declares itself to be a book
  3. Simplicity - Information coded in XML is just plain text. It’s easy to read and understand, plus it can be processed easily by computers. It’s an open standard, or non-proprietary. Because it’s stored as plain text files, there’s no special software required to access data. The data can theoretically be around forever because it’s not software dependent. The Word Perfect document you have stored on your floppy disk could be readable now if it was in XML. Extensibility – XML stands for extensible markup language. This means that there is no fixed set of tags. New tags can be created as they are needed. Interoperability - XML is a W3C standard, endorsed by software industry market leaders. [Is everyone here familiar with the W3C? The W3C develops open specifications (de facto standards) to enhance the interoperability of web-related products.] This means XML can be used by many different systems, transformed from one metadata standard (i.e. Dublin Core to TEI) to another, and shared by institutions. It separates content from presentation - XML tags describe meaning not presentation. XML is used to transport data, whereas HTML is used to format and display data.
  4. It can output to multiple formats The look and feel of an XML document can be controlled by XSL style sheets, allowing the look of a document (or of a complete Web site) to be changed without touching the content of the document.
  5. More importantly, this includes yet unknown formats. There are many other benefits, such as: It supports multilingual documents and Unicode (Red Brush, True Crimes projects) It facilitates the comparison and aggregation of data (standard formats allow different data sets to be aggregated and compared) You can embed multiple data types (can wrap image/video in a METS wrapper) And rapid adoption by industry (XML is used by scholarly databases, the publishing industry, software vendors, and many others)
  6. SGML first developed in the 80s out of the idea that markup should be focused on the structure of a text, not the presentation Not widely accepted because overwhelming number of tags and not modifiable
  7. From SGML, HTML was developed and became the language of the web. HTML indicates how a text should be displayed as well as its structure. The h1 and I tags tell the browser how to display the chapter heading – as large, italic text
  8. XML was also developed from SGML and introduced flexibility. It was not intended to address display issues, but only to indicate a document’s structure. Not a markup language, strictly speaking, but is a set of rules for creating a markup language. No display information is given. Only the structure of the document. Benefit of describing documents structure – if you have content identified as author name, book title, chapter heading, etc. you can limit your search to just that content
  9. XHTML was developed around early 2000. The W3C recommendations for HTML have been based on XML rather than SGML since then. XHTML documents must be well formed XML documents. It is a somewhat stricter version of HTML, allowing for more rigorous and robust documents, while using tags used by HTML. Allows the billions of web pages in HTML to retroactively become a subset of XML, without abandoning HTML altogether. Differences in XHTML – all tags must be closed, all attribute values must be quotes, all tag and attributes must be lowercase To summarize the relationships - HTML, XML and XHTML are all subsets of SGML, which predated the web by over a decade. XML is not an extension of HTML, but represents an attempt to revive the original ideas of SGML, while adding the benefits of HTML (web-friendliness and simplicity) and extensibility.
  10. The basic rules for creating a markup language in XML are relatively simple: Every XML document must declare itself as an XML document, with &amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;, in the first line of the document. At the very least, it must identify the version of XML but can include other information such as the type of character encoding used. (utf-8 - representing any character in Unicode standard – used for Chinese, Japanese, etc.) The initial fixed character string also allows parsers to test the type of character encoding present. (An XML parser is a processor that reads an XML document and determines the structure and properties of the data.)
  11. every XML document must have a single root element that contains all the other elements comprising the document. It is the top element and all other elements are hierarchically subordinate to it. Its start tag begins the document and its end tag is the last to occur in the document  
  12. every XML tag that opens must close  The only exception to this are self-closing tags—for use with tags that don’t enclose data, but mark a point in a document, like the break tag in HTML and TEI
  13. Tags are case sensitive and must match
  14. all tags must nest correctly: When an elements start tag is inside another elements, its end tag must also be inside that element. Elements that have subordinate elements, like the root element, are called “container elements” and also referred to as “parents”. The parent element here is &amp;lt;title&amp;gt; To continue the parent-child analogy, subordinate elements are called “children” and elements at the same level within a given container element are called “siblings”. The children of &amp;lt;title&amp;gt; are &amp;lt;subtitle&amp;gt; and &amp;lt;persName&amp;gt;. &amp;lt;subtitle&amp;gt; and &amp;lt;persName&amp;gt; are siblings.
  15. &amp;lt;title&amp;gt; does not close properly &amp;lt;price&amp;gt; mixed cases
  16. there are a few additions to these rules (we’ll add some as we go along) but basically, this is all you need to create well-formed XML. Well-formedness is a key concept of XML. This means that the document follows these minimal requirements for an XML document. If you follow these rules, you can use ANY tags you want, because XML is extensible, and this will be well-formed XML.
  17. Put basic rules slide back up and give 5 minutes Show finished XML file in oXygen and show it is well formed
  18. Now that we have some basic rules of XML, we’ll move on to some key concepts of XML As we saw from this exercise, the benefit to XML is that it allows you to be as precise (or obscure) as you want to be. The drawback is that it allows you to be as precise and obscure as you want to be. You could tag every word, but you have to consider what your target audience will be using this information for. Will scholars really care that and is marked as a conjunction? XML mediates the problem of large and unwieldy tag-sets (SGML) on the one hand, and overly-simple tag sets that get overloaded and used for purposes they weren’t intended for (HTML) on the other, by creating allowing for a middle ground, the XML application Not to be confused with software applications, an XML application is a set of XML tags and rules for their use agreed on by a given community or consortium for use in their common subject area or purpose. Some examples of XML applications include: [see slide] These are more or less successful, more and less accepted standards, and more are appearing all the time. Having a set group of tags and attributes means standardization and interoperability. Looking at a set of tags will also help you decide which information to mark up. For example, the TEI standard does not have a &amp;lt;word&amp;gt; tag. It would be overkill to mark every word with an element.
  19. If valid XML can be created using any tags, how is a community’s agreement on a certain set of tags and rules for them—an XML application—enforced? an XML application’s tag set is enforced through one of two ways: an XML schema &amp; a DTD (document type definition) Both serve essentially the same function, in that they list the tags the community or group has agreed on (and what they mean) and the rules for where and how they can be used in a document. DTD (Document Type Definition) – defines elements in your document and attributes they can have, ordering and nesting of elements, declared in a doctype declaration after the XML declaration Schema – essentially does the same by defining elements and attributes, but is itself an XML document To enforce the rules of an XML application in an XML document, the document calls on a DTD or schema file (either installed locally on the computer or to a location on the web) and an XML parser will read the rules of the DTD, and validate the XML. By definition, an XML document that validates against a schema or DTD is well-formed, but we only say it parses or is valid if it also conforms to the rules of a schema or DTD. Example of DTD: http://www.tei-c.org/Vault/P4/Lite/DTD/teixlite.dtd Example of schema: http://www.tei-c.org/Vault/P5/1.7.0/xml/tei/custom/schema/relaxng/tei_all.rng
  20. there is also a basic structure to an XML document that must be followed. At its most basic, an XML document is made of two parts: the prolog  &amp; the document element The minimal requirement for the prolog is the XML declaration, which you’ve already seen, which must be at the top of any XML document: &amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt; The prolog is also where a reference to a DTD or schema will be located. Sometimes the reference to the DTD will substitute for the XML declaration. The reference to a DTD or schema in the prolog is unfortunately called a document type declaration—not to be confused with the file it refers to, which is the document type definition (DTD). The prolog is also the place where any entities are declared. The document element is the root element that wraps the whole document The TEI entity would be used if you had to use Text Encoding Initiative a lot in your doc but wanted to use a placeholder
  21. With the prolog and document element, we have the basic outline of an XML document—what we need now are the basic building blocks Fundamentally, you have tags on the one hand and the content you want to encode on the other. To this point I’ve talked about XML “tags,” and you’ll often hear people refer to them. This generally includes any markup at the level of the root element and above. But “tags” is loose jargon, not really official terminology. More formally, what is meant by XML “tags” are XML elements and attributes. For content that can be represented in ASCII text, there is no problem. But for anything else—special characters, or any non-text data you want to refer to (such as jpgs, movie or audio files)—you need a placeholder in the XML to refer to the content outside the XML file. That role is filled in XML by entities. There are also 3 types of data that go in an XML document that we will discuss later.
  22. XML tags describe the content they contain—they say what the content is—and the way they do this is with XML elements. From our example before, I used the root element &amp;lt;text&amp;gt; to denote that the resource we are encoding is a text. The example we’re working with came from a book with a table of contents, so we might want to denote that as &amp;lt;front&amp;gt; matter and describe the chapters and page numbers
  23. But a general description is often not precise enough, and requires further clarification or elaboration. There may be several different kinds of texts in this one resource – poems, narratives, etc. We can expand on the &amp;lt;text&amp;gt; element by including attributes. Elements and attributes are broadly analogous to nouns and adjectives in grammar: an element describes a general conceptual category, while an attribute gives further information describing the content. Elements always appear first in the angle brackets that identify text as code (&amp;lt;&amp;gt;) while attributes follow them. Attributes, in addition, must have an assigned value. Here, the attribute is “type” and the value of the attribute is “third.” Note that you don’t have closing tags for the attribute: an attribute is part of the element, so only the element closes. All attributes must be in double or single quotation marks. This is one of the basic rules of XML. Double are probably the most common, but if the value contains quotes, as in someone’s nickname, the attribute value can be placed in single quotes
  24. Give a few minutes for exercise and add attributes to my file Show well formedness in oxygen
  25. As mentioned above, general entities function as placeholders for any data that is beyond the ASCII character set, including special characters, characters from non-Roman alphabets, and multimedia formats. But since it serves as a placeholder, it can also serve as a way to store repeated text (either in the XML prolog, or in a separate file) to reference in your XML file, as a way to save on re-typing. With the exception of a few pre-defined XML entities, and Unicode XML entities, entities must be declared in the prolog to be referenced in the document element (where your content goes).
  26. Once an entity is declared, it can be referenced in the document, and takes the syntax of being introduced by an ampersand (&amp;) and closed with a semicolon (;). Just like HTML, ampersands and angle brackets have to be encoded as entities in order to have valid XML. These are reserved characters that have other meaning in XML. The first example is incorrect. The second has the greater than sign encoded as the entity &amp;gt; Since gt is for greater than, you can probably guess what the entity for less than is  
  27. This is one of the key concepts of XML, so we’ll add it to our list  
  28. CDATA is not parsed by the XML parser Some text, like JavaScript, contains a lot of illegal characters like brackets and ampersands If for some reason you had a snippet of JavaScript code in your XML, you would want to put it in a CDATA section so it is ignored by the XML parser CDATA sections start and end with &amp;lt;![CDATA[“ “]]&amp;gt; PCDATA is everything else in your document – all text will be parsed, elements, attributes, text, etc. NDATA unparsed entities, refer to images, other media files Example: &amp;lt;!ENTITY pic SYSTEM &amp;quot;http://www.w3schools.com/picture.jpg&amp;quot; NDATA JPEG&amp;gt;
  29. Open Fanny Lewald file in oXygen and Window &amp;gt; Reset Layout if necessary Point out Project window, Outline window, completion of elements and attributes, content completion Show error window Show creating a new file from a template Show Tools &amp;gt; Compare Files with two Lewald files Show Project and Add files, Find/Replace across project files Well formedness exercise open Sleepy Hollow in oXygen and check for well formedness title type needs to be in quotes, h2 not nested properly, P different cases, no end blockquote Show transformation to HTML if time Use file-to-transform.xml and xml2html.xsl