- The document is a presentation about XML given by Mr. Viraf Karai to staff of Sila Solutions Group in Seattle, WA on February 27, 2009 and March 6, 2009.
- The presentation covers topics such as the history of XML, XML syntax and semantics, well-formed and valid XML, DTDs, XML schemas, and Relax NG.
- Examples are provided of well-formed XML documents and how they can be defined and validated using DTDs, XML schemas, and Relax NG.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Xml Demystified
1. XML Demystified
Presented by Mr. Viraf Karai
to staff of Sila Solutions Group
Seattle WA
Fri. Feb 27, 2009 and Fri. Mar 6, 2009
1
2. Agenda (session # 1)
What is XML Structure of XML docs
History of XML XML node types
XML syntax & semantics Where XML is being used
today
Well formed XML
Advantages &
Valid XML
disadvantages of XML
DTDs
XML vocabularies
XML schemas
XML authoring tools
Relax NG
2
3. What is XML
Stands for eXtensible Markup Language
It is a World Wide Web Consortium std.
A markup language like HTML – except you can
build your own tags
For machine consumption still readable
Hierarchical by nature
Widespread use since late '90s
3
4. History of XML
SGML around since 80's. SGML
was the int'l std for data markup.
Discussions began in 1996. Focus
Spec by Tim
on new simple markup language.
Bray et al only 26
pages. SGML ≈
500 pages
Approved as W3C
standard (spec 1.0) in Nov. 1998.
Mainly deals with
Unicode
enhancements
(v4.0)
W3C spec (v 1.1) in Feb. 2004
4
5. XML syntax & semantics
Generally speaking an XML document
is wellformed if it clears syntax checks
is valid if it is wellformed and clears semantic checks
(specified by grammar)
Computers can't process XML documents that fail
syntax or semantic validation
Yes, XML is very fussy!
5
6. Well formed XML
Elements
must start with letter or underscore (us)
may contain any number of letters, digits, underscores,
hyphens or periods
no embedded spaces
are case sensitive
must be closed (unless they're leaf elements)
6
7. Well formed XML (cont'd)
Element nesting order must be obeyed
Encase attrs in single (')/double(”) quotes
Escape special entities (in text and attrs)
& → &
< → <
” → " # if used in a double quoted attr
' → ' # if used in a single quoted attr
7
9. Valid XML
Valid XML docs enforced by wellknown APIs e.g.
Spring, Hibernate, Apache SOAP, Java EE – in
config files and RPC msgs
Three basic models for constraints
Document Type Defintion (DTD)
XML schemas (XSD)
Relax NG
Constraint models define structure of an XML
document – usually specified by URI
9
10. DTDs (extn: .dtd)
Introduced part of XML 1.0 spec
Oldest constraint model
First used in SGML
Still used in HTML 4.x spec
NonXML like syntax
Simple to use but inflexible
Poor validation capabilities
10
11. XML doc and its DTD
<?xml version=quot;1.0quot; <!ELEMENT people_list (person*)>
encoding=quot;UTF-8quot;?>
<!ELEMENT person (name, birthdate?, gender?,
<!DOCTYPE people_list SYSTEM socialsecuritynumber?)>
quot;example.dtdquot;>
<!ELEMENT name (#PCDATA)>
<people_list>
<!ELEMENT birthdate (#PCDATA)>
<person>
<!ELEMENT gender (#PCDATA)>
<name>Fred Bloggs</name>
<!ELEMENT socialsecuritynumber (#PCDATA)>
<birthdate>27/11/2008</birthdate>
<gender>Male</gender>
</person>
</people_list>
11
12. XML schemas (XSD extn: .xsd)
W3Cs succesor to DTDs
Extremely powerful semantic validation
Vastly more flexible compared to DTDs
Mind boggling support for rich data types
Complex and difficult to author by hand
Many XML gurus unhappy with complexity
12
14. Relax NG (extn: .rng / .rnc)
Stands for REgular LAnguage for XML Next
Generation
Not a W3C standard – part of OASIS
Offers alternative to XSD complexity
Based on Murata Makoto's RELAX and James
Clark's TREX.
Mostly satisfies Pareto principle
XML and nonXML syntax
14
16. Sample XML doc with DTD decl
DTD
<?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> decl
<!DOCTYPE beans PUBLIC quot;-//SPRING//DTD BEAN//ENquot;
quot;http://www.springframework.org/dtd/spring-beans.dtdquot;>
<beans>
<!-- Axis2 Web Service, but to Spring, its just another bean
that has dependencies -->
<bean id=quot;springAwareServicequot; class=quot;spring.SpringAwareServicequot;>
<property name=quot;myBeanquot; ref=quot;myBeanquot;/>
</bean>
<!-- just another bean / interface with a wired implementation,
that's injected by Spring into the Web Service -->
<bean id=quot;myBeanquot; class=quot;spring.MyBeanImplquot;>
<property name=quot;valquot; value=quot;Spring, emerge thyselfquot; />
</bean>
</beans>
16
19. Structure of XML docs (cont'd)
XML declaration
<?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
Doc element
<nutrition>
<?page skip?>
Procs'ng Instr
Element
End Element
<daily-values>
<total-fat units=quot;gquot;>65</total-fat>
<saturated-fat units=quot;gquot;>20</saturated-fat> Text
<cholesterol units=quot;mgquot;>300</cholesterol>
XML comment
<sodium units=quot;mgquot;>2400</sodium>
<carb units=quot;gquot;>300</carb> <!-- this is a comment -->
<fiber units=quot;gquot;>25</fiber> <! this is a comment
Attribute
<protein units=quot;gquot;>50</protein>
<notes><![CDATA[Daily values for an adult <male> ]]></notes>
Text (CDATA)
</daily-values>
End of CDATA
</nutrition>
End Doc element
19
20. Hierarchy of previous example
Nutrition
?page (PI) daily-values
total-fat saturated-fat Sodium
Cholestorol
units units units
units
2400
20 300
65
20
21. XML Node Types
Document Document Fragment
Element Entity
Attribute Entity reference
Text Processing instruction
Comment
21
22. XML Nodes: Document
Represents the entire document
Conceptual root of the document tree
Provides primary access to doc's data
Other node types must have a parent Document
Used in DOM for full traversal
Used in SAX to signal start of a document
22
23. XML Nodes: Element
Represents an element in an XML doc
May have 0 or more attributes
May have 0 or more children (other elements, text,
comments, CDATA, etc.)
No children → leaf elements
All elements must have closing elements e.g.
<color>pink</color> or <color/>
<color></color> <color/>
23
24. XML Nodes: Attributes
Only associated with elements – optional
Specify metadata about an element
Shown as name/value pairs eg. Num='6'
Each attribute must be quoted (') or (”)
Elements may have any number of attrs
Each attr must be unique for an element
24
25. XML Nodes: Text
Represents textual content of elements
Text nodes are leaves
Some chars such as '<' and '&' must be escaped
when authoring text
Use CDATA when such chars occur frequently.
Escaping impairs readability.
Intn'l chars may freely be inserted in text
25
26. XML Nodes: Comments
Mainly exist for human readability
Can span multiple lines
Char sequence '' illegal inside comments
Usually ignored by most parsers
Can't have nested comments
26
27. XML Nodes: Entity references
Used to substitute for a single char that is also a
markup delimiter in XML
Using these prevents a literal char from being
mistaken for a markup delimiter
Predefined entity references:
< → <
& → &
> → >r;
” → "
' → '
27
28. XML Nodes: Processing Instrs
Provides info to app processing document
E.g. how to process or render the doc
XML declaration looks like a PI, but isn't
Pis comprise of a target and data e.g. <?sort alpha
ascending?>
target data
28
29. Where XML is being used today
Config files Doc publishing (S1000D)
Expr & query languages
Devices (Sony PRS505)
(XPath, XQuery)
IDEs (IntelliJ, Eclipse)
Transform languages (XSLT,
Frameworks (Spring,
XSLFO)
Hibernate, JPA)
Java project build files (Ant, DBMS storage (SQLSrv,
Maven) DB2,Oracle support XML)
Web services (SOAP, REST) Web, PDAs, smart phones
and XMLRPC
Vector graphics (SVG)
Document storage (OpenOffice,
Log, data files
MSOffice)
29
30. Advantages of XML
Full blown W3C standard
I18N support – UTF8 and UTF16
Expressive – define CS data structures
Semantic vald'n using DTD, XSD, RelaxNG
Parsers on all platforms (Java, .net, LAMP)
Widely in use in industry and academia
Several vocabularies e.g. MathML, CML
Rigid syntax → predictability in parsing
30
31. Disadvantages of XML
Verbose syntax – not compact
Can't easily represent binary
DOM and SAX APIs are complex
Difficult to diff similar XML files
Overheads in data transmission
Different data model compared to RDBMS
Storing XML in RDBMS is unnatural
31
32. XML vocabularies
S1000d(aerospace) Legal XML
SVG (graphics) Human XML
MathML Address XML
Food XML Finance XML
Legal XML Physics XML
Manufacturing XML News XML
Healthcare XML Astronomy XML
32
33. XML authoring tools
XMLSpy ($)
Arbortext ($)
Oxygen ($)
XML Copy Editor
EditiX ($)
Stylus Studio
33
35. Agenda (session # 2)
Sample PO (fixed fmt & XML) Reading, writing XML property
files
Typical Java enterprise tech
Sample log4j.xml config file
The XML family
XML to build Java projects
Parsing XML
The JAX family
DOM, SAX, StaX parsing
XML in Java enterprise
XML parsing with Groovy
development
Compare DOM, SAX, StAX
XML and databases
XML usage in Java5 and
XML and web services
beyond
References
XMLEncoder sample code
35
39. The XML family
Has a number of number of members
XHTML – HTML that's wellformed XML
XSLT – transform XML to XML/HTML/Text
XSLFO – transform XML to PDF/PS/RTF
XPath – XML expression lang (used in XSLT, XQuery
and code e.g. Groovy, Java, Python)
XQuery – extract, manipulate data in XML document
All of the above are W3C standards
All have fullblown I18N support (UTF)
39
40. Parsing XML
Splendid support for parsing XML in Java, Groovy,
C, C++, Ruby, C# (.net), Python, Haskell, Scala,
Lisp, Erlang, PERL, etc.
Parsers available from handheld devices to massive
supercomputers
Most widely used techniques for parsing:
DOM – build a tree representing the XML document
SAX fire events during parsing (push model)
StAX – cursor and iterator based (pull model)
40
41. Parsing XML (cont'd)
Above techniques are lowlevel
DOM parsing heavily used in CAS Toolbox project
Other (less common) parsing techniques;
(a) JAXB (used in Toolbox Metadata Loader)
(b) XML Beans
(c) Commons Digester
(a) and (b) bind XML to Java objects (OO). Digester
defines rules defining XML struct
41
42. DOM parsing
Full support for DOM in Java 5 and beyond
Builds an inmemory tree of the XML doc
Can't access the tree until it is fully built
Random access to tree once built
Extremely impractical for huge XML files – can
fully consume virtual memory
Foolproof way to build small & medium XML docs
– build tree, then write out
42
43. DOM parsing (cont'd)
DocumentBuilderFactory dbfactory =
DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
DocumentBuilder domparser =
dbfactory.newDocumentBuilder();
//parse the XML and create the DOM
Document doc = domparser.parse(new
File(quot;data.xmlquot;));
//to create a new DOM from scratch -
//Document doc = domparser.newDocument();
//Use DOM once you have the Doc handle
43
44. SAX parsing
Brainchild of David Megginson
Solid interfacecentric design
Full support for SAX in Java 5 and beyond
Very low memory footprint. Ideal for processing
massive XML documents
No random access – housekeeping reqd
Fires synchronous events which should be
intercepted by code
Readonly API – can't use it to build XML
44
45. SAX parsing (cont'd)
SAXParserFactory spfactory =
SAXParserFactory.newInstance();
spfactory.setNamespaceAware(true);
SAXParser saxparser =
spfactory.newSAXParser();
// write your handler for processing
// events and handling error
DefaultHandler handler = new MyHandler();
// parse the XML and report events and
// errors (if any) to the handler
saxparser.parse(new File(quot;data.xmlquot;),
handler);
45
46. SAX parsing (cont'd)
<?xml version = quot;1.0quot; Start Document
encoding = quot;utf-8quot;?> Start Element quot;CarRentalquot;
<CarRental> Start Element quot;customerNamequot;
<customerName>JohnDoe Character Data quot;John Doequot;
</customerName> End Element quot;customerNamequot;
<date>2009-02-28</date> Start Element quot;datequot;
<model>Oldsmobile Alero</model> Character Data quot;2009-02-28quot;
</CarRental> End Element quot;datequot;
Start Element quot;modelquot;
Character Data quot;Oldsmobile Aleroquot;
End Element quot;modelquot;
End Element quot;CarRentalquot;
End Document
46
47. StAX parsing
StAX → Streaming API for XML
Fully integrated in Java 6
JSR 173 sponsored by BEA (part of Oracle)
A pull model to parse XML (SAX → push)
Pull model → app asks parser for events
Unlike SAX, no interfaces to implement
Unlike SAX, you can read and write XML
Offers cursor API and event iterator APIs
47
48. StAX parsing (cont'd)
XMLInputFactory xmlif = XMLInputFactory.newInstance();
xmlif.setEventAllocator(new XMLEventAllocatorImpl());
allocator = xmlif.getEventAllocator();
XMLStreamReader xmlr =
xmlif.createXMLStreamReader(filename,
new FileInputStream(filename));
//The next step is to create an event iterator:
int eventType = xmlr.getEventType();
while(xmlr.hasNext( ))
{
eventType = xmlr.next( );
//Get all quot;Bookquot; elements as XMLEvent object
if(eventType == XMLStreamConstants.START_ELEMENT &&
xmlr.getLocalName().equals(quot;Bookquot;))
{
StartElement event =
getXMLEvent(xmlr).asStartElement(););
}
}
48
49. XML parsing with Groovy
def file = new File(quot;person.xmlquot; ) <?xml version=”1.0”>
person = new <!-- person.xml -->
XmlSlurper().parse(file)
<person id=quot;100quot; >
println person.firstname
<firstname>Jane</firstname>
===> Jane
<lastname>Wells</lastname>
println person.address.city
<address type=quot;homequot; >
===> Denver
<street>343 Evans Ave</street>
println person.address.@type
<city>Denver</city>
===> home
<state>CO</state>
<zip>80020</zip>
</address>
</person>
49
50. Online Xpath demo
Xpath allows users to randomly access portions of an XML
●
document
A DOM representing the XML doc must be built first
●
Xpath operates against the DOM tree
●
Xpath is typically used in XSLT but can be used standalone
●
or in your code
Xpath does a lot of the heavy lifting in XSLT scripts
●
http://www.orbeon.com/ops/sandboxtransformations/xpath/
50
51. Comparing DOM, SAX and StAX
Feature DOM SAX StAX
API type In memory tree Streaming – push Streaming – pull
Ease of use High Medium High
XPath capability Yes No No
CPU & mem Varies Good Good
o'head
Full navigation Yes No No
Read XML Yes Yes Yes
Write XML Yes No Yes
CRUD Yes No No
51
52. XML usage in Java5 and beyond
DOM and SAX parsers, XSLT and XPath all
standard in Java 5 (StAX avail in Java 6)
Log4j not part of Java 5, but is ubiquitous. Preferred
config file format is XML.
Property files can also be defined in XML
XMLEncoder and XMLDecoder persistence
mechanism for Java Beans
XMLSignature – a W3C recommendation
52
53. XMLEncoder sample code
// Serialize orderBean to disk
//*****************************
XMLEncoder encoder = new XMLEncoder(new
FileOutputStream(“serializedBeans/orderBean.xml”));
encoder.writeObject(orderBean);
encoder.close();
// Now read the serialized objects back into Java beans
//*****************************************************
XMLDecoder decoder = new XMLDecoder(new
FileInputStream(“serializedBeans/orderBean.xml”));
OrderBean orderBean = (OrderBean) decoder.readObject();
decoder.close();
53
54. Reading XML property files
import java.util.*; <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
import java.io.*; <!DOCTYPE properties SYSTEM quot;
http://java.sun.com/dtd/properties.dtdquot;>
public class LoadSampleXML
<!-- props.xml -->
{
<properties>
public static void main(String args[ ])
<comment>Ths is an XML props file</comment>
throws Exception
<entry key=quot;fruitquot;>mango</entry>
{
<entry key=quot;favGroupquot;>Led Zeppelin</entry>
Properties prop = new Properties();
<entry key=”favStar”>Jack Nicholson</entry>
FileInputStream fis =
</properties>
new FileInputStream(quot;props.xmlquot;);
prop.loadFromXML(fis);
prop.list(System.out);
System.out.println(quot;nfavStar property: quot; +
prop.getProperty(quot;favStarquot;));
}
}
54
55. Writing XML property files
import java.util.*; ?<xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
import java.io.*; <!DOCTYPE properties SYSTEM quot;
http://java.sun.com/dtd/properties.dtdquot;>
public class StoreXML {
<!-- rhyme.xml -->
public static void main(String args[]) throws
Exception {
<properties>
Properties prop = new Properties();
<comment>Rhyme</comment>
prop.setProperty(quot;one-twoquot;, quot;buckle my shoequot;);
<entry key=quot;seven-eightquot;>lay them
prop.setProperty(quot;three-fourquot;, quot;shut the doorquot;); straight</entry>
prop.setProperty(quot;five-sixquot;, quot;pick up sticksquot;);
<entry key=quot;five-sixquot;>pick up sticks</entry>
prop.setProperty(quot;seven-eightquot;, quot;lay them
<entry key=quot;nine-tenquot;>a big, fat hen</entry>
straightquot;);
<entry key=quot;three-fourquot;>shut the door</entry>
prop.setProperty(quot;nine-tenquot;, quot;a big, fat henquot;);
FileOutputStream fos = <entry key=quot;one-twoquot;>buckle my shoe</entry>
new FileOutputStream(quot;rhyme.xmlquot;);
</properties>
prop.storeToXML(fos, quot;Rhymequot;);
fos.close();
}
}
55
57. XML to build Java projects
Ant – standard way to build Java for many years
Maven – smart way to build projects and manage
complex dependencies
Ant is procedural, whereas Maven is declarative
Ant files are typically build.xml
Maven files are typically pom.xml
Maven more complex but worth learning
Ant and Maven build files very readable
57
58. Java API for XML (JAX)
Set of packages comprising of
Java API for XML Processing (JAXP)
Java API for XMLbased RPC (JAXRPC)
Java API for XML Registries (JAXR)
Java Architecture for XML Binding (JAXB)
Implemented in Java SE and Java EE3
58
59. Java API for XML Parsing (JAXP)
Consists of APIs to parse, search and transform
XML files
JAXP was standard issue with Java 5
JAXP components
DOM parser
SAX parser
XPath lookups
XSLT
Implementations hidden from end users
59
60. Java API for XML Registries (JAXR)
Enables lookup of XML
registries
A registry is infrastruct
to enable building
deploy, discov of WS
JAXR supports UDDI
and ebXML
Consists of JAXR client
and provider
60
61. Java API for XML RPC (JAXRPC)
V2.0 → JAXWS
JavaEE, .Net, LAMP
interoperability
SOAP, REST support
Client → proxy
Proxy → JAXRPC RS
Method → SOAP msg
SOAP msg transmitted
over HTTP
61
62. JAVA API for XML Binding (JAXB)
Part of Java 6
Does a mapping of
XSD Java
Is objectoriented
More highlevel
xjc := XSD → Java
schemagen :=
Java → XSD
62
63. JAXB (continued)
<?xml version=quot;1.0quot;?>
// Unmarshalling to Java from XML
Unmarshaller unmarshaller =
<person
DataBindingFactory.newUnmarshaller(
xmlns=quot;http://www.example.com
);
/personquot;>
Person person = (Person)
unmarshaller.unmarshal(new <firstName>Lola</firstName>
File(quot;lola.xmlquot;));
<lastName>Boone</firstName>
System.out.println(person.getFirstName(
));
</person>
// Marshalling to XML from Java
Person person = new Person( );
person.setFirstName(quot;Lolaquot;);
person.setLastName(quot;Boonequot;);
Marshaller marshaller =
DataBindingFactory.newMarshaller( );
marshaller.marshal(person, new
FileWriter(quot;lola.xmlquot;));
63
64. XML in Java enterprise dvlp
Spring – wildly popular dependency injection f'work
Hibernate – the most widely used object →
relational mapping framework
Ibatis – a popular object → query mapping f'work
SOAP – web service configuration and messages
JPA – relatively new persistence spec (part of EJB3)
App servers and web containers (JBoss, Tomcat,
Jetty)
Struts, JSF, AJAX – Java web frameworks
64
65. XML in databases
Enterprise RDBMS's support XML – Oracle,
SQLServer, DB2 and Sybase
Oracle has had XMLType since v 9.x
XML DOMs can be persisted in Oracle (and
queried). Syntax somewhat clunky, but queries from
SQL*Plus possible.
Consider other options if your RDBMS doesn't
persist natively (write to CLOB)
65
66. Native XML databases
A popular fad in the early 21st century.
Not too many use cases for this – possibly document
publishing, web publishing
Mark Logic most popular native XML db
Open source XML dbs include Xindice (no support
for XQuery) & eXist (perf issues)
Tamino – first native XML db – now abandoned
66
67. XML and web services
Two kinds of web services – SOAP, REST
SOAP is still the dominant way to do webservices,
but REST is gaining popularity
WSDL is an XML based language to describe web
services and how to access them.
Sample WSDL:
http://geocoder.us/dist/eg/clients/GeoCoder.wsdl
Besides WSDL, SOAP messages are also specified in XML
(demo GeoCoder example)
67
68. References
http://www.w3schools.com
http://www.xml.com
http://www.xml.org
http://www.zvon.org
https://jaxp.dev.java.net/1.4/
http://www.saxproject.org
http://projects.apache.org/indexes/category.html#xml
Professional XML by Bill Evjen et al.
Pro XML Development with Java Technology by Ajay Vohra et al
Java and XML by Brett McLaughlin et al
68