2. Document Model
• Features of XML –
– create our own markup language
– defining elements and attributes that best fit the
information you want to encapsulate
• What’s still missing is
– a way to define the language in a formal way
– to restrict the vocabulary of elements and attributes to a
manageable set
– to control the grammar of elements
• The process of formally defining a language in XML is
called document modeling
• Two ways to model a document
– DTD (describe a document's structure with declarative
rules)
– XML Schema
3. DTD Overview
• DTD's syntax inherited from SGML
• DTDs are not XML document
• DTDS can not be parsed
• DTDS can not be manipulated (e.g., searched,
transformed into different representation)
• DTD describes the structure
• It has non-extensible content model
• Only content type is PCDATA
• Attributes have also non-extensible types
• Absence of user defined types
4. Contd..
• <quantity> 5 </quantity> and
<quantity> HELLO </quantity> are valid
• One will like to restrict quantity to be numeric only and will
expect the parser to detect the type violation
• With XML Schema, element quantity’s data can indeed be
described as numeric.
• When the preceding markup examples are validated
against an XML Schema that specifies element quantity’s
data must be numeric, 5 conforms and hello fails.
• An XML document that conforms to a schema document is
schema valid and a document that does not conform is
invalid.
• DTD and Schema both coexist
5. XML Schema
• Schema is an alternative modeling language
• Schema technology is still evolving
• Major schema models: XDR and XSD
• The XML schema defines
– the shape or structure of the XML document,
– rules for data content
– semantics such as
• what fields an element can contain,
• which sub elements it can contain and
• how many items can be present.
– the type and values that can be placed in each element or
attribute.
– XML data constraints (facets) includes rules such as min
and max length.
6. Some Observations
• Schema document uses XML syntax
• Schema's are XML documents
• Schema documents conform to DTDs
• Schemas are valid documents
• Schema processor provides additional information
to application
7. DTD vs XSD
• DTD has a simple syntax for content definition
• DTD has limitations when using XML for a variety
of complex purposes
• W3C recommended "XML Schema" as a schema
definition language to replace DTD.
• XML schema, commonly known as an XML
Schema Definition (XSD), describes what a given
XML document can contain.
8. Contd..
• Example XML :
<employees>
<employee id=”101”>
<name> Tom </name>
<department> CSA </department>
<salary> 35000 </salary>
<email> tom.peter@gmail.com</email>
</employee>
…...
</employees>
DTD:
<!ELEMENT employees (Employee)*>
<!ELEMENT employee (name, department, salary, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT department (#PCDATA)>
<!ELEMENT salary (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ATTLIST employee id CDATA #REQUIRED>
12. Pros and Cons in DTD
• Disadvantages in DTD
– Not written in XML
– Lacks strong typing capabilities
– Cannot validate the content to data types
• These disadvantage are made advantage in
XSD.
14. XSD
• The XSD structure starts with the root element
named “schema”
<xs:schema></xs:schema>
• The schema declaration looks like :
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.org/employee"
elementFormDefault="qualified">
...
..
</xs:schema>
15. Attributes of XSD
• xmlns:xs=http://www.w3.org/2001/XMLSchema
– the elements used and the data types used in the schema
are used from the namespace
– the prefix for these elements and the datatypes will be "xs"
• targetNamespace=" http://www.example.org/employee“
– the elements defined by this schema come from
"http://www.example.org/employee "
• elementFormDefault="qualified"
– any elements used by the XML document which were
declared in this schema must be namespace qualified
16. Element Declaration
• The elements of the xml document are defined
with the schema element declaration.
• The elements can be either simple or complex.
• Simple element
– contains only text
– cannot contain any other element or attribute.
– eg: <name> Tom </name>
• Complex element
– contains other elements in it.
– the elements can have attributes also
18. Simple Type
• Syntax –
– <xs:element name="xxx" type="yyy"/>
– ‘xxx’ is the name of the element and ‘yyy’ is the
data type of the element.
19. Data Types
• There are many data types in XSD. Data types
are classified into
– XSD Strings
– XSD Numeric
– XSD Date
20. XSD Strings
• A String data types contains characters like
alphabets, numbers and special characters,
line feed, carriage returns and tab spaces
Data Types Description
string A string
name A string which contains a valid name
normalizedString A string that does not contain line feeds, carriage
returns, or tabs
21. XSD Numeric
• These data types contains numbers which may
be a whole number or decimal number.
Data types Description
Integer Contains integer value
Decimal Contains decimal value
positiveInteger Contains integer value which is only positive
22. XSD Date
• This data type contains date and time values.
• Format of the date is “YYYY-MM-DD”
• All are mandatory
• The format for time is “hh:mm:ss”
Data types Description
Date Defines the date value (YYYY-MM-DD)
Time Defines the time value (hh:mm:ss)
DateTime Defines both data and time (yyyy-mm-ddThh:mm:ss)
23. Simple Type
• Example:
<name> Johan </name>
<age> 28 </age>
<dob> 1985-07-27 </dob>
• DTD for the above
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT dob (#PCDATA)>
• XSD for the above
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dob" type="xs:date"/>
24. Simple Type - Default / Fixed Value
• Simple elements might have default or fixed values that can
be specified in the schema definition
• In default, this value will be inserted if no other value is
given else will take the value given in the XML document.
• In fixed, the value given in the schema definition is only
assigned and no other value can be given in the XML
document.
• Example:
<xs:element name="salary" type="xs:integer"
default="20000"/>
<xs:element name="color" type="xs:string"
fixed="yellow"/>
25. Attribute
• Attribute are properties that define a XML
element
• Attributes are themselves a simple type.
• Simple element cannot have attribute.
• An element with attribute becomes a complex
type
• Attributes also has data types, default and fixed
values
• Example:
<employee id=”101”>Tom </employee>
The Schema definition of the "id" attribute :
<xs:attribute name=”id” type=”xs:integer”/>
26. Contd..
• Required and Optional in attributes
– By default the attributes are optional
– To make it mandatory add an attribute named “use”.
<xs:attribute name="id" type="xs:integer" use="required"/>
• Restrictions
– Restrictions are conditions that are applied on an
element.
– Restriction makes the element to be defined within a
boundary.
– For example, the age should be within 18 to 58. This
restriction cannot be given when defining the XML
Schema of the “age” element.
27. Restriction Description
Enumeration Defines a list of values for an element
Length Defines the exact number of characters or list elements that are allowed. The
value of this length must equal to or greater than zero.
maxExclusive Defines the upper limit for numeric values (the value must be less than this
value)
maxInclusive defines the upper limit for numeric values (the value must be greater than or
equal to this value)
maxLength Defines the maximum number of characters or list items that is allowed.
Must be equal to or greater than zero
minExclusive Defines the lower limit for numeric values (the value must be greater than
this value)
minInclusive defines the lower limit for numeric values (the value must be greater than or
equal to this value)
minLength Defines the minimum number of characters or list items allowed. Must be
equal to or greater than zero
Pattern Defines the exact sequence of characters that are acceptable
whiteSpace Defines how white space (line feeds, tabs, spaces, and carriage returns) is
handled
totalDigits Defines the exact number of digits allowed. Must be greater than zero
28. Simple Type - Example
• Simple element
– restriction for a simple element “age”.
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="18"/>
<xs:maxInclusive value="58"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
30. Contd..
• Using range of data
<xs:element name="status">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:pattern value="[0-9]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
• The element “status” can accept an integer
which can be between 0 to 9.
31. Contd..
• Using OR " | “
<xs:element name="flag">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value=”true|false"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
• The element “flag” can have either the value
“true” or “false”
32. Contd..
• Restriction
<xs:element name="productId">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[a-z]{2}[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
• the element “productId” should have totally 8
characters in which the first 2 are smaller case
alphabets and the remaining 4 are numbers
<productID>cs1234</product> – valid data value
<productID>CS123</product> – invalid data value
33. Complex Elements
• Complex elements contains other elements and
attributes within them
<employee id=”101”>
<name> Johan </name>
<age> 28 </name>
<salary> 35000 </salary>
</employee>
Complex Element
Empty Elements
Elements that
contain only sub
elements
Elements that
contain only text
Elements that
contains both text
and other
elements
34. Complex : Empty Element
<employee id=”101”/>
• This element “employee” does not have any element
inside them but do have an attribute named “id”
• This makes the element as a complex element
• The schema for this represented as
<xs:element name="employee">
<xs:complexType>
<xs:attribute name="id" type="xs:positiveInteger"/>
</xs:complexType>
</xs:element>
35. Complex Elements
• Elements that contain elements
<employee>
<name> Tom </name>
<age> 28 </name>
</employee>
Here complex element contains sub elements within them
• Schema for the above :
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
36. Contd..
• Mixed type element
– contains sub elements, attributes and text in it
<xs:element name="MarkedUpDesc">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="Bold" type="xs:string" />
<xs:element name="Italic" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
<MarkedUpDesc>
This is an <Bold>Example</Bold> of <Italic>Mixed</Italic>
Content. Note there are elements mixed in with the elements
data.
</MarkedUpDesc>
38. Order Indicators
• Sequence indicator
– ensures that all the sub elements are defined
– can be defined in the same order as given in the XSD
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<employee>
<name> Tom </name>
<age> 28 </age>
</employee>
<employee>
<age> 28 </age>
<name> Tom </name>
</employee>
Correct
Incorrect
39. Contd..
• All indicator
– ensures that all the sub elements are defined
– can be defined in any order
<xs:element name="employee">
<xs:complexType>
<xs:all>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:all>
</xs:complexType>
</xs:element>
<employee>
<age> 28 </age>
<name> Tom </name>
</employee>
<employee>
<name> Tom </name>
<age> 28 </age>
</employee>
Correct Correct
40. Contd..
• Choice indicator
– defines that either one of the child element must occur
within the element
<xs:element name="employee">
<xs:complexType>
<xs:choice>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:choice>
</xs:complexType>
</xs:element>
<employee>
<age> 28 </age>
</employee>
<employee>
<name> Tom </name>
</employee>
Correct Correct
<employee>
<name> Tom </name>
<age> 28 </age>
</employee>
Incorrect
41. Occurrence Indicators
• Defines the number of times an element can
occur
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name=“childname" type="xs:integer“
minOccurs=”0” maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<employee>
<name> Tom </name>
</employee>
<employee>
<name> Tom </name>
<childname>A</childname>
</employee>
Correct Correct
42. Group
• Defines a group of elements
• may contain one or more sequence, choice
and/or all elements
• can occur within complexType, sequence,
choice, and restriction
43. Examples: group and sequence
<xsd:group name=“personalinfo">
<xsd:sequence>
<xsd:element name=“firstname" type=“xsd:string"/>
<xsd:element name=“lastname" type=“xsd:string"/>
</xsd:sequence>
</xsd:group>
<xsd:complexType name=“person">
<xsd:group ref=“personalinfo"/>
<xsd:attribute name=“citizenship" type=“xsd:string"/>
<!-- other elements -->
</xsd:complexType>
<xsd:sequence minOccurs=“min" maxOccurs=“max">
- - -
</xsd:sequence>
45. Associating XML with XSD
• Define an XSD to create an XML file which
contains employee’s information like name,
department, salary and email.
• There can be many employee details present
in the XML file)
48. Dividing the XML Schema
• The previous XML Schema is very simple
• But it becomes very difficult to read it, and
maintain the XML document.
• Avoid this by dividing the XML Schema as
– define the elements and attributes first and then
– make use of them using the “ref” keyword.
52. Using Named Types
• defines types, that enables you to reuse element
definitions
• done by giving names to the simpleTypes and
complexTypes elements
• make them point through the type attribute of
the element.