2. What is XML?
XML stands for EXtensible Markup Language.
XML is a markup language much like HTML.
XML was designed to describe data and focus on what
data is.
2
3. eXtensible Markup Language
Helps information systems share structured data.
A meta language that gives meaning to data that other
application can use.
Application and platform independent.
Allows various types of data.
Extensible to accommodate new tags and processing methods.
Allows user-defined tags.
4
4. Advantages of using XML
Simpler version of Standard Generalized Markup
Language (SGML).
Easy to understand and read.
Supported by large number of platforms.
Used across open standards.
5
5. Components of an XML Document
1. Elements: <hello>
2. Attributes: <item id=“33905”>
3. Entities: < (<)
4. Advanced Components
1. CData Sections
2. Processing Instructions
6
10. Declaration:
First line in document.
Provides information to the parser.
Recommended but optional.
Contains three name-value pairs:
Version (common).
Encoding (defaults to UTF-8).
Standalone (rare).
11
11. Tags:
Text in between <and >
Have start tag and end tag.
Tags and data stored together.
Data is self-descriptive and easy to under stand.
12
13. Elements:
Basic building blocks of XML file.
Text between a start tag and end
tag is considered the value of the
element
Documents contain one root
element.
Can contain Nested elements.
14
14. Attributes:
Provide additional information about
the elements.
Name-value pairs:
- Single or double quotes to encode
values.
- Attribute names are unique within
the same element.
16
15. Comments:
Appear anywhere in document
- Start tag <!--
- End tag --!>
contents inside comment are not parsed.
17
16. More in XML:
1. Schemas
2. Parsers
3. Editors
4. Standards
18
17. 1. Schemas:
Describe the structure and content of an XML
document.
Define a shared vocabulary for application.
Can be expressed using XML schema languages
such as:
-Document Type Definition (DTD).
-XML Schema (W3C).
19
19. 2. Parsers:
Read and process the content of an XML
document.
Include push and pull parsers
-Pull parsers: events generated by the application
-Push parsers: events controlled by the parser
Free XML parsers available, including tools from
IBM.
21
20. 3. Editors:
Text and graphical editors facilitate the editing
of XML code.
Benefits of using editors:
coding effort.
-Provide to perform tasks.
22
22. 4. Standards:
Various types of standards:
- Core standards from the basis of what is expressed
in an XML document.
- Processing standards relate to XML processing by
developers.
- Key vocabularies (applications).
XML standards influencers include the W3C, ISO and
OASIS.
24
23. XML RuLes:
1. Must Have a Closing Tag.
In HTML, some elements do not
have to have a closing tag:
<p>This is a paragraph
<p>This is another paragraph
In XML, it is illegal to omit the
closing tag.
<p>This is a paragraph</p>
<p>This is another paragraph</p>
2. XML Tags are Case Sensitive.
XML tags are case sensitive. The tag
<Letter> is different from the tag
<letter>.
<Message>This is incorrect</message>
<message>This is correct</message>
"Opening and closing tags"
are often referred to as "Start and
end tags". Use whatever you prefer.
It is exactly the same thing.
25
24. XML RuLes:
3. Elements Must be Properly
Nested:
In HTML, you might see improperly
nested elements:
<b><i>This text is bold and
italic</b></i>
In XML, all elements must be
properly nested within each other:
<b><i>This text is bold and
italic</i></b>
4. XML Documents Must Have a Root
Element:
XML documents must contain one
element that is the parent of all
other elements. This element is called
the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
< /root>
26
25. XML RuLes:
XML Attribute Values Must be
Quoted:
XML elements can have attributes
in name/value pairs
< note date=12/11/2007>
< to>Tove</to>
< from>Jani</from>
< /note>
< note date="12/11/2007">
< to>Tove</to>
< from>Jani</from>
< /note> 27
Wrong
Right
26. XML RuLes:
28
5. Entity References
Some characters have a special meaning in XML.
-character like "<" inside an XML element, will
generate an error because the parser interprets it as
the start of a new element.
<message>if salary < 1000 then</message>
<message>if salary < 1000 then</message>
27. Characters have a special meaning in XML
Characters meanings in XML
Less than
<<
Greater than
>&qt;
ampersand
&&
apostrophe
‘'
Quotation mark"
29
28. tensibleXeXML Elements are
XML elements can be extended to carry more information.
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Should the application break or crash?
No. One of the beauties of XML, is that it can be
extended without breaking applications. 31
29. Examples: 1- book store
<bookstore>
<book category="CHILDREN">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
32
30. Why XML For -informatics?
Biology is a complex discipline.
Wide variety of data resources and repositories.
Biological data represented in multiple formats. (FASTA
, agp ,gff..)
No standard protocol:
1-to interrogate biological data stores.
2-for Genomic, Proteonomic, Chemi-informatics.
3-to exchange biological data.
Difficulties in using and exchanging data.
34
31. XML in -informatics
1- (Visual Genomics).
2- (ProteoMetrics).
3- (Chemical info. “atomic, crystallographic
info., structures….”).
4- ene ntology onsortium.
35
32. The Bioinformatics Sequences Markup Language
(BSML)
-The DTD is aimed at representing DNA, RNA, Protein
sequences and their graphic properties.
-Found the structure of the information to be similar to
the one used in the databases.
(http://www.ebi.ac.uk/embl.html)
(http://www.visualgenomics.com/products/index.html)
(http://www.ncbi.nlm.nih.gov; http://www.ddbj.nig.ac.jp) 36
34. The BIOpolymer Markup Language
(BioML)
- is different to BSML approach.
- BioML Goal (Fenyo, 1999) is “
BioML was designed to mimic the
hierarchical structure of aliving organism.”
- Data integration e.g nucleotide and protein sequences
38