Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Krextor – An Extensible XML→RDF Extraction Framework
1. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Krextor – An Extensible XML→RDF Extraction
Framework
Scripting for the Semantic Web, 5th Workshop
Christoph Lange
Jacobs University, Bremen, Germany
KWARC – Knowledge Adaptation and Reasoning for Content
May 31, 2009
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 1/15
2. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Overview
Want XML applications to contribute to the Semantic Web?
1 Define a schema→ontology mapping for your XML language
2 Extract RDF from XML
Krextor:
Specify XML→ontology
mappings (as extraction
rules)
Perform extraction
(XSLT-based
implementation)
http://kwarc.info/projects/krextor/
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 2/15
3. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
XML vs. RDF
Two slices of the infamous Layer Cake:
RDF
XML
Doesn’t tell much about the role of XML:
1 XML only for encoding higher-layer formalisms like RDF or OWL?
2 or XML as a metalanguage of its own right?
In case (2), we need a semantics for XML-based languages!
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 3/15
4. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
XML languages
Advantages of using XML for knowledge representation (and not
just RDF):
1 Sequential order out of the box
2 Style languages (CSS, XSL)
Given any domain, . . .
can define an XML schema for a domain-specific language
concise syntax for domain experts
no need to think in triples (compare OWL XML vs. RDF/XML)
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 4/15
5. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
What about the semantics?
<workshop xml:id="SFSW09"
conference="#ESWC09"
number="5"
date="2009-05-31">
<title short="SFSW">Scripting for the Semantic Web</title>
</workshop>
Usual approach: human-readable specification, then hard-code
Semantic approaches: RDFa, Microformats
Open questions:
1 How to give above language a direct RDF-based semantics?
2 How to implement the XML→RDF translation?
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 5/15
6. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Making an XML language semantic
We are focused on practical implementation, not on a formal
semantics bridging XML and RDF.
We want to benefit from existing XML and RDF tools.
Our approach:
1 provide rules that translate XML to RDF
2 if needed, supply an ontology as vocabulary for the extracted
RDF
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 6/15
7. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Krextor’s History
1 Origin: OMDoc (Open Mathematical
Documents; XML schema and ontology)
manage in a semantic wiki
2 Hard-coded Java implementation: too
unflexible to maintain
3 More lightweight approach: XSLT coded
from scratch (OMDoc→RXR→Java)
4 Needed support for other languages http://kwarc.info/
5 Created Krextor, a generic XSLT-based projects/krextor/
framework
6 . . . and provided some more
translations (‘‘extraction modules’’)
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 7/15
8. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
The Framework
OMDoc
+RDFa RDF/XML
OMDoc/OWL
+RDFa
XHTML RXR Turtle
+RDFa ?
generic ?
OpenMath
representation your format
my XML
+RDFa? Java
my Microformat callback input format
output format
Collection of XSLT stylesheets, Java wrapper, Shell frontend
Output targetted at machines, not humans
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 8/15
9. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Adding Input and Output Modules
Input module (for a new XML language):
very simple declarative mappings (element class)
otherwise pattern-match XML structure, then call a predefined
template: create resource, add property, etc.
several ways of generating URIs for XML elements: xml:id,
auto-generated, custom
Output module (for a new RDF serialization):
implement low-level ‘‘triple generation template’’
or post-process output of an existing module
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 9/15
10. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Our own applications
Semantic wiki: SWiM semantic wiki (http://swim.kwarc.info)
mathematical documents (OMDoc, OpenMath)
extract RDF outline from documents
use it for navigation, querying, problem-solving
assistance
Documented ontologies:
write ontologies in OMDoc
(better documentability → poster session)
Krextor translates to OWL
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 10/15
11. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Example: hCalendar Microformat (1)
Input:
<div class="vevent">
<a class="url" href="http://www.eswc2009.org">ESWC</a>
starts on <span class="dtstart">2009-05-31</span>.</div>
Desired output:
<http://www.eswc2009.org>
a <http://www.w3.org/2002/12/cal/ical#Vevent> ;
<http://www.w3.org/2002/12/cal/ical#dtstart>
"2009-05-31"^^<http://www.w3.org/2001/XMLSchema#date>
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 11/15
12. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Example: hCalendar Microformat (2)
Usage: krextor hcalendar..turtle infile.xhtml
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 12/15
14. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Related Work
Swignition: extensive support for ‘‘standard’’ semantics (RDFa,
microformats, GRDDL), but harder to add a new input
language
XSDL: declarative XML→OWL-DL mapping. Not (?)
implemented; would make a nice frontend to Krextor
XSPARQL: combines SPARQL and XQuery, breaks boundaries
between XML and RDF. Currently rather one-time
queries than complete translations.
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 14/15
15. Sem. Markup and RDF Krextor Framework Applications Examples Related Conclusion
Conclusion
Krextor supports many XML→RDF conversion tasks
Easy to extend, easy to integrate into applications
Possible integration into engineering workflows:
Ontology engineering: First design the ontology, then a convenient
XML syntax for domain-specific knowledge
Language engineering: Specify the semantics while engineering
the schema
Ch. Lange (Jacobs University) Krextor – An Extensible XML→RDF Extraction Framework May 31, 2009 15/15