SlideShare a Scribd company logo
1 of 26
Download to read offline
GRDDL
The Why, What, How, and Where




                            Chimezie Ogbuji
                            Cleveland Clinic Foundation
GRDDL: The Acronym
 Gleaning
 Resource
 Descriptions (from)
 Dialects (of)
 Language



   Rather long and intimidating
GRDDL: By Deconstruction

   Wordnet Definition of Glean:
    ◦ (gather, as of natural products)
    ◦ Synonyms: reap, harvest.
   Resource Description Framework (RDF)
    ◦ Logical assertions
   Dialects of Language
    ◦ XML document families (XHTML, for instance)
GRDDL: By Analogy
           GRDDL can be thought of
           as a protocol for sowing
           semantics in web content
           for later harvest.
The Why
   Vast amount of latent semantics in markup
        <span>Chimezie Ogbuji<span>
   Web content today is primarily built for
    human consumption
   Text indexing will only get you so far for
    document retrieval
   If machines are meant to harvest RDF from
    documents, reproducible protocols are
    needed
The Why (Cont.)
 Microformats, eRDF, and RDFa
     Specific to a particular family of
      documents
     XHTML and HTML
 If the goal is machine consumption, the
  bar needs to be raised beyond XHTML
The Why (Cont.)
 It seems easy to forget that XHTML is
  indeed an XML dialect
     You would think the (X) would make
      that obvious
 What was needed was a standard way to
  harvest RDF that is applicable to all XML
  dialects
The What
   Faithful rendition
   Transformations
   GRDDL result
   Source documents
   GRDDL-aware Agents
Faithful Rendition
“By specifying a GRDDL transformation, the author of a document
  states that the transformation will provide a faithful rendition in
  RDF of information (or some portion of the information)
  expressed through the XML dialect used in the source document.”

 Licenses an author-certified interpretation of
  an XML document
 A powerful paradigm for messaging
    See David Booths “RDF and SOA”
        http://www.w3.org/2007/01/wos-papers/booth
GRDDL Transformations
   Functions that take an XML document and
    return an RDF graph
   Transformations can be written in any
    particular language
   The “reference” transformation language is
    XSLT
        “[XSLT1] is the format most widely supported by GRDDL-
         aware agents as of this writing […] is specifically designed to
         express XML to XML transformations and has some good
         safety characteristics”
Other Transformation Languages
   “.. technically Javascript, C, or virtually any
    other programming language may be used to
    express transformations for GRDDL”
   However, these transformations need to be
    deterministic in order to ensure the result is
    a faithful rendition
   Hence, they must be functions
GRDDL Result
   The result of applying the transformation is
    an RDF serialization
   The RDF graph that corresponds to the
    serialization is a GRDDL result of the
    original document
   The “reference” result format is RDF/XML
   Other formats can be used (Turtle, N3,etc.)
GRDDL Source Documents
   The class of documents for which GRDDL
    defines a way to extract a result graph:
      XML Documents
      XML Namespace Documents
      Valid XHTML
      XHTML Profiles
GRDDL Source Documents
GRDDL: XML Documents
   GRDDL Namespace (grddl prefix)
              http://www.w3.org/2003/g/data-view#


   transformation attribute
    <?xml version=“1.0” encoding=“UTF-8”?>
    <root
     xmlns:grddl='http://www.w3.org/2003/g/data-view#’
     grddl:transformation=“.. path to transform ..”>
    … XML content ..
    </root>
Namespace Documents
“Transformations can be associated not only with individual
   documents but also with whole dialects that share an XML
   namespace”

   A GRDDL source document lives at the
    location of the namespace URI of the root
    element (the namespace document)
   The GRDDL result of the namespace
    document has a statement of the form:
            ?nsDoc grddl:namespaceTransformation ?txDoc
•   txDoc is the location of a transformation
    applicable to such XML documents
Valid XHTML Documents
    <html xmlns="http://www.w3.org/1999/xhtml">
     <head
      profile="http://www.w3.org/2003/g/data-view">
      <title>Some Document</title>
        <link rel="transformation"
              href=”.. path to transformation .. " />
        ...
     </head>
    …
    </html>
   Refers to the GRDDL XHTML profile
      Licenses the interpretation of
       rel=“transformation” links
XHTML Profiles
“Adding a GRDDL profileTransformation assertion to a profile
  document is much like adding a namespaceTransformation
  assertion to a namespace document”

   A GRDDL source document lives at the
    location of the profile URI an XHTML
    document
   The GRDDL result of the profile document
    has a statement of the form:
            ?profileDoc grddl:profileTransformation ?txDoc
•   txDoc is the location of a transformation
    applicable to such XML documents
The How
   GRDDL builds on existing XML & RDF
    standards
   An implementation mostly needs to
    orchestrate:
       Parsing of data representations
       Resolving representations from web locations
       The necessary XML processing to peek into and
        harvest RDF from the various sources
       The highly recursive nature of GRDDL 
Technological Overlap
Anatomy of a GRDDL
Implementation: GRDDL.py
   A reference implementation from scratch
   650 LOC
        RDFLib, 4Suite-XML, and Python control logic
   A layered approach
        Core module that handles transformations
        One module per source type stacked on top of the
         core
        A top layer that orchestrates the recursion and
         identification of which ‘class’ a source document
         belongs to
GRDDL.py Core
Component Stack
The Where
   GRDDL services online:
        http://triplr.org/ (Stuff in, triples out)
        http://www.w3.org/2007/08/grddl/ (W3C GRDDL
         Service)
   Primary GRDDL implementations:
        Redland
        GRDDL.py
        Virtuoso
        GRDDL Reader for Jena
   RDFa is most common GRDDL source
    content format in the wild
Hidden Value Proposition
   Supports separation of concerns:
      XML for messaging, data collection,
       structural validation
      RDF for Expressive assertions, inference,
       etc.
   A way to invest in data richness and
    accessibility
GRDDL Usecases
   Embedding scheduling assertions on
    personal pages
   Using GRDDL for extracting RDF from XML
    medical record documents
      Cleveland Clinic use case (clinical
       research)
   Aggregating web-based product reviews
   Embedding web service descriptions
   Adding semantic assertions to XML schemas
   Embedding semantic assertions to Wikis

More Related Content

What's hot

Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Connected Data World
 
Rdf And Rdf Schema For Ontology Specification
Rdf And Rdf Schema For Ontology SpecificationRdf And Rdf Schema For Ontology Specification
Rdf And Rdf Schema For Ontology Specification
chenjennan
 

What's hot (20)

NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
 
SHACL-based data life cycle management
SHACL-based data life cycle managementSHACL-based data life cycle management
SHACL-based data life cycle management
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF Graphs
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
XML and Databases
XML and DatabasesXML and Databases
XML and Databases
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases Performance Benchmarking of Key-Value Store NoSQL Databases
Performance Benchmarking of Key-Value Store NoSQL Databases
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Rdf And Rdf Schema For Ontology Specification
Rdf And Rdf Schema For Ontology SpecificationRdf And Rdf Schema For Ontology Specification
Rdf And Rdf Schema For Ontology Specification
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product Stack
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Use of ISOcat within CMDI
Use of ISOcat within CMDIUse of ISOcat within CMDI
Use of ISOcat within CMDI
 

Similar to GRDDL: The Why, What, How, and Where

Applied xml programming for microsoft 3
Applied xml programming for microsoft 3Applied xml programming for microsoft 3
Applied xml programming for microsoft 3
Raghu nath
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
AnushaMahmood
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
Richard.Sapon-White
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
webhostingguy
 

Similar to GRDDL: The Why, What, How, and Where (20)

Applied xml programming for microsoft 3
Applied xml programming for microsoft 3Applied xml programming for microsoft 3
Applied xml programming for microsoft 3
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
Web services Overview in depth
Web services Overview in depthWeb services Overview in depth
Web services Overview in depth
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
Introduction to RDFa
Introduction to RDFaIntroduction to RDFa
Introduction to RDFa
 
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
 
Xml
XmlXml
Xml
 
Introduction To Docbook 4 .5 Authoring
Introduction To Docbook 4 .5   AuthoringIntroduction To Docbook 4 .5   Authoring
Introduction To Docbook 4 .5 Authoring
 
RDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFaRDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFa
 
How RDFa works
How RDFa worksHow RDFa works
How RDFa works
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
 
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 

More from Chimezie Ogbuji

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Chimezie Ogbuji
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
Chimezie Ogbuji
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
Chimezie Ogbuji
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Chimezie Ogbuji
 

More from Chimezie Ogbuji (11)

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

GRDDL: The Why, What, How, and Where

  • 1. GRDDL The Why, What, How, and Where Chimezie Ogbuji Cleveland Clinic Foundation
  • 2. GRDDL: The Acronym  Gleaning  Resource  Descriptions (from)  Dialects (of)  Language  Rather long and intimidating
  • 3. GRDDL: By Deconstruction  Wordnet Definition of Glean: ◦ (gather, as of natural products) ◦ Synonyms: reap, harvest.  Resource Description Framework (RDF) ◦ Logical assertions  Dialects of Language ◦ XML document families (XHTML, for instance)
  • 4. GRDDL: By Analogy GRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.
  • 5. The Why  Vast amount of latent semantics in markup <span>Chimezie Ogbuji<span>  Web content today is primarily built for human consumption  Text indexing will only get you so far for document retrieval  If machines are meant to harvest RDF from documents, reproducible protocols are needed
  • 6. The Why (Cont.)  Microformats, eRDF, and RDFa  Specific to a particular family of documents  XHTML and HTML  If the goal is machine consumption, the bar needs to be raised beyond XHTML
  • 7. The Why (Cont.)  It seems easy to forget that XHTML is indeed an XML dialect  You would think the (X) would make that obvious  What was needed was a standard way to harvest RDF that is applicable to all XML dialects
  • 8. The What  Faithful rendition  Transformations  GRDDL result  Source documents  GRDDL-aware Agents
  • 9. Faithful Rendition “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”  Licenses an author-certified interpretation of an XML document  A powerful paradigm for messaging  See David Booths “RDF and SOA”  http://www.w3.org/2007/01/wos-papers/booth
  • 10. GRDDL Transformations  Functions that take an XML document and return an RDF graph  Transformations can be written in any particular language  The “reference” transformation language is XSLT  “[XSLT1] is the format most widely supported by GRDDL- aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good safety characteristics”
  • 11. Other Transformation Languages  “.. technically Javascript, C, or virtually any other programming language may be used to express transformations for GRDDL”  However, these transformations need to be deterministic in order to ensure the result is a faithful rendition  Hence, they must be functions
  • 12. GRDDL Result  The result of applying the transformation is an RDF serialization  The RDF graph that corresponds to the serialization is a GRDDL result of the original document  The “reference” result format is RDF/XML  Other formats can be used (Turtle, N3,etc.)
  • 13. GRDDL Source Documents  The class of documents for which GRDDL defines a way to extract a result graph:  XML Documents  XML Namespace Documents  Valid XHTML  XHTML Profiles
  • 15. GRDDL: XML Documents  GRDDL Namespace (grddl prefix) http://www.w3.org/2003/g/data-view#  transformation attribute <?xml version=“1.0” encoding=“UTF-8”?> <root xmlns:grddl='http://www.w3.org/2003/g/data-view#’ grddl:transformation=“.. path to transform ..”> … XML content .. </root>
  • 16. Namespace Documents “Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace”  A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)  The GRDDL result of the namespace document has a statement of the form: ?nsDoc grddl:namespaceTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  • 17. Valid XHTML Documents <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href=”.. path to transformation .. " /> ... </head> … </html>  Refers to the GRDDL XHTML profile  Licenses the interpretation of rel=“transformation” links
  • 18. XHTML Profiles “Adding a GRDDL profileTransformation assertion to a profile document is much like adding a namespaceTransformation assertion to a namespace document”  A GRDDL source document lives at the location of the profile URI an XHTML document  The GRDDL result of the profile document has a statement of the form: ?profileDoc grddl:profileTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  • 19. The How  GRDDL builds on existing XML & RDF standards  An implementation mostly needs to orchestrate:  Parsing of data representations  Resolving representations from web locations  The necessary XML processing to peek into and harvest RDF from the various sources  The highly recursive nature of GRDDL 
  • 21. Anatomy of a GRDDL Implementation: GRDDL.py  A reference implementation from scratch  650 LOC  RDFLib, 4Suite-XML, and Python control logic  A layered approach  Core module that handles transformations  One module per source type stacked on top of the core  A top layer that orchestrates the recursion and identification of which ‘class’ a source document belongs to
  • 24. The Where  GRDDL services online:  http://triplr.org/ (Stuff in, triples out)  http://www.w3.org/2007/08/grddl/ (W3C GRDDL Service)  Primary GRDDL implementations:  Redland  GRDDL.py  Virtuoso  GRDDL Reader for Jena  RDFa is most common GRDDL source content format in the wild
  • 25. Hidden Value Proposition  Supports separation of concerns:  XML for messaging, data collection, structural validation  RDF for Expressive assertions, inference, etc.  A way to invest in data richness and accessibility
  • 26. GRDDL Usecases  Embedding scheduling assertions on personal pages  Using GRDDL for extracting RDF from XML medical record documents  Cleveland Clinic use case (clinical research)  Aggregating web-based product reviews  Embedding web service descriptions  Adding semantic assertions to XML schemas  Embedding semantic assertions to Wikis