SlideShare a Scribd company logo
1 of 37
DOMEO ANNOTATION TOOLKIT
AND TEXT MINING


CREATING,   VISUALISING, CURATING AND SHARING
TEXT MINING RESULTS

Paolo Ciccarese, PhD
paolo.ciccarese@gmail.com


January 30th 2012, W3C Scientific Discourse Call
 Domeo Annotation Toolkit is a collection of software
  components that allow to create and share
  annotation of web documents and their fragments
 It can export and exchange all the annotation in
  Annotation Ontology (AO) RDF format
 The Domeo client is the user interface that can be
  used to produce manual and semi-automatic
  annotation of HTML documents directly in your
  browser


                              http://annotationframework.org/
ANNOTATION ONTOLOGY
   OWL vocabulary for representing and sharing
    annotation and semantic annotationof digital
    resources and their fragments:
       Is orthogonal to the domain(s) of interest




                                                     http://purl.org/ao/home
       Supports Stand-off annotation
       Offers tools for identifying fragments
       Designed with extension points
       Defines basic annotation containers
       Supports versioning
       Tracks provenance
DOMEO AND TEXT MINING SERVICES
 Domeo allows to trigger text mining algorithms
  when they are available through web services
 Software connectors have to be developed to
  translate the results in a suitable format
 The results are displayed in the web documents

 Users can record their feedback/judgment through
  customizable user interfaces
NCBO ANNOTATOR




                                                            http://www.bioontology.org/annotator-service
 Web service that annotates textual metadata (e.g.
  journal abstract) with relevant ontology concepts
 It is possible to preselect the ontologies of interests
  as one of the many parameters
DOMEO AND THE NCBO ANNOTATOR




                                                       http://www.bioontology.org/annotator-service
   Domeo allows automatic/manual annotation with
    terms coming from selected ontologies managed by
    the BioPortal
RUNNING NCBO ANNOTATOR




 Additional text mining services
 will be listed here
NCBO ANNOTATOR RESULTS IN DOMEO




List of recognized
entities
RESULTS CURATION

                   Customizable
CUMULATIVE RESULTS CURATION
 One item only
 All instances with the same text match

 All instances independently from the text match
SERIALIZATION IN AO/RDF
SOFTWARE CONNECTORS
At the current stage
 For each text mining service we have to write a
  specific connector that normally is translating offset
  and range into prefix and postfix
 And keep it up to date!
UIMA, CLEREZZA AND AO
OSS BASED    INFRASTRUCTURE FOR TEXT MINING OVER
ONTOLOGIES

TommasoTeofili and Paolo Ciccarese
tommaso@apache.org
APACHE UIMA
 Architecturalframework for UIM
 OASIS standard

 Build, deploy and run text mining pipelines

 Scaling capabilities for large volumes of data

 NLP/TM algorithms wrapped as Analysis Engines




                                   http://uima.apache.org/
UIMA TYPES
 Defining annotation domain in Typesystems
 Types and features are just declared

 Existing Typesystemscan be
  imported/exported/enhanced
 Ease data exchange between AEs

 Two “main” types
   TOP
   Annotation
APACHE CLEREZZA
 Service platform for linked data
 OSGi-based

 RDF API

 RESTful Web Service Framework

 TripleStore independent

 Integrated with Apache UIMA




                          http://incubator.apache.org/clerezza/
UIMA/CLEREZZA CONVENTION
 devs  can create custom types / typesystems
 need to manage URIs

 integration of services vs ontology sharing

 ClerezzaTypeSystem
     ClerezzaBaseAnnotation
         uri
     ClerezzaBaseEntity
       uri
       label (rdfs:label)

       references (annotations referring this entity)

     service specific annotations and entity types are defined
      subclassing the above
CLEREZZABASEANNOTATION DESCRIPTOR
CLEREZZABASEENTITYDESCRIPTOR
BEFORE
AFTER (URI FIELD INHERITED)
CONVERSION STRATEGIES
 UIMA  annotations stored inside CAS
 Services “talking” via webservices + RDF

 CAS to RDF mapping via Clerezza

 Pluggable mapping strategies
   Clerezza Default
   AnnotationOntology
   …
CONVERSION STRATEGIES
Change mapping strategies via XML/Eclipse plugin




Or in the descriptor directly
 <nameValuePair>
 <name>mappingStrategy</name>
 <value><string>ao</string></value>
 </nameValuePair>
CLEREZZA WEB SERVICES EXAMPLE
LOOKING AHEAD
DOMEO TOOLKIT V. 2

Paolo Ciccarese, PhD
DOMEO ANNOTATION TOOLKIT V.2
 DomeoAnnotation Toolkit v.2 is planned by the end
  of the first quarter of 2012
 It will consist in major refactoring to improve
  modularity and make plug-ins writing easier
 It will include various new features and will be the
  first step towards a federated architecture
 It will be open source!
DOMEO FEDERATION
 We currently have two instances of the Domeo
  Toolkit and the number of instances is going to
  increase
 We need to define a clean architecture that
  supports communication between instances or
  nodes
 Instances should be able to access each other
  annotations in multiple ways
Annotation Flow
                                                                         Web Service
  DOMEO FEDERATION                                                       Triplestore



      Domeo                                        Domeo    Web Client
               Web Client
      Node 1                                       Node 2




                                          SPARQL
                                      Web Client
                             Domeo                                         DomeoN
                             Node 3                                         ode 4
                    SPARQL




Ex: DT3 retrieves annotation from DT1 through a web service
and from DT2 through a SPARQL query against its triplestore
SOFTWARE ANNOTATION ACCESS
Nodes can access annotations of other nodes through
 Through Web Services
       Annotation by User
       Annotation by Group
       Annotation by Document
       Annotation by Corpora
       …
   SPARQL queries, when a SPARQL end-point is available
USERS ANNOTATION ACCESS
Users can export their own annotation in AO RDF
   Annotation by document
   Annotation by corpora
   All of the annotation
Request
CURRENT DOMEO ARCHITECTURE                              Annotation


                              Domeo
                              Web Client
                    AO-RDF




                Annotation
               Web Services



                               Domeo
                                                           User
                                           MySQL           Annotation
                                                           Export
 Text Mining                                       UI
 Connector




   NCBO
 Web Service

  NCBO
 Annotator
DOMEO NODE ARCHITECTURE
> ACCESSING EXTERNAL ANNOTATION
 Other          1                                         2
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store
               Web Services                Connector



Domeo v.2 Node
                                                                   User
                                           MySQL                   Annotation
                                                                   Export
 Text Mining                                                  UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
> ADDING A SPARQL ENDPOINT
 Other
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store    SPARQL
               Web Services                Connector

                                                          Triplestore
Domeo v.2 Node
                                                                        User
                                           MySQL                        Annotation
                                                                        Export
 Text Mining                                                      UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
    > TEXT MINING ALGORITHMS INTEGRATION
     Other                                                                     1
                                                                 External
     Domeo                            Domeo
                                                                Triplestore
      Node                            Web Client
                        AO-RDF
                                                                SPARQL

         AO-RDF                                                    AO-RDF


                    Annotation                                  Triple Store        SPARQL
                   Web Services                                 Connector

                                                                                   Triplestore
    Domeo v.2 Node
                              3                                 MySQL                            User
                                                                                                 Annotation
                                                                                                 Export
     Text Mining      Clerezza                Text Mining                                  UI
     Connector        Connector               Connector
2                                                           4


       NCBO            Clerezza               Text Mining
                                    Library




     Web Service      Web Service              Manager

       NCBO              UIMA                 Text Mining
      Annotator        Algorithm               Algorithm
DOMEO AND TEXT MINING
IN SUMMARY
   Run algorithms within Domeo
     Making available the algorithms through Web Services
     Integrating the algorithms - as libraries – within the
      Domeo architecture.
   Run algorithms separately and then
     Load the results into a Domeo node through web
      services
     Store the results directly in the (a) triplestore
     Store the results directly in the database
W3C COMMUNITY GROUP
OPEN ANNOTATION
 Annotation Ontology (AO) and Open Annotation
  Collaboration (OAC) are merging
 Unified model for representing and sharing
  annotation in RDF




                 http://www.w3.org/community/openannotation/
THANK YOU!
If you are interested in using - or contributing to -
the Domeo Annotation Toolkit follow our website
http://annotationframework.org or contact
paolo.ciccarese -at- gmail.com

More Related Content

Viewers also liked

Reference letter
Reference letterReference letter
Reference letter
Imraan Ali
 
Aut oct12 13
Aut oct12 13Aut oct12 13
Aut oct12 13
21dpr2035
 

Viewers also liked (20)

Gislaine
GislaineGislaine
Gislaine
 
Rfgdgfdfgdgdg
RfgdgfdfgdgdgRfgdgfdfgdgdg
Rfgdgfdfgdgdg
 
Linked in pp
Linked in ppLinked in pp
Linked in pp
 
Reference letter
Reference letterReference letter
Reference letter
 
635566847895062650 (1)
635566847895062650 (1)635566847895062650 (1)
635566847895062650 (1)
 
Fichas de animales (3)
Fichas de animales (3)Fichas de animales (3)
Fichas de animales (3)
 
Awdq
AwdqAwdq
Awdq
 
Aut oct12 13
Aut oct12 13Aut oct12 13
Aut oct12 13
 
Fotos
FotosFotos
Fotos
 
Aula obst verbal ok 12 09
Aula obst verbal ok 12 09Aula obst verbal ok 12 09
Aula obst verbal ok 12 09
 
Rd4
Rd4Rd4
Rd4
 
1465882066-106511783
1465882066-1065117831465882066-106511783
1465882066-106511783
 
Doc arquivos da cidade
Doc arquivos da cidadeDoc arquivos da cidade
Doc arquivos da cidade
 
Oficina origami folheto
Oficina origami folhetoOficina origami folheto
Oficina origami folheto
 
Documentos slide
Documentos slideDocumentos slide
Documentos slide
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
 
SherlockNet
SherlockNet SherlockNet
SherlockNet
 
Valoriser le numérique en médiathèque
Valoriser le numérique en médiathèqueValoriser le numérique en médiathèque
Valoriser le numérique en médiathèque
 
Everything about pest
Everything about pestEverything about pest
Everything about pest
 
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
10/13/16 Breakout Session III: The Role of Rural Education and Knowledge Systems
 

Similar to Domeo, Text Mining, UIMA and Clerezza

Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
Thomas Roessler
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29
Alexandre Morgaut
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
James Broberg
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle list
Bharath Marrivada
 

Similar to Domeo, Text Mining, UIMA and Clerezza (20)

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 Revolution
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management En
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29
 
Corba
CorbaCorba
Corba
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readout
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScript
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business Value
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle list
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
 

More from Tommaso Teofili

Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
Tommaso Teofili
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
Tommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
Tommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
Tommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 

More from Tommaso Teofili (19)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Domeo, Text Mining, UIMA and Clerezza

  • 1. DOMEO ANNOTATION TOOLKIT AND TEXT MINING CREATING, VISUALISING, CURATING AND SHARING TEXT MINING RESULTS Paolo Ciccarese, PhD paolo.ciccarese@gmail.com January 30th 2012, W3C Scientific Discourse Call
  • 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments  It can export and exchange all the annotation in Annotation Ontology (AO) RDF format  The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • 3. ANNOTATION ONTOLOGY  OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  • 4. DOMEO AND TEXT MINING SERVICES  Domeo allows to trigger text mining algorithms when they are available through web services  Software connectors have to be developed to translate the results in a suitable format  The results are displayed in the web documents  Users can record their feedback/judgment through customizable user interfaces
  • 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts  It is possible to preselect the ontologies of interests as one of the many parameters
  • 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • 8. NCBO ANNOTATOR RESULTS IN DOMEO List of recognized entities
  • 9. RESULTS CURATION Customizable
  • 10. CUMULATIVE RESULTS CURATION  One item only  All instances with the same text match  All instances independently from the text match
  • 12. SOFTWARE CONNECTORS At the current stage  For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix  And keep it up to date!
  • 13. UIMA, CLEREZZA AND AO OSS BASED INFRASTRUCTURE FOR TEXT MINING OVER ONTOLOGIES TommasoTeofili and Paolo Ciccarese tommaso@apache.org
  • 14. APACHE UIMA  Architecturalframework for UIM  OASIS standard  Build, deploy and run text mining pipelines  Scaling capabilities for large volumes of data  NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • 15. UIMA TYPES  Defining annotation domain in Typesystems  Types and features are just declared  Existing Typesystemscan be imported/exported/enhanced  Ease data exchange between AEs  Two “main” types  TOP  Annotation
  • 16. APACHE CLEREZZA  Service platform for linked data  OSGi-based  RDF API  RESTful Web Service Framework  TripleStore independent  Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • 17. UIMA/CLEREZZA CONVENTION  devs can create custom types / typesystems  need to manage URIs  integration of services vs ontology sharing  ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • 21. AFTER (URI FIELD INHERITED)
  • 22. CONVERSION STRATEGIES  UIMA annotations stored inside CAS  Services “talking” via webservices + RDF  CAS to RDF mapping via Clerezza  Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • 23. CONVERSION STRATEGIES Change mapping strategies via XML/Eclipse plugin Or in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • 25. LOOKING AHEAD DOMEO TOOLKIT V. 2 Paolo Ciccarese, PhD
  • 26. DOMEO ANNOTATION TOOLKIT V.2  DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012  It will consist in major refactoring to improve modularity and make plug-ins writing easier  It will include various new features and will be the first step towards a federated architecture  It will be open source!
  • 27. DOMEO FEDERATION  We currently have two instances of the Domeo Toolkit and the number of instances is going to increase  We need to define a clean architecture that supports communication between instances or nodes  Instances should be able to access each other annotations in multiple ways
  • 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQL Ex: DT3 retrieves annotation from DT1 through a web service and from DT2 through a SPARQL query against its triplestore
  • 29. SOFTWARE ANNOTATION ACCESS Nodes can access annotations of other nodes through  Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  …  SPARQL queries, when a SPARQL end-point is available
  • 30. USERS ANNOTATION ACCESS Users can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • 31. Request CURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 32. DOMEO NODE ARCHITECTURE > ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services Connector Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 33. DOMEO NODE ARCHITECTURE > ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector 2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • 35. DOMEO AND TEXT MINING IN SUMMARY  Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture.  Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • 36. W3C COMMUNITY GROUP OPEN ANNOTATION  Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging  Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • 37. THANK YOU! If you are interested in using - or contributing to - the Domeo Annotation Toolkit follow our website http://annotationframework.org or contact paolo.ciccarese -at- gmail.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n