SlideShare uma empresa Scribd logo
1 de 37
DOMEO ANNOTATION TOOLKIT
AND TEXT MINING


CREATING,   VISUALISING, CURATING AND SHARING
TEXT MINING RESULTS

Paolo Ciccarese, PhD
paolo.ciccarese@gmail.com


January 30th 2012, W3C Scientific Discourse Call
 Domeo Annotation Toolkit is a collection of software
  components that allow to create and share
  annotation of web documents and their fragments
 It can export and exchange all the annotation in
  Annotation Ontology (AO) RDF format
 The Domeo client is the user interface that can be
  used to produce manual and semi-automatic
  annotation of HTML documents directly in your
  browser


                              http://annotationframework.org/
ANNOTATION ONTOLOGY
   OWL vocabulary for representing and sharing
    annotation and semantic annotationof digital
    resources and their fragments:
       Is orthogonal to the domain(s) of interest




                                                     http://purl.org/ao/home
       Supports Stand-off annotation
       Offers tools for identifying fragments
       Designed with extension points
       Defines basic annotation containers
       Supports versioning
       Tracks provenance
DOMEO AND TEXT MINING SERVICES
 Domeo allows to trigger text mining algorithms
  when they are available through web services
 Software connectors have to be developed to
  translate the results in a suitable format
 The results are displayed in the web documents

 Users can record their feedback/judgment through
  customizable user interfaces
NCBO ANNOTATOR




                                                            http://www.bioontology.org/annotator-service
 Web service that annotates textual metadata (e.g.
  journal abstract) with relevant ontology concepts
 It is possible to preselect the ontologies of interests
  as one of the many parameters
DOMEO AND THE NCBO ANNOTATOR




                                                       http://www.bioontology.org/annotator-service
   Domeo allows automatic/manual annotation with
    terms coming from selected ontologies managed by
    the BioPortal
RUNNING NCBO ANNOTATOR




 Additional text mining services
 will be listed here
NCBO ANNOTATOR RESULTS IN DOMEO




List of recognized
entities
RESULTS CURATION

                   Customizable
CUMULATIVE RESULTS CURATION
 One item only
 All instances with the same text match

 All instances independently from the text match
SERIALIZATION IN AO/RDF
SOFTWARE CONNECTORS
At the current stage
 For each text mining service we have to write a
  specific connector that normally is translating offset
  and range into prefix and postfix
 And keep it up to date!
UIMA, CLEREZZA AND AO
OSS BASED    INFRASTRUCTURE FOR TEXT MINING OVER
ONTOLOGIES

TommasoTeofili and Paolo Ciccarese
tommaso@apache.org
APACHE UIMA
 Architecturalframework for UIM
 OASIS standard

 Build, deploy and run text mining pipelines

 Scaling capabilities for large volumes of data

 NLP/TM algorithms wrapped as Analysis Engines




                                   http://uima.apache.org/
UIMA TYPES
 Defining annotation domain in Typesystems
 Types and features are just declared

 Existing Typesystemscan be
  imported/exported/enhanced
 Ease data exchange between AEs

 Two “main” types
   TOP
   Annotation
APACHE CLEREZZA
 Service platform for linked data
 OSGi-based

 RDF API

 RESTful Web Service Framework

 TripleStore independent

 Integrated with Apache UIMA




                          http://incubator.apache.org/clerezza/
UIMA/CLEREZZA CONVENTION
 devs  can create custom types / typesystems
 need to manage URIs

 integration of services vs ontology sharing

 ClerezzaTypeSystem
     ClerezzaBaseAnnotation
         uri
     ClerezzaBaseEntity
       uri
       label (rdfs:label)

       references (annotations referring this entity)

     service specific annotations and entity types are defined
      subclassing the above
CLEREZZABASEANNOTATION DESCRIPTOR
CLEREZZABASEENTITYDESCRIPTOR
BEFORE
AFTER (URI FIELD INHERITED)
CONVERSION STRATEGIES
 UIMA  annotations stored inside CAS
 Services “talking” via webservices + RDF

 CAS to RDF mapping via Clerezza

 Pluggable mapping strategies
   Clerezza Default
   AnnotationOntology
   …
CONVERSION STRATEGIES
Change mapping strategies via XML/Eclipse plugin




Or in the descriptor directly
 <nameValuePair>
 <name>mappingStrategy</name>
 <value><string>ao</string></value>
 </nameValuePair>
CLEREZZA WEB SERVICES EXAMPLE
LOOKING AHEAD
DOMEO TOOLKIT V. 2

Paolo Ciccarese, PhD
DOMEO ANNOTATION TOOLKIT V.2
 DomeoAnnotation Toolkit v.2 is planned by the end
  of the first quarter of 2012
 It will consist in major refactoring to improve
  modularity and make plug-ins writing easier
 It will include various new features and will be the
  first step towards a federated architecture
 It will be open source!
DOMEO FEDERATION
 We currently have two instances of the Domeo
  Toolkit and the number of instances is going to
  increase
 We need to define a clean architecture that
  supports communication between instances or
  nodes
 Instances should be able to access each other
  annotations in multiple ways
Annotation Flow
                                                                         Web Service
  DOMEO FEDERATION                                                       Triplestore



      Domeo                                        Domeo    Web Client
               Web Client
      Node 1                                       Node 2




                                          SPARQL
                                      Web Client
                             Domeo                                         DomeoN
                             Node 3                                         ode 4
                    SPARQL




Ex: DT3 retrieves annotation from DT1 through a web service
and from DT2 through a SPARQL query against its triplestore
SOFTWARE ANNOTATION ACCESS
Nodes can access annotations of other nodes through
 Through Web Services
       Annotation by User
       Annotation by Group
       Annotation by Document
       Annotation by Corpora
       …
   SPARQL queries, when a SPARQL end-point is available
USERS ANNOTATION ACCESS
Users can export their own annotation in AO RDF
   Annotation by document
   Annotation by corpora
   All of the annotation
Request
CURRENT DOMEO ARCHITECTURE                              Annotation


                              Domeo
                              Web Client
                    AO-RDF




                Annotation
               Web Services



                               Domeo
                                                           User
                                           MySQL           Annotation
                                                           Export
 Text Mining                                       UI
 Connector




   NCBO
 Web Service

  NCBO
 Annotator
DOMEO NODE ARCHITECTURE
> ACCESSING EXTERNAL ANNOTATION
 Other          1                                         2
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store
               Web Services                Connector



Domeo v.2 Node
                                                                   User
                                           MySQL                   Annotation
                                                                   Export
 Text Mining                                                  UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
> ADDING A SPARQL ENDPOINT
 Other
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store    SPARQL
               Web Services                Connector

                                                          Triplestore
Domeo v.2 Node
                                                                        User
                                           MySQL                        Annotation
                                                                        Export
 Text Mining                                                      UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
    > TEXT MINING ALGORITHMS INTEGRATION
     Other                                                                     1
                                                                 External
     Domeo                            Domeo
                                                                Triplestore
      Node                            Web Client
                        AO-RDF
                                                                SPARQL

         AO-RDF                                                    AO-RDF


                    Annotation                                  Triple Store        SPARQL
                   Web Services                                 Connector

                                                                                   Triplestore
    Domeo v.2 Node
                              3                                 MySQL                            User
                                                                                                 Annotation
                                                                                                 Export
     Text Mining      Clerezza                Text Mining                                  UI
     Connector        Connector               Connector
2                                                           4


       NCBO            Clerezza               Text Mining
                                    Library




     Web Service      Web Service              Manager

       NCBO              UIMA                 Text Mining
      Annotator        Algorithm               Algorithm
DOMEO AND TEXT MINING
IN SUMMARY
   Run algorithms within Domeo
     Making available the algorithms through Web Services
     Integrating the algorithms - as libraries – within the
      Domeo architecture.
   Run algorithms separately and then
     Load the results into a Domeo node through web
      services
     Store the results directly in the (a) triplestore
     Store the results directly in the database
W3C COMMUNITY GROUP
OPEN ANNOTATION
 Annotation Ontology (AO) and Open Annotation
  Collaboration (OAC) are merging
 Unified model for representing and sharing
  annotation in RDF




                 http://www.w3.org/community/openannotation/
THANK YOU!
If you are interested in using - or contributing to -
the Domeo Annotation Toolkit follow our website
http://annotationframework.org or contact
paolo.ciccarese -at- gmail.com

Mais conteúdo relacionado

Destaque

BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.Starckn
 
Annotation Ontology (AO)
Annotation Ontology (AO)Annotation Ontology (AO)
Annotation Ontology (AO)Paolo Ciccarese
 
Benefits Of Collaborative E Learning
Benefits Of Collaborative E LearningBenefits Of Collaborative E Learning
Benefits Of Collaborative E LearningWilson Araromi
 
Economic presentation
Economic presentationEconomic presentation
Economic presentationErin McClarty
 
An Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAn Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAliaksandr Birukou
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristicsdijahapple
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristicsdijahapple
 
Career And Inventory Management
Career And Inventory ManagementCareer And Inventory Management
Career And Inventory ManagementSven Kruijs
 
Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)lacoplano
 
Chapter 3 1 take 2
Chapter 3 1 take 2Chapter 3 1 take 2
Chapter 3 1 take 2gmaidekamido
 
Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Anil Chopra
 
The Digestive System
The Digestive SystemThe Digestive System
The Digestive Systemjamesdeal1
 
Building Online Learning Environments
Building Online Learning EnvironmentsBuilding Online Learning Environments
Building Online Learning EnvironmentsTracy Shaw
 

Destaque (20)

Apache Marmotta - Introduction
Apache Marmotta - IntroductionApache Marmotta - Introduction
Apache Marmotta - Introduction
 
BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.
 
Annotation Ontology (AO)
Annotation Ontology (AO)Annotation Ontology (AO)
Annotation Ontology (AO)
 
Benefits Of Collaborative E Learning
Benefits Of Collaborative E LearningBenefits Of Collaborative E Learning
Benefits Of Collaborative E Learning
 
E Learning Benefits
E Learning BenefitsE Learning Benefits
E Learning Benefits
 
Russell Simmons Ppt
Russell Simmons PptRussell Simmons Ppt
Russell Simmons Ppt
 
Economic presentation
Economic presentationEconomic presentation
Economic presentation
 
An Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAn Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOA
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristics
 
Thesartor
ThesartorThesartor
Thesartor
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristics
 
Career And Inventory Management
Career And Inventory ManagementCareer And Inventory Management
Career And Inventory Management
 
Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)
 
Being a Club Webmaster
Being a Club WebmasterBeing a Club Webmaster
Being a Club Webmaster
 
Chapter 3 1 take 2
Chapter 3 1 take 2Chapter 3 1 take 2
Chapter 3 1 take 2
 
Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?
 
How to Be Your Club's VPPR
How to Be Your Club's VPPRHow to Be Your Club's VPPR
How to Be Your Club's VPPR
 
Chapter 2 5
Chapter 2 5Chapter 2 5
Chapter 2 5
 
The Digestive System
The Digestive SystemThe Digestive System
The Digestive System
 
Building Online Learning Environments
Building Online Learning EnvironmentsBuilding Online Learning Environments
Building Online Learning Environments
 

Semelhante a Domeo, Text Mining, UIMA and Clerezza

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)ukdpe
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 RevolutionAlex Ivy
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworksukdpe
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresSandro Pereira
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?Thomas Roessler
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnValtech
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Alexandre Morgaut
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readoutDebojyoti Dutta
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptmartinlippert
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5David Nuescheler
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...James Broberg
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business ValueESUG
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle listBharath Marrivada
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)Saltlux zinyus
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)zinyus
 

Semelhante a Domeo, Text Mining, UIMA and Clerezza (20)

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 Revolution
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management En
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29
 
Corba
CorbaCorba
Corba
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readout
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScript
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business Value
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle list
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
 

Mais de Paolo Ciccarese

Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Paolo Ciccarese
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)Paolo Ciccarese
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation ServerPaolo Ciccarese
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo Ciccarese
 
Open Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialOpen Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialPaolo Ciccarese
 
2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting2012 CNI Fall Membership Meeting
2012 CNI Fall Membership MeetingPaolo Ciccarese
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...Paolo Ciccarese
 
AO and Annotation Tool for AOC
AO and Annotation Tool for AOCAO and Annotation Tool for AOC
AO and Annotation Tool for AOCPaolo Ciccarese
 
SWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsSWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsPaolo Ciccarese
 
Swan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningSwan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningPaolo Ciccarese
 
AO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webAO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webPaolo Ciccarese
 
Semantics is not a luxury
Semantics is not a luxurySemantics is not a luxury
Semantics is not a luxuryPaolo Ciccarese
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPaolo Ciccarese
 

Mais de Paolo Ciccarese (14)

Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation Server
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynote
 
Open Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialOpen Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorial
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
 
2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
 
AO and Annotation Tool for AOC
AO and Annotation Tool for AOCAO and Annotation Tool for AOC
AO and Annotation Tool for AOC
 
SWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsSWAN, HyQue and Nanopublications
SWAN, HyQue and Nanopublications
 
Swan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningSwan Annotation Tool - Text Mining
Swan Annotation Tool - Text Mining
 
AO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webAO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the web
 
Semantics is not a luxury
Semantics is not a luxurySemantics is not a luxury
Semantics is not a luxury
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific Communities
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Domeo, Text Mining, UIMA and Clerezza

  • 1. DOMEO ANNOTATION TOOLKIT AND TEXT MINING CREATING, VISUALISING, CURATING AND SHARING TEXT MINING RESULTS Paolo Ciccarese, PhD paolo.ciccarese@gmail.com January 30th 2012, W3C Scientific Discourse Call
  • 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments  It can export and exchange all the annotation in Annotation Ontology (AO) RDF format  The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • 3. ANNOTATION ONTOLOGY  OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  • 4. DOMEO AND TEXT MINING SERVICES  Domeo allows to trigger text mining algorithms when they are available through web services  Software connectors have to be developed to translate the results in a suitable format  The results are displayed in the web documents  Users can record their feedback/judgment through customizable user interfaces
  • 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts  It is possible to preselect the ontologies of interests as one of the many parameters
  • 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • 8. NCBO ANNOTATOR RESULTS IN DOMEO List of recognized entities
  • 9. RESULTS CURATION Customizable
  • 10. CUMULATIVE RESULTS CURATION  One item only  All instances with the same text match  All instances independently from the text match
  • 12. SOFTWARE CONNECTORS At the current stage  For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix  And keep it up to date!
  • 13. UIMA, CLEREZZA AND AO OSS BASED INFRASTRUCTURE FOR TEXT MINING OVER ONTOLOGIES TommasoTeofili and Paolo Ciccarese tommaso@apache.org
  • 14. APACHE UIMA  Architecturalframework for UIM  OASIS standard  Build, deploy and run text mining pipelines  Scaling capabilities for large volumes of data  NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • 15. UIMA TYPES  Defining annotation domain in Typesystems  Types and features are just declared  Existing Typesystemscan be imported/exported/enhanced  Ease data exchange between AEs  Two “main” types  TOP  Annotation
  • 16. APACHE CLEREZZA  Service platform for linked data  OSGi-based  RDF API  RESTful Web Service Framework  TripleStore independent  Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • 17. UIMA/CLEREZZA CONVENTION  devs can create custom types / typesystems  need to manage URIs  integration of services vs ontology sharing  ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • 21. AFTER (URI FIELD INHERITED)
  • 22. CONVERSION STRATEGIES  UIMA annotations stored inside CAS  Services “talking” via webservices + RDF  CAS to RDF mapping via Clerezza  Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • 23. CONVERSION STRATEGIES Change mapping strategies via XML/Eclipse plugin Or in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • 25. LOOKING AHEAD DOMEO TOOLKIT V. 2 Paolo Ciccarese, PhD
  • 26. DOMEO ANNOTATION TOOLKIT V.2  DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012  It will consist in major refactoring to improve modularity and make plug-ins writing easier  It will include various new features and will be the first step towards a federated architecture  It will be open source!
  • 27. DOMEO FEDERATION  We currently have two instances of the Domeo Toolkit and the number of instances is going to increase  We need to define a clean architecture that supports communication between instances or nodes  Instances should be able to access each other annotations in multiple ways
  • 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQL Ex: DT3 retrieves annotation from DT1 through a web service and from DT2 through a SPARQL query against its triplestore
  • 29. SOFTWARE ANNOTATION ACCESS Nodes can access annotations of other nodes through  Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  …  SPARQL queries, when a SPARQL end-point is available
  • 30. USERS ANNOTATION ACCESS Users can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • 31. Request CURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 32. DOMEO NODE ARCHITECTURE > ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services Connector Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 33. DOMEO NODE ARCHITECTURE > ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector 2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • 35. DOMEO AND TEXT MINING IN SUMMARY  Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture.  Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • 36. W3C COMMUNITY GROUP OPEN ANNOTATION  Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging  Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • 37. THANK YOU! If you are interested in using - or contributing to - the Domeo Annotation Toolkit follow our website http://annotationframework.org or contact paolo.ciccarese -at- gmail.com