SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Argo: a platform for interoperable
and customisable text mining
Sophia Ananiadou
National Centre for Text Mining
School of Computer Science
The University of Manchester
Overview
• Sharing tools, resources and text mining workflows
• Challenges
• Interoperable infrastructure for processing and
annotation
2Open AIRE-COAR ConferenceAnaniadou
NaCTeM
• 1st publicly funded national
text mining centre
• Location: Manchester Institute
of Biotechnology
• Phase I - Biology (2004-2008)
• Phase II - Biology, Medicine,
Social Sciences (2008-2011)
• Phase III – Biology, Medicine,
Humanities, Social Sciences;
Fully sustainable centre (2011-
)
www.nactem.ac.uk
Challenges
Language Technology
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
….
Chinese
Hindu
Urdu
Japanese
Korean….Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
….
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities….
Text Types
Newswire
Scientific Literature
Full papers/abstracts
Twitter
Patents
Clinical records, EMR
Textbooks, monographs
Online forums….
Technology
Sentence Splitter
Paragraph Splitter
NP Chunkers
C-parser
D-parser
Semantic parser
NE recognizers
Relation recognizers
…….
Diversity of Languages
Diversity of Contexts
Diversity of Applications
TM Workflows
TM Modules
Shared!
4Open AIRE-COAR ConferenceAnaniadou
Metadata
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
….
Chinese
Hindu
Urdu
Japanese
Korean…Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
….
Language Technology
Linguistic Resources
Knowledge Resources
Resource-Rich
Big DataBig Text
Cloud Computing Crowd Sourcing
Big Ontology
Text Types
Newswire
Scientific Literature
Full papers/abstracts
Twitter
Patents
Clinical records, EMR
Textbooks, monographs
Online forums….
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities….
5Open AIRE-COAR ConferenceAnaniadou
OPEN SCIENCE
Requirements from TM infrastructure
• Modularity of TM modules
• Interoperability among TM modules and resources
• Generic across different languages, domains, and text
types
– Adaptability
6Open AIRE-COAR ConferenceAnaniadou
Module
Interoperability and Adaptability
ModuleModule
Resources
Dictionaries
Ontologies
Adaptation
Rule Writing
(Annotated)
Text
Interoperability and Adaptability
in Resource-rich TM
INFRASTRUCTURES!
Dependency Parser
English French German JapaneseGreek
POS Tagger
Named Entity Languages
Text Types
Domains
7Open AIRE-COAR ConferenceAnaniadou
Example: extracting proteins, annotations
8
GENIA
PennBioIE
AIMed
GENETAG
Incompatibility
Type definitions
Texts
Problem: Inconsistency
Open AIRE-COAR ConferenceAnaniadou
The problem with incompatibility
• Difficult to evaluate NERs
9
Corpus C Corpus D
NER A
Which NER is
best for my
task?
NER B
A: 93% B: 36%
A is better than B.
A: 63% B: 90%
B is better than A.
Why so different among
different corpora and
NERs ?
Open AIRE-COAR ConferenceAnaniadou
Text mining workflows
• A pipeline that executes particular tools and resources in
order
• Example: semantic search
• Various versions (language- or domain-specific) of basic
components needed for different applications and tasks
• Different workflows can be created, compared and evaluated
by the ability to seamlessly “mix and match” various versions
of components
PoS
Tagger
Dictionary
Lookup
NE
Extraction
Chunking Parsing
Semantic
Query
10Open AIRE-COAR ConferenceAnaniadou
Text mining workflows
Interoperability
Common Data Representation and Types
IBM Journal of Research and
Development (2011)
U-Compare: a modular NLP workflow
construction and evaluation system.
Kano, Y., Miwa, M., Cohen, K. B., Hunter,
L., Ananiadou, S. and Tsujii, J.
11Open AIRE-COAR ConferenceAnaniadou
Common Type System
• A common type system is required for the complete
interoperability
• Solution: Maintain local type systems and bridge them
via a sharable type system
12
A single common type is almost impossible to impose
for all developers.
U-Compare
Sharable Type System
Local Type System A Local Type System B
bridging bridging
12Open AIRE-COAR ConferenceAnaniadou
U-Compare Type System
Syntactic Level
Document Level
Semantic Level
13Open AIRE-COAR ConferenceAnaniadou
POS tagger
B
Sentence
Splitter B
library
POS tagger
A
Sentence
Splitter A
NER
Sentence
Splitter A
Sentence
Splitter A
Sentence
Splitter A
Sentence
Splitter B
Sentence
Splitter B
Sentence
Splitter B
POS tagger
A
POS tagger
A
POS tagger
A
POS tagger
B
POS tagger
B
POS tagger
B
NERNERNER
Workflow A Workflow B Workflow C
 F-Score A F-Score B F-Score C
U-Compare: Evaluate and Compare TM
Worklfows
UIMA SD
OpenNLP SD
GENIA SD
UIMA Tokenizer
OpenNLP Tokenizer
GENIA Tagger as
Tokenizer
GENIA Tagger
Stepp Tagger
OpenNLP
Tagger
ABNER
MedT-NER
GENIA Tagger
as NER
• Web-based application
• Interactive creation of
workflows
• Cloud and high-
performance computing
• Integrated TM/NLP processing system
• GUI for workflow creation
• Library of ready-to-use processing components
• Statistics, visualizations, developer APIs
• Supports UIMA
• http://argo.nactem.ac.uk
15
Database: The Journal of Biological Databases
and Curation (2012)
Argo: an integrative, interactive, text mining-
based workbench supporting curation.
Rak, R., Rowley, A., Black, W.J. and Ananiadou, S
Structured
Data
Remote
Processing
Workflow
Diagramming
Workflow Designer
Manual
Editing
Annotator/Curator
Processing
Components
Developers
UIMA
Compliance
16Ananiadou
Processing Components
• Approaching 100 components (U-Compare)
– Additional 50 will be added soon
• META-NET
• Developed or co-developed by NaCTeM
– Planned: Make the library open to others to contribute
• Generic Listener component
– Developers can plug in their own locally run UIMA
component to a workflow in Argo
17Open AIRE-COAR ConferenceAnaniadou
Remote Processing
• Single machine execution
– In-house high-performance machines
• Distributed processing
– HTCondor
– VMware vCloud (EBI) EUPMC
– Planned: EC2, Azure, …
18Open AIRE-COAR ConferenceAnaniadou
Workflows
• Users create workflows as block diagrams
• Workflows can be shared among users
– Read only
– Planned: Read & write
– Planned: downloadable workflows
• Workflows can be deployed as web services
– Plain text (input only), XMI, RDF, BioC
19Open AIRE-COAR ConferenceAnaniadou
Workflows view
20Open AIRE-COAR ConferenceAnaniadou
Workflow Editor
21Open AIRE-COAR Conference
Sample Use Cases
1 Recognition of chemical entities (chemical NER)
2 Semi-automatic curation of metabolic pathways
3 Evaluation of inter-annotator agreement
4 Information extraction as a Web service
Ananiadou Open AIRE-COAR Conference 22
Use Case 1: Chemical NER
Supplies gold
standard corpus
Removes golden annotations
so that they can be created
automatically
Combinations of syntactic and
semantic components create
annotations
Compares and reports precision, recall
and F1 of the different branches
against the gold standard corpus
Chemical Entity Recogniser
• Chemical model evaluated at BioCreative IV
CHEMDNER challenge
• The challenge
– Data: 10,000 manually annotated PubMed abstracts
– Automatically recognises names of chemical entities in text
24Open AIRE-COAR ConferenceAnaniadou
Chemical Entity Recogniser
• Our solution
– Ranked unique mentions: ranked 1st out of 18 groups
– All mentions: ranked 3rd out of 19 groups
Subtask Precision % Recall % F-score %
Ranked unique mentions 91 85 88
All mentions 93 81 87
25Open AIRE-COAR ConferenceAnaniadou
Use Case 2: Semi-automatic Curation –
Metabolic Pathways
Search for
relevant
documents
Manual correction of
automatic annotations
NER for chemicals,
genes, process
indicators
Linking to
ontologies: CTD,
ChEBI, UniProt
26Open AIRE-COAR ConferenceAnaniadou
Save results in
various formats,
e.g., RDF for
querying and
incorporation into
databases
Manual Annotation Editor
Create new
annotations by
selecting text
Create, modify or
delete annotations
Edit details of
annotations
Open a graphical
interface to link
annotations to
ontologies
27Open AIRE-COAR ConferenceAnaniadou
Filtering and converting
annotations
28Open AIRE-COAR ConferenceAnaniadou
Manual Annotation Editor: linking to
ontologiesAutomatic pre-
selection can be
modified by the user
Details show
ontology entry
webpage
29Open AIRE-COAR ConferenceAnaniadou
Use Case 3: Information extraction
as a Web service
Web service-
enabled
reader
Web service-
enabled
writer
34Open AIRE-COAR ConferenceAnaniadou
Language Universal
• Reusable modules
• Generic TM modules: Competence
• Annotated Text, corpora: Performance
• Standards of Data Representation and Types for
Resources: Competence
• Dictionaries, Thesauri, Ontologies: Performance
36Open AIRE-COAR ConferenceAnaniadou

Mais conteúdo relacionado

Destaque

AIM GLOBAL SLIDE PRESENTATION
AIM GLOBAL SLIDE PRESENTATIONAIM GLOBAL SLIDE PRESENTATION
AIM GLOBAL SLIDE PRESENTATIONthirdtarrosa
 
Top 100 general knowledge question answers 2
Top 100 general knowledge question answers 2Top 100 general knowledge question answers 2
Top 100 general knowledge question answers 2letsguru guru
 
Chp. 2 simulation examples
Chp. 2 simulation examplesChp. 2 simulation examples
Chp. 2 simulation examplesPravesh Negi
 
Transcription and Translation PowerPoint
Transcription and Translation PowerPointTranscription and Translation PowerPoint
Transcription and Translation PowerPointBiologyIB
 
RNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsRNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsNamrata Chhabra
 
Relations and Functions (Algebra 2)
Relations and Functions (Algebra 2)Relations and Functions (Algebra 2)
Relations and Functions (Algebra 2)rfant
 
HR / Talent Analytics
HR / Talent AnalyticsHR / Talent Analytics
HR / Talent AnalyticsAkshay Raje
 
14 Principles of HENRI FAYOL project on KFC Class-XII
14 Principles of HENRI FAYOL  project on KFC Class-XII14 Principles of HENRI FAYOL  project on KFC Class-XII
14 Principles of HENRI FAYOL project on KFC Class-XIIAtif Khan
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and ValidationYevgeniy Brikman
 
The 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media HeadlinesThe 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media HeadlinesBuffer
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureReferralCandy
 

Destaque (15)

TRANSLATION
TRANSLATIONTRANSLATION
TRANSLATION
 
Amino acids
Amino acidsAmino acids
Amino acids
 
AIM GLOBAL SLIDE PRESENTATION
AIM GLOBAL SLIDE PRESENTATIONAIM GLOBAL SLIDE PRESENTATION
AIM GLOBAL SLIDE PRESENTATION
 
Top 100 general knowledge question answers 2
Top 100 general knowledge question answers 2Top 100 general knowledge question answers 2
Top 100 general knowledge question answers 2
 
Chp. 2 simulation examples
Chp. 2 simulation examplesChp. 2 simulation examples
Chp. 2 simulation examples
 
Transcription and Translation PowerPoint
Transcription and Translation PowerPointTranscription and Translation PowerPoint
Transcription and Translation PowerPoint
 
Integers
IntegersIntegers
Integers
 
RNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsRNA- Structure, Types and Functions
RNA- Structure, Types and Functions
 
Relations and Functions (Algebra 2)
Relations and Functions (Algebra 2)Relations and Functions (Algebra 2)
Relations and Functions (Algebra 2)
 
HR / Talent Analytics
HR / Talent AnalyticsHR / Talent Analytics
HR / Talent Analytics
 
14 Principles of HENRI FAYOL project on KFC Class-XII
14 Principles of HENRI FAYOL  project on KFC Class-XII14 Principles of HENRI FAYOL  project on KFC Class-XII
14 Principles of HENRI FAYOL project on KFC Class-XII
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and Validation
 
Slides That Rock
Slides That RockSlides That Rock
Slides That Rock
 
The 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media HeadlinesThe 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media Headlines
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From Failure
 

Semelhante a OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester

SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...Margaret-Anne Storey
 
ETNA – European Thematic Network on Assistive Information and Communication T...
ETNA – European Thematic Network on Assistive Information and Communication T...ETNA – European Thematic Network on Assistive Information and Communication T...
ETNA – European Thematic Network on Assistive Information and Communication T...AEGIS-ACCESSIBLE Projects
 
Annotation seminar
Annotation seminarAnnotation seminar
Annotation seminarhozifa1010
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Itinera Nova
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...Jose Manuel Gómez-Pérez
 
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Centre of Competence
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...Pedro Príncipe
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringPer Runeson
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...Stéphane Ducasse
 

Semelhante a OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester (20)

SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
 
Aegis ETNA NTU
Aegis ETNA NTUAegis ETNA NTU
Aegis ETNA NTU
 
ETNA – European Thematic Network on Assistive Information and Communication T...
ETNA – European Thematic Network on Assistive Information and Communication T...ETNA – European Thematic Network on Assistive Information and Communication T...
ETNA – European Thematic Network on Assistive Information and Communication T...
 
Annotation seminar
Annotation seminarAnnotation seminar
Annotation seminar
 
51 etna
51 etna51 etna
51 etna
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...
 
Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
 
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
 

Mais de OpenAIRE

10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community CallOpenAIRE
 
9th Content Providers Community Call\
9th Content Providers Community Call\9th Content Providers Community Call\
9th Content Providers Community Call\OpenAIRE
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE
 
8th Content Providers Community Call
8th Content Providers Community Call8th Content Providers Community Call
8th Content Providers Community CallOpenAIRE
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
OpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE
 
What will it cost to manage and share my data?
What will it cost to manage and share my data?What will it cost to manage and share my data?
What will it cost to manage and share my data?OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community CallOpenAIRE
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?OpenAIRE
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)OpenAIRE
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in GreeceOpenAIRE
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community CallOpenAIRE
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community CallOpenAIRE
 

Mais de OpenAIRE (20)

10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call
 
9th Content Providers Community Call\
9th Content Providers Community Call\9th Content Providers Community Call\
9th Content Providers Community Call\
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)
 
8th Content Providers Community Call
8th Content Providers Community Call8th Content Providers Community Call
8th Content Providers Community Call
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
OpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managers
 
What will it cost to manage and share my data?
What will it cost to manage and share my data?What will it cost to manage and share my data?
What will it cost to manage and share my data?
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community Call
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in Greece
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community Call
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community Call
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester

  • 1. Argo: a platform for interoperable and customisable text mining Sophia Ananiadou National Centre for Text Mining School of Computer Science The University of Manchester
  • 2. Overview • Sharing tools, resources and text mining workflows • Challenges • Interoperable infrastructure for processing and annotation 2Open AIRE-COAR ConferenceAnaniadou
  • 3. NaCTeM • 1st publicly funded national text mining centre • Location: Manchester Institute of Biotechnology • Phase I - Biology (2004-2008) • Phase II - Biology, Medicine, Social Sciences (2008-2011) • Phase III – Biology, Medicine, Humanities, Social Sciences; Fully sustainable centre (2011- ) www.nactem.ac.uk
  • 4. Challenges Language Technology Languages English French German Spanish Portuguese Italian Polish …. Chinese Hindu Urdu Japanese Korean….Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Domains Finance/Business Health Biology Social Sciences Humanities…. Text Types Newswire Scientific Literature Full papers/abstracts Twitter Patents Clinical records, EMR Textbooks, monographs Online forums…. Technology Sentence Splitter Paragraph Splitter NP Chunkers C-parser D-parser Semantic parser NE recognizers Relation recognizers ……. Diversity of Languages Diversity of Contexts Diversity of Applications TM Workflows TM Modules Shared! 4Open AIRE-COAR ConferenceAnaniadou
  • 5. Metadata Languages English French German Spanish Portuguese Italian Polish …. Chinese Hindu Urdu Japanese Korean…Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Language Technology Linguistic Resources Knowledge Resources Resource-Rich Big DataBig Text Cloud Computing Crowd Sourcing Big Ontology Text Types Newswire Scientific Literature Full papers/abstracts Twitter Patents Clinical records, EMR Textbooks, monographs Online forums…. Domains Finance/Business Health Biology Social Sciences Humanities…. 5Open AIRE-COAR ConferenceAnaniadou OPEN SCIENCE
  • 6. Requirements from TM infrastructure • Modularity of TM modules • Interoperability among TM modules and resources • Generic across different languages, domains, and text types – Adaptability 6Open AIRE-COAR ConferenceAnaniadou
  • 7. Module Interoperability and Adaptability ModuleModule Resources Dictionaries Ontologies Adaptation Rule Writing (Annotated) Text Interoperability and Adaptability in Resource-rich TM INFRASTRUCTURES! Dependency Parser English French German JapaneseGreek POS Tagger Named Entity Languages Text Types Domains 7Open AIRE-COAR ConferenceAnaniadou
  • 8. Example: extracting proteins, annotations 8 GENIA PennBioIE AIMed GENETAG Incompatibility Type definitions Texts Problem: Inconsistency Open AIRE-COAR ConferenceAnaniadou
  • 9. The problem with incompatibility • Difficult to evaluate NERs 9 Corpus C Corpus D NER A Which NER is best for my task? NER B A: 93% B: 36% A is better than B. A: 63% B: 90% B is better than A. Why so different among different corpora and NERs ? Open AIRE-COAR ConferenceAnaniadou
  • 10. Text mining workflows • A pipeline that executes particular tools and resources in order • Example: semantic search • Various versions (language- or domain-specific) of basic components needed for different applications and tasks • Different workflows can be created, compared and evaluated by the ability to seamlessly “mix and match” various versions of components PoS Tagger Dictionary Lookup NE Extraction Chunking Parsing Semantic Query 10Open AIRE-COAR ConferenceAnaniadou
  • 11. Text mining workflows Interoperability Common Data Representation and Types IBM Journal of Research and Development (2011) U-Compare: a modular NLP workflow construction and evaluation system. Kano, Y., Miwa, M., Cohen, K. B., Hunter, L., Ananiadou, S. and Tsujii, J. 11Open AIRE-COAR ConferenceAnaniadou
  • 12. Common Type System • A common type system is required for the complete interoperability • Solution: Maintain local type systems and bridge them via a sharable type system 12 A single common type is almost impossible to impose for all developers. U-Compare Sharable Type System Local Type System A Local Type System B bridging bridging 12Open AIRE-COAR ConferenceAnaniadou
  • 13. U-Compare Type System Syntactic Level Document Level Semantic Level 13Open AIRE-COAR ConferenceAnaniadou
  • 14. POS tagger B Sentence Splitter B library POS tagger A Sentence Splitter A NER Sentence Splitter A Sentence Splitter A Sentence Splitter A Sentence Splitter B Sentence Splitter B Sentence Splitter B POS tagger A POS tagger A POS tagger A POS tagger B POS tagger B POS tagger B NERNERNER Workflow A Workflow B Workflow C  F-Score A F-Score B F-Score C U-Compare: Evaluate and Compare TM Worklfows UIMA SD OpenNLP SD GENIA SD UIMA Tokenizer OpenNLP Tokenizer GENIA Tagger as Tokenizer GENIA Tagger Stepp Tagger OpenNLP Tagger ABNER MedT-NER GENIA Tagger as NER
  • 15. • Web-based application • Interactive creation of workflows • Cloud and high- performance computing • Integrated TM/NLP processing system • GUI for workflow creation • Library of ready-to-use processing components • Statistics, visualizations, developer APIs • Supports UIMA • http://argo.nactem.ac.uk 15 Database: The Journal of Biological Databases and Curation (2012) Argo: an integrative, interactive, text mining- based workbench supporting curation. Rak, R., Rowley, A., Black, W.J. and Ananiadou, S
  • 17. Processing Components • Approaching 100 components (U-Compare) – Additional 50 will be added soon • META-NET • Developed or co-developed by NaCTeM – Planned: Make the library open to others to contribute • Generic Listener component – Developers can plug in their own locally run UIMA component to a workflow in Argo 17Open AIRE-COAR ConferenceAnaniadou
  • 18. Remote Processing • Single machine execution – In-house high-performance machines • Distributed processing – HTCondor – VMware vCloud (EBI) EUPMC – Planned: EC2, Azure, … 18Open AIRE-COAR ConferenceAnaniadou
  • 19. Workflows • Users create workflows as block diagrams • Workflows can be shared among users – Read only – Planned: Read & write – Planned: downloadable workflows • Workflows can be deployed as web services – Plain text (input only), XMI, RDF, BioC 19Open AIRE-COAR ConferenceAnaniadou
  • 20. Workflows view 20Open AIRE-COAR ConferenceAnaniadou
  • 22. Sample Use Cases 1 Recognition of chemical entities (chemical NER) 2 Semi-automatic curation of metabolic pathways 3 Evaluation of inter-annotator agreement 4 Information extraction as a Web service Ananiadou Open AIRE-COAR Conference 22
  • 23. Use Case 1: Chemical NER Supplies gold standard corpus Removes golden annotations so that they can be created automatically Combinations of syntactic and semantic components create annotations Compares and reports precision, recall and F1 of the different branches against the gold standard corpus
  • 24. Chemical Entity Recogniser • Chemical model evaluated at BioCreative IV CHEMDNER challenge • The challenge – Data: 10,000 manually annotated PubMed abstracts – Automatically recognises names of chemical entities in text 24Open AIRE-COAR ConferenceAnaniadou
  • 25. Chemical Entity Recogniser • Our solution – Ranked unique mentions: ranked 1st out of 18 groups – All mentions: ranked 3rd out of 19 groups Subtask Precision % Recall % F-score % Ranked unique mentions 91 85 88 All mentions 93 81 87 25Open AIRE-COAR ConferenceAnaniadou
  • 26. Use Case 2: Semi-automatic Curation – Metabolic Pathways Search for relevant documents Manual correction of automatic annotations NER for chemicals, genes, process indicators Linking to ontologies: CTD, ChEBI, UniProt 26Open AIRE-COAR ConferenceAnaniadou Save results in various formats, e.g., RDF for querying and incorporation into databases
  • 27. Manual Annotation Editor Create new annotations by selecting text Create, modify or delete annotations Edit details of annotations Open a graphical interface to link annotations to ontologies 27Open AIRE-COAR ConferenceAnaniadou
  • 28. Filtering and converting annotations 28Open AIRE-COAR ConferenceAnaniadou
  • 29. Manual Annotation Editor: linking to ontologiesAutomatic pre- selection can be modified by the user Details show ontology entry webpage 29Open AIRE-COAR ConferenceAnaniadou
  • 30. Use Case 3: Information extraction as a Web service Web service- enabled reader Web service- enabled writer 34Open AIRE-COAR ConferenceAnaniadou
  • 31. Language Universal • Reusable modules • Generic TM modules: Competence • Annotated Text, corpora: Performance • Standards of Data Representation and Types for Resources: Competence • Dictionaries, Thesauri, Ontologies: Performance 36Open AIRE-COAR ConferenceAnaniadou