SlideShare a Scribd company logo
1 of 19
Entity Enrichment and Consolidation
in ARCOMEM
Elena Demidova1,
including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim
Peters2, Katerina Doka3, Yannis Stavrakas3
1 L3S Research Center, Hannover, Germany
2 University Sheffield, UK
3 IMIS, RC ATHENA, Athens, Greece
The ARCOMEM approach
• Make use of the Social Web
– Huge source of user generated content
– Wide range of articulation methods
From simple „I like it“-Buttons to complete articles
– Represents the diversity of opinions of the public
• User activities often triggered by
– Events and related entities
(e.g. Sport Events, Celebrations,
Crises, News Articles, Persons,
Locations)
– Topics (e.g. Global Warming,
Financial Crisis, Swine Flu)
 A semantic-aware and socially-driven
preservation model is a natural way to go
Slide 2
ARCOMEM architecture
Slide 3
Crawler
Cross Crawl Analysis
Online
Processing
Offline
Processing
Queue
Management
Application-Aware
Helper
Resource Selection
& Prioritization
Resource
Fetching
Intelligent
Crawl
Definition
Consolidation
Enrichment
GATE Offline Analysis
Social Web Analysis
GATE Online Analysis Social Web Analysis
Named Entity
Evol. Recog.
Extracted
SocialWeb
Information
Crawler
Cockpit
ARCOMEM
Storage
URLs
Relevance Analysis
&
Priorization
Image/Video Analysis
Twitter
Dynamics
WARC Export
WARC
Files
Applications
Broadcaster
Application
Parliament
Application
ARCOMEM system architecture foresees four processing
levels: crawler level, online processing level, offline
processing level and cross crawl analysis
4
ETOE offline processing chain
The processing chain depicted here describes all components involved in
the offline processing of Web objects.
The extraction components for text
Aim
 Extraction of Entities, Topics, Events and Opinions (ETOEs) from
 Web Pages
 Social Web (Twitter, YouTube, Facebook, …)
Challenges
 Entity recognition from degraded input sources (tweets etc)
 Advancing state of the art NLP and text mining
 Dynamics detection: evolution of terms/entities
 Semantic representation of Web objects and entities
 Appropriate RDF schemas for ETOE and Web objects
 Exploiting (Linked Open) Web data to enrich extracted ETOE
 Entity classification (into events, locations, topics etc) & consolidation
Slide 5
ETOE extraction with GATE: an example
Slide 6
candidate multi-word term
Data consolidation & integration problem
Data extracted from different components or during
different processing cycles not aligned
=> consolidation, disambiguation & correlation required.
Slide 7
<Location>Greece</Location>
<Person>Venizelos</Person>
<Location>Griechenland</Location>
<Organisation>Greek Parliament</Organisation>
?
Data clustering & enrichment
Enrichment of entities with related references to Linked
Data, particularly reference datasets (DBpedia, Freebase, …)
=> use enrichments for correlation/clustering/consolidation
Slide 8
Enrichment with DBpedia & Freebase
• DBpedia and Freebase are particularly well-suited due to
their vast size, the availability of disambiguation techniques
which can utilise the variety of multilingual labels available
in both datasets for individual data items and the level of
inter-connectedness of both datasets, allowing the retrieval
of a wealth of related information for particular items.
• In the case of DBpedia, we make use of the DBpedia
Spotlight service which enables an approximate string
matching with adjustable confidence level in the interval
[0,1]. Experimentally, we set confidence to 0.6.
• For Freebase, we use structured queries, taking into
account entity types extracted by GATE.
9
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 10
<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>
<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 11
=> dbpprop:office dbpedia:President_of_the_European_Central_Bank
dbpedia:Governor_of_the_Banque_de_France
=> dcterms:subject category:Living_people
category:Karlspreis_recipients
category:Alumni_of_the_École_Nationale_d'Administration
category:People_from_Lyon…
<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>
<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 12
ARCOMEM entities and enrichments - graph
Slide 13
 Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
 1013 clusters of correlated entities/events
 Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
 1013 clusters of correlated entities/events => cluster expansion by considering related enrichments
ARCOMEM entities and enrichments - graph
Slide 14
Clustering of entities via enrichment relatedness
Discovery of “related” entities by discovering related enrichments
(a) Retrieving possible paths between 2 enrichments (eg via RelFinder
http://www.visualdataweb.org/relfinder.php)
(b) Computation of relatedness measure (considering variables such as shortest
path, number of paths, relationship types, number of directly connected edges of
both enrichments…)
(c) Clustering enrichments (entities) which are above certain threshold
Slide 15
RDF schema for the Knowledge Base
16
 Relationships between ARCOMEM entities (ETOE etc) and enrichments
 RDF schema: http://www.gate.ac.uk/ns/ontologies/arcomem-data-
model.rdf
Enrichment evaluation results
 Manual evaluation of 240 enrichment-entity pairs
 Available scores: 1 (correct), 0 (incorrect), 0.5 (vague or
ambiguous relationship)
Slide 17
Entity Type Average score
DBpedia
Average score
Freebase
Average Score
Total
arco:Event 0.71 0.71
arco:Location 0.81 0.94 0.88
arco:Money 0.67 0.67
arco:Organization 0.93 1 0.97
arco:Person 0.9 0.89 0.89
arco:Time 0.74 0.74
Total 0.79 0.94 0.87
Further reading
• Entity Extraction and Consolidation for Social Web Content Preservation. S.
Dietze, D. Maynard, E. Demidova, T. Risse, W. Peters, K. Doka und Y.
Stavrakas, SDA, volume 912 of CEUR Workshop Proceedings, page 18-29.
CEUR-WS.org, (2012)
• Can entities be friends? B. P. Nunes , R. Kawase, S. Dietze, D. Taibi, M. A.
Casanova, W. Nejdl Boston, US, 2012. Web of Linked Entities
(WOLE2012), Workshop at The 11th International Semantic Web Conference
(ISWC2012).
• Combining a co-occurrence-based and a semantic measure for entity linking. B.
P. Nunes, S. Dietze, M. A. Casanova, R. Kawase, B. Fetahu, W. Nejdl. 2013.
ESWC 2013 - 10th Extended Semantic Web Conference.
• Linked data - The Story So Far. Biser, C., Heath, T. and Berners-Lee, T.
2009, Special Issue on Linked data, International Journal on Semantic Web and
Information Systems (IJSWIS).
Slide 18
THANK YOU
CONTACT DETAILS
Dr. Elena Demidova
L3S Research Center
+49 511 762 17732
demidova@L3S.de
www.arcomem.eu

More Related Content

What's hot

Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
Talat Fakhri
 

What's hot (20)

Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked Data
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning Environments
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
How to create Database in Moodle
How to create  Database in MoodleHow to create  Database in Moodle
How to create Database in Moodle
 
Knowledge organization
Knowledge organizationKnowledge organization
Knowledge organization
 
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
 
Ben Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes ProjectBen Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes Project
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial pen
 
Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkage
 
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 

Viewers also liked (6)

Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 

Similar to Arcomem training enrichment_advanced

Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
ijcsit
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Amit Sheth
 

Similar to Arcomem training enrichment_advanced (20)

Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic Web
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginner
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 

More from arcomem

Diata12 ARCOMEM
Diata12 ARCOMEMDiata12 ARCOMEM
Diata12 ARCOMEM
arcomem
 

More from arcomem (20)

Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advanced
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Arcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedArcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advanced
 
Arcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersArcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginners
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advanced
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advanced
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginner
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advanced
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advanced
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawls
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginner
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginner
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advanced
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginner
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advanced
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversification
 
Arcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginnerArcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginner
 
Arcomem TPDL poster
Arcomem TPDL posterArcomem TPDL poster
Arcomem TPDL poster
 
Diata12 ARCOMEM
Diata12 ARCOMEMDiata12 ARCOMEM
Diata12 ARCOMEM
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Arcomem training enrichment_advanced

  • 1. Entity Enrichment and Consolidation in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University Sheffield, UK 3 IMIS, RC ATHENA, Athens, Greece
  • 2. The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu)  A semantic-aware and socially-driven preservation model is a natural way to go Slide 2
  • 3. ARCOMEM architecture Slide 3 Crawler Cross Crawl Analysis Online Processing Offline Processing Queue Management Application-Aware Helper Resource Selection & Prioritization Resource Fetching Intelligent Crawl Definition Consolidation Enrichment GATE Offline Analysis Social Web Analysis GATE Online Analysis Social Web Analysis Named Entity Evol. Recog. Extracted SocialWeb Information Crawler Cockpit ARCOMEM Storage URLs Relevance Analysis & Priorization Image/Video Analysis Twitter Dynamics WARC Export WARC Files Applications Broadcaster Application Parliament Application ARCOMEM system architecture foresees four processing levels: crawler level, online processing level, offline processing level and cross crawl analysis
  • 4. 4 ETOE offline processing chain The processing chain depicted here describes all components involved in the offline processing of Web objects.
  • 5. The extraction components for text Aim  Extraction of Entities, Topics, Events and Opinions (ETOEs) from  Web Pages  Social Web (Twitter, YouTube, Facebook, …) Challenges  Entity recognition from degraded input sources (tweets etc)  Advancing state of the art NLP and text mining  Dynamics detection: evolution of terms/entities  Semantic representation of Web objects and entities  Appropriate RDF schemas for ETOE and Web objects  Exploiting (Linked Open) Web data to enrich extracted ETOE  Entity classification (into events, locations, topics etc) & consolidation Slide 5
  • 6. ETOE extraction with GATE: an example Slide 6 candidate multi-word term
  • 7. Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. Slide 7 <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ?
  • 8. Data clustering & enrichment Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for correlation/clustering/consolidation Slide 8
  • 9. Enrichment with DBpedia & Freebase • DBpedia and Freebase are particularly well-suited due to their vast size, the availability of disambiguation techniques which can utilise the variety of multilingual labels available in both datasets for individual data items and the level of inter-connectedness of both datasets, allowing the retrieval of a wealth of related information for particular items. • In the case of DBpedia, we make use of the DBpedia Spotlight service which enables an approximate string matching with adjustable confidence level in the interval [0,1]. Experimentally, we set confidence to 0.6. • For Freebase, we use structured queries, taking into account entity types extracted by GATE. 9
  • 10. <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 10
  • 11. <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 11
  • 12. => dbpprop:office dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France => dcterms:subject category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon… <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 12
  • 13. ARCOMEM entities and enrichments - graph Slide 13  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events
  • 14.  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events => cluster expansion by considering related enrichments ARCOMEM entities and enrichments - graph Slide 14
  • 15. Clustering of entities via enrichment relatedness Discovery of “related” entities by discovering related enrichments (a) Retrieving possible paths between 2 enrichments (eg via RelFinder http://www.visualdataweb.org/relfinder.php) (b) Computation of relatedness measure (considering variables such as shortest path, number of paths, relationship types, number of directly connected edges of both enrichments…) (c) Clustering enrichments (entities) which are above certain threshold Slide 15
  • 16. RDF schema for the Knowledge Base 16  Relationships between ARCOMEM entities (ETOE etc) and enrichments  RDF schema: http://www.gate.ac.uk/ns/ontologies/arcomem-data- model.rdf
  • 17. Enrichment evaluation results  Manual evaluation of 240 enrichment-entity pairs  Available scores: 1 (correct), 0 (incorrect), 0.5 (vague or ambiguous relationship) Slide 17 Entity Type Average score DBpedia Average score Freebase Average Score Total arco:Event 0.71 0.71 arco:Location 0.81 0.94 0.88 arco:Money 0.67 0.67 arco:Organization 0.93 1 0.97 arco:Person 0.9 0.89 0.89 arco:Time 0.74 0.74 Total 0.79 0.94 0.87
  • 18. Further reading • Entity Extraction and Consolidation for Social Web Content Preservation. S. Dietze, D. Maynard, E. Demidova, T. Risse, W. Peters, K. Doka und Y. Stavrakas, SDA, volume 912 of CEUR Workshop Proceedings, page 18-29. CEUR-WS.org, (2012) • Can entities be friends? B. P. Nunes , R. Kawase, S. Dietze, D. Taibi, M. A. Casanova, W. Nejdl Boston, US, 2012. Web of Linked Entities (WOLE2012), Workshop at The 11th International Semantic Web Conference (ISWC2012). • Combining a co-occurrence-based and a semantic measure for entity linking. B. P. Nunes, S. Dietze, M. A. Casanova, R. Kawase, B. Fetahu, W. Nejdl. 2013. ESWC 2013 - 10th Extended Semantic Web Conference. • Linked data - The Story So Far. Biser, C., Heath, T. and Berners-Lee, T. 2009, Special Issue on Linked data, International Journal on Semantic Web and Information Systems (IJSWIS). Slide 18
  • 19. THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732 demidova@L3S.de www.arcomem.eu