SlideShare uma empresa Scribd logo
1 de 28
OGD: Part 5 – Semantic Issues
Juan Pane: jpane@pol.una.py
Lorenzino Vaccari: lorenzino.vaccari@gmail.com

1

Juan Pane, Lorenzino Vaccari

http://dati.trentino.it/

08/10/2013
Outline
• Overview
• Issues of opening data
• Entity centric Semantic layer
• Importing pipeline

• Importing tool

2

Juan Pane, Lorenzino Vaccari

08/10/2013
Available

Structured

Linked Open
Data

Open formats
Redefenceable

Linked

The best data is
an open data
Vs.

All data must be
perfect

3

Juan Pane, Lorenzino Vaccari

08/10/2013
Lack of explicit semantics
The real meaning of the data was kept in the developers mind
when creating the data

http://goo.gl/npEHKr (Thanks to Moaz Reyad)

4

Juan Pane, Lorenzino Vaccari

08/10/2013
Lack of explicit semantics
Can lead to things like:

http://goo.gl/npEHKr (Thanks to Moaz Reyad)

5

Juan Pane, Lorenzino Vaccari

08/10/2013
Semantic heterogeneity
Difference in the meaning of local data

6

Juan Pane, Lorenzino Vaccari

08/10/2013
Issues when Opening Trentino Data
 Each department has authority on only some part of the data.

 Dataset originally created for internal use only.
 Dataset created for a specific need.
 Dataset created with custom format:
 For structure (some exceptions)
 For data
 Lack of reuse -> duplication.
 Lack of programmers.
 We cannot TELL them what/how to do (always).
 Data changes

7

Juan Pane, Lorenzino Vaccari

08/10/2013
Available

Data Catalog

Structured

Open formats
Redefenceable

Linked

8

Entity
Centric
Semantic
Layer
Juan Pane, Lorenzino Vaccari

08/10/2013
Entity centric: Added value
 Aggregated data

 Accurate data, manually curated
 Unique identifiers, distributed perspectives
 Re-think identifiers

 Semantified values
E1

E2

name

name

Ignacio P. F.

nationality

italian

born in

Paraguay

lives in

Trento

date of birth

1980

affiliation

9

Juan Pane

Univ. Trento

affiliation

PF-UNA

Juan Pane, Lorenzino Vaccari

08/10/2013
Entities
 Real world: is something that has a distinct, separate

existence, although it need not be a material (physical)
existence. Has a set of properties, which evolve over time.
Example:
 Mental: personal (local) model created and maintained by a

person that references and describes a real world entity.
 Digital: capture the semantics of real world entities,

provided by people.
10

Juan Pane, Lorenzino Vaccari

08/10/2013
Entity Centric Semantic Layer:
• Address the integration problems due to semantic

heterogeneity:
• Different formats
• Different identifiers
• Implicit semantics
• Homonyms, synonyms, aliases
• Partial knowledge
• Knowledge evolution
http://www.webfoundation.org/2011/11/5-staropen-data-initiatives/

11

Juan Pane, Lorenzino Vaccari

08/10/2013
Entity-based Integration
• Focus on entities as first class citizens
• Entities are objects which are so important in our everyday life to be referred with a name
• Each entity has its own metadata (e.g. name, latitude, longitude, …)
• Each entity is in relation with many other entities (e.g. Einstein was born in Ulm, his affiliation

was Charles University, Ulm is a city in Germany)
• There are relatively “few” commonsense entity types (person, …, event)
• There are many domain specific entities (bus stops, cycling paths, ..)
• All components have explicit semantics: schema, entities, attributes, values

12

Juan Pane, Lorenzino Vaccari

08/10/2013
Importing pipeline, Macro Steps
Domain analysis

1.

Study the needed entity types, adapt the knowledge base
accordingly. First time bootstrapping



Import entities

2.

Semi-automatic tool.






13

Domain experts are expensive.
Human attention is a scarce resource.
Incremental enrichment and aggregation of entities.

Juan Pane, Lorenzino Vaccari

08/10/2013
Open Data Peculiarities
 All data comes from a CKAN repository (DCAT).

 Process one data file at a time.
 Each data file can be represented as a table.
 Each row in the table represents a (partial) entity.

 The format of the values might not be enforced in the data

files.
 Not all data is relevant.

14

Juan Pane, Lorenzino Vaccari

08/10/2013
Available

Data Catalog

Structured

Open formats
Redefenceable

Linked

15

Juan Pane, Lorenzino Vaccari

Entity centric
Importing tool

08/10/2013
Importing tool process

16

Juan Pane, Lorenzino Vaccari

08/10/2013
1. Source Selection
Import one data file at a time

17

Juan Pane, Lorenzino Vaccari

08/10/2013
2. Schema Matching
Select a target type of entity -> correspondences between the input columns and
the output attributes
LocalitaTuristica
nome

provincia

descrizione

Andalo (1047)

Provincia di
Trento

Canazei (1450)

Trento Prov.

18

lat

long

Sorge su un'ampia sella prativa 3
al centro...

654463

712857

Situato all'estremità
settentrionale della...

511504

147444

Juan Pane, Lorenzino Vaccari

funivie

2

• Nome
• Provincia
• Quota
• Coordinate
• Descrizione
• popolazione

08/10/2013
3. Data Validation
Applies format and structure validation and possible automatic transformations
needed to have the input data in the expected format.

19

Juan Pane, Lorenzino Vaccari

08/10/2013
4. Semantic Enrichment (1/2)
Entity disambiguation: Transform text references into links to existing entities.

20

Juan Pane, Lorenzino Vaccari

08/10/2013
4. Semantic Enrichment (2/2)
Natural Language Processing: Extract concepts and entity references from
free-text.

21

Juan Pane, Lorenzino Vaccari

08/10/2013
5. Reconciliation
Run Identity Management Algorithms to identify each row as a new or existing
entity.
Result
• No Match
• Match
• Multiple
Matches

Action:
• Use ID
• New ID
• Ignore
Row

22

Juan Pane, Lorenzino Vaccari

08/10/2013
6. Exporting
At this point:
 We know what to export.
 All values for target attributes conform to the expected format.
 All text has been semantified (NLP).
 All textual references to entities are converted to links
 Each row has an identifier

v0
23

Juan Pane, Lorenzino Vaccari

i

i+1
08/10/2013
7. Publishing
Put back the semantified entities into CKAN so that the entities
can be Open Data and can be found in the same catalog as the
original data.
 Developers and find the data files of the cleaned, aggregated
entities
 But can also interact with the entities via the Entitypedia APIs

8. Visualization
Search and Navigation
24

Juan Pane, Lorenzino Vaccari

08/10/2013
Semantic Layer: Services
Tool for aiding the “semantification” of the datasets in the catalog
based on:
• Schema matching services
• Identity Management services
• Entity Matching services

• Global Unique Identifier services

• Semantic search and indexing services
• Natural Language Processing
• Entity store

25

Juan Pane, Lorenzino Vaccari

08/10/2013
Our Goal
TN

UK

ES

BE

26

Juan Pane, Lorenzino Vaccari

08/10/2013
27

Juan Pane, Lorenzino Vaccari

08/10/2013
http://www.shabra.com/wp-content/uploads/2011/03/lets-work-together.jpg
Gracias!

Grazie!
Mercy!

Thanks!
Kiitos!

Dank u!
Gràcies!

Gratias!
Danke!

ευχαριστώ

We thank in particular CLEI 2013, Autonomous Province of Trento, TrentoRise association,
Universidad Nacional de Asuncion, and University of Trento

28

Juan Pane, Lorenzino Vaccari

08/10/2013

Mais conteúdo relacionado

Mais procurados

Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauriliddy
 
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...Sarah Shreeves
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 
Accessibility Issues
Accessibility IssuesAccessibility Issues
Accessibility Issuesliddy
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierBasis Technology
 

Mais procurados (6)

Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauri
 
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 
Accessibility Issues
Accessibility IssuesAccessibility Issues
Accessibility Issues
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
 
Clustering
ClusteringClustering
Clustering
 

Destaque

Open Government Data Tutorial at CLEI 2013. Part 1 - Introduction
Open Government Data Tutorial at CLEI 2013. Part 1 - IntroductionOpen Government Data Tutorial at CLEI 2013. Part 1 - Introduction
Open Government Data Tutorial at CLEI 2013. Part 1 - Introductionjpane
 
Open Government Data Tutorial at CLEI 2013. Part 4 Applications
Open Government Data Tutorial at CLEI 2013. Part 4 ApplicationsOpen Government Data Tutorial at CLEI 2013. Part 4 Applications
Open Government Data Tutorial at CLEI 2013. Part 4 Applicationsjpane
 
Adequabilidade de Postes a Luminárias - Claudia Granjeiro
Adequabilidade de Postes a Luminárias - Claudia GranjeiroAdequabilidade de Postes a Luminárias - Claudia Granjeiro
Adequabilidade de Postes a Luminárias - Claudia GranjeiroAureo Ricardo Salles
 
Open Government Data Tutorial at CLEI 2013. Part 3 Real Experience
Open Government Data Tutorial at CLEI 2013. Part 3 Real ExperienceOpen Government Data Tutorial at CLEI 2013. Part 3 Real Experience
Open Government Data Tutorial at CLEI 2013. Part 3 Real Experiencejpane
 
Expanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudioExpanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudior-kor
 
Open Government Data Tutorial at CLEI 2013. Part 2 - Issues
Open Government Data Tutorial at CLEI 2013. Part 2 - IssuesOpen Government Data Tutorial at CLEI 2013. Part 2 - Issues
Open Government Data Tutorial at CLEI 2013. Part 2 - Issuesjpane
 
Optima multitrade business plan
Optima multitrade business planOptima multitrade business plan
Optima multitrade business planJetan Arora
 
Open government obl_20101006
Open government obl_20101006Open government obl_20101006
Open government obl_20101006Andre Golliez
 
Vitrine (Artshare) UX Internship Document
Vitrine (Artshare) UX Internship DocumentVitrine (Artshare) UX Internship Document
Vitrine (Artshare) UX Internship DocumentAnne David
 
a3 systems relauncht Website der FITT gGmbH
a3 systems relauncht Website der FITT gGmbHa3 systems relauncht Website der FITT gGmbH
a3 systems relauncht Website der FITT gGmbHa3 systems GmbH
 
FBK - 11 MSc flyer HRM prf 29aug11
FBK - 11 MSc flyer HRM prf 29aug11FBK - 11 MSc flyer HRM prf 29aug11
FBK - 11 MSc flyer HRM prf 29aug11Deep Mahangi
 
Diapositiva expocision psicolo..
Diapositiva expocision psicolo..Diapositiva expocision psicolo..
Diapositiva expocision psicolo..rubengonzalez01
 
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1le thai
 
Präsentation carports Im hahnengrunde carports
Präsentation carports Im hahnengrunde carportsPräsentation carports Im hahnengrunde carports
Präsentation carports Im hahnengrunde carportsJenapolis
 
Simple Marketing Automation
Simple Marketing AutomationSimple Marketing Automation
Simple Marketing AutomationRoja Guggilam
 

Destaque (20)

Open Government Data Tutorial at CLEI 2013. Part 1 - Introduction
Open Government Data Tutorial at CLEI 2013. Part 1 - IntroductionOpen Government Data Tutorial at CLEI 2013. Part 1 - Introduction
Open Government Data Tutorial at CLEI 2013. Part 1 - Introduction
 
Open Government Data Tutorial at CLEI 2013. Part 4 Applications
Open Government Data Tutorial at CLEI 2013. Part 4 ApplicationsOpen Government Data Tutorial at CLEI 2013. Part 4 Applications
Open Government Data Tutorial at CLEI 2013. Part 4 Applications
 
Adequabilidade de Postes a Luminárias - Claudia Granjeiro
Adequabilidade de Postes a Luminárias - Claudia GranjeiroAdequabilidade de Postes a Luminárias - Claudia Granjeiro
Adequabilidade de Postes a Luminárias - Claudia Granjeiro
 
Open Government Data Tutorial at CLEI 2013. Part 3 Real Experience
Open Government Data Tutorial at CLEI 2013. Part 3 Real ExperienceOpen Government Data Tutorial at CLEI 2013. Part 3 Real Experience
Open Government Data Tutorial at CLEI 2013. Part 3 Real Experience
 
Open Government Data
Open Government DataOpen Government Data
Open Government Data
 
Expanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudioExpanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudio
 
Open Data handbook thai
Open Data handbook thaiOpen Data handbook thai
Open Data handbook thai
 
EOP.IM.S31
EOP.IM.S31EOP.IM.S31
EOP.IM.S31
 
Open Government Data Tutorial at CLEI 2013. Part 2 - Issues
Open Government Data Tutorial at CLEI 2013. Part 2 - IssuesOpen Government Data Tutorial at CLEI 2013. Part 2 - Issues
Open Government Data Tutorial at CLEI 2013. Part 2 - Issues
 
Optima multitrade business plan
Optima multitrade business planOptima multitrade business plan
Optima multitrade business plan
 
Open government obl_20101006
Open government obl_20101006Open government obl_20101006
Open government obl_20101006
 
Vitrine (Artshare) UX Internship Document
Vitrine (Artshare) UX Internship DocumentVitrine (Artshare) UX Internship Document
Vitrine (Artshare) UX Internship Document
 
a3 systems relauncht Website der FITT gGmbH
a3 systems relauncht Website der FITT gGmbHa3 systems relauncht Website der FITT gGmbH
a3 systems relauncht Website der FITT gGmbH
 
FBK - 11 MSc flyer HRM prf 29aug11
FBK - 11 MSc flyer HRM prf 29aug11FBK - 11 MSc flyer HRM prf 29aug11
FBK - 11 MSc flyer HRM prf 29aug11
 
Objetivos
ObjetivosObjetivos
Objetivos
 
Diapositiva expocision psicolo..
Diapositiva expocision psicolo..Diapositiva expocision psicolo..
Diapositiva expocision psicolo..
 
Revista finalizada
Revista finalizadaRevista finalizada
Revista finalizada
 
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1
PHẦN MỀM NHÀ HÀNG - ACMAN BAR KARAOKE 6.1
 
Präsentation carports Im hahnengrunde carports
Präsentation carports Im hahnengrunde carportsPräsentation carports Im hahnengrunde carports
Präsentation carports Im hahnengrunde carports
 
Simple Marketing Automation
Simple Marketing AutomationSimple Marketing Automation
Simple Marketing Automation
 

Semelhante a Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues

SemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital ProvenanceSemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital Provenancegvj4v
 
Apache Solr, il motore di ricerca enterprise open source
Apache Solr, il motore di ricerca enterprise open sourceApache Solr, il motore di ricerca enterprise open source
Apache Solr, il motore di ricerca enterprise open sourceLuca Bonesini
 
Semanticnews 230913-final
Semanticnews 230913-finalSemanticnews 230913-final
Semanticnews 230913-finalDavid Newman
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderHeiko Joerg Schick
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsikanow
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data ExperienceDublinked .
 
Towards a frictionless data future
Towards a frictionless data futureTowards a frictionless data future
Towards a frictionless data futureJisc RDM
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikMarko Grobelnik
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityIrina Bolychevsky
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 
How to model digital objects within the semantic web
How to model digital objects within the semantic webHow to model digital objects within the semantic web
How to model digital objects within the semantic webAngelica Lo Duca
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Webscale Discovery with the Enduser in Mind
Webscale Discovery with the Enduser in Mind Webscale Discovery with the Enduser in Mind
Webscale Discovery with the Enduser in Mind Debra Kolah
 

Semelhante a Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues (20)

Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013
Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013
Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013
 
SemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital ProvenanceSemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital Provenance
 
Apache Solr, il motore di ricerca enterprise open source
Apache Solr, il motore di ricerca enterprise open sourceApache Solr, il motore di ricerca enterprise open source
Apache Solr, il motore di ricerca enterprise open source
 
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
 
Semanticnews 230913-final
Semanticnews 230913-finalSemanticnews 230913-final
Semanticnews 230913-final
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
 
Open Science and Identifiers
Open Science and IdentifiersOpen Science and Identifiers
Open Science and Identifiers
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Loditaly2014 new
Loditaly2014 newLoditaly2014 new
Loditaly2014 new
 
Towards a frictionless data future
Towards a frictionless data futureTowards a frictionless data future
Towards a frictionless data future
 
Ice dec04-04-sammy
Ice dec04-04-sammyIce dec04-04-sammy
Ice dec04-04-sammy
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
How to model digital objects within the semantic web
How to model digital objects within the semantic webHow to model digital objects within the semantic web
How to model digital objects within the semantic web
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Webscale Discovery with the Enduser in Mind
Webscale Discovery with the Enduser in Mind Webscale Discovery with the Enduser in Mind
Webscale Discovery with the Enduser in Mind
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues

  • 1. OGD: Part 5 – Semantic Issues Juan Pane: jpane@pol.una.py Lorenzino Vaccari: lorenzino.vaccari@gmail.com 1 Juan Pane, Lorenzino Vaccari http://dati.trentino.it/ 08/10/2013
  • 2. Outline • Overview • Issues of opening data • Entity centric Semantic layer • Importing pipeline • Importing tool 2 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 3. Available Structured Linked Open Data Open formats Redefenceable Linked The best data is an open data Vs. All data must be perfect 3 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 4. Lack of explicit semantics The real meaning of the data was kept in the developers mind when creating the data http://goo.gl/npEHKr (Thanks to Moaz Reyad) 4 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 5. Lack of explicit semantics Can lead to things like: http://goo.gl/npEHKr (Thanks to Moaz Reyad) 5 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 6. Semantic heterogeneity Difference in the meaning of local data 6 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 7. Issues when Opening Trentino Data  Each department has authority on only some part of the data.  Dataset originally created for internal use only.  Dataset created for a specific need.  Dataset created with custom format:  For structure (some exceptions)  For data  Lack of reuse -> duplication.  Lack of programmers.  We cannot TELL them what/how to do (always).  Data changes 7 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 9. Entity centric: Added value  Aggregated data  Accurate data, manually curated  Unique identifiers, distributed perspectives  Re-think identifiers  Semantified values E1 E2 name name Ignacio P. F. nationality italian born in Paraguay lives in Trento date of birth 1980 affiliation 9 Juan Pane Univ. Trento affiliation PF-UNA Juan Pane, Lorenzino Vaccari 08/10/2013
  • 10. Entities  Real world: is something that has a distinct, separate existence, although it need not be a material (physical) existence. Has a set of properties, which evolve over time. Example:  Mental: personal (local) model created and maintained by a person that references and describes a real world entity.  Digital: capture the semantics of real world entities, provided by people. 10 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 11. Entity Centric Semantic Layer: • Address the integration problems due to semantic heterogeneity: • Different formats • Different identifiers • Implicit semantics • Homonyms, synonyms, aliases • Partial knowledge • Knowledge evolution http://www.webfoundation.org/2011/11/5-staropen-data-initiatives/ 11 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 12. Entity-based Integration • Focus on entities as first class citizens • Entities are objects which are so important in our everyday life to be referred with a name • Each entity has its own metadata (e.g. name, latitude, longitude, …) • Each entity is in relation with many other entities (e.g. Einstein was born in Ulm, his affiliation was Charles University, Ulm is a city in Germany) • There are relatively “few” commonsense entity types (person, …, event) • There are many domain specific entities (bus stops, cycling paths, ..) • All components have explicit semantics: schema, entities, attributes, values 12 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 13. Importing pipeline, Macro Steps Domain analysis 1. Study the needed entity types, adapt the knowledge base accordingly. First time bootstrapping  Import entities 2. Semi-automatic tool.     13 Domain experts are expensive. Human attention is a scarce resource. Incremental enrichment and aggregation of entities. Juan Pane, Lorenzino Vaccari 08/10/2013
  • 14. Open Data Peculiarities  All data comes from a CKAN repository (DCAT).  Process one data file at a time.  Each data file can be represented as a table.  Each row in the table represents a (partial) entity.  The format of the values might not be enforced in the data files.  Not all data is relevant. 14 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 15. Available Data Catalog Structured Open formats Redefenceable Linked 15 Juan Pane, Lorenzino Vaccari Entity centric Importing tool 08/10/2013
  • 16. Importing tool process 16 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 17. 1. Source Selection Import one data file at a time 17 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 18. 2. Schema Matching Select a target type of entity -> correspondences between the input columns and the output attributes LocalitaTuristica nome provincia descrizione Andalo (1047) Provincia di Trento Canazei (1450) Trento Prov. 18 lat long Sorge su un'ampia sella prativa 3 al centro... 654463 712857 Situato all'estremità settentrionale della... 511504 147444 Juan Pane, Lorenzino Vaccari funivie 2 • Nome • Provincia • Quota • Coordinate • Descrizione • popolazione 08/10/2013
  • 19. 3. Data Validation Applies format and structure validation and possible automatic transformations needed to have the input data in the expected format. 19 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 20. 4. Semantic Enrichment (1/2) Entity disambiguation: Transform text references into links to existing entities. 20 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 21. 4. Semantic Enrichment (2/2) Natural Language Processing: Extract concepts and entity references from free-text. 21 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 22. 5. Reconciliation Run Identity Management Algorithms to identify each row as a new or existing entity. Result • No Match • Match • Multiple Matches Action: • Use ID • New ID • Ignore Row 22 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 23. 6. Exporting At this point:  We know what to export.  All values for target attributes conform to the expected format.  All text has been semantified (NLP).  All textual references to entities are converted to links  Each row has an identifier v0 23 Juan Pane, Lorenzino Vaccari i i+1 08/10/2013
  • 24. 7. Publishing Put back the semantified entities into CKAN so that the entities can be Open Data and can be found in the same catalog as the original data.  Developers and find the data files of the cleaned, aggregated entities  But can also interact with the entities via the Entitypedia APIs 8. Visualization Search and Navigation 24 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 25. Semantic Layer: Services Tool for aiding the “semantification” of the datasets in the catalog based on: • Schema matching services • Identity Management services • Entity Matching services • Global Unique Identifier services • Semantic search and indexing services • Natural Language Processing • Entity store 25 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 26. Our Goal TN UK ES BE 26 Juan Pane, Lorenzino Vaccari 08/10/2013
  • 27. 27 Juan Pane, Lorenzino Vaccari 08/10/2013 http://www.shabra.com/wp-content/uploads/2011/03/lets-work-together.jpg
  • 28. Gracias! Grazie! Mercy! Thanks! Kiitos! Dank u! Gràcies! Gratias! Danke! ευχαριστώ We thank in particular CLEI 2013, Autonomous Province of Trento, TrentoRise association, Universidad Nacional de Asuncion, and University of Trento 28 Juan Pane, Lorenzino Vaccari 08/10/2013