SlideShare uma empresa Scribd logo
1 de 27
Linked Data Quality Assessment
– daQ and Luzzu
Jeremy Debattista
University of Bonn
Presentation at the Ontology Engineering
Group (UPM)
…who am I?
• B.Sc (Hons) in Computer Science – University of
Malta
– Thesis: Collaborative Editing and Expert Finding
• M.App Sc in Computer Science – DERI, National
University of Ireland, Galway
– Thesis: Ontology-based rules for User-Controlled
Support in Ubiquitous Environments
• PhD Candidate – University of Bonn
… my PhD – the big picture
• Work related to Data Quality (in LD)
– representing quality metadata (daQ)
– assessing data quality (Luzzu)
– identifying new metrics from standard
vocabularies (like PROV-O)
… the need for Quality Metadata
• Convincing data consumers to use our
published data
• Filtering datasets
• Poor Quality Perspective – Big Data Veracity
… the daQ vocabulary
… the daQ vocabulary
… the daQ vocabulary
• Metadata as Named Graphs
• Usage of abstract class concept
• Metric assessment as Observations
• Preserving Provenance information
… daQ on the Web
http://purl.org/eis/vocab/daq
… daQ Applications
• daQ validator – Validates quality metric
schemas extending the daQ (will be online
soon)
– e.g. checking that each dimension is in exactly one
category…
• Luzzu – next slides
… Luzzu – QA Framework
• A comprehensive QA framework
– assesses LD quality using user-provided metrics (we
have a number of LOD metrics already) in a scalable
manner
– provides queryable metadata (daQ)
– provide quality reports which can be used for cleaning
• Java Based with maven integration
• http://eis-bonn.github.io/Luzzu
… Luzzu – QA Framework
Knowledge)
Layer)
Quality)Assessment)Unit)
Processing)Unit)
Assessment)
Layer)
Seman9c)Schema)Layer)
Annota9on)Unit) Opera9ons)Unit)
Communica9on)Layer)
LQML)Comp.)Unit)
… Luzzu – QA Framework
Dataset& Processing&Unit& Annota0on&Unit&
Metric&1& Metric&2& Metric&n&…"
Quality&Assessment&Unit&
Communica0on&Layer&
…what’s missing in Luzzu
• Make Luzzu work better on Big Data Platforms
– We already have a SPARK Processor
– How can metrics be scaled on different cores?
Something like map-reduce maybe?
… data quality lifecycle
2.#
Assessment#
3.#Data#
Repairing#and#
Cleaning#
4.#Storage/
Cataloguing/
Archiving##
5.#
Explora@on/
Ranking#
1.#Metric#
Iden@fica@on#
and#
Defin
i
@on#
… quality metrics
• Traditional naïve way
• Probabilistic Techniques (A paper was
presented at ESWC this year)
… probabilistic technique hypothesis
Probabilistic approximation techniques would :
(H1) drastically improve computational time
(H2) give close to accurate results
… probabilistic techniques used
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 75%
Time Saved: > 2 Orders of Magnitude
Precision: 100%
Time Saved: > 2 Orders of Magnitude
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 97%
Time Saved: > 3 Orders of Magnitude
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 95%
Time Saved: > 1 Order of Magnitude
… what am I working on
• Large Scale/Data web Scale evaluation Journal
Paper
– assessing the quality of LOD Cloud datasets
• daQ (Journal Paper)
… what do we do at Bonn
• Open Government Data – Publishing and
Consumption
– Data Value Chains, Value Creation, Budgeting
• Portal for publication and consumption of open
data
– Lowering of semantic data to shallower domain
specific formats (RDB, CSV etc..)
• RDF Visualisations and Recommendations
… what do we do at Bonn
• Dataset Change Detection
• Collaborative Authoring and Open Educational
Content
• Low-threshold agile methodology for
collaborative vocabulary development
• Mapping of AutomationML to RDF
… some tools
http://purl.org/net/exconquer/
… some tools
http://purl.org/net/dsaas
… some tools
http://slidewiki.org
… some tools
http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html

Mais conteúdo relacionado

Mais procurados

Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
butest
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
Alexander Decker
 

Mais procurados (20)

RDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management educationRDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management education
 
Project E: Citation
Project E: CitationProject E: Citation
Project E: Citation
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
 
Ran zhou poster 2018
Ran zhou poster 2018Ran zhou poster 2018
Ran zhou poster 2018
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
CV
CVCV
CV
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
RDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open ContextRDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
 
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
 
Supporting PDF accessibility evaluation: Early results from the FixRep project
 Supporting PDF accessibility evaluation: Early results from the FixRep project Supporting PDF accessibility evaluation: Early results from the FixRep project
Supporting PDF accessibility evaluation: Early results from the FixRep project
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support Services
 
QQML presentation
QQML presentationQQML presentation
QQML presentation
 
krynski_cv
krynski_cvkrynski_cv
krynski_cv
 
Gunderman, Slayton, and Wang, "Planning for the Long-Term"
Gunderman, Slayton, and Wang, "Planning for the Long-Term"Gunderman, Slayton, and Wang, "Planning for the Long-Term"
Gunderman, Slayton, and Wang, "Planning for the Long-Term"
 
Ievobio2010cdaostore
Ievobio2010cdaostoreIevobio2010cdaostore
Ievobio2010cdaostore
 
A survey of heterogeneous information network analysis
A survey of heterogeneous information network analysisA survey of heterogeneous information network analysis
A survey of heterogeneous information network analysis
 

Destaque

Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
Nuffield Trust
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
deborahsutton
 

Destaque (20)

Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Managing Completeness of Web Data
Managing Completeness of Web DataManaging Completeness of Web Data
Managing Completeness of Web Data
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Open data quality
Open data qualityOpen data quality
Open data quality
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Institutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsInstitutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, Tools
 

Semelhante a Linked Data Quality Assessment – daQ and Luzzu

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
Manjula Ambur
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 

Semelhante a Linked Data Quality Assessment – daQ and Luzzu (20)

RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Data Quality
Data QualityData Quality
Data Quality
 
Hmp 201512
Hmp 201512Hmp 201512
Hmp 201512
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 

Último

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 

Último (20)

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 

Linked Data Quality Assessment – daQ and Luzzu

  • 1. Linked Data Quality Assessment – daQ and Luzzu Jeremy Debattista University of Bonn Presentation at the Ontology Engineering Group (UPM)
  • 2. …who am I? • B.Sc (Hons) in Computer Science – University of Malta – Thesis: Collaborative Editing and Expert Finding • M.App Sc in Computer Science – DERI, National University of Ireland, Galway – Thesis: Ontology-based rules for User-Controlled Support in Ubiquitous Environments • PhD Candidate – University of Bonn
  • 3. … my PhD – the big picture • Work related to Data Quality (in LD) – representing quality metadata (daQ) – assessing data quality (Luzzu) – identifying new metrics from standard vocabularies (like PROV-O)
  • 4. … the need for Quality Metadata • Convincing data consumers to use our published data • Filtering datasets • Poor Quality Perspective – Big Data Veracity
  • 5. … the daQ vocabulary
  • 6. … the daQ vocabulary
  • 7. … the daQ vocabulary • Metadata as Named Graphs • Usage of abstract class concept • Metric assessment as Observations • Preserving Provenance information
  • 8. … daQ on the Web http://purl.org/eis/vocab/daq
  • 9. … daQ Applications • daQ validator – Validates quality metric schemas extending the daQ (will be online soon) – e.g. checking that each dimension is in exactly one category… • Luzzu – next slides
  • 10. … Luzzu – QA Framework • A comprehensive QA framework – assesses LD quality using user-provided metrics (we have a number of LOD metrics already) in a scalable manner – provides queryable metadata (daQ) – provide quality reports which can be used for cleaning • Java Based with maven integration • http://eis-bonn.github.io/Luzzu
  • 11. … Luzzu – QA Framework Knowledge) Layer) Quality)Assessment)Unit) Processing)Unit) Assessment) Layer) Seman9c)Schema)Layer) Annota9on)Unit) Opera9ons)Unit) Communica9on)Layer) LQML)Comp.)Unit)
  • 12. … Luzzu – QA Framework Dataset& Processing&Unit& Annota0on&Unit& Metric&1& Metric&2& Metric&n&…" Quality&Assessment&Unit& Communica0on&Layer&
  • 13. …what’s missing in Luzzu • Make Luzzu work better on Big Data Platforms – We already have a SPARK Processor – How can metrics be scaled on different cores? Something like map-reduce maybe?
  • 14. … data quality lifecycle 2.# Assessment# 3.#Data# Repairing#and# Cleaning# 4.#Storage/ Cataloguing/ Archiving## 5.# Explora@on/ Ranking# 1.#Metric# Iden@fica@on# and# Defin i @on#
  • 15. … quality metrics • Traditional naïve way • Probabilistic Techniques (A paper was presented at ESWC this year)
  • 16. … probabilistic technique hypothesis Probabilistic approximation techniques would : (H1) drastically improve computational time (H2) give close to accurate results
  • 17. … probabilistic techniques used Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network
  • 18. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 75% Time Saved: > 2 Orders of Magnitude Precision: 100% Time Saved: > 2 Orders of Magnitude
  • 19. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 97% Time Saved: > 3 Orders of Magnitude
  • 20. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 95% Time Saved: > 1 Order of Magnitude
  • 21. … what am I working on • Large Scale/Data web Scale evaluation Journal Paper – assessing the quality of LOD Cloud datasets • daQ (Journal Paper)
  • 22. … what do we do at Bonn • Open Government Data – Publishing and Consumption – Data Value Chains, Value Creation, Budgeting • Portal for publication and consumption of open data – Lowering of semantic data to shallower domain specific formats (RDB, CSV etc..) • RDF Visualisations and Recommendations
  • 23. … what do we do at Bonn • Dataset Change Detection • Collaborative Authoring and Open Educational Content • Low-threshold agile methodology for collaborative vocabulary development • Mapping of AutomationML to RDF

Notas do Editor

  1. there are various reasons why dataset should contain quality metadata convincing data consumers: is the published data fit to the user’s needs filtering datasets: if the publisher does not care about his data, then why should a consumer use it? Poor quality perspective: LD is a good use case for Veracity in Big Data, but it is often overlooked due to its poor quality perspective. If the big data community is convinced otherwise, LD might be used more often on bigger platforms. Therefore we have to start by assessing data quality and stamp our datasets in a machine readable format.
  2. Represent Quality Metadata in Named Graphs that can be attached to datasets CDM are abstract classes… these are only conceptual.. more concrete classes should be represented as sub-classes A dataset can be assessed multiple metrics. Each metric can be assessed over the dataset infinite times, each time the new value represented as an observation Each observation is also a Provenance Entity, enabling the representation of concepts such as the activity agent and how a metric was executed (for example parameter setting in reservoir techniques)
  3. The general architecture
  4. The processing workflow
  5. We identified the data quality lifecycle, which could be part of a bigger lifecyle like the LODStack or even to bigger more generic processes like data value chains Metric Identification and Definition – Choosing the right metrics for a dataset and task at hand; Assessment – Dataset assessment based on the metrics chosen Data Repairing and Cleaning – Ensuring that, following a quality assessment, a dataset is curated in order to improve its quality; Storage, Cataloguing and Archiving – Updating the improved dataset on the cloud whilst making the quality metadata available to the public Exploration and Ranking – Finally, data consumers can explore cleaned datasets according to their quality metadata
  6. our hypothesis is that probabilistic approximation techniques would drastically improve computational time when compared with the naïve implementations which gives 100% accurate results having said that the probabilistic techniques will still give a close to accurate results given the right parameter settings.
  7. Therefore to sum up the metrics using the Res Sampling techniques: The deref metric gave around 75% precision, whilst the order of magnitude can easily go over 2 with small datasets having 1M triples The links to External DP metric gave us 100% precision, whilst the difference in the time can be easily noticeable when datasets grow larger.
  8. From the results we saw that the precision was on average 97%, whilst the computational time takes more than 3 order of magnitude in most of the cases
  9. sum up