SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Finding Data Sets



           Anja Jentzsch, Freie Universität Berlin


                       17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
                 WWW2012, Lyon, France



                                                               1
Different motivations
•   Finding data sets
    •   Look for resources to link a data set to
    •   Find a data set with relevant data to consume / integrate


•   Finding vocabularies
    •   Find vocabularies to use to model data sets
    •   Find vocabularies to map your existing schema to




                                                                    2
Different tool types
•   Search engines
    •   find data sets based on keywords


•   Data catalogs / directories
    •   explore data sets and faceted search


•   Data Marketplaces
    •   explore and consume data sets



                                                3
Linked Data Search Engines
•   The description of the resources is published as document in RDF
•   RDF search engine index the RDF documents
•   Process similar to that of search engines for HTML documents




                                                                       4
http://sindice.com   5
http://sindice.com   6
http://sig.ma   7
http://sig.ma   8
http://swoogle.umbc.edu   9
http://kmi-web05.open.ac.uk/WatsonWUI/   10
http://factforge.net   11
http://factforge.net   12
Suitability
•   Look for resources to link a data set to
    •   Good


•   Find a data set with relevant data to consume
    •   Maybe good: depends on how the query is expressed


•   Find vocabularies to use to model data sets
    •   Not good: everything is indexed, too much noise



                                                            13
Data catalogs
•   Several governments and institutions are opening their catalogs
•   http://datacatalogs.org provides a manually curated index of 226 data catalogs




                                                                                     14
http://datacatalogs.org   15
16
The Data Hub
•   Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets
•   Various metadata for each data set


•   Other views over (part of) its content
    •   Semantic CKAN (http://semantic.ckan.net)
    •   LATC Data Source Inventory
    •   LOD Cloud
    •   State of the LOD Cloud



                                                                                 17
http://thedatahub.org   18
19
http://dsi.lod-cloud.net   20
http://lod-cloud.net   21
http://lod-cloud.net/state/   22
http://lod-cloud.net/state   23
Data Marketplaces
•   “Services that make it easy to find data from a range of secondary data sources,
    then consume or acquire the data in a usable and unified format. Several of these
    services are trying to create marketplaces for data, envisioning that data providers
    can offer their data sets for sale to data seekers.” (http://datamarket.com)




                                                                                       24
Kasabi
•   Data domain
    •   All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
•   Data population
    •   Public datasets
    •   User submitted datasets
•   Data size
    •   186 data sets
•   Data model
    •   RDF


                                                                   25
http://kasabi.com   26
Freebase
•   Metaweb (USA), now Google
•   Free for 100K read API calls per day (10K write), paid for higher volumes
•   Data access
    •   REST API
    •   Linked Data endpoint (http://rdf.freebase.com)
    •   Triple uploader / RDF dumps
•   Data tools
    •   Web based – schema editor, review queue, viewers, …
    •   GridWorks (Google Refine)
        •   Exploring, data cleaning, transformation of tabular data
        •   Map data to Freebase schema & RDF export (3rd party extension)      27
http://www.freebase.com   28
29
Linked Open Vocabularies (LOV)
•   Initiative similar to the LOD Cloud but focused on vocabularies
•   250+ vocabularies




                                                                      30
http://labs.mondeca.com/dataset/lov/   31
32

Mais conteúdo relacionado

Mais procurados

RDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateRDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateAndy Powell
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data ArrivesRichard Wallis
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
LOD4JS - Linked Open Data for Jewish Studies
LOD4JS - Linked Open Data for Jewish StudiesLOD4JS - Linked Open Data for Jewish Studies
LOD4JS - Linked Open Data for Jewish StudiesKepa J. Rodriguez
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk CambridgeMagnus Manske
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureMichele Pasin
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Facilitating the discovery of public datasets
Facilitating the discovery of public datasetsFacilitating the discovery of public datasets
Facilitating the discovery of public datasetsNafiseh Navabpour
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountLeigh Dodds
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern PalimpsestLeigh Dodds
 
Many flavors of linked data
Many flavors of linked dataMany flavors of linked data
Many flavors of linked dataDebra Shapiro
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?Debra Shapiro
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Andy Jackson
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 

Mais procurados (20)

RDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an updateRDTF Metadata Guidelines: an update
RDTF Metadata Guidelines: an update
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data Arrives
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
LOD4JS - Linked Open Data for Jewish Studies
LOD4JS - Linked Open Data for Jewish StudiesLOD4JS - Linked Open Data for Jewish Studies
LOD4JS - Linked Open Data for Jewish Studies
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
 
SemanticWebApp
SemanticWebAppSemanticWebApp
SemanticWebApp
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Linked Data
Linked DataLinked Data
Linked Data
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Facilitating the discovery of public datasets
Facilitating the discovery of public datasetsFacilitating the discovery of public datasets
Facilitating the discovery of public datasets
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern Palimpsest
 
Many flavors of linked data
Many flavors of linked dataMany flavors of linked data
Many flavors of linked data
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 

Destaque

Benedictine sisters
Benedictine sistersBenedictine sisters
Benedictine sistersmarcelabui00
 
2do romeo y julieta
2do romeo y julieta2do romeo y julieta
2do romeo y julietamarcelabui00
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query ResultsAnja Jentzsch
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Anja Jentzsch
 
Marstrat - white-label financial services for blue chip Brands
Marstrat - white-label financial services for blue chip BrandsMarstrat - white-label financial services for blue chip Brands
Marstrat - white-label financial services for blue chip BrandsDominic Reeves
 
Pokemon: Wii & DS ideas
Pokemon: Wii & DS ideasPokemon: Wii & DS ideas
Pokemon: Wii & DS ideassourpatch74
 
งานนำเสนอไฟฟ้า ม.304
งานนำเสนอไฟฟ้า ม.304งานนำเสนอไฟฟ้า ม.304
งานนำเสนอไฟฟ้า ม.304toaaasdwggh
 
FYP Presentation
FYP PresentationFYP Presentation
FYP Presentationwindkit
 
約旦華語老師數位教學訓練
約旦華語老師數位教學訓練約旦華語老師數位教學訓練
約旦華語老師數位教學訓練韶君 徐
 
FILE & LETTER TRACKING AND MANAGEMENT SYSTEM
FILE & LETTER TRACKING AND MANAGEMENT SYSTEMFILE & LETTER TRACKING AND MANAGEMENT SYSTEM
FILE & LETTER TRACKING AND MANAGEMENT SYSTEMNiharendra Choudhury
 
CK12.ORG presentation at miniCAST El Paso 2016
CK12.ORG presentation at miniCAST El Paso 2016CK12.ORG presentation at miniCAST El Paso 2016
CK12.ORG presentation at miniCAST El Paso 2016Tim Holt
 

Destaque (16)

Ethics
EthicsEthics
Ethics
 
Wikidata
WikidataWikidata
Wikidata
 
Benedictine sisters
Benedictine sistersBenedictine sisters
Benedictine sisters
 
2do romeo y julieta
2do romeo y julieta2do romeo y julieta
2do romeo y julieta
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query Results
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
 
Marstrat - white-label financial services for blue chip Brands
Marstrat - white-label financial services for blue chip BrandsMarstrat - white-label financial services for blue chip Brands
Marstrat - white-label financial services for blue chip Brands
 
Pokemon: Wii & DS ideas
Pokemon: Wii & DS ideasPokemon: Wii & DS ideas
Pokemon: Wii & DS ideas
 
งานนำเสนอไฟฟ้า ม.304
งานนำเสนอไฟฟ้า ม.304งานนำเสนอไฟฟ้า ม.304
งานนำเสนอไฟฟ้า ม.304
 
FYP Presentation
FYP PresentationFYP Presentation
FYP Presentation
 
Fraçao
FraçaoFraçao
Fraçao
 
約旦華語老師數位教學訓練
約旦華語老師數位教學訓練約旦華語老師數位教學訓練
約旦華語老師數位教學訓練
 
Project Monitoring System (PMIS)
Project Monitoring System (PMIS)Project Monitoring System (PMIS)
Project Monitoring System (PMIS)
 
FILE & LETTER TRACKING AND MANAGEMENT SYSTEM
FILE & LETTER TRACKING AND MANAGEMENT SYSTEMFILE & LETTER TRACKING AND MANAGEMENT SYSTEM
FILE & LETTER TRACKING AND MANAGEMENT SYSTEM
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
CK12.ORG presentation at miniCAST El Paso 2016
CK12.ORG presentation at miniCAST El Paso 2016CK12.ORG presentation at miniCAST El Paso 2016
CK12.ORG presentation at miniCAST El Paso 2016
 

Semelhante a Finding Data Sets

The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfRichardWallis3
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go LiveRichard Wallis
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureEmily Nimsakont
 
Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Anja Jentzsch
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 

Semelhante a Finding Data Sets (20)

The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdf
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go Live
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
Dm1.1
Dm1.1Dm1.1
Dm1.1
 
Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Finding Data Sets

  • 1. Finding Data Sets Anja Jentzsch, Freie Universität Berlin 17 April 2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
  • 2. Different motivations • Finding data sets • Look for resources to link a data set to • Find a data set with relevant data to consume / integrate • Finding vocabularies • Find vocabularies to use to model data sets • Find vocabularies to map your existing schema to 2
  • 3. Different tool types • Search engines • find data sets based on keywords • Data catalogs / directories • explore data sets and faceted search • Data Marketplaces • explore and consume data sets 3
  • 4. Linked Data Search Engines • The description of the resources is published as document in RDF • RDF search engine index the RDF documents • Process similar to that of search engines for HTML documents 4
  • 13. Suitability • Look for resources to link a data set to • Good • Find a data set with relevant data to consume • Maybe good: depends on how the query is expressed • Find vocabularies to use to model data sets • Not good: everything is indexed, too much noise 13
  • 14. Data catalogs • Several governments and institutions are opening their catalogs • http://datacatalogs.org provides a manually curated index of 226 data catalogs 14
  • 16. 16
  • 17. The Data Hub • Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets • Various metadata for each data set • Other views over (part of) its content • Semantic CKAN (http://semantic.ckan.net) • LATC Data Source Inventory • LOD Cloud • State of the LOD Cloud 17
  • 19. 19
  • 24. Data Marketplaces • “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com) 24
  • 25. Kasabi • Data domain • All purpose, incl. DBpedia, GeoNames, BBC Linked Data, … • Data population • Public datasets • User submitted datasets • Data size • 186 data sets • Data model • RDF 25
  • 27. Freebase • Metaweb (USA), now Google • Free for 100K read API calls per day (10K write), paid for higher volumes • Data access • REST API • Linked Data endpoint (http://rdf.freebase.com) • Triple uploader / RDF dumps • Data tools • Web based – schema editor, review queue, viewers, … • GridWorks (Google Refine) • Exploring, data cleaning, transformation of tabular data • Map data to Freebase schema & RDF export (3rd party extension) 27
  • 29. 29
  • 30. Linked Open Vocabularies (LOV) • Initiative similar to the LOD Cloud but focused on vocabularies • 250+ vocabularies 30
  • 32. 32