SlideShare a Scribd company logo
1 of 15
Data Science meets Linked
Data
Alasdair J G Gray
http://www.alasdairjggray.co.uk
@gray_alasdair
A.J.G.Gray@hw.ac.uk
SICSA Data Science Theme Launch
3 July 2014
BBC World Cup
3 July 2014SICSA Data Science Theme Launch
1
BBC Linked Data Platform
3 July 2014SICSA Data Science Theme Launch
2
Olympics 2012
3 July 2014SICSA Data Science Theme Launch
3
Linking Data
3 July 2014SICSA Data Science Theme Launch
4
1. Global ID – URI
2. Resolvable ID
3. Useful content
HTML for humans
RDF for machines
4. Link to other resources
Like the Web, but for data!
Linked Data Principles
3 July 2014SICSA Data Science Theme Launch
5
“RDF and OWL do not
solve the interoperability
problem, they just lay it
bare on the table!”
Challenge 1: Matching
Administrative Data Research Centre - Scotland | Alasdair J G Gray| 3 July 2014
John Grant
Fisherman
Fiona Sinclair
Ian Grant
Smithy
Born: 1861
Stuart Adam
Wheelwright
Morag Scott
Flora Adam
Seamstress
Born: 1866
Married: 1884
John Grant
Farmer
Fiona Grant
Iain Grant
Born: 1860
Messy data
Probabilistic
matches
Schema matching
Gleevec® = Imatinib Mesylate
3 July 2014 SICSA Data Science Theme Launch 7
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Challenge 2: Reusing mappings
3 July 2014 SICSA Data Science Theme Launch 8
Link: skos:closeMatch
Reason: non-salt form
Link: skos:exactMatch
Reason: drug name
Link: owl:sameAs
Challenge: Multiple Identities
Andy Law's Third Law
“The number of unique identifiers
assigned to an individual is never
less than the number of Institutions
involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
3 July 2014SICSA Data Science Theme Launch
9
P12047
X31045
GB:29384
http://rdf.ebi.ac.uk/resource/ch
embl/molecule/CHEMBL1642
https://www.ebi.ac.uk/chembl/co
mpound/inspect/CHEMBL1642
Challenge Open Data:
Licenses
5★ of linked data
Licenses who can
reuse the data
 Interoperability of
licenses
 Non-commercial:
academic use,
teaching, industry
3 July 2014SICSA Data Science Theme Launch
10
Challenges: Privacy
11
3 July 2014SICSA Data Science Theme Launch
Challenge: Query Performance
Response time
Data freshness
Reliability
Volume of
requests
Hosting
resources
3 July 2014SICSA Data Science Theme Launch
12
Queries Queries
In Data we Trust
How can we trust
the data we’ve got
back?
How can we ensure
that it hasn’t been
tampered on the
way?
Trusty URIs
3 July 2014SICSA Data Science Theme Launch
13
http://www.intelsat.com/wp-
content/uploads/2014/03/Red-padlock.jpg
Contact Details
www.alasdairjggray.co.uk
A.J.G.Gray@hw.ac.uk
@gray_alasdair
3 July 2014SICSA Data Science Theme Launch
15
“There is lots of data we all use every day, and it’s not part of the web. I
can see my bank statements on the web, and my photographs, and I can
see my appointments in a calendar. But can I see my photos in a calendar
to see what I was doing when I took them? Can I see bank statement lines
in a calendar?
No. Why not? Because we don’t have a web of data. Because data is
controlled by applications and each application keeps it to itself.”
Tim Berners-Lee

More Related Content

Similar to Data Science meets Linked Data

Fsci 2018 tuesday31_july_am6
Fsci 2018 tuesday31_july_am6Fsci 2018 tuesday31_july_am6
Fsci 2018 tuesday31_july_am6ARDC
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLCarly Strasser
 
20160623 alia sydney
20160623 alia sydney20160623 alia sydney
20160623 alia sydneyARDC
 
Research Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementResearch Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementARDC
 
Presentation to the Woolcock Institute of Medical Research
Presentation to the Woolcock Institute of Medical Research Presentation to the Woolcock Institute of Medical Research
Presentation to the Woolcock Institute of Medical Research ARDC
 
Why we care about research data? Why we share?
Why we care about research data? Why we share?Why we care about research data? Why we share?
Why we care about research data? Why we share?Richard Ferrers
 
Ready and Prepared for Research data
Ready and Prepared for Research dataReady and Prepared for Research data
Ready and Prepared for Research dataARDC
 
Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017Katina Toufexis
 
Ucla july 2018 natasha simons
Ucla july 2018 natasha simonsUcla july 2018 natasha simons
Ucla july 2018 natasha simonsARDC
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approachesPaul Bradshaw
 
Managing & sharing your data and code for openness
Managing & sharing your data and code for opennessManaging & sharing your data and code for openness
Managing & sharing your data and code for opennessPhoebe Ayers
 
Kate Kelly - Open Research Ireland
Kate Kelly - Open Research IrelandKate Kelly - Open Research Ireland
Kate Kelly - Open Research Irelanddri_ireland
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research RequirementsICPSR
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...datacite
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbinstherese nolan-brown
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...John Domingue
 
Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015bmeredig
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsNicole Vasilevsky
 

Similar to Data Science meets Linked Data (20)

Fsci 2018 tuesday31_july_am6
Fsci 2018 tuesday31_july_am6Fsci 2018 tuesday31_july_am6
Fsci 2018 tuesday31_july_am6
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
20160623 alia sydney
20160623 alia sydney20160623 alia sydney
20160623 alia sydney
 
Research Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementResearch Integrity Advisor and Data Management
Research Integrity Advisor and Data Management
 
Presentation to the Woolcock Institute of Medical Research
Presentation to the Woolcock Institute of Medical Research Presentation to the Woolcock Institute of Medical Research
Presentation to the Woolcock Institute of Medical Research
 
Why we care about research data? Why we share?
Why we care about research data? Why we share?Why we care about research data? Why we share?
Why we care about research data? Why we share?
 
Ready and Prepared for Research data
Ready and Prepared for Research dataReady and Prepared for Research data
Ready and Prepared for Research data
 
Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017
 
Ucla july 2018 natasha simons
Ucla july 2018 natasha simonsUcla july 2018 natasha simons
Ucla july 2018 natasha simons
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approaches
 
Managing & sharing your data and code for openness
Managing & sharing your data and code for opennessManaging & sharing your data and code for openness
Managing & sharing your data and code for openness
 
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
 
Kate Kelly - Open Research Ireland
Kate Kelly - Open Research IrelandKate Kelly - Open Research Ireland
Kate Kelly - Open Research Ireland
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
 
Seeking serendipity
Seeking serendipitySeeking serendipity
Seeking serendipity
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbins
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...
 
Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate Students
 

More from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 

More from Alasdair Gray (20)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Data Science meets Linked Data

  • 1. Data Science meets Linked Data Alasdair J G Gray http://www.alasdairjggray.co.uk @gray_alasdair A.J.G.Gray@hw.ac.uk SICSA Data Science Theme Launch 3 July 2014
  • 2. BBC World Cup 3 July 2014SICSA Data Science Theme Launch 1
  • 3. BBC Linked Data Platform 3 July 2014SICSA Data Science Theme Launch 2
  • 4. Olympics 2012 3 July 2014SICSA Data Science Theme Launch 3
  • 5. Linking Data 3 July 2014SICSA Data Science Theme Launch 4
  • 6. 1. Global ID – URI 2. Resolvable ID 3. Useful content HTML for humans RDF for machines 4. Link to other resources Like the Web, but for data! Linked Data Principles 3 July 2014SICSA Data Science Theme Launch 5 “RDF and OWL do not solve the interoperability problem, they just lay it bare on the table!”
  • 7. Challenge 1: Matching Administrative Data Research Centre - Scotland | Alasdair J G Gray| 3 July 2014 John Grant Fisherman Fiona Sinclair Ian Grant Smithy Born: 1861 Stuart Adam Wheelwright Morag Scott Flora Adam Seamstress Born: 1866 Married: 1884 John Grant Farmer Fiona Grant Iain Grant Born: 1860 Messy data Probabilistic matches Schema matching
  • 8. Gleevec® = Imatinib Mesylate 3 July 2014 SICSA Data Science Theme Launch 7 DrugbankChemSpider PubChem Imatinib MesylateImatinib Mesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N
  • 9. Challenge 2: Reusing mappings 3 July 2014 SICSA Data Science Theme Launch 8 Link: skos:closeMatch Reason: non-salt form Link: skos:exactMatch Reason: drug name Link: owl:sameAs
  • 10. Challenge: Multiple Identities Andy Law's Third Law “The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study” http://bioinformatics.roslin.ac.uk/lawslaws/ 3 July 2014SICSA Data Science Theme Launch 9 P12047 X31045 GB:29384 http://rdf.ebi.ac.uk/resource/ch embl/molecule/CHEMBL1642 https://www.ebi.ac.uk/chembl/co mpound/inspect/CHEMBL1642
  • 11. Challenge Open Data: Licenses 5★ of linked data Licenses who can reuse the data  Interoperability of licenses  Non-commercial: academic use, teaching, industry 3 July 2014SICSA Data Science Theme Launch 10
  • 12. Challenges: Privacy 11 3 July 2014SICSA Data Science Theme Launch
  • 13. Challenge: Query Performance Response time Data freshness Reliability Volume of requests Hosting resources 3 July 2014SICSA Data Science Theme Launch 12 Queries Queries
  • 14. In Data we Trust How can we trust the data we’ve got back? How can we ensure that it hasn’t been tampered on the way? Trusty URIs 3 July 2014SICSA Data Science Theme Launch 13 http://www.intelsat.com/wp- content/uploads/2014/03/Red-padlock.jpg
  • 15. Contact Details www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk @gray_alasdair 3 July 2014SICSA Data Science Theme Launch 15 “There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.” Tim Berners-Lee

Editor's Notes

  1. Many of you will have visited this site recently Lot of sport coverage, how do the BBC cope within their resources?
  2. 700+ pages on teams, groups and players Minimal journalist involvement Automatic aggregation and links to relevant stories Article tagged with Frank Lampard, inference used to link team, group ,etc
  3. Coverage of 10,000+ athletes, 200+ countries, 400-500 disciplines and 30 venues Page for every athlete and country drawing on open data
  4. Internally DBPedia and Geonames
  5. Linked Data hugely successful since inception in 2009 About 300 datasets published Wide range of topics
  6. Familiar with birth, marriage and death records. Aligning individuals is hard Also applies to schema matching
  7. Data’s been aligned now what? Example drug: Gleevec Cancer drug for leukemia Lookup in three popular public chemical databases Different results Data is messy!
  8. sameAs != sameAs depends on your point of view Links relate individual data instances: source, target, predicate, reason. Links are grouped into Linksets which have VoID header providing provenance and justification for the link. Links need provenance to enable reuse – James’s talk
  9. Each captures a subtly different view of the world Are they the same? … depends on your point of view Different URIs for different representations (content negotiation)
  10. Not all data should be open Consider your interaction with the health service – its unique to you Need statistical aggregation to anonymise data As much about educating the public – Public relations