A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
1. How the Web
can change
social science research
(including yours)
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
2. Using the web (of data)
for e-science
in Social Sciences
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Health Warning:
Computer
Scientist!
3. This talk is about
using the web
as an observational instrument
using the web of data
as an even better observational instrument
using the web of data
as a data-sharing platform
4. This talk is not about
it's NOT social science about e-science
(e.g Oxford research center)
it's NOT about high-performance computing
(that's just boring infrastructure,
let the computer scientists will deal with that)
I don’t discuss online social experiments
(crowd sourcing, social games, mech. turk, etc)
5. Who are you?
who is using large computerised data-sets ?
who is using data extracted from the web ?
who is using semantic web data ?
6. This talk is about
using the web & the web of data
as an observational instrument &
as a sharing platform
Through:
A whole bunch of realistic examples
A sketch of the technology
Message = yes, you can do this too!
15. Question: Is the content of party-political
programmes and election speeches predictive
of government coalition attempts?
Data
• All party manifesto’s,
• half a year of all Dutch newspapers
17. Question: Can we predict the social network
at Tn from the content at Tn-1?
Data
• Discussions from online forum nl.politiek
• 21.000 participants talking about 19 Dutch
political parties during 259 weeks
23. General idea of Web of Data
(a.k.a. “Semantic Web”)
1. Make data available on the Web
in machine-understandable form
(formalised)
2. Structure the data
and meta-data
in ontologies
25. Bluffer’s Guide to RDF
• Express relations between things:
• Results in labelled network (“graph”)
• All labels are actually web-addresses (URIs)
• You can “ping” any label and find out more
• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
MIT
publishedBy
Subject Object
Predicate
26. Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates
• Types organised in a hierarchy
• Inheritance of properties
Frank y
x
AuthorOf
MIT
publishedBy
author book publisher
person artifact
man
27. Ontologies (= hierarchical
conceptual vocabularies)
Identify the key concepts in a domain
Identify a vocabulary for these concepts
Identify relations between these concepts
Make these precise enough
so that they can be shared between
• humans and humans
• humans and machines
• machines and machines
28. Biomedical ontologies (a few..)
Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
UMLS
• Integrates 100 different vocabularies
SNOMED
• 200.000 concepts, College of American Pathologists
Gene Ontology
• 15.000 terms in molecular biology
NCBI Cancer Ontology:
• 17,000 classes (about 1M definitions),
29. On the Web of Data, anyone
can link anything to anything
x T
[<x> IsOfType <T>]
different
owners & locations
<institute>
40. The World Bank is also doing it!
http://data.worldbank.org/
7,000 indicators from World Bank data sets.
41. The US gov is also doing it!
http://data.gov/ : 390.000 data sets
Compare foreign aid budgets
Does tax influence smokers?
Compare campaign money
42. already many billions of facts & rules
Everybody’s doing it!
May ‘09 estimate > 4.2 billion triples +
140 million interlinks
It gets bigger every month
44. And many more
• Reuters
• New York Times
• EU (EUROSTAT, others)
• BBC
• Facebook
• ….
45. So how good is this
observational instrument ?
Studies on validity (e.g. in science dynamics)
methods for provenance & trust
methods for attribution & citation
46. For real ?
“ use the power of information to
explore social and economic life on
Earth ”
1bn€ over 10 years
48. Take home message
use the web & the web-of-data
to obtain your data
use the web-of-data to share your data
yes, you can do this too!
Collaborate with computer scientists
reflect on deeper consquences
for the social sciences
(methodological, theoretical, etc)
49. Acknowledgements
I’ve freely used material from the work of
Shenghui Wang
Paul Groth
Julie Birkholz
Wouter van Atteveldt
Laurens van Rietveld
Rinke Hoekstra
and many in the Semantic Web community
Notas do Editor
Add pictures
Add pictures
Add pictures
Talk about citation data, difficult to get2 weeks to gather a couple of hundred citation scores