SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
SPARQLasto
Auke Rijpma (UU)
(CC-BY-SA)
DH BeNeLux 2017
Utrecht University
Clariah datahub example
• Try to construct some queries to get a feel for
interacting with Clariah Structured Data Hub.
• Use Catasto, famous dataset, made by David Herlihy
and Christiane Klapisch-Zuber.
• Fiscal census for 1427 Tuscany, covering 60k+
households and 270k+ individuals.
• Covering such fiscal matters as asset ownership,
occupations, etc., but also some basic demographic
information.
6-812
76
SAMPLE CODING FORM
Ser . Hold No. Loc. Name Fat-er's Farii v
3 7 12 2^ 32
Source :
Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax
42 45- 48 52 55 60 65 71 76
Ilt3' -
Ser. & Hhoid No . Me—triers
(1-6) Cd.
As above. 7 9 16 30 37
1_6 0l ~ Io, ~
44 51 5S 65 - 72
1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ !
Ser. Hhold No. Loc. Name Fathers Famil y
1 3 7 12 22 ?2
Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i
Source :
Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x
42 45 48 52 55 60 65 71 7 6
!~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__°
•
Ser. & Hhold No . Members
(1-6) Cd.
As above . 7 9 16 23 30 3
7d451 58 65 72
_+_,_ ,
1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1
Ser. Hhold No. Loc. Name Father's Family
1 3 7 12 22 32
ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _
Source :
Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax
42 45 48 52 55 60 65 71 76 - -
111C 11i 8 ,` 1_ ;_1A _
Ser. & Hhold No. Members
(1-6) Cd.
As above . 7 9 16 23 30 37
ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _
44 51 ' 58 65 7 2
I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1
75
Catasto datasets
• Early versions error-prone fwf files
• More recent version offer tabular data
• Mix of household and individual data in rows:
need to know whether e.g. A11 will exist for a
given household.
• Early versions strictly numeric except hhh-names.
• Hard to browse, interpret results.
Catasto as linked data
• New datamodel:
• individuals (rdf:type) inHousehold household
• observations (age, occupation, sex, marital
status, relation to head) for individuals
• households householdMember individual
• observations (fiscal, occupation, house)
• Codebook included using prefLabel
Browse
• Find links and other long, hard-to-type things at
goo.gl/pwnTZo.
• Browse the new data at <http://
data.socialhistory.org/resource/catasto/household/
2222>
• Try to find some individuals there.
• Try to find the meaning of the codes of a variable
like METIER (occupation) or maritalStatus.
SPARQL and triples
• Basic unit in linked data and linked data (SPARQL) queries is
the triple.
• subject - predicate -object
• So here for example:
• individual - age - 75
• household privateInvestments - 5000
• household(head) - occupation - Barbiere
• individual:4_11 inHousehold household:4
SPARQL and triples
• SPARQL queries are made with similar triple statements.
• Statement is either a URI: <http://…/…>
• Or a literal: “something”
• Place a question-mark ? to allow part of the statement to
be anything.
• Specify part of the statement as URI or Literal to fix it.
• FROM specifies the named graph where the statements
are in.
Query basics
• The basic starting query asks for all triples by
entering all three parts of the statement as variable.
• SELECT * to select all
• ?sub ?pred ?obj
• LIMIT 10 to go easy on the server.
• http://yasgui.org/short/rkQeY_vEZ
Query basics: DISTINCT
• Putting DISTINCT after SELECT gives the unique
results; get rid of duplicates.
• write a query to see all the predicates in the Catasto:
• http://yasgui.org/short/ry8iLdPNb
• write a query to see all the possible codes for the
METIER predicate
• http://yasgui.org/short/SytvcOD4W
Query basics: PREFIXes
• Writing our URIs all the time isn’t fun and prone to errors.
• Make your life easier by adding prefixes.
• PREFIX name: <uri goes here>
• Usage in the query is name:FINAL_BIT_OF_STATEMENT.
• Replace everything before “METIER” in previous query
by a sensible prefix.
• http://yasgui.org/short/S1SYjOwNb
Query basics: PREFIXes
• Useful prefixes for today:
• rdf (pre-added)
• skos (simple knowledge organisation scheme)
• Yasgui autocompletes prefixes it knows.
• catasto:
• <http://data.socialhistory.org/resource/catasto/>
• catdim:
• <http://data.socialhistory.org/resource/catasto/dimension/>
Query basics: summarise
• Add COUNT after SELECT to count how often a
statement in a triple exists in the data.
• Automatically grouped by other variables in the query.
• Can also add GROUP BY at the end to
• Count the number of household (heads) in each
occupational category.
• http://yasgui.org/short/HyCsnuvVb
Codebook access
• Codebook is integrated part of data.
• Explore with skos:prefLabel
• Because Clariah-hub uses CSVW-standard, each
file has its own unique graph.
• Either add graph names (there are a lot!) or remove
the FROM statement to search the entire hub.
Ordering results
• Use ORDER BY or ORDER BY DESC() at the end of
the query to sort the results.
• Place the previous results in a sensible order
• http://yasgui.org/short/BJzFetvEb
Codebook access
• Careful! Need some sort of triple statement that limits it to
the right graphs or you’ll be flooded with results.
• Do limit 100 for safety as well.
• Add meaningful labels to the occupation count query.
• To do this, you’ll need to add a query line.
• Queries with multiple query lines requires the lines to end
with a dot.
• http://yasgui.org/short/rkeLktDNZ
Your turn
• Now build something from the ground up.
• Get the ages for individuals (use limit 10 at first).
• http://yasgui.org/short/rJZe-KDEb
• Then make a population distribution:
• http://yasgui.org/short/rkErbKwEZ
Your turn
• Use catasto/dimension:relationToHead (not actually to head) and
catasto/dimension:sex (explore using brwsr) to find couples in the
catasto.
• Calculate the age difference between them
• http://yasgui.org/short/rJgIcFPNZ
• What do you notice?
• Can you extend the query to see if this varies by socio-economic group?
• http://yasgui.org/short/BkMA9YP4Z
• http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)

Mais conteúdo relacionado

Mais procurados

File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerAvjinder (Avi) Kaler
 
Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()Sandhya Bankar
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsAvjinder (Avi) Kaler
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publishGleydson Lima
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...Amazon Web Services
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with Rnaroranisha
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0Norvald Ryeng
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table ExpressionsMySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressionsoysteing
 

Mais procurados (15)

File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder Kaler
 
Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 
Hashing gt1
Hashing gt1Hashing gt1
Hashing gt1
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table ExpressionsMySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 

Semelhante a Rijpma's Catasto meets SPARQL dhb2017_workshop

MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0Manyi Lu
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD MicrothesauriMarcia Zeng
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's Newdpcobb
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQLGrant Fritchey
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx086ChintanPatel1
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPBob Ward
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsChad Petrovay
 
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Hironori Washizaki
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselyEnkitec
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxamarnathdeo
 

Semelhante a Rijpma's Catasto meets SPARQL dhb2017_workshop (20)

MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
2CPP16 - STL
2CPP16 - STL2CPP16 - STL
2CPP16 - STL
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills sets
 
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
 

Mais de Richard Zijdeman

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queriesRichard Zijdeman
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesRichard Zijdeman
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Richard Zijdeman
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataRichard Zijdeman
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized worldRichard Zijdeman
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019Richard Zijdeman
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISHRichard Zijdeman
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Richard Zijdeman
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Richard Zijdeman
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Richard Zijdeman
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsRichard Zijdeman
 

Mais de Richard Zijdeman (18)

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queries
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
Toogdag 2017
Toogdag 2017Toogdag 2017
Toogdag 2017
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemes
 
Basic introduction into R
Basic introduction into RBasic introduction into R
Basic introduction into R
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open Data
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized world
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISH
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupations
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 

Último

Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Último (20)

Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Rijpma's Catasto meets SPARQL dhb2017_workshop

  • 1. SPARQLasto Auke Rijpma (UU) (CC-BY-SA) DH BeNeLux 2017 Utrecht University
  • 2. Clariah datahub example • Try to construct some queries to get a feel for interacting with Clariah Structured Data Hub. • Use Catasto, famous dataset, made by David Herlihy and Christiane Klapisch-Zuber. • Fiscal census for 1427 Tuscany, covering 60k+ households and 270k+ individuals. • Covering such fiscal matters as asset ownership, occupations, etc., but also some basic demographic information.
  • 3. 6-812 76 SAMPLE CODING FORM Ser . Hold No. Loc. Name Fat-er's Farii v 3 7 12 2^ 32 Source : Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax 42 45- 48 52 55 60 65 71 76 Ilt3' - Ser. & Hhoid No . Me—triers (1-6) Cd. As above. 7 9 16 30 37 1_6 0l ~ Io, ~ 44 51 5S 65 - 72 1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ ! Ser. Hhold No. Loc. Name Fathers Famil y 1 3 7 12 22 ?2 Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i Source : Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x 42 45 48 52 55 60 65 71 7 6 !~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__° • Ser. & Hhold No . Members (1-6) Cd. As above . 7 9 16 23 30 3 7d451 58 65 72 _+_,_ , 1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1 Ser. Hhold No. Loc. Name Father's Family 1 3 7 12 22 32 ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _ Source : Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax 42 45 48 52 55 60 65 71 76 - - 111C 11i 8 ,` 1_ ;_1A _ Ser. & Hhold No. Members (1-6) Cd. As above . 7 9 16 23 30 37 ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _ 44 51 ' 58 65 7 2 I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1 75
  • 4.
  • 5.
  • 6.
  • 7. Catasto datasets • Early versions error-prone fwf files • More recent version offer tabular data • Mix of household and individual data in rows: need to know whether e.g. A11 will exist for a given household. • Early versions strictly numeric except hhh-names. • Hard to browse, interpret results.
  • 8. Catasto as linked data • New datamodel: • individuals (rdf:type) inHousehold household • observations (age, occupation, sex, marital status, relation to head) for individuals • households householdMember individual • observations (fiscal, occupation, house) • Codebook included using prefLabel
  • 9. Browse • Find links and other long, hard-to-type things at goo.gl/pwnTZo. • Browse the new data at <http:// data.socialhistory.org/resource/catasto/household/ 2222> • Try to find some individuals there. • Try to find the meaning of the codes of a variable like METIER (occupation) or maritalStatus.
  • 10. SPARQL and triples • Basic unit in linked data and linked data (SPARQL) queries is the triple. • subject - predicate -object • So here for example: • individual - age - 75 • household privateInvestments - 5000 • household(head) - occupation - Barbiere • individual:4_11 inHousehold household:4
  • 11. SPARQL and triples • SPARQL queries are made with similar triple statements. • Statement is either a URI: <http://…/…> • Or a literal: “something” • Place a question-mark ? to allow part of the statement to be anything. • Specify part of the statement as URI or Literal to fix it. • FROM specifies the named graph where the statements are in.
  • 12. Query basics • The basic starting query asks for all triples by entering all three parts of the statement as variable. • SELECT * to select all • ?sub ?pred ?obj • LIMIT 10 to go easy on the server. • http://yasgui.org/short/rkQeY_vEZ
  • 13. Query basics: DISTINCT • Putting DISTINCT after SELECT gives the unique results; get rid of duplicates. • write a query to see all the predicates in the Catasto: • http://yasgui.org/short/ry8iLdPNb • write a query to see all the possible codes for the METIER predicate • http://yasgui.org/short/SytvcOD4W
  • 14. Query basics: PREFIXes • Writing our URIs all the time isn’t fun and prone to errors. • Make your life easier by adding prefixes. • PREFIX name: <uri goes here> • Usage in the query is name:FINAL_BIT_OF_STATEMENT. • Replace everything before “METIER” in previous query by a sensible prefix. • http://yasgui.org/short/S1SYjOwNb
  • 15. Query basics: PREFIXes • Useful prefixes for today: • rdf (pre-added) • skos (simple knowledge organisation scheme) • Yasgui autocompletes prefixes it knows. • catasto: • <http://data.socialhistory.org/resource/catasto/> • catdim: • <http://data.socialhistory.org/resource/catasto/dimension/>
  • 16. Query basics: summarise • Add COUNT after SELECT to count how often a statement in a triple exists in the data. • Automatically grouped by other variables in the query. • Can also add GROUP BY at the end to • Count the number of household (heads) in each occupational category. • http://yasgui.org/short/HyCsnuvVb
  • 17. Codebook access • Codebook is integrated part of data. • Explore with skos:prefLabel • Because Clariah-hub uses CSVW-standard, each file has its own unique graph. • Either add graph names (there are a lot!) or remove the FROM statement to search the entire hub.
  • 18. Ordering results • Use ORDER BY or ORDER BY DESC() at the end of the query to sort the results. • Place the previous results in a sensible order • http://yasgui.org/short/BJzFetvEb
  • 19. Codebook access • Careful! Need some sort of triple statement that limits it to the right graphs or you’ll be flooded with results. • Do limit 100 for safety as well. • Add meaningful labels to the occupation count query. • To do this, you’ll need to add a query line. • Queries with multiple query lines requires the lines to end with a dot. • http://yasgui.org/short/rkeLktDNZ
  • 20. Your turn • Now build something from the ground up. • Get the ages for individuals (use limit 10 at first). • http://yasgui.org/short/rJZe-KDEb • Then make a population distribution: • http://yasgui.org/short/rkErbKwEZ
  • 21. Your turn • Use catasto/dimension:relationToHead (not actually to head) and catasto/dimension:sex (explore using brwsr) to find couples in the catasto. • Calculate the age difference between them • http://yasgui.org/short/rJgIcFPNZ • What do you notice? • Can you extend the query to see if this varies by socio-economic group? • http://yasgui.org/short/BkMA9YP4Z • http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)