These slides are from Auke Rijpma who presented the Catasto meets SPARQL workshop. All stuff is in beta, so let us know when something broke (twitter: @rlzijdeman)
2. Clariah datahub example
• Try to construct some queries to get a feel for
interacting with Clariah Structured Data Hub.
• Use Catasto, famous dataset, made by David Herlihy
and Christiane Klapisch-Zuber.
• Fiscal census for 1427 Tuscany, covering 60k+
households and 270k+ individuals.
• Covering such fiscal matters as asset ownership,
occupations, etc., but also some basic demographic
information.
3. 6-812
76
SAMPLE CODING FORM
Ser . Hold No. Loc. Name Fat-er's Farii v
3 7 12 2^ 32
Source :
Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax
42 45- 48 52 55 60 65 71 76
Ilt3' -
Ser. & Hhoid No . Me—triers
(1-6) Cd.
As above. 7 9 16 30 37
1_6 0l ~ Io, ~
44 51 5S 65 - 72
1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ !
Ser. Hhold No. Loc. Name Fathers Famil y
1 3 7 12 22 ?2
Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i
Source :
Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x
42 45 48 52 55 60 65 71 7 6
!~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__°
•
Ser. & Hhold No . Members
(1-6) Cd.
As above . 7 9 16 23 30 3
7d451 58 65 72
_+_,_ ,
1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1
Ser. Hhold No. Loc. Name Father's Family
1 3 7 12 22 32
ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _
Source :
Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax
42 45 48 52 55 60 65 71 76 - -
111C 11i 8 ,` 1_ ;_1A _
Ser. & Hhold No. Members
(1-6) Cd.
As above . 7 9 16 23 30 37
ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _
44 51 ' 58 65 7 2
I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1
75
4.
5.
6.
7. Catasto datasets
• Early versions error-prone fwf files
• More recent version offer tabular data
• Mix of household and individual data in rows:
need to know whether e.g. A11 will exist for a
given household.
• Early versions strictly numeric except hhh-names.
• Hard to browse, interpret results.
8. Catasto as linked data
• New datamodel:
• individuals (rdf:type) inHousehold household
• observations (age, occupation, sex, marital
status, relation to head) for individuals
• households householdMember individual
• observations (fiscal, occupation, house)
• Codebook included using prefLabel
9. Browse
• Find links and other long, hard-to-type things at
goo.gl/pwnTZo.
• Browse the new data at <http://
data.socialhistory.org/resource/catasto/household/
2222>
• Try to find some individuals there.
• Try to find the meaning of the codes of a variable
like METIER (occupation) or maritalStatus.
10. SPARQL and triples
• Basic unit in linked data and linked data (SPARQL) queries is
the triple.
• subject - predicate -object
• So here for example:
• individual - age - 75
• household privateInvestments - 5000
• household(head) - occupation - Barbiere
• individual:4_11 inHousehold household:4
11. SPARQL and triples
• SPARQL queries are made with similar triple statements.
• Statement is either a URI: <http://…/…>
• Or a literal: “something”
• Place a question-mark ? to allow part of the statement to
be anything.
• Specify part of the statement as URI or Literal to fix it.
• FROM specifies the named graph where the statements
are in.
12. Query basics
• The basic starting query asks for all triples by
entering all three parts of the statement as variable.
• SELECT * to select all
• ?sub ?pred ?obj
• LIMIT 10 to go easy on the server.
• http://yasgui.org/short/rkQeY_vEZ
13. Query basics: DISTINCT
• Putting DISTINCT after SELECT gives the unique
results; get rid of duplicates.
• write a query to see all the predicates in the Catasto:
• http://yasgui.org/short/ry8iLdPNb
• write a query to see all the possible codes for the
METIER predicate
• http://yasgui.org/short/SytvcOD4W
14. Query basics: PREFIXes
• Writing our URIs all the time isn’t fun and prone to errors.
• Make your life easier by adding prefixes.
• PREFIX name: <uri goes here>
• Usage in the query is name:FINAL_BIT_OF_STATEMENT.
• Replace everything before “METIER” in previous query
by a sensible prefix.
• http://yasgui.org/short/S1SYjOwNb
16. Query basics: summarise
• Add COUNT after SELECT to count how often a
statement in a triple exists in the data.
• Automatically grouped by other variables in the query.
• Can also add GROUP BY at the end to
• Count the number of household (heads) in each
occupational category.
• http://yasgui.org/short/HyCsnuvVb
17. Codebook access
• Codebook is integrated part of data.
• Explore with skos:prefLabel
• Because Clariah-hub uses CSVW-standard, each
file has its own unique graph.
• Either add graph names (there are a lot!) or remove
the FROM statement to search the entire hub.
18. Ordering results
• Use ORDER BY or ORDER BY DESC() at the end of
the query to sort the results.
• Place the previous results in a sensible order
• http://yasgui.org/short/BJzFetvEb
19. Codebook access
• Careful! Need some sort of triple statement that limits it to
the right graphs or you’ll be flooded with results.
• Do limit 100 for safety as well.
• Add meaningful labels to the occupation count query.
• To do this, you’ll need to add a query line.
• Queries with multiple query lines requires the lines to end
with a dot.
• http://yasgui.org/short/rkeLktDNZ
20. Your turn
• Now build something from the ground up.
• Get the ages for individuals (use limit 10 at first).
• http://yasgui.org/short/rJZe-KDEb
• Then make a population distribution:
• http://yasgui.org/short/rkErbKwEZ
21. Your turn
• Use catasto/dimension:relationToHead (not actually to head) and
catasto/dimension:sex (explore using brwsr) to find couples in the
catasto.
• Calculate the age difference between them
• http://yasgui.org/short/rJgIcFPNZ
• What do you notice?
• Can you extend the query to see if this varies by socio-economic group?
• http://yasgui.org/short/BkMA9YP4Z
• http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)