9. Pooling results
• Do we want to do it? (Not everyone does …)
• If so, how can it be done?
• How do you say that you’re both talking about
the same person?
10. Current FreeUKGen search facilities
• BMD search is sophisticated and flexible
• Only one result type: people who match
• Census search has same approach, with links
to individual households
14. Limitations of current search
• Limit of 3000 hits per BMD search
• Difficult to get to household info
• Result pages can’t be bookmarked
– http://www.freecen.org.uk/cgi/search.pl
• Main problem: searches all return HTML!
15. Getting machine-processible data
• Save FreeBMD HTML results page
• Copy table of results
• Paste into spreadsheet
• Save as CSV file
• Convert to XML and load into Modes
22. Working with census data
• Initial efforts ‘broke’ FreeCen!
• Data had to be loaded from a full dump
• Loaded all Districts, Pieces and Households
• Selectively loaded Light and Kerridge records
• Then loaded all people registered in one of
these Light or Kerridge households
• Shows up Lights/Kerridges as servants, in
institutions, etc.
26. Census data: co-contextuality
• Each ‘household’ records relationships
between people
• Binary links between ‘Head’ and others, but
other family relationships can be inferred
• Nothing like the completeness of FreeBMD,
but more can be done with the data that is
there
30. Cross-linking census data to BMD
• Census records include place of birth and age
• Can use same inference techniques to match
against BMD data
31. An Open Data FreeUKGen API …
• … could be HTTP-based; RESTful
• would support a wide variety of information
needs
• would deliver a variety of machine-processible
formats
• would allow re-use of the data
32. The problem of identity
• All my data files use invented primary keys for
people, places, … which are only significant
within my database
• In general, how do we assert that two
statements are about the same person?
• None of these is sufficient on its own:
– Name
– Date of birth/death
– Place of birth/death
33. Linked Data
• One step beyond Open Data
• Combines idea of machine-processible data
with a persistent identity for each concept
• Uses content negotiation to return RDF, XML,
JSON, … for each URL
• Allows programmatic access to data;
processing chains (‘follow your nose’)
• Requires suitably open licensing
36. Everything comes from the same URL
http://collections.wordsworth.org.uk/Object/WTcoll/id/GRMDC.C144.9
By default, return HTML:
http://collections.wordsworth.org.uk/Object/WTcoll/id/html/GRMDC.C144.9
When RDF requested (in Accept header), redirect to a variant URL:
http://collections.wordsworth.org.uk/Object/WTcoll/id/rdf/GRMDC.C144.9
Can support lots of variant formats, e.g. XML, JSON, … This approach
relies on a technique called Content Negotiation
Linked Data URLs are unique; persistent; dereferenceable
37. What FreeUKGen resources could we
publish as Linked Data?
• Can only assign identifiers to data we have
– BMD registration events
– Census return events
– Pieces, Districts etc.
• Can’t assign identifiers to people
• Problem: current database update strategy
generates identifiers afresh each time
– Conflicts with need for persistent identifiers
38. Potential Linked Data projects
• Produce authorities which can be integrated
into current approach:
– Geographical units: Districts, Parishes, Pieces,
named places. Link to Geonames, OS Gazetteer
– Occupations: potential for useful groupings (e.g.
Ag Lab and variants). Link to SIC, SHIC?
• Generate persistent identifiers for the primary
references published by FreeUKGen
– e.g. a page within the BMD index
39. Let the computer work harder!
• Current approach makes very little use of the
computer as a data-processing tool
• FreeUKGen resources as Open Data would
support new types of research and simplify
e.g. Single Name Studies
• FreeUKGen resources as Linked Data would
give the community a common frame of
reference for its work