The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic).
Download the slides for detailed embedded comments.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Linked Data for Czech Legislation
1. Linked Data for Czech Legislation
Martin Nečaský, Ph.D.
necasky@xrg.cz
Matematicko-fyzikální fakulta Univerzity Karlovy
http://www.xrg.cz
http://www.opendata.cz
2. Our projects in Nutshell
The goal of our effort is to enable intelligent browsing and querying
a set of semi-structured documents from some domain.
legislative documents
project documentation
medical documentation
basic prerequisite – documents have some common characteristics
The project consists of following steps
extract useful structured data from semi-structured documents with
NLP techniques
transform extracted data to Linked Data so that the data can be easily
(= quickly and cheaply) interconnected with other related data and
with the original documents
provide tools for browsing and querying the created data + documents
space
4. Outline
What is Linked Data?
current Web
publishing data on Web
Linked Data principles
Linked Data for legislative documents
basic ideas
what we have done and what we want to do
sample data and queries
6. Web of Documents
Current Web (of Documents) provides lot of
data about Prague. Problems
• Data about Prague is encoded in
documents distributed across the Web
• Documents are intended for humans not
for computers
• Documents about Prague or related things
are not linked
Computers are not able to process data
about Prague published on the Web http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
7. Web of Documents
Try to search for this information on the
current Web
• Top 100 suppliers of Prague with
headquarters outside of Prague region.
• Money spent in Prague for new public
playgrounds in the last 5 years per one
child.
• Public playgrounds in Prague funded by EU.
http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
8. Architecture of Web of Documents
Unified global space of documents
Built on top of several simple principles:
1. HTML as a format for publishing
documents
2. URLs as unique global identifiers of
documents
3. HTTP for localization and accessing
documents by their URLs
4. hyperlinks between documents
There are two kinds of applications
working in this space of documents:
• web browsers (localizing and
browsing documents through
hyperlinks)
• search engines (indexing and full text
searching of documents)
Database
A
HTML
Database
B
HTML
Database
D
HTML
Database
C
HTML
Web browser
Search engine
HTTP
HTTP
9. What about publishing data?
The next step should be publishing data instead of
documents
Raw (open) data about things published on the Web which
can be processed by machines (applications, domain-
specific search engines)
See public administration efforts in the area of publishing
open data:
• http://data.gov.uk
• http://1.usa.gov/193lKN6
We can publish data on the current Web!
basic way: data files with their own URLs in different
formats (CSV, XLS, DBF, XML, etc.)
advanced way: Application Programming Interfaces (APIs)
10. Web can publish data! APIs
Different APIs provide machine
readable data for further processing
in so called mash-up applications.
Also built on several simple principles:
• XML/JSON as formats for publishing
data
• HTTP URIs as global unique
identifiers of APIs and their
operations
• HTTP protocol for transferring data
between APIs and applications
Database
A
Database
B
Database
D
Database
C
Mash-up App
Mash-up App
HTTP
Proprietary
Data API A
HTTP
HTTP HTTP
Proprietary
Data API C
Proprietary
Data API D
Proprietary
Data API B
11. Current principles and technologies do not lead to
Web of Data!
publishing data about things not based on the principles
which have already been invented for documents
Problems with data on current Web
Web of Documents Current Web IS NOT Web of Data!
HTML as a format for publishing documents many formats for publishing data (XML,
JSON, CSV, XLS, ...)
URLs as unique global identifiers of
documents
no unique global identifiers of things
HTTP for localization and accessing
documents by their URLs
HTTP for localization of APIs and accessing
them (REST) [but not for localization of
things and accessing their data]
hyperlinks between documents none of current formats enables to link
related things
12. Linked Data
data published on the Web according to 4
simple principles (introduced by sir T. B. Lee)
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
4. Include links to other URIs so that they can
discover more things.
13. Linked Data vs. Documents
Web of Documents Linked Data = Web of Data!
HTML as a format for publishing documents RDF as a format for publishing data about
things
URLs as unique global identifiers of
documents
HTTP URIs (URLs) as unique global
identifiers of things
HTTP for localization and accessing
documents by their URLs
HTTP for localization and accessing things
by their HTTP URIs
hyperlinks between documents links between related entities
14. Things as first-class citizens
Public contract
OSM/MZ/044/09
City of Prague
Prague council Prague budget
Prague demography
EU funded project
CZ.2.16/2.1.00/22189
Public contract
MAN/23/07/007316/2010Public contract
DIL/23/07/007302/2010
15. HTTP URIs for Things
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/
prague
http://www.czso.cz/
prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/
prague
http://www.mfcr.cz/
prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/
city
http://www.praha.eu/
council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/
location/prague
http://www.risy.cz/
project/412457
http://www.risy.cz/
contract/007302
16. Data about Things in RDF
Client
HTTP REQUEST
PlaygroundRevitalization
Authority: Prague
Delivery date: 31.8.2011
Price: 28 444 000 CZK
...
http://www.praha.eu/
contract/007302
http://www.praha.eu/
contract/007302
Playground
Revitalization
http://www.praha.eu/
contract/007302/price
28444000 CZK
dcterms:title
pc:contractingAuthority
pc:agreedPrice
gr:hasCurrencygr:hasCurrencyValue
31.8.2011
pc:estimated
EndDate
http://www.praha.eu/
council
18. Vocabularies
published RDF data would be hardly interpretable
when each publisher would use proprietary types
types of properties (= predicates) and types of things
(= classes)
therefore, standardized (or at least widely used)
predicates should have priority before
proprietary ones
e.g. Dublin Core, Good Relations, FOAF, schema.org, ...
predicates are defined in so called vocabularies
(or ontologies)
note: ontology is a special case of vocabulary, it
contains more detailed reasoning rules which is out of
scope of this lecture
19. Vocabularies
classes and predicates
semantic relationships between classes and predicates in one
vocabulary or more different vocabularies
subtyping (sub-class of, sub-property of)
semantic equivalence (equivalent class, equivalent property) – when
two different vocabularies define classes/properties with the same
semantics
vocabularies expressed in RDF using RDF Schema, OWL vocabularies
each class and predicate has own HTTP URI
mechanism of XML namespaces and prefixes is usually used
class URI is used to denote the type of a thing:
<http://www.praha.eu/contract/007302> rdf:type pc:Contract .
predicate URI is used to denote the predicate in a triple:
<http://www.praha.eu/contract/007302> dcterms:title "..." .
20. Linking URIs of Related Things
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/
project/412457
czso.cz (Czech Statistical Office)
http://www.czso.cz/
prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/
prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/
city
http://www.praha.eu/
council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
n1:budget
n2:demography
n3:beneficiary
n3:realizedBy
http://registry.cszo.cz/
prague
http://www.czso.cz/
prague
http://www.mfcr.cz/
prague
http://www.risy.cz/
location/prague
http://www.risy.cz/
contract/007302
21. Linking URIs of Same Things
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/
prague
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/
prague
praha.eu (Prague)
http://www.praha.eu/
city
http://www.praha.eu/
council
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/
contract/007302
owl:sameAs
owl:sameAs
http://www.risy.cz/
project/412457
http://www.czso.cz/
prague/stats/demog
http://www.mfcr.cz/
prague/budget
http://www.risy.cz/
location/prague
22. Related vs. Same Things
Situation: Publisher A publishes some data about a
thing T under URI U
you want to publish something
new about T create your
own URI V for T, publish new
data under V and link V to U
with owl:sameAs
you want to say that your
things are related to T but you
do not publish anything new
for T do not create own
HTTP URI for T and do not copy
data about T from A, only link
your things to U
You A
V
... ...
...
U
...
...
...
...
You A
...
U
...
...
...
...
23. Primary Data vs. Secondary Data
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/
prague
http://www.czso.cz/
prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/
prague
http://www.mfcr.cz/
prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/
city
http://www.praha.eu/
council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/
location/prague
http://www.risy.cz/
project/412457
http://www.risy.cz/
contract/007302
25. Linked Data in Czech Legislation
Acts and
Regulations
Court
Decisions
Public
authorities
Agendas of
Public
Authorities
Rights and
obligations
Life
situations
definedetermine
regulate
execute
Acts and
Regulations
Proposals
results from
26. Structural Layer of Legislative Documents
structural parts of acts and regulations
references between
court decisions and parts of acts and regulations
court decisions (legal case retrospection)
amendments
what we have done
vocabulary of legislative documents
metadata and structure of acts, regulations and decrees
represented as Linked Data
• metadata about each version of each act, regulation and decree
since 1945
• structured content of versions of all acts, regulations and decrees
valid in 2011, 2012
extraction of references and retrospection NLP
27. Structural Layer of Legislative Documents
Public
Contracts Act
Public Contracts
Act Version
07/2006
Public Contracts
Act Version
07/2012
Public Contracts
Act Version
01/2015
Public Contracts
Act Version
06/2007
Similarly, we represent
paragraphs, sections, etc. of each
version of each law. However, we
have a problem to get consolidated
documents.
DECISION
XYZ
DECISION
ABC
refers
28. Structural Layer of Legislative Documents
CASE
6 C 135/2007
DECISION
6 C 135/2007-44
CASE
21 Co 472/2008
DECISION
21 Co 472/2008-62
made for
DECISION
6 C 135/2007-141
CASE
21 Co 458/2011
DECISION
21 Co 458/2011-173
CASE
26 Cdo 2523/2012
based on
extraordinary
appeal against
DECISION
26 Cdo 2523/2012
Acts
Other
Decisions
Metropolitan
Court in Prague
District Court
Prague 9
Supreme Court
29. Structural Layer of Legislative Documents
browsing data
http://linked.opendata.cz/resource/legislation/cz/
act/2006/137-2006
• instance of lex:Act representing Public Procurement Act
31. Structural Layer of Legislative Documents
querying data (SPARQL)
http://linked.opendata.cz/sparql
32. Structural Layer of Legislative Documents
Which acts amended the Act about political parties of Czech Republic?
PREFIX lex: <http://purl.org/lex#>
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?amendmentTitle ?amendmentValidity
WHERE {
?version frbr:realizationOf
<http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .
?change lex:changedOriginal ?version .
?amendment lex:definesChange ?change ;
dcterms:title ?amendmentTitle ;
dcterms:valid ?amendmentValidity .
}
33. Structural Layer of Legislative Documents
One of well-known hidden
amendments. It increased
the payments of state to
political parties from 500k to
900k for one parliament
member.
34. Structural Layer of Legislative Documents
Which another acts were amended together with Act about political parties of Czech
Republic?
PREFIX lex: <http://purl.org/lex#>
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?anotherActTitle ?anotherVersionValidity
WHERE {
?version frbr:realizationOf
<http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .
?change lex:changedOriginal ?version .
?amendment lex:definesChange ?change ;
lex:definesChange ?anotherChange .
FILTER (?change != ?anotherChange)
?anotherChange lex:changeResult ?anotherVersion .
?anotherVersion frbr:realizationOf ?anotherAct ;
dcterms:valid ?anotherVersionValidity .
?anotherAct dcterms:title ?anotherActTitle .
}
36. Structural Layer of Legislative Documents
How many changes have been done in Czech legislation per year?
PREFIX lex: <http://purl.org/lex#>
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT (COUNT(?amendment) as ?changeCnt) (year(?validity) AS ?year)
WHERE {
?amendment lex:definesChange ?change ;
dcterms:valid ?validity .
}
GROUP BY year(?validity)
ORDER BY DESC(year(?validity))
38. Semantic Layer of Legislative Documents
rights, obligations and subjects defined by
legislation
their occurrences in court decisions
currently we start experiments with extracting
these concepts and relationships between
them from documents with acts NLP
based on syntactic parsing
we do not have RDF representation yet