Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Linked Open Data for Public Contracts
1. Linked Open Data for Public Contracts
Martin Nečaský
Faculty of Mathematics and Physics, Charles University in Prague
Faculty of Informatics and Statistics, University of Economics in Prague
13.6.2013 – Publications Office of the European Union, Luxembourg
2. Outline
Introduction to Linked Data
What benefits Linked Data bring for TED and
Public Procurement in EU?
What does it mean for TED and others to
publish its data as Linked Data?
What we have already done in LOD2 project?
4. Web Applications Eco-system
Linked Data helps to create an eco-system of web
applications which publish, enrich and consume
data about things in one shared global data space
Shared Global Data Space on the Web
(Web of Data)
App 1
App 2
App 3
App 4
App 5
App 4
5. Architecture of Web of Documents
Shared global space of documents
Built on top of several simple principles:
1. HTML as a format for publishing
documents
2. URLs as unique global identifiers of
documents
3. HTTP for localization and accessing
documents by their URLs
4. hyperlinks between documents
There are two kinds of applications
working in this space of documents:
• web browsers (localizing and
browsing documents through
hyperlinks)
• search engines (indexing and full text
searching of documents)
HTML
HTML
HTML
HTML
Web browser
Search engine
HTTP
HTTP
6. Web of Documents
Current Web (of Documents) provides lot of
data about Prague. Problems
• Data about Prague encoded in documents
distributed across the Web
• Documents intended for humans not
computers
• Documents about Prague or related things
not linked
• Therefore, computers not able to process
data about Prague published on the Web http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
7. Web of Documents
Try to search for this information on the
current Web
• Top 100 suppliers of Prague with
headquarters outside of Prague region.
• Money spent in Prague for new children
playgrounds in the last 5 years per one
child.
• Organizations in Prague funded by EU
structural funds and their top 100
suppliers. http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
8. Linked Data
data published on the Web according to four
simple principles (introduced by sir T. B. Lee)
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
4. Include links to other URIs so that they can
discover more things.
9. Things as first-class citizens
Project
CZ.2.16/2.1.00/22189
Prague City
Prague
Council
Prague
Demography
Prague
Budget
Contract
DIL/23/07/007302/2010
10. HTTP URIs for Things
Project
CZ.2.16/2.1.00/22189
praha.eu (Prague)
http://praha.eu/
contract/7302
http://praha.eu/
council
http://praha.eu/
city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/con
tract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/pr
ague/demogstat
11. Data about Things in RDF
Client
PlaygroundRevitalization
Authority: Prague
Delivery date: 31.8.2011
Price: 28 444 000 CZK
...
Playground
Revitalization
28444000 CZK
dcterms:title
pc:contracting
Authority
pc:agreedPrice
gr:hasCurrency
gr:hasCurrency
Value
31.8.2011
pc:estimated
EndDate
http://praha.eu/
contract/7302
http://praha.eu/
contract/7302
http://praha.eu/
contract/7302/price
http://praha.eu/
council
13. Vocabularies
published RDF data would be hardly interpretable
when each publisher would use proprietary predicates
therefore, standardized (or at least widely used)
predicates should have priority before proprietary ones
e.g. Dublin Core, Good Relations, FOAF, schema.org, ...
or more specific ones for public procurement
• e.g., Public Contracts Ontology
(http://purl.org/procurement/public-contracts )
predicates are defined in so called vocabularies (or
ontologies)
note: ontology is a special case of vocabulary, it contains more detailed
reasoning rules which is out of scope of this lecture
note: not only predicates but also classes (= types of things) are defined in
vocabularies/ontologies
14. Linking URIs of Related Things
praha.eu (Prague)
http://praha.eu/
contract/7302
http://praha.eu/
city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/con
tract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/pr
ague/demogstat
c: hasBeneficiary
a:fundedBy
b:hasBudgethttp://praha.eu/
council
d:hasDemography
15. d:hasDemography
Linking URIs of Related Things
praha.eu (Prague)
http://praha.eu/
contract/7302
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/con
tract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://czso.cz/pr
ague/demogstat
c:hasBeneficiary
a:fundedBy
http://praha.eu/
city
http://risy.cz/
location/prague
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://praha.eu/
council
owl:sameAs
owl:sameAs
b:hasBudget
17. Benefits of Publishing TED as LD
Problem: It is hard to get a unified view of a chosen
thing (i.e. contracting authority, supplier, contract,
contract notice, tender, ...) from TED.
The data about the thing is distributed across several
contract notices.
LD solution: Each thing has a unique TED HTTP URI
which can be used by third-party applications to get all
TED data for this thing.
Data is represented as RDF graph respecting openly
defined vocabularies shared across developers and
communities.
Data include links to URIs of other things on TED.
TED can flexibly and continuously extend the data
provided for the thing.
18. Benefits of Publishing TED as LD
User
Web
application
?detail=http://ted.eu/contract/CZ/54782145
TED LD
Service
http://ted.eu/contract/CZ/54782145
http://pra
ha.eu/con
tract/730
2
http://praha.eu/
contract/7302/pri
ce
http://pra
ha.eu/cou
ncil
TED easily assembles data
related to the requested
contract and returns it as an
interconnected graph to the
requesting web application.
19. Benefits of Publishing TED as LD
User
Web
application
TED LD
Service
http://ted.eu/org/CZ/00064581
http://pra
ha.eu/con
tract/730
2
http://praha.eu/
contract/7302/pri
ce
http://pra
ha.eu/cou
ncil
TED easily assembles data
related to the requested
authority and returns it as an
interconnected graph to the
requesting web application.
click
?detail=http://ted.eu/
org/CZ/00064581
20. Problems with HTTP URIs
Today, public procurement data are collected from
contracting authorities in a form of contract notices (calls
for tender, contract award notices, etc.)
Notices usually do not contain explicit identifiers of
contracting authorities and suppliers.
These organizations are usually identified in the notices only by
names and addresses which are often misspelled and incorrect.
Therefore, if we create an HTTP URI for an organization from
one notice, it is often very hard to recognize whether an
organization from another notice is the same one or not.
Therefore, a serious questions arise – how the HTTP URI of
an organization (contracting authority/supplier) should look
like? How an organization should be identified in a notice
so that we are able to unambiguously recognize it?
21. Problems with HTTP URIs
There are two possible solutions to this question,
both are very simple from the technical point of
view but very complex from the political point of
view (enforcement in all EU countries)
1st solution:
Some countries define unique mandatory identifiers
for organizations (for both, private companies as well
as public institutions).
These identifiers should be present in the notices to
identify contracting authorities and suppliers.
We can then use them to recognize organizations and
associate them with corresponding HTTP URIs.
22. Problems with HTTP URIs
2nd solution:
Each organization involved in public procurement should have own
public profile on the Web with own HTTP URI.
The public profile can be a simple HTML web page which also contains
few data encoded in RDF (technically, it is very simple)
The public profile can be a part of the official web site of the
organization, e.g. http://praha.eu/public-profile
Or, the organization can use services which can manage public web
profiles of organizations. There already exist such services, e.g.
http://opencorporates.org
• This service already contains profiles of many organizations, it associates them
with HTTP URIs and provides basic RDF data about them (title, address, etc.)
The HTTP URI of the profile should become a part of the notice.
This solution also saves some time and money because details about
the organization do not have to be repeated in each notice – each
notice is linked to the HTTP URI where the information is present.
• Yes, if you think about the problem that there is only actual information on the profile which can be
different than the information which was valid before for some earlier notices, then you are right. But
this can be technically solved (e.g. TED and other authorities responsible for collecting public
procurement data can back-up those information, etc.).
24. Benefits of Publishing TED as LD
Problem: It is hard to find information related to public
contracts, contracting authorities and suppliers which
is published outside of TED somewhere else on the
Web, e.g.,
data from the post-award phase
public contracts not published on TED
profiles of contracting authorities and suppliers
LD solution: TED publishes the basic data infrastructure
of HTTP URIs of public contracts, contracting
authorities, suppliers, etc.
Others can enrich this basic infrastructure with their own
data.
The enriched TED datasets can be consumed by third-party
applications and even by TED itself.
25. Benefits of Publishing TED as LD
Shared Global Data Space (Web)
TED Linked Data
Basic Infrastructure
Publisher of
profiles of CZ
suppliers
Publisher of
post-award
data of GE
contracts
26. Suitable suppliers for a contract
?
Benefits of Publishing TED as LD
Public spending per
inhabitant in 2010
Contracts similar to a contract
PC Filing Application
Public spending in
Czech Republic
"HeatMap" Application
27. Benefits of Publishing TED as LD
Problem: Other authorities must copy TED data
to their databases if they want to use TED data
(which includes also republishing TED data).
Repeated work for building such databases and their
maintenance is paid from public budgets (!)
LD solution: Other public authorities link their
primary data (represented as Linked Data, not
necessarily published) to TED without the need to
copy, integrate and maintain this data in their
database.
Anyone who works with the data of such other public
authority can get the data directly from TED if
necessary.
28. Benefits of Publishing TED as LD
Our planned experiment in Czech Republic in cooperation with
Czech Ministry of Finance (MoF) and data about public contracts
CZ Public Budgets
(MoF)
NUTS&LAU CZ
regions
CZ Public ContractsDemography (Czech
Stat. Office)
Public contracts in Prague with
Prague budget and demography
statistics?
To show that institutions can
share data by linking the data
instead of copying them
29. Benefits for Stakeholders
Contracting Authorities and Suppliers
Unified global data space covering various aspects of public procurement
across all EU countries.
contracting authorities
They can find similar contracts to their contracts.
They can group their calls with other authorities to achieve better offers from
suppliers.
They can verify their requirements against requirements of other buyers to
increase quality and completeness of their requirements and ask for better
prices.
They can search for suitable suppliers who realized similar contracts
successfully in the past.
suppliers
They can get necessary information about opened calls for tenders.
They can better inform potential customers about their offers.
They can analyze previous contracts in their market to better target their
tenders and improve the quality of the services they offer.
They can group with other suppliers with complementary offers for joint
tendering.
30. Benefits for Stakeholders
EU and Citizens
EU saves money
Only basic infrastructure is build and primary data is published
• Related data is published and linked by third-parties
There is no need to build and pay for complex applications and
services
• These will be built by third-parties not only for citizens but also for contracting
authorities and suppliers solely on the base of their demand.
There is no need to duplicate data in different public administration
services and applications
• Data is linked instead of copied
EU supports building a common market and interoperability (ISA)
EU supports transparency
Citizens can more easily monitor what public administrations buy in
their city/country, from who and for how much
They can also more easily compare the purchases of their city/country
with other cities/countries.
31. Linked Data for TED – What needs to
be done to adopt LD principles?
33. Public Procurement and LOD2 Project
vocabulary for publishing Public Contracts as Linked
Data
combination of existing broadly adopted vocabularies and
their extension for public procurement (GoodRelations,
Payments Ontology, schema.org, Dublin Core, SKOS)
Public Contracts filing application
web application for contracting authorities and suppliers
It enables to publish data about public contracts as Linked
Data.
Contracting authorities can search for similar contracts and
suitable suppliers.
Experimental Linked Data from Czech Republic, Great
Britain and TED
34. Experimental Linked Data from Czech Republic, Great
Britain and TED created as part of LOD2 project
CZ
Public
Contracts
Common
Procurement
Vocabulary
CZ Business
Entities
CZ
Demography
Stats
CZ
Public
Budgets
DBPedia
TED Public
Contracts and
Organizations
SDMX
CZ
LAU Regions
NUTS
Regions
(RAMON)
GB Public
Contracts and
Organizations
Products
Ontology