Injustice - Developers Among Us (SciFiDevCon 2024)
Web2.0.2012 - lesson 10 - open data
1. Web 2.0
blog, wiki, tag, social network: what are they,
how to use them and why they are important
Lesson 10: Open Government and
Open Data
Carlo Vaccari
vaccaricarlo@gmail.com
http://vaccaricarlo.wordpress.com Camerino University – 2011/2012
2. This material is distributed under the Creative Commons
"Attribution - NonCommercial - Share Alike - 3.0", available at
http://creativecommons.org/licenses/by-nc-sa/3.0/ .
2
3. Origins
Letter from Thomas Jefferson to Isaac McPherson (1813)
If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an
idea, which an individual may exclusively possess as long as he
keeps it to himself; but the moment it is divulged, it forces itself
into the possession of every one, and the receiver cannot
dispossess himself of it. Its peculiar character, too, is that no one
possesses the less, because every other possesses the whole of it.
He who receives an idea from me, receives instruction
himself without lessening mine; as he who lights his taper
at mine, receives light without darkening me. That ideas
should freely spread from one to another over the globe, for the
moral and mutual instruction of man, and improvement of his
condition, seems to have been peculiarly and benevolently
designed by nature, when she made them, like fire, expansible
over all space, without lessening their density in any point, and like
the air in which we breathe, move, and have our physical being,
incapable of confinement or exclusive appropriation.
3
4. Origins
1966 : Freedom of Information Act (FOIA): consistent with the
belief that people have the “right to know” about government
records
The Act gives the government a series of rules to allow anyone
to know how to work the federal government, including the full
or partial access to classified documents
The measure guarantees the transparency of public
administration towards the citizen and the freedom of the
press and press freedom of the press
4
5. Origins
Since the 80s campaign in favor of free access to information,
decisive for the development and dissemination of new digital
media.
Key driver of these initiatives has been the movement for Free
or Open Source Software (defined ecumenically FLOSS),
through the work of Richard Stallman and Linus Torvalds
Stallman, founder of the movement, coined the definition of
free software, by which expression includes the freedom to
run, copy, distribute, study and modify a program. Stallman
also introduced the copyleft concept, a term that literally
means "permission to copy”, a license by which the author
transfers to the public some rights giving its users the
conditions under which can be used
Torvalds, Linux creator, and the large community of
programmers who have collaborated in its development, have
shown the feasibility of the model conceived by Stallman
5
6. Origins
Outside of software, the concept of copyleft has invaded the
field of content (text, music, video) through Lawrence Lessig
work, founder of Creative Commons, to invest the field of
scientific research.
The Open Access movement, born in 2004, has focused on the
scientific literature
In 2008, the European Commission stated that 20% research
funded by the Commission inside FP7 must be published open
access after an embargo of 6-12 months; action was followed by
European Research Council (ERC - open access publishing after
6 months), and then by the European Science Foundation (ESF)
and the Head of European Research Council (EuroHORCS)
6
7. Origins
The advent of the so-called "social software", ie applications in
which users become content producers, applications grouped
under the banner of Web 2.0, has enabled the Internet to
become a platforms that allows interaction between different
users, producing and sharing contents freely
These behaviors also engage the public sphere. In past two
years, the instances of the movement for open access to
knowledge are also addressed to the 'public sector
information' (PSI)
Encouraged by the changes underway and the results
obtained, a new grassroots movement, known as the Open
Government Data, is spreading in industrialized countries with
the aim of achieving open access to data in a proactive and
specific area: that of political institutions and public
administration
7
8. Gov 2.0 pillars
Government 2.0 three pillars:
1. Leadership, policy and governance to achieve necessary shifts
in public sector culture and practice. Cultural change is at the
heart of Government 2.0 and more important than the
development of policy or the technical challenges of adopting
new technologies.
2. Application of Web 2.0 collaborative tools and practices to
the business of government As they are outside of government,
these tools and practices can increase productivity and
efficiency. Opportunity to make representative democracy more
responsive, participatory and informed
3. Open access to Public Sector Information (PSI)
8
9. Gov 2.0 links
Top 10 gov 2.o web sites:
http://shareable.net/blog/the-worlds-top-10-gov-20-initiatives
Australian Government govspace platform:
http://govspace.gov.au/about/
Victoria state (au) Gov 2.0 action plan:
http://www.egov.vic.gov.au/government-2-0/government-2-0-action-plan-vic
Cool gov 2.0 sites
http://www.socialbrite.org/2010/08/10/cool-gov-2-0-sites-you-dont-know-abo
Lessons for Gov 2.0 from Web 2.0
http://radar.oreilly.com/2010/10/top-10-lessons-for-gov-20-from.html
9
10. from Gov 2.0 to Open Gov
Gov 2.0 is a fundamental step along the way to Open
Government bringing together the utilisation of emerging Web
tools and mechanisms which enable multi-channel
communications and information sharing.
Much of the drive for “open” government comes from the
“open organisation” and “open data” movements because
essentially, as the Economist stated in February 2010 “the
nation has always been a product of information
management”
10
11. Open Government
In 2008 Barack Obama’s use of social media during his election
campaign: Obama won having 2x Web Traffic, 4x Youtube views,
5x Facebook friends and 10x Online staff than McCain
First act of Obama as President: Memorandum on
Transparency and Open Government that starts:
“My Administration is committed to creating an unprecedented
level of openness in Government. We will work together to
ensure the public trust and establish a system of transparency,
public participation, and collaboration. Openness will
strengthen our democracy and promote efficiency and
effectiveness in Government”
Then Open Government Directive “to direct executive
departments and agencies to take specific actions to implement
the principles of transparency, participation, and collaboration
set forth in the President’s Memorandum”
In the website Open Government Initiative all the actions 11
12. Open Government in Europe
An Open Declaration on European Public Services propose three
core principles for European public services:
1. Transparency:
- public sector organisations “transparent by default”
- clear, regularly-updated information on all processes
- citizens able to highlight areas where increase transparency
- open, standard and reusable formats
2. Participation:
- citizens' input in all its activities
- collaboration with citizens core competence of government
3. Empowerment:
- public institutions as platforms for public value creation
- data and services available in ways that others can build on
- providing resources to enable citizens to solve problems
- citizens as owners of their own personal data and enable
them to monitor and control how these data are shared
Declaration accepted inside 2009 Malmoe Ministerial Declaration
12
13. Open Government
In Europe: Council of Europe Convention on Access to Official
Documents, Tromsø, 18.VI.2009
http://conventions.coe.int/Treaty/EN/Treaties/Html/205.htm
(not yet signed http://www.access-info.org/en/council-of-europe )
Gov 2.0 examples:
http://shareable.net/blog/the-worlds-top-10-gov-20-initiatives
(see 10, 9, 2 http://demo.ushahidi.com/, 1)
http://govspace.gov.au/about/
http://www.guardian.co.uk/open-platform/blog/raw-data-now-one-year-on-in-t
http://www.flightradar24.com/
http://www.itoworld.com/static/gallery.html
http://traintimes.org.uk/
http://www.informationisbeautiful.net/visualizations/
http://www.informationisbeautiful.net/visualizations/google-ngram-experimen
13
15. Practical Steps for Government Agencies
Recommendations by Tim O'Reilly: Government as a Platform (see)
Issue your own open government directive
Create “a simple, reliable and publicly accessible infrastructure
that ‘exposes’ the underlying data” from your city, county, state,
or agency
Build your own websites and applications using the same open
systems for accessing the underlying data as they make available
to the public at large
Share those open APIs with the public, using Data.gov for federal
APIs and creating state and local equivalents
Share your work with other cities, counties, states, or agencies.
Provide your work as open source software, work with other bodies
to standardize web services, building a common cloud computing
platform, or simply sharing best practices (see Code for America)
15
16. Practical Steps for Government Agencies
Don’t reinvent the wheel: support existing open standards and
use open source software whenever possible. (eg Open311 is a
great example of an open standard being adopted by many
cities)
Create a list of software applications that can be reused by your
government employees without procurement
Create an “app store” that features applications created by the
private sector as well as those created by your own government
unit (see Apps.DC.gov)
Create permissive social media guidelines that allow government
employees to engage the public without having to get pre-
approval from superiors
Sponsor meetups, code camps, and other activity sessions to
actually put citizens to work on civic issues
16
17. OECD PSI Principles
OECD recommendations about PSI principles:
1. Openness. Maximize the availability of public sector
information for use and re-use - openness as the default rule.
2. Access and transparent conditions for re-use. In principle all
accessible information would be open to re-use by all.
3. Asset lists. Strengthening awareness of what public sector
information is available for access and re-use.
4. Quality. Ensuring methodical practices to enhance data
quality through cooperation of various government bodies
5. Integrity. Protect information from unauthorized
modification or from denial of authorized access
6. New technologies. Storing technologies, open formats,
multiple languages, technological obsolescence and long term
preservation 17
18. OECD PSI Principles
7. Copyright. Intellectual property rights should be respected,
exercising copyright in ways that facilitate re-use. Public
sector information must be copyright-free.
8. Pricing. PSI provided free of charge, or information pricing
transparent as far as possible
9. Competition. PSI open to all possible users and re-users on
non-exclusive terms
10. Transparent Redress mechanisms
11. Facilitate public private partnerships
12. International access and use. Support international co-
operation for commercial re-use and non-commercial use
13. Best practices. Encouraging the wide sharing of best
practices and exchange of information on implementation,
training, copyright and monitoring 18
19. From PSI to Open Data
PSI in Europe: EPSIplus platform
Public Sector Information (PSI) not necessarily open
Many different rules (national/regional laws!) about PSI re-use
For PSI to be open for re-use it needs to be
- discoverable
- legally open
- technically open
- free of charge
Rules for Open Government Data:
if it can’t be spidered or indexed, it doesn’t exist
if it isn’t available in open and machine readable format, it
can’t engage
if a legal framework doesn’t allow it to be re-used, it doesn’t
empower
19
20. Value of the Data
Opening data can have a big economic value and their value lies
in the possibility of their use and reuse.
What has real value is what you develop from them, and the fact
that they are available.
Data are so ubiquitous that are becoming a commodity, such as
electricity and water.
In the political field, the data have value only if it forms a critical
mass of people who know them and use them to form opinions
and participate in public activities.
More data are used, the greater value because it increases the
amount of decisions, goods, products and valuable services
based on them.
20
21. Value of the Data
2006 European Commission MEPSIR Study (Measuring European
Public Sector Information Resources) : estimates for the overall
market size for public sector information in the European Union
range from €10 to €48 billion, with a mean value around €27
billion
2011 Vickery's Study : first estimate provides economic gains
from opening up Public Information and providing access for free
up to € 40M for the EU27
But PSI can be used in direct and indirect applications across the
economy and direct and indirect economic impacts from PSI
applications and use across the whole EU economy are of the
order of €140 M annually
Tim Berners-Lee: Raw Data Now! TED 2009
21
22. Recent EU moves
December 2011:
Digital Agenda – turning government data into gold
Open Data Strategy for Europe, expected to deliver a €40 billion
boost to the EU's economy each year.
Open Data Package consisting of:
A proposal for a revision of the Directive 2003
A Communication on Open Data
New Commission rules on re-use of the documents it holds
three actions to overcome barriers and fragmentation :
Adapt the legal framework for data re-use
Financial resources in favor of open data and European data
portals
Facilitate coordination between European countries, in particular
through:
PSI group and PSI platform (exchange of good practices)
LAPSI network (legal issues related to PSI)
ISA action Interoperability Solutions for EU PA (€164 m)
22
23. Open Definition
From http://www.opendefinition.org/okd/ : what is “open”?
1. Access
Available as a whole and at a reasonable reproduction cost,
preferably downloading via the Internet without charge. The
work must be available in a convenient and modifiable form.
2. Redistribution
The license shall not restrict any party from selling or giving
away the work either on its own or as part of a package made
from derived work. License without royalty or other fee.
3. Reuse
The license must allow for modifications and must allow them to
be distributed under the terms of the original work.
4. Absence of Technological Restriction
The work must be provided in such a form that there are no
technological obstacles to the performance of the above
activities (eg. open data format) 23
24. Open Definition
5. Attribution
The license may require the attribution of the contributors
and creators to the work.
6. Integrity
The license may require as a condition for the work being
distributed in modified form that the resulting work carry a
different name or version number from the original work.
7. No Discrimination Against Persons or Groups
The license must not discriminate against any person or
group of persons.
8. No Discrimination Against Fields of Endeavor
The license must not restrict anyone from use the work in a
specific field of endeavor. For example, it may not restrict the
work from being used in a business, or for genetic research
24
25. Open Definition
9. Distribution of License
The rights attached to the work must apply to all to whom the
program is redistributed without the need for execution of an
additional license by those parties.
10. License Must Not Be Specific to a Package
The rights attached to the work must not depend on the work
being part of a particular package. If the work is extracted from
that package, all parties redistributed should have the same
rights as the original package.
11. License Must Not Restrict the Distribution of other
Works
The license must not place restrictions on other works that are
distributed along with the licensed work. For example, the
license must not insist that all other works distributed on the
same medium are open.
Many items similar to Open Software Definition (see) 25
27. How to Open Up data?
Key rules in opening up data:
Keep it simple (KISS) Start out fast, small and simple.
Not every dataset must become open right now. Moving as
rapidly as possible is good because it means you can learn
from experience
Engage early and engage often with actual and potential
users and reusers of the data: citizens, businesses,
developers. Much of the data will reach ultimate users via
infomediaries who take the data and transform and remix
them – users don’t need a large vectors database but are
interested in the map. Thus, the primary users to engage are
the infomediary reusers.
Address common fears and misunderstandings especially if
you are working with large institutions such as government.
In opening up data one encounters plenty of questions and
concerns and it is important to (a) identify the main ones (b)
address them at as an early stage as possible. 27
28. Steps to Open Data
There are 4 main steps in making data open (unsorted,
sometime recursive)
Choose the dataset(s) you plan to make open, though note
you may need to return to this step if you encounter problems
especially at step 2.
Apply an open license, suitable for all rights existing on data
(Legal Openness)
Make the data available - in bulk and in a useful format.
(sometimes via API) (Technical Openness)
Make them discoverable: post on the web and perhaps
organize a central catalogue to list your open datasets (or put
them in existing catalogues)
28
29. How to choose the datasets
Ask the ‘community’ (i.e. actual or potential users of the
data) what they want
Put up a web-page with details of this request for data
suggestions and a simple way to submit data requests (e.g.
via email or a simple webform). Some tips:
Avoid registration
Prepare a short (5-20 items) list of datasets as a prompt
This list should be a quick process that identifies what
datasets could be made open
Circulate the request to relevant mailing lists, forums and
individuals pointing back to the main webpage
Run a consultation event — but make sure you run it at a
convenient time where the average business person and ‘data
hacker’ can attend
29
30. Apply an Open License
If you are planning to make your data available you should
put a license on it — and if you want your data to be open this
is even more important
For Licensing purposes must distinguish:
Data (the collection)
Contents (individual items, part of the collection,
rows/columns)
Structure (schema, metadata, Data Definition)
30
31. OpenData Licenses
Many licenses proposed:
OpenData Commons proposes three licenses:
Public Domain Dedication and License (PDDL)
Attribution License (ODC-By)
Open Database License (ODC-ODbL) - Like the GPL (or CC
Attribution Share-Alike) requires public reusers of your data to
share back changes (and attribute)
Opendefinition gives a list of licences conformant or not to
“open” definition
Many national licenses:
Canada
UK
Norway
Italy (now IODL 2.0)
31
32. Open Data organisations
Open Knowledge Foundation : “From sonnets to statistics,
genes to geodata”
Founded in 2004, a not-for-profit organization promoting open
knowledge: any kind of data and content – sonnets to statistics,
genes to geodata – that can be freely used, reused, and
redistributed
OKF created standards like the Open Definition, organizes
events like OKCon and Open Government Data Camp, projects
like “Where Does My Money Go” and Open Shakespeare and
develops tools like CKAN to help people share open material
Italian organisations: SpaghettiOpendata, DataGov.it, OPenPolis
, LinkedOpenData.it
32
33. Software for Open Data
CKAN (Comprehensive Knowledge Archive Network) is open-
source “data hub” software designed to make it easier to find,
share, reuse and collaboratively develop data and content,
especially open data and content
In http://data.gov.uk and http://wiki.okfn.org/ckan/instances
For Italy: http://it.ckan.net/
Data.gov code released as OSS (a modified Drupal version)
used also for India → Open Government Platform
33
34. CKAN features
Free/Open-Source software, written in Python
Core catalog based around Resources (Files and APIs) and
groupings of those (Packages)
Tagging
Package Groups
Ratings
Arbitrary metadata
Package relationships
Web user interface (WUI)
Package adding, editing, listing etc
Wiki features such as “Recent Changes”, edit histories,
purging of changes etc
User management and user home pages
Full JSON-based REST API with clients in Python, PHP, Perl …
RDF version also available
CKAN is easy to use as your “catalogue” backend
An Extension and Plugin system
34
35. Make the data available
Tim Berners-Lee: Linked Data as part of a continuum of web
publishing activities associated with gold stars, like the ones you
got in school.
Here they are:
★ make your stuff available on the web (whatever format
see here)
★★ make it available as structured data (e.g. excel instead
of image scan of a table)
★★★ non-proprietary format (e.g. csv instead of excel)
★★★★ use URLs to identify things, so that people can point at
your stuff
★★★★★ link your data to other people’s data to provide context
35
36. Make the data available
Open data needs to be ‘technically’ open as well as legally
open. Specifically the data needs be:
Available - at no more than a reasonable cost of reproduction,
preferably for free download on the Internet. Publish your
information on the Internet wherever possible
In bulk - the data should be available as a whole (a web
service may also be useful but is not a substitute for bulk
access)
In an open, machine-readable format - machine-readability is
important because it facilitates reuse (eg figures in a PDF are
read easily by humans but are very hard for a computer)
The key point to keep in mind here is: keep it simple, move
fast and be pragmatic. In particular it is better to give out raw
data now than perfect data in six months time.
36
37. Make the data discoverable
Tell the world!
Contact prominent organisations or individuals interested in
this area
Contact relevant mailing lists or social networking groups
Contact prospective users you know may be interested in this
data
Getting folks in a room: Unconferences, Meetups and
Barcamps: face-to-face events can be a very effective way to
encourage others to use your data
Making things! Hackdays, prizes and prototypes,
conferences, barcamps, ...
37
38. Open Linked Data
Linked Data: a method of publishing structured data, so that it
can be interlinked and become more useful
Built upon standard Web technologies (HTTP and URIs) - but it
extends them to share information in a way that can be read
automatically by computers (this enables data from different
sources to be connected and queried)
Tim Berners-Lee, W3C director, coined the term in a
design note discussing the Semantic Web project
4 rules:
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL)
Include links to other URIs, so that they can discover more
things
38
39. Open Linked Data
A site that exists to provide a home for, or pointers to,
resources from across the Linked Data community:
http://linkeddata.org/
Extended:
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.ht
Example: DBPedia - dataset from Wikipedia, see
http://wiki.dbpedia.org/
Dataset description http://wiki.dbpedia.org/Datasets?v=br4
DBPedia full Ontology and example
DBPedia SPARQL example: query
39
40. OpenData links
Examples:
http://dati.gov.it (since October 2011)
http://data.gov http://www.data.gov/raw/34
http://data-gov.tw.rpi.edu/demo/stable/demo-34-earthquake-exhibit.html
http://www.police.uk/crime/?q=Cambridge,%20UK#crimetypes
http://jobseekers.direct.gov.uk/homepage.aspx
http://data.gov.uk/dataset/financial-transactions-data-nhslondon
http://data.london.gov.uk/datastore
http://data.worldbank.org/ (best practices!)
http://openlylocal.com/councils/open
Standard
http://www.w3.org/2011/gld/charter.html
Next generation data.gov
http://www.socrata.com/datagov/introducing-next-gen-platform-short-video/
opendata http://www.opendatacommons.org/guide/
40