FINODEX open data training

OPEN DATA TRAINING MATERIAL
November 2014
Page 1

Table of contents
1. Defining Open Data
2. Understanding Law and Licensing
3. Big Data vs. Open Data
4. Open data as part of your business model
5. Case studies: Open Data Business
6. Where do I find Open Data?
7. How to develop your open data business
8. Open Data training materials already available. A list
9. Slides and inspiring presentations: link-o-graphy
10.Recommended videos, audio files and books
November 2014
Page 2

1. Defining Open Data
“A key promise of open data is that it can freely accessed and used. Without a clear definition of
what exactly that means (e.g. used by whom, for what purpose) there is a risk of dilution especially
as open data is attractive for data users” (Pollock, 2014).
Main goal of this material is to make sure that people willing to re-use open datasets are aware of
what “open” really means.
First step we take is to explore some guidelines you find online. The Open Data Institute and Open
Knowledge keep posting interesting simple guides and contents, ready for open data publishers
and reusers. Let’s start from the basics: What makes data open and The Open definition v2.0.
What makes data open?
Original contents for this material is provided online at http://theodi.org/guides/what-open-data and
http://theodi.org/guides/publishers-guide-open-data-licensing .
Open data is data that is made available by organisations, businesses and individuals for anyone
to access, use and share.
Open data has to have a licence that says it is open data. Without a licence, the data can’t be
reused.
The licence might also say:
● that people who use the data must credit whoever is publishing it (this is called attribution)
● that people who mix the data with other data have to also release the results as open data
(this is called share-alike)
● that people can do whatever they want with your work, if the holder has waived the
copyright of database rights (public domain)
Example: The Department for Education makes available open data about the performance of
schools in England. The data is available as CSV and is available under the Open Government
Licence, which only requires reusers to say that they got the data from the Department for
Education.
These principles for open data are detailed in the Open Definition in the next paragraph.
November 2014
Page 3
Good open data
● are rich of documentation and metadata
● can be linked to, so that it can be easily shared and talked about
● is available in a standard, structured format, so that it can be easily processed

Open Definition
The Open Definition, created in 2005, is the main international standard for open data and open
data licences, and provides principles and guidance for all things “open”.
Open Data Mark: indicates compliance with Open Definition
Definition
You can find the entire updated version of the Open definition at http://opendefinition.org/od/ . The Open
Definition is a project by Open Knowledge, that provides details and additional contents as well on its official
web page.
This material is licensed under a CC 4.0 Attribution https://creativecommons.org/licenses/by/4.0/.
Open data is data that can be freely used, shared and built on by anyone, anywhere, for any
purpose. The “standard” provided by the Open Definition – common requirements that must be
met if a data is to be called “open” – is crucial because much of the value of open data lies in the
ease with which different sources of open data can be combined – practically every app or
insight made with data requires combining several pieces of data. For example, you need to
know the bus timetable and have a map showing bus stops to be able to reach your destination on
time.
Both legal and technical compatibility is vital, and the Open Definition ensures that openly-licensed
data can be combined successfully. This eliminates the risk of a “Tower of Babel” of data, with a
proliferation of licences and terms of use for open data leading to complexity and incompatibility.
The Open Definition prevents this fragmentation – and resulting destruction in value – by ensuring
a common standard for all “open” data. Evidence for the practical success of the effort can be
found in the reuse of the definition key principles and language in other important areas including
UK and US government policy, and include the transition in terminology from “public sector
information” to “open government data”.
Thanks to the efforts of many translators in the community, the Open Definition is available in 30+
languages.
The Open definition explains what can be defined as open work and open license. The term work
is used to denote the item or piece of knowledge being transferred. The term license refers to the
legal conditions under which the work is made available. Where no license has been offered this
should be interpreted as referring to default legal conditions governing use of the work (for
example, copyright or public domain).
November 2014
Page 4

November 2014
Page 5
Open Works
An open work must satisfy the following requirements in its distribution:
● Open License
The work must be available under an open license (as defined in Section 2). Any additional
terms accompanying the work (such as terms of use, or patents held by the licensor) must not
contradict the terms of the license.
● Access
The work shall be available as a whole and at no more than a reasonable one-time
reproduction cost, preferably downloadable via the Internet without charge. Any additional
information necessary for license compliance (such as names of contributors required for
compliance with attribution requirements) must also accompany the work.
● Open Format
The work must be provided in a convenient and modifiable form such that there are no
unnecessary technological obstacles to the performance of the licensed rights. Specifically,
data should be machine-readable, available in bulk, and provided in an open format (i.e., a
format with a freely available published specification which places no restrictions, monetary
or otherwise, upon its use) or, at the very least, can be processed with at least one
free/libre/open-source software tool.

Open Licenses
A license is open if its terms satisfy the following conditions:
● Required Permissions: The license must irrevocably permit (or allow) the following:
1.1 Use: The license must allow free use of the licensed work.
1.2 Redistribution: The license must allow redistribution of the licensed work, including sale,
whether on its own or as part of a collection made from works from different sources.
1.3 Modification: The license must allow the creation of derivatives of the licensed work and allow
the distribution of such derivatives under the same terms of the original licensed work.
1.4 Separation: The license must allow any part of the work to be freely used, distributed, or
modified separately from any other part of the work or from any collection of works in which it was
originally distributed. All parties who receive any distribution of any part of a work within the terms
of the original license should have the same rights as those that are granted in conjunction with the
original work.
1.5 Compilation: The license must allow the licensed work to be distributed along with other
distinct works without placing restrictions on these other works.
1.6 Non-discrimination: The license must not discriminate against any person or group.
1.7 Propagation: The rights attached to the work must apply to all to whom it is redistributed
without the need to agree to any additional legal terms.
1.8 Application to Any Purpose: The license must allow use, redistribution, modification, and
compilation for any purpose. The license must not restrict anyone from making use of the work in a
specific field of endeavor.
1.9 No Charge: The license must not impose any fee arrangement, royalty, or other compensation
or monetary remuneration as part of its conditions.
● Acceptable Conditions
● The license shall not limit, make uncertain, or otherwise diminish the permissions required
in previous section except by the following allowable conditions:
Attribution: The license may require distributions of the work to include attribution of contributors,
rights holders, sponsors and creators as long as any such prescriptions are not onerous.
Integrity: The license may require that modified versions of a licensed work carry a different name
or version number from the original work or otherwise indicate what changes have been made.
Share-alike: The license may require copies or derivatives of a licensed work to remain under a
license the same as or similar to the original
Notice: The license may require retention of copyright notices and identification of the license.
Source: The license may require modified works to be made available in a form preferred for
further modification.
November 2014
Page 6

Technical Restriction Prohibition: The license may prohibit distribution of the work in a manner
where technical measures impose restrictions on the exercise of otherwise allowed rights.
Non-aggression: The license may require modifiers to grant the public additional permissions (for
example, patent licenses) as required for exercise of the rights allowed by the license. The license
may also condition permissions on not aggressing against licensees with respect to exercising any
allowed right (again, for example, patent litigation).
A list of conformant licenses is available at http://opendefinition.org/licenses/ .
We explore licensing in the next section.
November 2014
Page 7

2. Understanding Law and Licensing
In this section, we intend to provide some additional materials on the licenses the applicants are
invited to look for. You can find here an extended list of licenses that are conformant with the
principles laid out in the Open Definition.
Conformant Licenses
The following licenses are conformant with the principles set forth in the Open Definition.
● Domain = Domain of application, i.e. what type of material this license should/can be
applied to. Note if you are looking for an open license for software, please see Open
Source Definition conformant licenses.
● BY = requires attribution
● SA = require share-alike
● Recommended conformant licenses
These licenses conform to the Open Definition and are:
● Reusable: Not specific to an organization or jurisdiction.
● Compatible: Must be compatible with at least one of GPL-3.0+, CC-BY-SA-4.0, and
ODbL-1.0. Permissive/attribution-only licenses must be compatible with all 3 of the
aforementioned licenses, and at least one of Apache-2.0, CC-BY-4.0, and ODC-BY-1.0.
● Current: Widely used and generally considered best practice by a broad spectrum of
projects and actors within the domains of applicability of the license.
License Domain By SA Comments
Creative Commons CCZero (CC0) Content,
Data
N N Dedicate to the Public
Domain (all rights
waived)
Open Data Commons Public Domain
Dedication and Licence (PDDL)
Data N N Dedicate to the Public
Domain (all rights
waived)
Creative Commons Attribution 4.0 (CC-
BY-4.0)
Content,
Data
Y N
Open Data Commons Attribution
License(ODC-BY)
Data Y N Attribution for
data(bases)
Creative Commons Attribution Share-
Alike 4.0 (CC-BY-SA-4.0)
Content,
Data
Y Y
Open Data Commons Open Database Data Y Y Attribution-ShareAlike
November 2014
Page 8

License (ODbL) for data(bases)
November 2014
Page 9

● Other conformant licenses
These licenses conform to the Open Definition, but do not meet reusability or compatibility
requirements for recommended licenses, or have been superseded by newer license versions or
newer licenses with similar use cases, or are little-used. These licenses may be reasonable for the
particular organization they were crafted for to use, or to use for legacy reasons. Projects outside
such contexts are strongly advised to use a recommended conformant license from the list above.
License Domain By SA Comments
Against DRM Content Y Y Little used.
Creative Commons
Attribution
versions 1.0-3.0
Content Y N Includes all jurisdiction "ports";
Superseded by CC-BY-4.0.
Creative Commons
Attribution-
ShareAlike
versions 1.0-3.0
Content Y Y Includes all jurisdiction "ports";
Superseded by CC-BY-SA-4.0.
Additionally, CC-BY-SA-1.0 is
Incompatible with any other license.
Data licence
Germany –
attribution – version
2.0
Data Y N Non-reusable. For use by Germany
government licensors. Note version 1.0 is
not approved as conformant.
Data licence
Germany – Zero –
version 2.0
Data N N Non-reusable. For use by Germany
government licensors. Note there is no
previous version.
Design Science
License
Content Y Y Little used, Incompatible with any other
license.
EFF Open Audio
License
Content Y Y Deprecated in favor of CC-BY-SA.
Free Art License
(FAL)
Content Y Y
GNU Free
Documentation
License
(GNU FDL)
Content Y Y Incompatible with any other license. Only
conformant if used with no cover texts
and no invariant sections.
MirOS License Code,
Content
Y N Little used.
November 2014
Page 10

Open Government
Licence Canada 2.0
Content,
Data
Y N Non-reusable. For use by Canada
government licensors. Note version 1.0 is
not approved as conformant.
Open Government
Licence United
Kingdom 2.0 and 3.0
Content,
Data
Y N Non-reusable. For use by UK government
licensors; re-uses of OGL-UK-2.0 and
OGL-UK-3.0 material may be released
under CC-BY or ODC-BY. Note version 1.0
is not approved as conformant.
Talis Community
License
Data Y Y Draft only, Deprecated in favour of ODC
licenses.
Non-Conformant Licenses
Non conformant licenses are usually those that though supporting some of the definition’s
principles do not support all of them.
● Creative Commons No-Derivatives Licenses
Creative Commons No-Derivatives (by-nd-*) violate OD 1.1#3., “Reuse”, as they do not allow
works, in part or in whole, to be re-used in derivative works.
Creative Commons licenses with the Noderivs stipulation include:
● Attribution-NoDerivs (by-nd)
● Attribution-NonCommercial-NoDerivs (by-nc-nd)
●
● Creative Commons NonCommercial
Creative Commons NonCommercial licenses (by-nc-*) do not support the OD 1.1#8., “No
Discrimination Against Fields of Endeavor”, as they exclude usage in commercial activities.
Creative Commons licenses with the non-commercial stipulation include:
● Attribution-Noncommercial (by-nc)
● Attribution-NonCommercial-ShareAlike (by-nc-sa)
● Attribution-NonCommercial-NoDerivs (by-nc-nd)
November 2014
Page 11

Licence Compatibility
The applicants, as reusers and publishers of open data, often need to understand whether the
licenses applied to datasets are "compatible".
We recommend to the Finodex proposers to have a look at this page:
https://github.com/theodi/open-data-licensing/blob/master/guides/licence-compatibility.md
The most important step towards understanding compatibility in more detail is to understand the
basic provisions of each license.
The Creative Commons Rights Expression Language defines some basic facets of licenses,
covering Permissions, Requirements and Prohibitions. As the CC licenses are already described
using these facets, which are also common to many other licenses, it is possible to put together a
matrix that identifies which facets apply to which licenses.
Table 1 summarises how a number of licenses can be classified based on these facets.
There are several things to note here:
● The Share Alike requirement requires that derived data is published under the same or
compatible terms as the original. This places limits on how remixes can be distributed, i.e.
only under compatible terms.
● The Derivative Works prohibition limits re-users from distributing any form of derivative
work at all. Even if those derivatives are not distributed. However it is still possible to
include the database in a collection in which the original is preserved.
When it comes to publishing derivatives there are, broadly, two different scenarios to consider:
publishing a simple derivative based on a single source, and
publishing a remix of several datasets.
Once a derivative has been created, then it too can be the source of additional derivation.
Derivation is a process that can be repeated either by the original publisher (e.g. mixing in
additional further datasets) or by third-parties (e.g to create new derivatives).
November 2014
Page 12
Questions about licence compatibility:
● Can some data published with Licence X be combined with some additional data published
under Licence Y?
● What license(s) could be applied to a derived or aggregated dataset?
● Are there provisions associated with a licence that inhibit or constrain the creation and
Set of questions for open data publishers and reusers
Author: David Tarrant
● Do you have rights or permission to publish?
● Do you have rights to use the information/data?
● Is the data derived from other sources?

Further readings:
http://www.scribd.com/doc/128356210/Business-considerations-or-privacy-and-open-data-how-not-to-get-caught-out
http://www.scribd.com/doc/125638490/Getting-to-grips-with-the-National-Pupil-Database-personal-data-in-an-open-data-
world
USEFUL GUIDES for reusers and publishers released by The Open Data
Institute
The ODI Publisher's Guide to Open Data Licensing
Source: http://theodi.org/guides/publishers-guide-open-data-licensing
In Europe, there are two kinds of rights that you are automatically given over things that you have
created:
● you get copyright over works (content) that you create and which are original to you, such
as text that you write or photographs you take
● you get a database right over collections of data that you have put a substantial effort into
obtaining, verifying or presenting
Note: As far as we know the database right only arises within the European Union and in Mexico.
In some countries there may be no protection for collections of data.
Database right: 15 years since database was last updated
Database copyright: Life of author + 70 years from date database was created
November 2014
Page 13
Suggestion for the proposers:
If you are uncertain about what rights you may have over a
piece of content or dataset or how you can use it...
Contact the owner. Ask.

If you apply original judgement in putting together a database, for example in choosing which items
to include within the database or which information about them to include, you have a copyright
over that database, because it is a creative work.
For example, if you were to build a database about the best 100 cars, this might involve:
● choosing which cars count as the best cars
● writing a description about each car
● researching and gathering facts about them
You would have copyright over the database, because you chose which cars were “best”. You
would have copyright over the descriptions, because you wrote them. And you would probably
have the database right for the database you’ve built, because you put substantial effort into
gathering information about them. Importantly, you don’t own the facts about the cars — anyone
else can build their own database containing exactly those facts without violating your database
right — but no one else can reuse your database or your descriptions without your permission
because you own the copyright over them.
You probably do not have a database right if you create the facts in a database, as opposed to
gathering them from elsewhere, unless you put substantial effort into verifying or presenting the
database. For example, if you own a restaurant and create a database of the dishes that you offer
and when you offer them, you probably do not have a database right over that database, though
you might have copyright because of the creative judgement involved in working out which dishes
should be offered on particular days to provide a balanced menu.
Copyright and database right are types of Intellectual Property Rights (IPR). There are other kinds
of IPR that you can get, such as patents, trademarks and (some) design rights, which must be
registered (for example with the Intellectual Property Office).
November 2014
Page 14
Database definition
“A collection of independent works, data or materials which are
a) arranged in a systematic or methodical way and

● What About Data From Other Organisations?
You might not own all the content or data that you have and use within your organisation. In
particular, rather than creating the content or gathering the data yourself, some of the content and
data you hold and use within your organisation, and might want to publish, might be:
● completely licensed from someone else
● include an extract of content or data that you have licensed from someone else
● be derived from the content or data that you have licensed from someone else
The Reuser’s Guide to Open Data Licensing describes what you can do with content or data that
you licence from someone else. If you do reuse that content or data in your own publications, you
should indicate the licence under which you are reusing that content, so that people reusing that
content or data know what they can do with it.
● What About My Brand?
Organisations who publish content or data under an open licence are often concerned that this
might enable reusers to also copy their brand.
Your brand should be protected through a trade mark. A trade mark restricts how other people use
your logo or company name. You will also have copyright on the logo.
Although your trade mark will protect you from other people using your logo directly, if your logo is
incorporated into some content that you licence, you should make sure the logo is explicitly not
covered by that licence, as you will usually want to place additional restrictions on its use (such as
its adaptation).
For example, if you have written a report that includes your logo, and you want to licence the
content of the report under the Creative Commons Attribution licence, you could say:
The text, figures and tables in this report are licensed under a Creative Commons Attribution 4.0
International License.
What If I Publish the Data on a Website?
November 2014
Page 15

You still have rights over your database and your content when you publish them on a website.
Others cannot legally extract and reuse a substantial portion of your data or content without your
permission.
You can also indicate that others should not scrape data from your website through your Terms
and Conditions and through technical mechanisms such as robots.txt.
There are two sets of open licences. You should use a licence from one of these sets rather than
creating your own licence, for three reasons:
1. it’s less work
2. it ensures that the legal language in the licence is correct
3. it makes it a lot easier for reusers to know what they can do with your data
● Open Licences for Creative Content
Creative content, such as text, photographs, slides and so on, should be licensed using a Creative
Commons Licence. There are three of these that you should consider using for open content:
Level of Licence Creative Commons Licence
public domain CC0
attribution CC-by
attribution & share-alike CC-by-sa
Make sure that you use the latest (version 4.0) Creative Commons licenses, which are
international. The links in the table above go to the correct licences.
There are other types of Creative Commons licences that are not open licences. For example, the
Creative Commons Attribution-NonCommercial licence does not allow commercial reuse of
November 2014
Page 16

content, and therefore is not an open licence. If you use the Creative Commons licence chooser,
only those that are described as “Free Culture” licences are open licences.
● Open Licences for Databases
We now recommend that you also use a Creative Commons 4.0 licence for data as well as for
content.
You may alternatively use a similar set of licences that was created specifically for databases from
the Open Data Commons. There are again three levels that you can choose from:
Level of Licence Open Data Commons Licence
public domain PDDL
attribution ODC-by
attribution & share-alike ODbL
ODBL licence is used for OpenStreetMap.
You can find more details here: https://blog.openstreetmap.org/2014/08/06/at-the-edge-of-the-
license/
Which Licence Should I Use?
The licence that you use should support your open data business model. It is unusual for
organisations to place content or data in the public domain as being given attribution for the
content or data usually helps to achieve some of the goals of opening it up.
It is possible to license content or data under more than one licence, and let reusers choose which
licence to use it under. Typically you would dual-license some content or data by making it
available under an open licence and under a paid-for licence that does not have the same
restrictions. Dual-licensing is typically used with a share-alike licence, as outlined below.
November 2014
Page 17

Some open data business models work best with a share-alike licence. For example:
● a share-alike licence will usually be unattractive to commercial businesses who don’t want
to open up their own data, so using a share-alike licence coupled with a charged licence
can be a good basis for a freemium business model
● when you are collaborating with others to create a shared resource, a share-alike licence
can help to ensure that you can bring back into that resource any work that others do on
their own copies
On the other hand, if you are hoping to gain other benefits for your business through the reuse of
your data, using a cross-subsidy business model, you may find that a share-alike licence prevents
people from reusing it, and therefore want to avoid having a share-alike restriction.
There are two cases where you have no choice over what licence you can use for the content or
data that you publish.
1. If you are publishing content or data that is derived from content or data that was
licensed to you using a share-alike licence, then you must publish your content or data
using that same licence.
2. With very few exceptions, if you are a government department or arms-length body then
the content or data that you have created or gathered is owned by the Crown. Unless
you have an exemption, granted by the Office of Public Sector Information (OPSI), you
must publish this data using the Open Government Licence.
What Attribution Should I Ask For?
If you choose a licence that includes a requirement for attribution, you need to specify what that
attribution should look like.
In choosing what attribution to ask for, you should consider the ways in which your data or content
might be reused, and the fact that it might be combined with other data or content that might
require its own attribution. If you want to encourage the reuse of your data or content, you need to
make it easy for reusers to satisfy your attribution requirements.
There are two things you should document:
November 2014
Page 18

1. What should the attribution include? You will usually want the name of your
organisation, and a link to either your organisation’s home page or a page about the
data or content you are licensing. Keep this as minimal as possible.
2. Where and how should the attribution be presented? Some attribution requirements
specify that the attribution must be presented directly wherever the data is used, and
may even specify the size or format of the attribution. These requirements can be
difficult to adhere to, particularly for mobile application developers who have limited
screen space to include such attributions. Allowing reusers to provide attribution on a
separate page makes this easier.
Note that under the terms of the licences listed above, when a reuser uses your data or content to
add value to or to create new data or content, they cannot relicense your work. Any onward
reusers are bound by the same attribution requirements as the direct reusers of your content or
data. It’s a good idea to explicitly document this requirement because it might not be obvious to
reusers.
How Do I Indicate the Licence of Content or Data?
You should indicate the licence for content or data you make available using both a human-
readable description and computer-readable metadata. The clearer you make it which licence
applies to your content or data, the easier it is for reusers to know that they can reuse the content
or data you are licensing.
The human-readable descriptions and marks that you should use are spelled out on the Creative
Commons and Open Data Commons websites:
● Creative Commons licence chooser
● Open Data Commons licences
It is best to embed information about the licence that some content or data is available under
directly within the content or data. This ensures that the licensing information is carried around with
the content or data.
In addition to human-readable text, you should provide computer-readable metadata. The separate
Publisher’s Guide to the Open Data Rights Statement Vocabulary describes how to do this.
If you add your dataset to a catalog, such as data.gov.uk or the Data Hub, you should make sure
that you indicate the licence under which the dataset is available within that catalog. This gives
November 2014
Page 19

people searching the catalog a quick and easy way of seeing that they will be able to reuse the
dataset.
November 2014
Page 20

The ODi Reuser's Guide to Open Data Licensing
Source: http://theodi.org/guides/reusers-guide-open-data-licensing
The fact that you can get hold of some information does not necessarily mean that you can do
whatever you want with it. You need to have permission from the owner of that information to do
what you want to do. A licence tells you what you can do.
But what does it mean to license data? What requirements can a licence place on you? What
different licences to publishers use? How can you find out what licence a dataset is available
under? This guide answers these questions.
Note: This guide focuses on data published by organisations based in the UK. Licensing law is
different in different countries, so some of this information might not apply to you if you are reusing
information that is published elsewhere. It does not address other potential legal considerations,
such as compliance with the Data Protection Act.
● What Do Publishers Own?
In Europe, there are two kinds of rights that publishers — organisations or individuals who make
available content or data — are given over things that they have created:
● they get copyright over works (content) that they create and which are original to them,
such as text that they write or photographs they take
● they get a database right over collections of data that they have put a substantial effort
into obtaining, verifying or presenting
Note: As far as we know the database right is unique to the European Union. In some countries
there may be no protection for collections of data.
If someone applies original judgement in putting together a database, for example in choosing
which items to include within the database or which information about them to include, they have a
copyright over that database, because it is a creative work.
For example, if someone were to build a database about the best 100 cars, this might involve:
● choosing which cars count as the best cars
● writing a description about each car
● researching and gathering facts about them
November 2014
Page 21

They would have copyright over the database, because they chose which cars were “best”. They
would have copyright over the descriptions, because they wrote them. And they would probably
have the database right for the database they’ve built, because they put substantial effort into
gathering information about the cars. Importantly, they don’t own the facts about the cars — you or
anyone else could build your own database containing exactly those facts without violating their
database right — but no one else can reuse their database or their descriptions without their
permission because they own the copyright over them.
Publishers probably do not have a database right if they create the facts in a database, as opposed
to gathering them from elsewhere, unless they put substantial effort into verifying or presenting the
database. For example, if someone owns a restaurant and creates a database of the dishes that
they offer, and when they offer them, they probably do not have a database right over that
database, though they might have copyright because of the creative judgement involved in working
out which dishes should be offered on particular days to provide a balanced menu.
● What About Data From Third Parties?
Publishers might not own all the content or data that they publish themselves. In particular, rather
than creating the content or gathering the data themselves, some of the content and data they
publish might be:
● completely licensed by them from someone else
● include an extract of content or data that they have licensed from someone else
● be derived from the content or data that they have licensed from someone else
When they publish the data, the publisher should tell you about which content or data is owned by
another organisation, and under which licence it is being republished.
● What About Brands?
Brands are usually protected through a trade mark. A trade mark restricts how you can use an
organisation’s logo or company name. They will also have copyright on the logo.
Licences for content or data usually explicitly exclude logos and company names, so you cannot,
for example, adapt a logo by changing the colours used within it. You also cannot use the company
name or logo to lend weight to your product without permission to do so. However, the attribution
requirements of a licence may require you to use the company name and logo to indicate that you
have reused data owned by that company.
● What Can’t You Do?
There are a few things that you can do with content or data without a licence, but in general you
need to be given a licence by a publisher if you want to reuse their content or data. Having access
to some content or data — for example by downloading it from a publisher’s website — does not
give you the right to reuse it.
November 2014
Page 22

● Republishing and Adding Value
You do not automatically have the right to republish, in its entirety, content or data that someone
else owns, even if they have given you a licence to use it yourself. You need to check the terms of
the licence for the content or data to make sure that you can republish it.
The same applies if you are adding value to the content or data, for example by automatically
adding links or styling to content, or adding columns with extra information into a dataset. The new
content or data includes the entirety of someone else’s content or data, so you cannot publish it
unless you have their permission.
● Publishing Extracts
You have the right to publish extracts of content or databases that you have access to, regardless
of what the licence says, so long as the extract is not “substantial”. However, it is often hard to tell
if the extract that you have made is “substantial”.
The licence that you have been given might let you republish any amount of the content or data
(open licences do this). Otherwise, you should take legal advice about whether the extracts that
you want to publish are likely to count as substantial or not.
● Publishing Derived Content or Data
You might want to create new content or databases by adapting, deriving, or otherwise processing
some content or data. To do that, you first have to ensure you have been given a licence to use the
data in the first place. You then need to look at what the licence says about creating derived works.
For example, say you have been given a licence to use a photograph on your website. You could
create a new version of that photograph by changing it from colour to black & white, or by adding a
speech bubble to it.
In this case, the photograph is a creative work, and the person who took it owns the copyright.
Because the photograph is protected by copyright, you can only create these new images if the
licence under which you are using the photograph allows you to do so.
Copyright can exist in small pieces of content, such as phrases. For example, if you analyse some
content to create a new database, you should make sure that you have the right to reuse any
snippets of content that you might keep in the new database. If the content includes a presentation
of data from a database, you have to consider database rights as well: scraping data from the page
might equate to creating an extract.
Database rights are slightly different, because they only extend to creating extracts or re-utilising
(republishing) a database.
For example, say you analysed the data about prescriptions of each drug within each GP practice
within the UK, along with other data about the coverage of each practice, to create a new dataset
that provided the average spend per patient of each practice. So long as you had no separate
contractual obligations to the owners of the two datasets you have brought together, you might well
be free to do what you liked with the result, as it would not be possible to reconstruct the original
databases from the aggregated data.
November 2014
Page 23

● What Do Licences Say?
Licences tell you what you can do with the content or data that you access. A licence will tell you
whether you can:
● republish the content or data on your own website
● derive new content or data from it
● make money by selling products that use it
● republish it while charging a fee for access
Many licences will let you access content or data for free, but say that you cannot republish it or
adapt it, or use it within commercial products. If you break the terms of the licence, the owner of
the content or data can take you to court.
● What Do Open Licences Say?
An open licence is one that places very few restrictions on what you can do with the content or
data that is being licensed.
According to the Open Definition, there are only two kinds of restrictions that an open licence can
place:
● that you must give attribution to the source of the content or data
● that you must publish any derived content or data under the same licence (this is called
share-alike)
An open licence might do neither or one or both of these. So, you might encounter content or data
available under one of three levels of licence:
1. a public domain licence has no restrictions at all (technically, these indicate that the rights
owner has waived their rights to the content or data)
2. an attribution licence just says that you must give attribution to the publisher
3. an attribution & share-alike licence says that you must give attribution and share any
derived content or data under the same licence
November 2014
Page 24

● How Do You Provide Attribution?
You should provide attribution even if the licence does not require it. Giving attribution is a way of
recognising both the efforts that the publisher has made to put together the content or data you are
reusing, and their generosity in making it available for reuse.
When content or data is licensed using a licence that includes attribution, the publisher might
specify:
● what wording the attribution should include
● where and how the attribution should be presented
You should follow what the publisher asks you to do. If it is not practical, for example if you are
providing a service that does not have room for the attribution statement that they request, then get
in touch with them to ask what to do.
It is good practice to provide the name of the organisation that published the data or content, and a
link to their home page. Specifying the name of the dataset and providing a link to its location also
helps other reusers to find the data you are reusing.
If you are building a tool that reuses some content or data, you should try to include attribution on
every page or screen in which the content or data is used. If this is impractical (for example
because you are pulling together information from lots of different sources), you should provide a
clear link to a page or screen that then provides attribution information.
If you are republishing data or content, its reusers are still bound by the attribution requirements of
the original data or content. To make it easier for them to understand and fulfil those requirements,
it is good practice to include the attribution for the source data or content in the attribution that you
ask for. This might sometimes be impractical, for example because you are creating derived data
or content includes data or content from a large number of sources. In these cases, you should
provide a full list of the sources and request an attribution which links to that list.
● How Do You Share-Alike?
A share-alike licence requires you to republish new content or data that you create using the given
content or data under the same, share-alike licence. Creating new ways of presenting data does
not count as derivation or adaptation, but combining two sets of data to create a new set probably
does.
Publishing the content and data that you create from open data, as open data, is a good thing to
do even if the licence does not require it. Opening up your content and data enables others to
reuse and build on your work, and can add value to your work.
● What Open Licences Are There?
There are two sets of open licences that you may encounter.
November 2014
Page 25

● Open Licences for Creative Content
Creative content, such as text, photographs, slides and so on, may be licensed using a Creative
Commons Licence. There are three of these that you might encounter:
Level of Licence Creative Commons Licence
public domain CC0
attribution CC-by
attribution & share-alike CC-by-sa
There are different versions for each of these licences, the most recent being version 4.0. There
are also different variants which take into account differences in the law in different countries. The
links in the table above are to the version 4.0 versions, which apply internationally, but you may
find publishers using other versions. You can reuse content under these licences no matter what
country you are in.
There are other types of Creative Commons licences that are not open licences. For example, the
Creative Commons Attribution-NonCommercial licence does not allow commercial reuse of
content, and therefore is not an open licence. The human-readable summaries of the Creative
Commons licences spell out exactly what you can do under each licence.
● Open Licences for Databases
You might encounter a similar set of licences which is available for databases from the Open Data
Commons. There are again three levels:
Level of Licence Open Data Commons Licence
public domain PDDL
attribution ODC-by
attribution & share-alike ODbL
● Other Licences
There are other licences that enable reuse and which you may encounter, particularly around
public sector information:
November 2014
Page 26

● Open Government Licence is an attribution licence that covers both copyright and
database right and is mainly used for information made available by UK central government
● OS Open Licence is an attribution licence that is exactly the same as the Open
Government Licence but ensures that the attribution is to the Ordnance Survey
● How is the Licence Indicated?
The licence under which information is published should be clear both in human-readable content
and as machine-readable data. If you cannot work out the licence for information that you discover
on the web, you should contact the owner of the site to ask: the lack of licensing information means
that you cannot assume the right to reuse the content or data.
Human-readable descriptions and marks that you may encounter are shown on the Creative
Commons and Open Data Commons websites:
● Creative Commons licence chooser
● Open Data Commons licences
Where possible, the publisher should have embedded information about the licence directly within
the content or data itself. Often, however, you will have to look at the page from which you access
the content or data, or the licence information for the entire website, which is often linked to from
the footer of the page.
If a publisher adds their dataset to a catalog, such as data.gov.uk or the Data Hub, they may
indicate the licence under which the dataset is available in the metadata supplied by the catalog.
You should check that this is consistent with any licence information they supply on their own site
or within the data itself: if it is not, you should ask them for clarification.
Legal tools for open data
Open Data Commons is the home of a set of legal tools to help you provide and use Open Data
http://opendatacommons.org/
http://opendatacommons.org/faq/licenses/
3. Big Data vs. Open Data
November 2014
Page 27

Big Data vs Open Data - Diagram
Source: http://www.opendatanow.com/2013/11/new-big-data-vs-open-data-mapping-it-out/#.VGDCrfSG9Zt
As Joel Gurin points out: “there’s general agreement that Open Data should be free of charge or
cost just a minimal amount. Starting with some basic descriptions, the intersection of these three
concepts (big data, open data, open government) defines the six subtypes of data shown on the
Venn diagram. (There’s no separate category for the intersection of Big Data and Open
Government – anything in that category is also Open Data.) Here are characteristic examples of
each, referring to the numbers above.
1. Big Data that’s not Open Data. A lot of Big Data falls in this category, including some Big Data
that has great commercial value. All of the data that large retailers hold on customers’ buying
habits, that hospitals hold about their patients, or that banks hold about their credit-card holders,
falls here. It’s information that the data-holders own and can use for commercial advantage.
National security data, like the data collected by the NSA, is also in this category.
2. Open Government work that’s not Open Data. This is the part of Open Government that
focuses purely on citizen engagement. For instance, the White House has started a petition
website, called We the People, to open itself to citizen input. While the site makes its data
available, publishing Open Data – beyond numbers of signatures – is not its main purpose.
3. Big, Open, Non-Governmental Data. Here we find scientific data-sharing and citizen science
projects like Zooniverse. Big data from astronomical observations, from large biomedical projects
like the Human Genome Project, or from other sources realizes its greatest value through an open,
shared approach. While some of this research may be government-funded, it’s not “government
data” because it’s not generally held, maintained, or analyzed by government agencies. This
category also includes a very different kind of Open Data: the data that can be analyzed from
Twitter and other forms of social media.
4. Open Government Data that’s not Big Data. Government data doesn’t have to be Big Data to
be valuable. Modest amounts of data from states, cities, and the federal government can have a
major impact when it’s released. This kind of data fuels the participatory budgeting movement,
where cities around the world invite their residents to look at the city budget and help decide how
to spend it. It’s also the fuel for apps that help people use city services like public buses or health
clinics.
November 2014
Page 28

5. Open Data – not Big, not from Government. This includes the private-sector data that
companies choose to share for their own purposes – for example, to satisfy their potential investors
or to enhance their reputations. Environmental, social, and governance (ESG) metrics fall here. In
addition, reputational data, such as data from consumer complaints, is highly relevant to business
and falls in this category.
6. Big, Open, Government Data (the trifecta). These datasets may have the most impact of any
category. Government agencies have the capacity and funds to gather very large amounts of data,
and making those datasets open can have major economic benefits. National weather data and
GPS data are the most often-cited examples. U.S. Census data, and data collected by the
Securities and Exchange Commission and the Department of Health and Human Services, are
others. With the new Open Data Policy, this category will likely become larger, more robust, and
even more significant.
November 2014
Page 29

November 2014
Page 30
4 key steps
These are in very approximate order — many of the steps can be done simultaneously.
1. Choose your dataset(s). Choose the dataset(s) you plan to make open. Keep in mind
that you can (and may need to) return to this step if you encounter problems at a later
stage.
2. Apply an open license.
○ Determine what intellectual property rights exist in the data.
○ Apply a suitable ‘open’ license that licenses all of these rights and supports the
definition of openness.
○ NB: if you can’t do this go back to step 1 and try a different dataset.
○
3. Make the data available — in bulk and in a useful format. You may also wish to
consider alternative ways of making it available such as via an API.
4. Make it discoverable — post on the web and perhaps organize a central catalog to list
your open datasets.

4. Categories and Type of Data
Open can apply to information from any source and about any topic. Anyone can release their data
under an open licence for free use by and benefit to the public. Although we may think mostly
about government and public sector bodies releasing public information such as budgets or maps,
or researchers sharing their results data and publications, any organisation can open information
(corporations, universities, NGOs, startups, charities, community groups and individuals).
There is open information in transport, science, products, education, sustainability, maps,
legislation, libraries, economics, culture, development, business, design, finance. So the
explanation of what open means applies to all of these information sources and types.
Source: http://blog.okfn.org/2013/10/03/defining-open-data/#sthash.nXnXf8Bx.dpuf
November 2014
Page 31

Categories
Business and Legal services
Data/Technology
Education
Energy
Environment and weather
Finance and Investment
Food and Agriculture
Geospatial/Mapping
Governance
Healthcare
Housing/ real estate
Insurance
Lifestyle and Consumer
Media
Research and Consulting
Scientific Research
Transportation
November 2014
Page 32

The Open Data Consumers Checklist:
Source: http://theodi.org/guides/the-open-data-consumers-checklist
The Open Data Handbook:
Source: http://opendatahandbook.org/
The handbook introduces you to the legal, social and technical aspects of open data. It can be
used by anyone but is especially useful for those working with government data. It discusses the
why, what and how of open data — why to go open, what open is, and the how to do open. Read it
online or download a PDF .
November 2014
Page 33

4. Open data as part of your business model
Al-Debei and Avison (2010) derived a unified business model framework based on a
comprehensive review of the literature. They argue that the model provides an abstract but holistic
view and that the fundamental
dimensions are value based. There are four relevant aspects to the business model framework:
● Value proposition—the business logic for creating value for customers by offering products
● and services for targeted segments,
● Value architecture—an architecture for the technological and organizational infrastructure
● used in the provisioning of products and services,
● Value network—collaboration and coordination with other organizations, and
● Value finance—the costing, pricing, and revenue breakdown associated with sustaining and
improving the creation of value.
New business models and practices driven by social media and open data have hardly been
investigated. Exceptions are the analyses of companies in the United Kingdom (Hammell,
Perricos,Lewis, & Branch, 2012) and a classification of social business models based on the
revenue model (for instance, Ferro, 2012; Ferro & Osella, 2012; Ferro & Osella, 2013; Ubaldi,
2013).
Based on the analysis of a number of companies in the United Kingdom, five archetypes of
business models can be identified (Hammell et al., 2012).
These include:
(1) suppliers—public and private sector organizations—publishing the data,
(2) aggregators linking open data to produce useful insights,
(3) developers—organizations and individuals—building apps,
(4) enrichers using open data to enable their existing products and services, and
(5) enablers facilitating the supply and use of open data.
Ferro and Osella (2013) identify the following models:
1. Premium—end users are offered a service or product in exchange for payment.
2. Freemium—basic services or products are offered free of charge. Profit is made by having
end users pay for extended features.
3. Open source like—data are offered for free through cross subsidization.
4. Infrastructural Razor and Blades—data sets are stored for free and are accessible to everyone
via Application Programming Interfaces (APIs) (‘‘razors’’), while reusers are charged only
for the computing power that they employ on demand in as-a-service mode (‘‘blades’’).
5. Demand-oriented platform—the company provides developers with a one-stop shop of data
sets that are catalogued using metadata. Revenue is made in exchange for advanced services
and refined data sets or data flows.
6. Supply-oriented platform—this business model is quite similar to the previous one, but the
PSI providers are charged in lieu of developers.
7. Free as branded advertising—the company uses PSI as a tool to attract attention from
November 2014
Page 34

customers by providing them with useful services. The company expects that the public will
then favor its particular brand or company. Revenue is expected not to come directly from
PSI, but from other business lines that represent the company’s core business.
8. White label development—a company wants to use PSI as an attraction tool but does not
have the competencies required to do so. The company then uses an advertising factory,
which receives payment in the form of a lump sum or recurring fees in exchange for turnkey
solutions, depending upon whether the solution is in the form of a product or a service
(Ferro & Osella, 2013).
The revenue model can be payment by open data providers or users in the form of
(1) recurring fees, granting access for a specific time period, or pay per use,
(2) advertisement, or
(3) ensuring visibility for creating revenue for other activities (Ferro & Osella, 2013). Although these
eight options describe a complete array of possible business models, they are derived from the
revenue.
Infomediary Business Models for Connecting Open Data Providers and Users
Available here: http://ssc.sagepub.com/content/early/2014/01/30/0894439314525902.full.pdf+html
All infomediary business models can be developed and operated by either public or private
organizations. The business model might be initiated by public events (hackathons) but operated
by private party, yet when a best practice is adopted the roles can be reversed. The following six
business models were identified.
1. Single-purpose apps provide real-time services such as information about weather, quality of
restrooms, vehicles, houses, and pollution. These apps often provide a single function, based
on one type of open data provided. The app processes the data and presents it visually for the
ease of the users.
2. Interactive apps: In addition to single-purpose apps, this type of business model provides users
the opportunity to add content. Ratings are often included, as is additional information such as
complaints.
3. Information aggregators take many published open data sources and combine and process
them for subsequent presentation to the users. An example is a transportation planner that
aggregates information from various transport modalities and companies.
Often interoperability is a challenge that requires agreements among data providers.
4. Comparison models: This type of business model aggregates open data from various sources
for the purpose of comparing the performance of entities with each other. For example, it can be
used to compare schools and other public organizations. The data can originate from official
sources (school inspection) or from users (criminal chart) and used by citizens (in determining a
school for their children or a place to live) and public organizations (in developing measures to
improve schools or for crime interventions).
5. Open data repositories are used by governments to publish their information. These can be
national open data portals or more specialized portals, such as websites of statistical agencies.
The essence is that these portals are relatively closed and only a limited number of public
organizations can publish open data on them. There is little to no user interaction, and the focus is
on being able to indiscriminately open data sets. Searching for open data is a key aspect, although
it is often difficult to find the right information. They can provide basic functionalities for processing
and visualizing data.
November 2014
Page 35

6. Service platforms: These platforms provide all kinds of features for searching, importing,
cleansing, processing, and visualizing information. Service platforms often contain open data
repositories or are connected to open data repositories that function as the data source. Service
platforms can vary in the level of openness; some are based on payment (e.g., www.junar. com)
whereas others are free of charge (www.engagedata.eu ).
Further reading:
Business models for open data applications available at:
http://www.appsforeurope.eu/article/business-models-open-data-applications
November 2014
Page 36

5. CASE Studies: Open Data Business
Success stories about the open datastartups from the ODI Startup
Programme
November 2014
Page 37
Transport API
http://www.transportapi.com
Clients: Transport for London, Heathrow Airport. Greater London Authority, Citymapper, Elgin, Giraffe.co.uk,
Network rail
Products: TransportAPI
Achievements: Transport API solutions have powered award winning apps, such as Citymapper
The TransportApi story:
TransportAPI is Britain’s first comprehensive open platform for transport solutions. the company’s objective
is to enhance travel experience through real time information, and enable new transportation insights
through analytics. It uses open data feeds from key industry sources as Traveline, Network rail and Transport
for London. The company offers nationwide timetables, departure and infrastructure informations for
schedules, live departures and archived service running across all transport modes. The data feeds are
available for integration by web and app developers. Data Components such as the ‘nearest transport’ widget
can be used in travel portals, hyperlocal sites and business analytics.
TransportAPI currently has 700 developers and organizations signed up on its platform. They are individual
taxpayers, but also public sector organizations like universities and local authorities who are getting free data.
As Jonathan Raper, Managing director, says, “Our intervention in the market has led prices for transport data
fall and previous monopoly transport data providers to relax their terms.” The company also scales data
usage and provides a new, single source option for its customers, like Heathrow Airport, who now use
TransportAPI for all their public transport information. Jonathan further explain that “TransportAPI employs 6
people now and the tax we generate per year is nudging £75K”.

November 2014
Page 38
Mastodon C
http://www.mastodonc.com
Clients: Technology Strategy Board, CDEC’s Open Health Data platform, Nesta
Products: Kixi Data Platform
Achievements: Mastodon C identified £200m of potential savings to the NHS in its prescribing analytics project,
which investigated the use of branded statins over cheaper generic versions.
The Mastodon C story
Mastodon C helps businesses make sense of the proliferation of data that now exists, allowing them to make
better decisions. It does this using a cloud-based open source data processing and analytics platform, which it
customises to each client’s datasets. The team also applies data science techniques to gain insights, make
predictions and find business value from data, which is built back into client systems.
The team at Mastodon C uses open data together with the closed data that clients own. Francine Bennett, Co-
Founder and CEO at Mastodon C says: “We often find ourselves introducing clients to open data concepts
through our work, as we’ll suggest useful datasets which they can make use of to help their business.”

6. Where do I find open data?
A list open data catalogs
http://publicdata.eu/
https://open-data.europa.eu/it/data
http://datacatalogs.org/
http://planet.openstreetmap.eu
http://wikidata.org
dbpedia.org
November 2014
Page 39

7. How can you develop your open data business?
This chapter has been elaborated by the Finodex team and It’s already included in the Finodex
Handbook.
Summary:
In this chapter we provide basic knowledge regarding how you can develop your business
using open data. We’ll show how to generate a business model, exploring the components of
the Business Model Canvas in detail. In particular, we’ll offer an overview of open data
business models. In the case of reuse of PSI (Public sector information) Osella & Ferro have
developed an interesting framework “that focuses on decision-making levers that a business
developer has at his/her fingertips for molding the overarching architecture of a business
venture hinged on public data re-use”. They combined the framework with the business model
ontology by employing the Business Model Canvas in order to visualize archetypal business
models at an enterprise level. The tool has been proved very useful and could probably be
adopted in the development and assessment of any data intensive business venture. After
exploring eight business models we introduce the importance of the adoption of the Lean
methodology for business development, offering a case study of open data business
development in which the Lean approach has been used. Moreover, defining and setting your
business goals need a competitor analysis, which is also explained. Last but not least, we
describe the rights connected to using open datasets. Licensing and related issues of
compatibility between licenses are crucial when you deal with open data.
Index:
a. Business Modeling
b. Open Data Business models
c. Lean methodology
d. Competitor Analysis
e. Intellectual Property Rights
Introduction
In this chapter we provide essential knowledge regarding how you can develop your open data
business. We’ll show how to generate a business model, exploring the components of the
Business Model Canvas in detail. In particular, we’ll offer an overview of open data business
models. In the case of reuse of PSI (Public Sector Information) Osella & Ferro have developed an
interesting framework “that focuses on decision-making levers that a business developer has at
his/her fingertips for molding the overarching architecture of a business venture hinged on public
data re-use”. They combined the framework with the business model ontology by employing the
Business Model Canvas in order to visualize archetypal business models at an enterprise level.
The tool has been proved very useful and could probably be adopted in the development and
assessment of any data intensive business venture. After exploring eight business models we
introduce the importance of the adoption of the Lean methodology for business development,
offering a case study of open data business development in which the Lean approach has been
used. Moreover, defining and setting your business goals need a competitor analysis, which is also
explained. Last but not least, we describe the rights connected to using open datasets. Licensing
and related issues of compatibility between licenses are crucial when you deal with open data.
November 2014
Page 40

a) Business Modeling
A business model is a strategic tool that indicates how the company makes money specifying the
sources of the company’s revenues as well as how much and how often these sources are willing
to do that. Since its publication in 2004, the book “Business Model Generation” by Osterwalder and
Pigneur, soon has become the bible for startups and SMEs. In their book the authors explain the
so called Business Model Canvas (Figure 1), which is a tool that will help you to visually and
capture the components of a business model, and will assist you in the business model generation
process.
In order to keep track of all of your steps in creating your business model, you may want to
download here the “canvas” and start to write down all the assumptions and progress that you
make!
Figure 1. Business Model Canvas
Source: “A business model describes the rationale of how an organization creates, delivers, and
captures value” in Osterwalder & Pigneur, Business Model Generation, 2004.
According to Osterwalder, in order to build an effective business model you have to identify several
blocks. In the following we briefly list them. For each of them, rather than a theoretical description,
we provide a set of practical questions for you to answer. Down to work!
1. Customer segments
First of all, you need to define which customers you aim to reach. You have to answer two
important questions:
● For whom are we creating value?
● Who are our most important customers?
November 2014
Page 41

2. Value Proposition
You should provide to your customers a product or a service with an added value. The “value
proposition” is a statement that summarizes why potential consumers should buy your particular
product or service, and prefer it to similar offerings. In this case, you should answer the following
questions:
● What value do we deliver to the customer?
● Which one of our customer’s problems are we helping to solve?
● Which customer needs are we satisfying?
● What bundles of products and services are we offering to each Customer Segment?
Factors such as newness, performance, customization, design, brand/status, cost reduction, risk
reduction, accessibility, and convenience/usability can add value to your business. Your value
proposition may be qualitative (privileging customer experience and outcome) and/or qualitative
(price and efficiency).
3. Sales Channels
Once you have understood your value proposition and your customer segment, you need to take
care of channels able to deliver the value to your clients. You should ask yourself:
● Through which channels do our customer segments want to be reached?
● How are we reaching them now?
● How are our channels integrated? Which ones work best?
● Which ones are most cost-efficient?
● How are we integrating them with customer routines?
You can reach your clients either through your own channels (store front), your partner channels
(major distributors), or a combination of both.
4. Customer Relationships
Another important step: you have to identify the kind of relationship you establish with each of your
customer segments. These are the main questions you should answer:
● What type of relationship does each of our customer/segments expect us to establish and
maintain with them?
● Which ones have we established?
● How costly are they?
● How are they integrated with the rest of our business model?
The different types of customer relationships are: personal assistance, automated service,
communities and so on.
5. Revenue streams
You need to plan how you are going to generate cash through the customer segment (costs must
be subtracted from revenues to create earnings). The meaningful questions are:
● For what value are our customers really willing to pay?
● For what do they currently pay?
● How are they currently paying?
November 2014
Page 42

● How would they prefer to pay?
● How much does each Revenue Stream contribute to overall revenues?
There are several possibility to generate revenue streams such as asset sales, usage fee,
subscription fees, lending/leasing/renting, licensing, etc.
6. Key resources & key activities
You need then to understand what are the assets that will make your business model work. Hence
answer at the following questions:
● What Key Resources do our Value Propositions require?
● Our Distribution Channels?
● Customer Relationships?
● Revenue Streams?
● What are then the action you can do in order to make your business model work.
● What Key Activities do our Value Propositions require?
● Our Distribution Channels?
● Customer Relationships?
● Revenue streams?
7. Key partnerships
You will probably need to require the help of external help of partners and/or suppliers in order to
make your business model to work properly:
● Who are our Key Partners?
● Who are our key suppliers?
● Which Key Resources are we acquiring from partners?
● Which Key Activities do partners perform?
8. Cost structure
Last but not least, you want to consider what are costs you will incur as well as the consequences,
when you will start applying your business model on your product.
What are the most important costs inherent in our business model?
Which Key Resources are most expensive?
Which Key Activities are most expensive?
Further reading
● A. Osterwalder & Y. Pigneur, Business Model Generation, 2004
● Elements of a business plan, available online
b) Open data business models
In the case of PSI (Public Sector Information) reuse performed by private sector entrepreneurs,
many inherent roadblocks, coupled with a certain vagueness surrounding the rationale underlying
business endeavors, keep slowing the process down. The advent of the Open Data framework,
oriented towards data openness (i.e. open by default), poses new issues regarding the access to
November 2014
Page 43

information which occurs free of charge and different forms of payment may be required for
restricting the access to derivative works.
Two Italian researchers Michele Osella and Enrico Ferro (2012) developed a framework “that
focuses on decision-making levers that a business developer has at his/her fingertips for molding
the overarching architecture of a business venture hinged on public data re-use”.
Figure 2. Framework for PSI business model analysis by Osella & Ferro
Source: Osella & Ferro, “Business Models for PSI Re-Use: A Multidimensional Framework”, 2012
Figure 3. Framework for PSI business model analysis by Osella & Ferro
Source: Osella & Ferro, “Business Models for PSI Re-Use: A Multidimensional Framework”, 2012
November 2014
Page 44

While developing the framework surrounding the PSI reuse, they realize that it was not sufficient to
grasp the business logic and the mechanisms needed to build an effective strategy. A solution
came from the combination with Osterwalder's business model ontology, by employing the
Business Model Canvas (explained in the previous paragraphs) in order to visualize archetypal
business models at an enterprise level. The tool has been proved very useful and could probably
be adopted in the development and assessment of any data intensive business venture.
The result is the identification of eight business models currently employed by the actors present in
the Public Sector Information centric (PSI-centric) ecosystem. In particular, the choice of the
business model to adopt is function of the position covered in the value chain and of the strategic
choices made.
Why are they useful?
From a business model viewpoint, which is one of the perspectives on the PSI realm showed by
Osella here, our interest is to identify the steps needed to maximise the benefits for reusers of
open data, “a profit-driven reuse and value creation”.
You can find, in the following list, the eight business models as described by Osella and Ferro:
1. Premium Product / Service.
2. Freemium Product / Service. A classic example in this vein is represented by mobile apps
related to public transportation in urban areas.
3. Open Source. OpenCorporates and OpenPolis
4. Infrastructural Razor & Blades. Public Data Sets on Amazon Web Service
5. Demand-Oriented Platform. DataMarket and Infochimps
6. Supply-Oriented Platform. Socrata
7. Free, as Branded Advertising.
8. White-Label Development.. This business model has not consolidated yet, but some
embryonic attempts seem to be particularly promising.
In this paragraph we are exploring the identified eight business models more in details. The main
references are two papers co-authored by Ferro and Osella: “Business Models for PSI Re-Use: A
Multidimensional Framework” (2012) and “Eight Business Model Archetypes for PSI Re-Use”
(2013).
#1 Premium Product / Service: While implementing this business model, a core re-user offers to
end-users a product or a service presumably characterized by high intrinsic value in exchange for
a payment that could occur à la carte or in the guise of a recurring fee: while the former implies the
November 2014
Page 45

payment of an amount of money for each unit of product purchased (pay-per-use), the latter has
an "all-inclusive" nature since it grants for a given timeframe the access to certain features in
accordance with contractual terms. In this business model, probably associated to the
“mainstream” model by the majority of analysts, the high intrinsic value, coupled with the price
mechanism, calls for B2B customers (often called “high-end market”) and for long or medium terms
relationships going beyond single transactions (Osella & Ferro, 2013).
Figure 4. Premium Product / Service (framework view)
Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013
November 2014
Page 46

Figure 5. Premium Product / Service (“Canvas” view)
#2 Freemium Product / Service. Core re-users resorting to this business model offer to end-users
a product or a service in accordance with freemium price logic: one of the offerings is free-of-
charge and entails only basic features, while customers willing to take advantage of refined
features or add-ons are charged. In the PSI realm, the implementation of this business model has
its roots in limitations deliberately imposed by the core re-user in terms of data access: as a result,
ad-hoc payments may be required to enjoy advanced features, to have recourse to additional
formats or, sometimes, to weed out advertising. In contrast with the previous model, here the
prominent target market is the consumer one (often called “low-end market”) with which the firm
establishes medium or short terms relationships that usually do not involve the customization.
Target customers are generally reached via the Web or via the mobile channel, which are
promising to “hit” a considerable number of installed bases. (Osella & Ferro, 2013).
Fi
gure 6. Freemium Product / Service (“Canvas” view)
source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013
#3 Open Source Like. This very peculiar business model takes place on top of products, services,
or simple unpackaged data that are provided for free and in an open format. In terms of
economics, a cross-subsidization occurs in the enterprise under examination since the costs
incurred for free offering of data are covered by revenues stemming from supplementary business
lines that are still PSI-based: in fact, trickles of revenue for the core re-users may stem only from
added-value services or from license variations (dual licensing). The resemblance with Open
Source software is given by the fact that in this circumstance data is provided in a totally open
format that allows free elaboration, usage and redistribution without any technical barrier (Osella &
Ferro, 2013).
November 2014
Page 47

.
Figure 7. Open Source Like. (“Canvas” view)
#4 Infrastructural Razor & Blades. Entering in the realm of enablers, this business model is
chosen by enterprises acting as intermediaries that facilitate the access to PSI resources by profit-
oriented developers or scientists not driven by commercial intent. As it happens in the well-known
model “razor & blades”, the value proposition hinges on an attractive, inexpensive or free initial
offer (“razor”) that encourages continuing future purchases of follow-up items or services (“blades”)
that are usually consumables characterized by inelastic demand curve and high margins. Applying
this model in the PSI environment, datasets are stored for free on cloud computing platforms being
accessible by everyone via APIs (“razor”) while re-users are charged only for the computing power
that they employ on-demand in as-a-service mode (“blades”). This business model exhibits another
case of cross-subsidization whereby profits accrued from the provision of on-demand computing
capacity cover costs attributable to the storage and maintenance of data. Finally, it goes without
saying that application of this model is limited to contexts and domains in which the computational
costs are significant (Osella & Ferro, 2013).
November 2014
Page 48

Figure 8. Infrastructural Razor and Blades (“Canvas” view)
#5 Demand-Oriented Platform. Following this business model, the enabler acting as intermediary
provides developers with easier access to PSI resources that are stored on proprietary servers
having high reliability. Once collected, PSI datasets are subsequently catalogued using metadata,
harmonized in terms of formats and exposed through APIs, making it easier to dynamically retrieve
data in meaningful way. As a result, a wide range of critical issues pertaining to original raw data
are made irrelevant due to the usage of platforms capable to convert datasets in data streams,
contributing significantly to the "commoditization" and "democratization" of data. In addition,
developers may reap the benefits given by the "one stop shopping" nature of such platforms: they
may resort to one supplier and access a variety of information resources through standardized
APIs - even beyond the borders of the PSI - without having to worry about interfaces connecting to
each original source. This “procurement” approach is crucial to minimize search costs and, by
consequence, transaction costs. In terms of pricing, as a good that was born free and open (such
as Open Government Data) cannot be charged in absence of added value on top of it, enablers
adopting this business model earn revenues in exchange for advanced services and refined
datasets or data flows. To sum up, re-users are charged according to a freemium pricing model
that sets the boundary between free and premium in light of feature limitations (Osella & Ferro,
2013).
November 2014
Page 49

Figure 9. Demand-oriented platform (“Canvas” view)
#6 Supply-Oriented Platform. To conclude with enablers, this business model entails the
presence of an intermediary business actor having again an infrastructural role. However, on the
contrary of the previous case, according to this logic PSI holders are charged in lieu of developers.
In fact, the enabler, following the
golden rules of two-sided market, fixes the price according to the degree of positive externality that
each side is able to exert on the other one. Consequently, this approach is beneficial for both sides
of the resulting arena: from developers’ perspective, their barriers are wiped out (i.e., they can
retrieve data without incurring cost) while, from the governmental angle, PSI holders become
platform owners taking advantage of some handy features such as cloud storage, rapid upload of
brand-new datasets by public employees, standardization of formats, tagging with metadata and,
above all, automated external exposure of data via APIs and GUI. Public agencies that adhere to
such programs in order to dip their toes into the water of Open Data establish long term
relationships with providers and are required to pay a periodic fee that depends on the degree of
sophistication characterizing the solutions purchased and on some technical parameters (Osella &
Ferro, 2013).
November 2014
Page 50

Figure 10. Supply-oriented platform (“Canvas” view)
#7 Free as Branded Advertising. Service advertising is an emerging form of communication
aimed at encouraging or persuading an audience towards a brand or a company. Conversely to
the more famous “display advertising”, where commercial messages are simply visualized, in
service advertising the advertiser strives to conquer the customer by providing him or her with
services of general usefulness. That said, in the PSI realm, services offered in this way do not
generate any direct revenue but they are supposed to bring positive return in a broad sense,
driving economic results on other business lines - unrelated to PSI - that represent the enterprise’s
core business. The rationale fuelling this “enlightened” business model is twofold. Firstly, it may be
based on a powerful advertising boost that leads the company to consider the cost as a
promotional investment in the marketing mix. Secondly, it seems to be very convenient in presence
of zero marginal costs, a situation that occurs when the costs of distribution and usage are not
significant (Osella & Ferro, 2013).
November 2014
Page 51

Figure 11. Free as Branded Advertising. (“Canvas” view)
#8 White-Label Development. Last but not least, if service advertisers do not have in-house
sufficient competencies required to develop their business endeavors, they can knock the door of
advertising factories. Such firms, in fact, come into play as outsourcers carrying out duties that
otherwise would be handled by service advertisers. Hence, the development of PSI-based
solutions is particularly compelling for companies willing to use PSI as "attraction tool" but not
equipped with competencies required to do so (e.g., data retrieval, software development, service
maintenance, marketing promotion). In order to let the service advertiser’s brand stand out,
solutions are developed in a white-label manner, i.e., shadowing the outsourcer’s brand and giving
full visibility to the sole service advertiser’s brand. Taking into account the “one stop shopping
supply” and the business-criticality of the solutions in terms of corporate image, the resulting one-
to-one relationship between provider and customer is tailor-made and “cemented”.
Concerning financials, advertising factories collect lump-sum payments or recurring fees in
exchange for turn-key solutions so developed, depending on whether the crafted solution takes the
form of product or service: whilst in the former case service advertisers perceive the cost as
CAPEX, in the latter one the respective cost assumes an OPEX nature (Osella & Ferro, 2013).
November 2014
Page 52

Figure 12. White Label Development. (“Canvas view”)
Case studies
You can find a lot of examples of companies that employ the business models described above
here. Herein we describe one example on the freemium model. A variety of web applications use
the freemium business model. The free product or service here is subsidised through a paid-for
product or service that offers some kind of added value on top of what is made available as open
data. The free product acts as marketing, establishing the provider in the marketplace and
increasing the take-up of the paid-for product (The ODI Guide, How to make a business case for
open data). One way of using a freemium model is to release your open data using a share-alike
license. This ensures that organisations who do things with your data have to either openly share
their results (which means you can benefit from what they do) or have to negotiate with you to be
able to use the data under a different (potentially charged) license.
OpenCorporates uses this business model, licensing their database with a share-alike license
while offering paid-for licenses for companies who do not want to share their data.
Another approach to a freemium model is to offer a paid-for product that:
● incorporates additional data, perhaps from third-party sources
● is provided in a different format from the open data
● is more up-to-date, complete or detailed than the open data
● is the result of an analysis or model based on the released open data
November 2014
Page 53

● is a dump of data that can otherwise be accessed through an API
Alternatively, you could offer a paid-for service based on the open data you are publishing that:
● provides an API over open data that can otherwise be accessed as a dump
● provides availability guarantees through a Service-Level Agreement
● removes rate limits
Recently the U.S. Government has launched a new section of the open government data catalog,
data.gov. The new sub-domain “Impact” profiles companies that are making use of open
government data.
References and further reading
● The Open Data Institute, How to make a business case for open data, available on line.
● Alex Howard, Open data economy: Eight business models for open data and
insight from Deloitte UK, available here.
● Elements of open data startups, presentation available here.
● Enrico Ferro, Emerging Business models in PSI reuse, available here.
● E.Ferro & M.Osella, Business Models for PSI Re-Use: A Multidimensional Framework,
2012 available on line.
● E.Ferro & M.Osella, Eight Business Model Archetypes for PSI Re-Use, 2013 available on
line.
c) Lean methodology
After exploring the eight business models on which the PSI reuse relies on, we introduce the
importance of the adoption of the Lean methodology for business development. You have already
identified the opportunities offered by the reuse of open data by employing the Business Model
Canvas and the framework developed by Osella and Ferro and now you want to start developing
your own business.
Lean methodology is a method for developing businesses and products with the goal to find
product-market fit and make a cash flow positive and sustainable company before it runs out of
money. “Validated learning,” experimentation, testing, measurement actual progress and learn
what customers really want are the main pillars of the methodology. All the process, then, should
be accomplished as fast as possible and as cheap as possible. Pioneers of the Lean Startup
movement are Steve Blank (The startup owner’s manual: the step by step guide for building a
company, 2012; The four steps to the epiphany, 2006) and Eric Ries (The Lean Startup, 2011).
The lean approach aims at being as much effective as possible in achieving your final goal.
According to lean methodology you should follow a build-measure-learn feedback loop.
Ideas > build > product > measure > data > learn > ideas > and so on (circle)
November 2014
Page 54

Figure 13. Build-measure-learn feedback loop
Image source: Andrew Walpole, Build - Measure - Learn Feedback Loop infographic, 2013
Here we explain the loop step by step:
1) Idea:
When you process your idea keep in mind that the final goal is to provide benefit to your customer,
the rest is just waste of time. So, first of all, ask yourself:
➢ Can I build a sustainable business around this set of products and services?
What you want to achieve is, in fact, a compromise between your vision and what your customers
would accept.
Hence, you want to focus on an idea that answers a problem that really needs a solution. You want
also to make explicit all implicit assumptions you are making on how you can create a business on
that idea.
Please, answer at the following questions before building your product:
➢ Do consumers recognize that they have the problem you are trying to solve?
➢ If there was a solution, would they buy it?
➢ Would they buy it from you?
➢ Can you build a solution for that problem?
“Success is not delivering a feature; success is learning how to solve the customer’s problem.”
(Eric Ries, The Lean Startup, 2011).
2) Build:
Develop a minimum viable product (MVP) in order to start learning process as soon as possible.
➢ MVP
A minimum viable product is a version of a new product or feature which allows to test the
assumptions you made. When you are building your MVP, remove any feature, process or effort
that does not contribute directly to the learning you seek. When you will test your MVP you will
learn which elements of your product or strategy are not appropriated.
3) Measure:
November 2014
Page 55

When MVP is establish, measure how your customer respond build on metrics that can lead to to
cause and effect questions. Metrics have to show a clearly deﬁned action to take once analyzed.
Examples:
➢ A/B Split-Test Results
➢ Per-customer metrics
➢ Direct customer feedback
4) Learn:
Analyze your product, feedback and metrics to assess your progress in an objective way.
➢ Validate learning
“Validated learning” means that you need to run experiments that you have to scientifically validate
based on empirical data collected by real customers that allow you to test each element of your
vision.
During the all process should utilize an investigative development the so called "Five Whys"-asking
yourself simple questions to study and solve problems along the way. When this process of
measuring and learning is done and you made small changes for optimizing your product, you
should be able to understand whether the drivers of your business model are appropriate or not
and decide to pivot or persevere.
Figure 14. Description Step by Step of the feedback loop
Image source: Andrew Walpole, Build - Measure - Learn Feedback Loop infographic, 2013
November 2014
Page 56

Pivot:
If you decide to pivot you need to take a big change in the direction or make structural course
correction to test new ideas/hypotheses about the product, strategy and engine of growth and start
the cycle once again from the beginning. If your new experiment runs in a more productive way
than the experiments you were running before it is probably a sign that you made a successful
pivot.
Persevere:
If you think that your test is going in the right direction then you should continue to test more
assumptions and build towards executing your current vision.
The lean methodology underlines the importance of experimenting in order to learn. Pivoting is just
a part of the process - “if you cannot fail, you cannot learn.” (Eric Ries, The Lean Startup, 2011).
Until a precise business model is found, it is important to keep your initial vision. This way,
adjustments can be made to the model without reassessing the entire market.
Lean approach in open data business development: a case study
Steve Blank mentions a story of a startup called Tidepool as the perfect example to be studied in
order to demonstrate the power of the customer development, one of the key parts in Lean
Methodology. Tidepool team were severely criticized about their business model. They began
believing they were selling an open data and software platform for people with Type 1, Diabetes
into a multi-sided market comprised of patients, providers, device makers, app builders and
researchers. They firstly reduced what they thought was a five-sided market to a simpler two-sided
one. But the big payoff came when their discussions with medical device customers revealed an
entirely new way to think about pricing - potentially tripling their revenue.
Figure 15. Screenshot of Tidepool home page
Image source: http://tidepool.org
Further reading
● Eric Ries, The Lean Startup, available online
● Steve Blank, The Four Steps to the Epiphany, available online
● Steve Blank & Bob Dorf, The Startup Owner Manual: The Step by Step Guide for Building a
Great Company, available online
● Steve Blank, When Customer Make you Smarter, available online.
● Andrew Walpole, Build - Measure - Learn Feedback Loop, available online
● The Lean Startup Methodology, available online
Learning resources
● Steve Blank, How to Build a Startup, available online
November 2014
Page 57

● Steve Blank, Lean Customer Development - Part 1, available online
● Steve Blank, Lean Customer Development - 3 tool for startups, Part 2, available online
● Steve Blank, Lean Customer Development - Customer Development in action, Part 3 - 3
tool for startups, available online
● Steve Blank, Lean Customer Development - Closing, Part 3, available online
November 2014
Page 58

8.Open Data training materials already available. A list
● Useful links by the ODI use on our 3-day Open Data in Practice course here
● Slides used in the business sections on ODI’s Open Data in Practice course here
● ODI’s stories section : good place to find examples of real world impact.
● It's also worth looking at ODI Start-Ups page for ways entrepreneurs are using open data
to build new businesses. You'll find details of business approach, short pitch videos and
for some of the companies case-studies.
● You can explore all the materials and tutorials released by the team of School of Data.
You can find interesting guides at http://schoolofdata.org/courses/
November 2014
Page 59

9.SLIDES and inspiring presentations: link-o-graphy
http://www.slideshare.net/MicheleOsella
http://www.slideshare.net/search/slideshow?searchfrom=header&q=open+data+business
http://www.slideshare.net/OReillyStrata
http://www.slideshare.net/TheODINC
http://www.slideshare.net/MGHProfessional/leading-with-data?qid=9626d5fe-9a72-4e37-9bcf-
579ef5d75c88&v=qf1&b=&from_search=1
http://www.slideshare.net/JenvanderMeer/strata-open-data-its-not-just-for-govts2112014?
qid=9626d5fe-9a72-4e37-9bcf-579ef5d75c88&v=default&b=&from_search=15
http://www.slideshare.net/deirdrelee/deirdre-lee-opendata?qid=9626d5fe-9a72-4e37-9bcf-
579ef5d75c88&v=qf1&b=&from_search=8
http://www.slideshare.net/WorldBankGroupFinances/world-bank-gurin?qid=9626d5fe-9a72-4e37-
9bcf-579ef5d75c88&v=qf1&b=&from_search=6
http://training.theodi.org/resources/ODP_Business.pdf
http://theodi.github.io/presentations/2013-10-tsb-workshop-tom.html#/cover
http://www.slideshare.net/napo/a-dive-into-open-data
November 2014
Page 60

10. Videos, Audio files and books
So you want to build an open data business?
https://www.youtube.com/watch?v=jNscjJ5DetM
The value of open data to business - the Open Data 500 Study
http://theodi.org/lunchtime-lectures/friday-lunchtime-lecture-the-value-of-open-data-to-business-
the-open-data-500-study
Learning from New York City’s open-data effort
http://www.mckinsey.com/insights/public_sector/learning_from_new_york_citys_open_data_effort
Some useful webinars:
http://www.socrata.com/webinars/
Opening up open data: An interview with Tim O’Reilly
http://www.mckinsey.com/insights/business_technology/opening_up_open_data_an_interview_wit
h_tim_o_reilly
What is Open Data and how can it transform your business?
https://www.youtube.com/watch?v=hXZaf08gjfo
A very interesting list of recommended books is available here:
https://github.com/theodi/training-web/blob/gh-pages/Bibliography/index.md
November 2014
Page 61

FINODEX open data training

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to FINODEX open data training

Similar to FINODEX open data training (20)

More from Miguel García González

More from Miguel García González (15)

Recently uploaded

Recently uploaded (20)

FINODEX open data training