4. You are free to:
- copy, publish, distribute and transmit
the Information;
- adapt the Information;
- exploit the Information commercially
for example, by combining it with other
Information, or by including it in your
own product or application
5. You must:
- acknowledge the source of the Information by including
any attribution statement specified by the Information
Provider(s) and, where possible, provide a link to this
licence;
- ensure that you do not use the Information in a way that
suggests any official status;
- ensure that you do not mislead others or misrepresent
the Information or its source;
- ensure that your use of the Information does not breach
the Data Protection Act 1998 or the Privacy and
Electronic Communications (EC Directive) Regs 2003.
6. Exemptions:
- personal data;
- Information that has neither been published
nor disclosed under information access
legislation (FOI) by or with the consent of the
Information Provider;
- departmental or public sector organisation
logos, crests etc;
- third party rights the Information Provider is
not authorised to license;
- Information subject to other IPR
8. Availability and Access: the data must
be available as a whole and at no more
than a reasonable reproduction
cost, preferably by downloading over
the internet. The data must also be
available in a convenient and
modifiable form.
The Open Knowledge Foundation
9. Reuse and Redistribution: the data
must be provided under terms that
permit reuse and redistribution
including the intermixing with other
datasets.
The Open Knowledge Foundation
10. Universal Participation: everyone must be able
to use, reuse and redistribute – there should
be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or restrictions
of use for certain purposes (e.g. only in
education), are not allowed.
The Open Knowledge Foundation
A great example of timely data is data relating to roadworks. This data is often released in an impenetrable form, screeds of text detailing roadnames nobody uses and identifying in arcane language where roadworks are to take place, and what diversions have been put in place. Why is it so hard to just publish the data as KML that can be rendered trivially in an online map?!
Another example that demonstrates how CSV can be used to help data flow is demonstrated by Google Spreadsheets. The =importData formula allows a user to specify a source data URL, and pull the CSV data found at that location in to the spreadsheet. Unlike Many Eyes Wikified, if the source data at the URL is updated, the updated will (eventually) be pulled into the spreadsheet automatically.
One of the really good reasons for getting data into a data processing environment such as a spreadsheet is that you can start to work it. In the case of Google Spreadsheets, the spreadsheet environment can also be used as a database environment. That is, we can treat one or more data containing sheets in a spreadsheet as a database, and generate new views over the data, as well as running queries over that data.
Another way of using a Google Spreadsheet as a database is via the Google Spreadsheets API. The GoogleVisualisation API (?) provides a way of passing queries written using the Google ???viz query language from an arbitrary web page or web application, and receiving the resulting data in a standard JSON based format, which also happens to play nicely with the Google Visualisation API???The Guardian Datastore explorer is a crude demonstration for 2009(??) demonstrating how data from the Guardian datastore, data that is stored across a range of Google spreadsheets, can be explored , queried and visualised via these APIs. Users can select a dataset from a drop down menu, fed from a delicious account to which various datastore spreadsheets have been bookmarked using a particular set of tags, or by pasting in the URL of an arbitrary (public) Google spreadsheet. The first row/headings of the data can then be previewed (a simple spreadsheet is assumed, in which column headings appear In the first row of the spreadsheet).
A series of list boxes are then populated with the column labels and there names, and provide a certain amount of help for the creation of a query over the spreadsheet data. A range of output formats can also be selected, from simple HTML data tables, to a range of charts. URLs are also generated for HTML and CSV representations of the data returned from the query.
One of the nice things about the data table widget (a standard GoogleVisualisation API component in this case, though similar examples exist for YUI, the Yahoo User Interface Libraries, or frameworks such as JQuery), is that is supports things like row sorting by column, (for free – no programming required!), allowing even further manipulation of the data, albeit at a simplistic level.(It’s probably worth pointing out here that it may be worth providing a preview of the column headings and first few rows (or a sample of random rows) of data when datasets are published, just so that users can see what sort of data is on offer without having to download the whole data set?)
If you’re in the business of selling information as data, you are under threat where that information is published in an openly licensed way.