Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Creating Open Data with Open Source (beta2)
1. Creating Open Data with
Open Source
Sammy Fung
sammy.hk
[ITFest.HK] Seminar of Free / Open Source in Hong Kong, April 2013.
2. Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source projects to create
Open Data.
3. Sammy Fung
● Software Developer using open source.
– Perl → PHP → Python.
– Data Mining / Web Crawling.
– Also deploying OpenStack Cloud and Linux Solutions.
● Open Source Community Leader.
– opensource.hk, HKLUG, GNOME Asia committee, Mozilla
Rep, and program committee member of the largest
Taiwan open source conference - COSCUP.
● Blogger at sammy.hk.
4. Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
5. Open Data
● Tim Berners-Lee, the inventor of the Web.
– 5stardata.info
– 5 star deployment scheme of Open Data.
6. * One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
7. ** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
8. *** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
9. **** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
10. ***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
11. Open Data from HK Government ?
● 2 Use Cases of Data:
– Legco Meeting Minutes and Voting Results.
– Weather at Data.One.
14. Legco Meeting Minutes
and Voting Results
● All legco voting results are scanned and
released in PDF, it is only possible to retrieve
voting results manually.
● In recent years, it seems scanned minutes
from sheets scanned are replaced by minutes
converted from original computer document
files.
15. Improving Legco Vote Result Data ?
● Legcovotes.net is created by Hong Kong
netitizens(?).
● Only 20 famous vote results are included.
● It is possible to let public to input other vote
results by hand, and submissions should be
verified by legcovotes.net authoritative.
● Including other data, eg. Minutes in plain text
or paragraphs related to a counciler.
16. Weather at Data.One
● My Chinese Blog Post 「香港政府機構開放資
料 Open Data 情況」 on 2013/1/17.
● Data.One released on 2011/3/31.
● Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
– One word: Useless.
– Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
17. Weather at Data.One
● Example - Current local weather report:
● Plain text report in RSS.
● Difference to quote report content:
– Website: a pair of HTML tags, eg. <PRE>....</PRE>.
– Data.One: a pair of RSS description tags,
<description>....</description>.
● Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
18. Weather at Data.One
● Weather at Data.One is 'report' but not 'data'.
● Weather RSS is already released by HKO
before launch of Data.One.
● Technically, json/xml format is better
readable by computer programs.
19. Oversea Open Data Project
Examples
● Toronto:
– City Data: http://map.toronto.ca/wellbeing/
– Transportation: http://www.rocketradar.net/
– Pollution: http://www.emitter.ca/
● US & Canada:
– https://www.crimereports.com/
20. Use of Open Source Software in
Web Crawling
● Use Open Source Tools to collect useful and
meaningful machine-readable data.
● Doesn't need to wait provider to release data
in machine-readable format.
21. Open Source Tools
● Python programming lanugage
● with Regular Expression library
● Scrapy web crawling framework
22. Why python + scrapy ?
● python: my current favourite programming
language for few years.
● scrapy: web crawling framework written in
Python.
23. Scrapy
● scrapy: web crawling framework written in
Python.
● HtmlXPathSelector
● Output: built-in JSON, CSV, XML.
● Python: import re
32. Starting new Open Source projects
to create Open Data
● Develop a open source project.
● Release data in standard machine-readable
data format.
33. Open Source Project Examples
● Hk0weather
● My weather related open source project.
34. hk0weather
● https://github.com/sammyfung/hk0weather
● Open Source Hong Kong Weather Project.
● convert to JSON data from HKO webpages.
● python + scrapy
● 1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
42. hk0weather
● Future Planning:
● Add more weather reports.
● Getting ideas and/or cooperate with 'pro'
Weather hobbists.
● Remarks:
● Development of hk0weather is started from
ZERO, its code is different than my twitter
@weatherhk.
43. Challenge
● Challenge on first day of hk0weather release.
● Director of a mobile app developer company
told me by leaving a Facebook comment.
– HKO provides data in pretty XML format with their
annual service plan for commerical companies.
– He think that ***MAYBE*** HKO would provide XML
to you ***without*** any charges if I asked.
● Remark: This is an assumption only, not listed on HKO
website.
44. Challenge
● I replied the following to him after googling for HKO XML
schema.
– HKO didn't mention 'free of charge service' of XML data feed on
website.
– I registered and got authorization from HKO to re-distribute their
weather information for non-profit making. And I received some
emails from HKO for any updates of website and HTML structure,
but never mention about XML data feed service.
– Weather data available on HKO XML data feed is still fewer than its
HTML website.
●
So, this challenge is FAIL! XD
45. Open Data Project Examples
● Open Government initiative from HKU JMSC.
● http://opengov.jmsc.hku.hk/
● https://github.com/jmschku
46. Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source projects to create
Open Data.