SlideShare uma empresa Scribd logo
1 de 68
Baixar para ler offline
Use of Open Data in Hong Kong
Sammy Fung
sammy.hk
Incu-Lab ICE in StartMeUpHK - Open Data Initiative Gathering
2013/12/04
We want a better life with
public data.
We want a easier way to
access the public data.
Agenda
●

What is Open Data ?

●

Use of Open Source Software in web crawling.

●

Starting new Open Source project hk0weather
to create Open Weather Data.
Sammy Fung
●

Software Developer
–

to use and develop open source sofware.

–

Perl → PHP → Python.

–

interests on Data Mining / Web Crawling.

–

own a startup of web and mobile technology.
Sammy Fung
●

15+ years in Open Source Communities.
–

Founding Chairman, Hong Kong Linux User Group.

–

Founding Chairman, Open Source Hong Kong.

–

Member, GNOME Asia committee.

–

Mozilla Representative

–

Member, program committee at COSCUP
●

Conference for Open Source Coders, Users and Developers.

●

Largest open source conference in Taiwan.
What is Open Data ?
Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
Open Data
●

Tim Berners-Lee, the inventor of the Web.
–

5stardata.info

–

5 star deployment scheme of Open Data.
* One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
*** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
**** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
Open Data in Hong Kong
Open Data in Hong Kong
●

Data.One
–

http://www.gov.hk/en/theme/psi

–

released on 2011/3/31.

–

First App Competition on Data.One
●

Call for Submission now till 2014/02/28.
Weather Information in Hong Kong
●

Hong Kong Observatory
–

Hourly Hong Kong Weather Report

–

Regional Weather in Hong Kong (10 min updates)

–

Weather Forecast and Weekly Weather Forecast

–

Typhoon Report and Forecast
Hong Kong Observatory RSS
Hong Kong Observatory RSS
Weather at Data.One
●

●

I posted a blog 'Progress of Open
Government Data in Hong Kong' on
2013/01/17.
Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
–

One word: Useless.

–

Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
Weather at Data.One
●

Example - Current local weather report:

●

Plain text report in RSS.

●

Difference to quote report content:
–
–

●

Website: a pair of HTML tags, eg. <PRE>....</PRE>.
Data.One: a pair of RSS description tags,
<description>....</description>.

Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
Weather at Data.One
●

●

●

Weather at Data.One is 'report' but not 'data'.
Weather RSS is already released by HKO
before launch of Data.One.
Technically, json/xml format is better
readable by computer programs.
Data.One
●

In November 2013, 43 datasets are available.
–

JSON/XML = 18

–

RSS = 10

–

XLS = 6

–

CSV = 4

–

JPG/PNG = 3

–

HTML/MDB = 2
Data.One
●

JSON/XML (18 datasets)
–

Air Pollution.
●

Past 24-hour Air Pollution Index from stations.

–

Approved Charitable Fund-raising Activities

–

Restaurant and Food Licences.

–

Details of facility locations.

–

Reward Notices from Police Force.

–

Marine Traffic (Arrival/Departure).

–

Traffic Speed and special news.

–

EventHK information.
Data.One
●

RSS (10 datasets)
–

Weather Information (7 datasets)

–

Beach Water Quality (1 datasets)

–

Current Air Pollution Index range and forecase (2
datasets)
Data.One
●

JPG/PNG (3 datasets)
–

Exhibition gallery of government building
projects.

–

Speed map panels.

–

Traffic snapshot images.
Data.One
●

CSV
–
–

Locations of Public Facility and GovWifi

–
●

Past Record of Air Pollution Index
Marine Shipping directory of HK

HTML
–

●

HTML version of Marine Traffic.

XLS, MDB
–

2011 Population Census.

–

Property Market Statistics.

–

Monthly Digested Stats and Registers of Auth Persons from Building Dept.

–

Routes and fares of public transport.
Data.One
●

Many departments does not release their useful data, and
release current information available on their website.
–

●

Few of them keep available open data in their own.

Most of them does not understand what is 'real' open data.
–
–

Open data format insteads of proprietary data format.

–
●

Data insteads of Information.
Useful of data.

Some departments should manage their open data in better
data structure.
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
●

●

●

In October 2013, LegCo start to publish voting
results of House Committe in XML.
It is not a part of Data.One project.
My open source software on LegCo vote
result XML:
–

http://github.com/smamyfung/legcovotes
Open Data is important to citizens.
User of Open Source
Software in web
crawling
Web Scraping
●

a computer software technique of extracting
information from websites. (Wikipedia)

●

for business, hobbies, research purposes.
Web Scraping
●

Look for right URLs to scrap.

●

Look for right content from webpages.

●

Saving data into data store.

●

When to run the web scraping program ?
Use of Open Source Software in
Web Crawling
●

●

Use Open Source Tools to collect useful and
meaningful machine-readable data.
Doesn't need to wait provider to release data
in machine-readable format.
Open Source Tools
●

Python programming lanugage

●

with Regular Expression library

●

Scrapy web crawling framework
Why python + scrapy ?
●

●

python: my current favourite programming
language for few years.
scrapy: web crawling framework written in
Python.
What is Scrapy ?
●

●

An open source web scraping framework for
Python.
Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
Scrapy Features
●

define data you want to scrapy

●

write spider to extract data

●

Built-in: selecting and extracting data from HTML
and XML

●

Built-in: JSON, CSV, XML output

●

Interactive shell console

●

Built-in: web service, telnet console, logging

●

Others
Programme List of Paid TVs in 2004
Programme List of Paid TVs in 2004
●

I want to know live football match was
showing on which channel.

●

Paid TV web site = M$ + IIS + ASP + Flash

●

Slow....... Very Slow...... Extremely Slow!

●

Couldn't connect at any peak hours!

●

Wrote my first web crawler in PHP in 2004.
Public Transportation in 2006-2010
●

Kowloon Motor Bus (KMB)
–

●

No map view for a bus route

Public Transportation Enquiry System (PTES)
–

Exteremly Poor, Ugly (or much worse) map UI on
PTES.
HK Observatory and Joint Typhoon
Warning Center
●

Any typhoon is coming to Hong Kong ? And
When will it come ?

●

No easy data exchange format.

●

No RSS nor ATOM.

●

We aren't check websites everyday.
My Products
●

WeatherHK ← ← ←

●

TCTrack
WeatherHK
●

http://twitter.com/weatherhk

●

hourly current weather report

●

weather forecast report

●

tropical signal warning
WeatherHK
●

●

Backend: Python + Scrapy + Database +
Twitter + NNTP......
Frontend: Twitter + Newsgroup
WeatherHK
●

http://twitter.com/weatherhk

●

Interview by MetroPop in 2009.
My Products
●

WeatherHK

●

TCTrack ← ← ←
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Plot TC current and forecast tracks over
Google Map.
Source:
–

JTWC

–

HKO
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Probably first tctrack map in HK using
GoogleMap
Use of GMap: TCTrack -> Weather
Underground Hong Kong -> HKO
TCTrack
●

http://twitter.com/tctrack

●

Tweet JTWC updates for Northwest Pacific.
Releases information to citizens
in a better presentation.
Starting new Open
Source project
hk0weather to create
Open Weather Data.
Starting new Open Source projects
to create Open Data
●

●

Develop a open source project.
Release data in standard machine-readable
data format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

Open Source Hong Kong Weather Project.

●

convert to JSON data from HKO webpages.

●

python + scrapy

●

1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

$ virtualenv hk0weatherenv

●

$ source hk0weatherenv/bin/activate

●

$ pip install scrapy

●

$ git clone
https://github.com/sammyfung/hk0weather.git

●

$ cd hk0weather

●

$ scrapy crawl currwx -t json -o testresult
hk0weather
●

Python
–

●

import re

Scrapy
–

web crawling framework written in Python.

–

HtmlXPathSelector.

–

built-in JSON, CSV, XML output.
hk0weather
[{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720},
{"station": "kingspark", "temperture": 16, "time": 1360785720},
{"station": "wongchukhang", "temperture": 17, "time": 1360785720},
{"station": "takwuling", "temperture": 16, "time": 1360785720},
{"station": "laufaushan", "temperture": 15, "time": 1360785720},
{"station": "taipo", "temperture": 16, "time": 1360785720},
{"station": "shatin", "temperture": 17, "time": 1360785720},
{"station": "tuenmun", "temperture": 17, "time": 1360785720},
{"station": "tseungkwano", "temperture": 16, "time": 1360785720},
{"station": "saikung", "temperture": 16, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "tsingyi", "temperture": 17, "time": 1360785720},
{"station": "shekkong", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720},
{"station": "hongkongpark", "temperture": 17, "time": 1360785720},
{"station": "shaukeiwan", "temperture": 16, "time": 1360785720},
{"station": "kowlooncity", "temperture": 16, "time": 1360785720},
{"station": "happyvalley", "temperture": 18, "time": 1360785720},
{"station": "wongtaisin", "temperture": 17, "time": 1360785720},
{"station": "stanley", "temperture": 16, "time": 1360785720},
{"station": "kwuntong", "temperture": 15, "time": 1360785720},
{"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
Items.py
class Hk0WeatherItem(Item):
time = Field()
station = Field()
temperture = Field()
humidity = Field()
Currwx.py
start_urls = (
'http://www.weather.gov.hk/wxinfo/currwx/curr
entc.htm',
)
Currwx.py
def parse(self, response):
laststation = ''
temperture = int()
stations = []
hxs = HtmlXPathSelector(response)
report = hxs.select('//div[@id="ming"]')
libhk0
class hk0:
stations = [
(u' 天 文 台 ', 'hko'),
(u' 京 士 柏 ', 'kingspark'),
(u' 黃 竹 坑 ', 'wongchukhang'),
(u' 打 鼓 嶺 ', 'takwuling'),
(u' 流 浮 山 ', 'laufaushan'),
libhk0
class hk0:
def gettime(self, report):
…
def hk0current(self, report):
…
Agenda
●

What is Open Data ?

●

Use of Open Source Software in web crawling.

●

Starting new Open Source project hk0weather
to create Open Weather Data.
We want a easier way to
access the public data.
We want a better life with
public data.
Thank You!
sammy.hk

Mais conteúdo relacionado

Mais procurados

Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlAlexander Panchenko
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
IN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationIN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationAlexander Haffner
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsHolistic Benchmarking of Big Linked Data
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Ana Roxin
 
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"Sean Barbeau
 
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
Almost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingAlmost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingMichelle Minkoff
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk publicNesta
 

Mais procurados (20)

Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
IN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationIN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority Collaboration
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
 
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
Almost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingAlmost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without Programming
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk public
 

Destaque

الجهاز التنفسي
الجهاز التنفسيالجهاز التنفسي
الجهاز التنفسيthamee622
 
Intro to editing
Intro to editingIntro to editing
Intro to editingMs Olive
 
Applying representation theory
Applying representation theoryApplying representation theory
Applying representation theoryMs Olive
 
Sli̇deshar eyenidf
Sli̇deshar eyenidfSli̇deshar eyenidf
Sli̇deshar eyenidfSefa Doğan
 
David lee cates pp
David lee cates ppDavid lee cates pp
David lee cates ppElaine Ryan
 
Tembang macapat2
Tembang macapat2Tembang macapat2
Tembang macapat2Ayu Spears
 
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...Solocal Group UK
 
Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015 Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015 Dr . Randy David Hassen
 
Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.alenochka94-94
 
5. new technologies
5. new technologies5. new technologies
5. new technologiesMs Olive
 

Destaque (15)

الجهاز التنفسي
الجهاز التنفسيالجهاز التنفسي
الجهاز التنفسي
 
Chardham ytara tours
Chardham ytara toursChardham ytara tours
Chardham ytara tours
 
Intro to editing
Intro to editingIntro to editing
Intro to editing
 
Applying representation theory
Applying representation theoryApplying representation theory
Applying representation theory
 
Sli̇deshar eyenidf
Sli̇deshar eyenidfSli̇deshar eyenidf
Sli̇deshar eyenidf
 
David lee cates pp
David lee cates ppDavid lee cates pp
David lee cates pp
 
Friends Forever
Friends ForeverFriends Forever
Friends Forever
 
Christmas at Ysgol Rhewl
Christmas at Ysgol RhewlChristmas at Ysgol Rhewl
Christmas at Ysgol Rhewl
 
rediscovering DUNCANVILLE
rediscovering DUNCANVILLErediscovering DUNCANVILLE
rediscovering DUNCANVILLE
 
Tembang macapat2
Tembang macapat2Tembang macapat2
Tembang macapat2
 
Drone Hijacking
Drone HijackingDrone Hijacking
Drone Hijacking
 
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
 
Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015 Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015
 
Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.
 
5. new technologies
5. new technologies5. new technologies
5. new technologies
 

Semelhante a Ice dec04-04-sammy

Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong KongSammy Fung
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionSammy Fung
 
Open Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageOpen Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageSammy Fung
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are goingEuropean Data Forum
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...guest1e3ee089
 
Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009guestbc60aee0
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...guest8c518a8
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked .
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsMohd Izhar Firdaus Ismail
 
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Sergio Fernández
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Christophe Guéret
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareSafe Software
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410Arnaud Le Hors
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2Sammy Fung
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Config Management and Data Service Deep Dive
Config Management and Data Service Deep DiveConfig Management and Data Service Deep Dive
Config Management and Data Service Deep DiveCristina Vidu
 

Semelhante a Ice dec04-04-sammy (20)

Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
Open Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageOpen Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object Storage
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
 
Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they Compare
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Config Management and Data Service Deep Dive
Config Management and Data Service Deep DiveConfig Management and Data Service Deep Dive
Config Management and Data Service Deep Dive
 

Mais de Chun Ming Au Yeung (8)

Ice dec05-04-wan leung
Ice dec05-04-wan leungIce dec05-04-wan leung
Ice dec05-04-wan leung
 
Ice dec02-02-andrew
Ice dec02-02-andrewIce dec02-02-andrew
Ice dec02-02-andrew
 
Ice dec06-03-kim
Ice dec06-03-kimIce dec06-03-kim
Ice dec06-03-kim
 
Ice dec06-02-mo
Ice dec06-02-moIce dec06-02-mo
Ice dec06-02-mo
 
Ice dec02-03-marlon
Ice dec02-03-marlonIce dec02-03-marlon
Ice dec02-03-marlon
 
Ice dec02-01-pindar
Ice dec02-01-pindarIce dec02-01-pindar
Ice dec02-01-pindar
 
Ice dec03 03-billy
Ice dec03 03-billyIce dec03 03-billy
Ice dec03 03-billy
 
Ice dec06-02-christina
Ice dec06-02-christinaIce dec06-02-christina
Ice dec06-02-christina
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Ice dec04-04-sammy

  • 1. Use of Open Data in Hong Kong Sammy Fung sammy.hk Incu-Lab ICE in StartMeUpHK - Open Data Initiative Gathering 2013/12/04
  • 2. We want a better life with public data.
  • 3. We want a easier way to access the public data.
  • 4. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 5. Sammy Fung ● Software Developer – to use and develop open source sofware. – Perl → PHP → Python. – interests on Data Mining / Web Crawling. – own a startup of web and mobile technology.
  • 6. Sammy Fung ● 15+ years in Open Source Communities. – Founding Chairman, Hong Kong Linux User Group. – Founding Chairman, Open Source Hong Kong. – Member, GNOME Asia committee. – Mozilla Representative – Member, program committee at COSCUP ● Conference for Open Source Coders, Users and Developers. ● Largest open source conference in Taiwan.
  • 7. What is Open Data ?
  • 8. Open Data Three Laws of Open Government Data by David Eaves. 1.If it can't be spidered or indexed, it doesn't exist. 2.If it isn't available in open and machine readable format, it can't engage. 3.If a legal framework doesn't allow it to be repurposed, it doesn't empower. http://eaves.ca/2009/09/30/three-law-of-open-government-data/
  • 9. Open Data ● Tim Berners-Lee, the inventor of the Web. – 5stardata.info – 5 star deployment scheme of Open Data.
  • 10. * One Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 11. ** Two Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 12. *** Three Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 13. **** Four Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 14. ***** Five Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 15. Open Data in Hong Kong
  • 16. Open Data in Hong Kong ● Data.One – http://www.gov.hk/en/theme/psi – released on 2011/3/31. – First App Competition on Data.One ● Call for Submission now till 2014/02/28.
  • 17. Weather Information in Hong Kong ● Hong Kong Observatory – Hourly Hong Kong Weather Report – Regional Weather in Hong Kong (10 min updates) – Weather Forecast and Weekly Weather Forecast – Typhoon Report and Forecast
  • 20. Weather at Data.One ● ● I posted a blog 'Progress of Open Government Data in Hong Kong' on 2013/01/17. Weather at Data.One provides 7 dataset URLs, returns RSS (XML) format (Eng/TChi/SChi) – One word: Useless. – Data.One dataset (RSS) is completely different with HKO own paid service (XML).
  • 21. Weather at Data.One ● Example - Current local weather report: ● Plain text report in RSS. ● Difference to quote report content: – – ● Website: a pair of HTML tags, eg. <PRE>....</PRE>. Data.One: a pair of RSS description tags, <description>....</description>. Other weather data is missing, eg. Regional temperture updates per each 12 mins.
  • 22. Weather at Data.One ● ● ● Weather at Data.One is 'report' but not 'data'. Weather RSS is already released by HKO before launch of Data.One. Technically, json/xml format is better readable by computer programs.
  • 23. Data.One ● In November 2013, 43 datasets are available. – JSON/XML = 18 – RSS = 10 – XLS = 6 – CSV = 4 – JPG/PNG = 3 – HTML/MDB = 2
  • 24. Data.One ● JSON/XML (18 datasets) – Air Pollution. ● Past 24-hour Air Pollution Index from stations. – Approved Charitable Fund-raising Activities – Restaurant and Food Licences. – Details of facility locations. – Reward Notices from Police Force. – Marine Traffic (Arrival/Departure). – Traffic Speed and special news. – EventHK information.
  • 25. Data.One ● RSS (10 datasets) – Weather Information (7 datasets) – Beach Water Quality (1 datasets) – Current Air Pollution Index range and forecase (2 datasets)
  • 26. Data.One ● JPG/PNG (3 datasets) – Exhibition gallery of government building projects. – Speed map panels. – Traffic snapshot images.
  • 27. Data.One ● CSV – – Locations of Public Facility and GovWifi – ● Past Record of Air Pollution Index Marine Shipping directory of HK HTML – ● HTML version of Marine Traffic. XLS, MDB – 2011 Population Census. – Property Market Statistics. – Monthly Digested Stats and Registers of Auth Persons from Building Dept. – Routes and fares of public transport.
  • 28. Data.One ● Many departments does not release their useful data, and release current information available on their website. – ● Few of them keep available open data in their own. Most of them does not understand what is 'real' open data. – – Open data format insteads of proprietary data format. – ● Data insteads of Information. Useful of data. Some departments should manage their open data in better data structure.
  • 29. Legco Meeting Minutes and Voting Results
  • 30. Legco Meeting Minutes and Voting Results
  • 31. Legco Meeting Minutes and Voting Results ● ● ● In October 2013, LegCo start to publish voting results of House Committe in XML. It is not a part of Data.One project. My open source software on LegCo vote result XML: – http://github.com/smamyfung/legcovotes
  • 32. Open Data is important to citizens.
  • 33. User of Open Source Software in web crawling
  • 34. Web Scraping ● a computer software technique of extracting information from websites. (Wikipedia) ● for business, hobbies, research purposes.
  • 35. Web Scraping ● Look for right URLs to scrap. ● Look for right content from webpages. ● Saving data into data store. ● When to run the web scraping program ?
  • 36. Use of Open Source Software in Web Crawling ● ● Use Open Source Tools to collect useful and meaningful machine-readable data. Doesn't need to wait provider to release data in machine-readable format.
  • 37. Open Source Tools ● Python programming lanugage ● with Regular Expression library ● Scrapy web crawling framework
  • 38. Why python + scrapy ? ● ● python: my current favourite programming language for few years. scrapy: web crawling framework written in Python.
  • 39. What is Scrapy ? ● ● An open source web scraping framework for Python. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • 40. Scrapy Features ● define data you want to scrapy ● write spider to extract data ● Built-in: selecting and extracting data from HTML and XML ● Built-in: JSON, CSV, XML output ● Interactive shell console ● Built-in: web service, telnet console, logging ● Others
  • 41. Programme List of Paid TVs in 2004
  • 42. Programme List of Paid TVs in 2004 ● I want to know live football match was showing on which channel. ● Paid TV web site = M$ + IIS + ASP + Flash ● Slow....... Very Slow...... Extremely Slow! ● Couldn't connect at any peak hours! ● Wrote my first web crawler in PHP in 2004.
  • 43. Public Transportation in 2006-2010 ● Kowloon Motor Bus (KMB) – ● No map view for a bus route Public Transportation Enquiry System (PTES) – Exteremly Poor, Ugly (or much worse) map UI on PTES.
  • 44. HK Observatory and Joint Typhoon Warning Center ● Any typhoon is coming to Hong Kong ? And When will it come ? ● No easy data exchange format. ● No RSS nor ATOM. ● We aren't check websites everyday.
  • 45. My Products ● WeatherHK ← ← ← ● TCTrack
  • 46. WeatherHK ● http://twitter.com/weatherhk ● hourly current weather report ● weather forecast report ● tropical signal warning
  • 47. WeatherHK ● ● Backend: Python + Scrapy + Database + Twitter + NNTP...... Frontend: Twitter + Newsgroup
  • 50. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Plot TC current and forecast tracks over Google Map. Source: – JTWC – HKO
  • 51. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Probably first tctrack map in HK using GoogleMap Use of GMap: TCTrack -> Weather Underground Hong Kong -> HKO
  • 53. Releases information to citizens in a better presentation.
  • 54. Starting new Open Source project hk0weather to create Open Weather Data.
  • 55. Starting new Open Source projects to create Open Data ● ● Develop a open source project. Release data in standard machine-readable data format.
  • 56. hk0weather ● https://github.com/sammyfung/hk0weather ● Open Source Hong Kong Weather Project. ● convert to JSON data from HKO webpages. ● python + scrapy ● 1st version: from current weather report, extracting temperture and humidity from 20+ weather stations, export in json format.
  • 57. hk0weather ● https://github.com/sammyfung/hk0weather ● $ virtualenv hk0weatherenv ● $ source hk0weatherenv/bin/activate ● $ pip install scrapy ● $ git clone https://github.com/sammyfung/hk0weather.git ● $ cd hk0weather ● $ scrapy crawl currwx -t json -o testresult
  • 58. hk0weather ● Python – ● import re Scrapy – web crawling framework written in Python. – HtmlXPathSelector. – built-in JSON, CSV, XML output.
  • 59. hk0weather [{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720}, {"station": "kingspark", "temperture": 16, "time": 1360785720}, {"station": "wongchukhang", "temperture": 17, "time": 1360785720}, {"station": "takwuling", "temperture": 16, "time": 1360785720}, {"station": "laufaushan", "temperture": 15, "time": 1360785720}, {"station": "taipo", "temperture": 16, "time": 1360785720}, {"station": "shatin", "temperture": 17, "time": 1360785720}, {"station": "tuenmun", "temperture": 17, "time": 1360785720}, {"station": "tseungkwano", "temperture": 16, "time": 1360785720}, {"station": "saikung", "temperture": 16, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "tsingyi", "temperture": 17, "time": 1360785720}, {"station": "shekkong", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720}, {"station": "hongkongpark", "temperture": 17, "time": 1360785720}, {"station": "shaukeiwan", "temperture": 16, "time": 1360785720}, {"station": "kowlooncity", "temperture": 16, "time": 1360785720}, {"station": "happyvalley", "temperture": 18, "time": 1360785720}, {"station": "wongtaisin", "temperture": 17, "time": 1360785720}, {"station": "stanley", "temperture": 16, "time": 1360785720}, {"station": "kwuntong", "temperture": 15, "time": 1360785720}, {"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
  • 60. Items.py class Hk0WeatherItem(Item): time = Field() station = Field() temperture = Field() humidity = Field()
  • 62. Currwx.py def parse(self, response): laststation = '' temperture = int() stations = [] hxs = HtmlXPathSelector(response) report = hxs.select('//div[@id="ming"]')
  • 63. libhk0 class hk0: stations = [ (u' 天 文 台 ', 'hko'), (u' 京 士 柏 ', 'kingspark'), (u' 黃 竹 坑 ', 'wongchukhang'), (u' 打 鼓 嶺 ', 'takwuling'), (u' 流 浮 山 ', 'laufaushan'),
  • 64. libhk0 class hk0: def gettime(self, report): … def hk0current(self, report): …
  • 65. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 66. We want a easier way to access the public data.
  • 67. We want a better life with public data.