1. Instructor: Professor Lothar Piepmeyer
Beautifying Data
in the Real World
Group 5:
Toan Do - An Du
Vinh Nguyen - Tan Tran
1
2. How big is the data on the Internet?
2004: The first time Internet exceed 1EB
2005: Eric Schmidt estimated it was 5 million
Terabytes (~ 5EB)
Cisco forecasts that in 2015, the size of the
Internet will reach nearly 1,000 EB
How big is it?
Source: http://www.wisegeek.com/how-big-is-the-internet.htm
http://techland.time.com/
3. If 1 byte = 0.5mm
Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
4. Content
Introduction
Open Notebook Sciences appoaching
Curating and presenting the data
Beautfifying the data
Data Visualization & Building a portal from
open data and free services
Demonstration
5. Data on the internet
Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
6. Problems of data in real world
(Scientific)
Noisy source of data
The barrier of data presentation
OCR version
Text version
Human-readable
Machine readable
…
How to verify the data?
7. Open Notebook Science
Purpose: record full scientific research raw data,
make it available and online
Benefits:
obtain detailed descriptions of procedures
improve the communication of science
increase the progress
reduce time lost due to the repetition of failed
experiments
…
14. Unique Identifiers for Chemical
Entity
Standardize data
Facilitate the integration with other data sets
Consider 3 possibilities
CAS Registry Number
InChI
SMILES
15. CAS Registry Number
Proprietary
Cannot converted to chemical structure
Dependent to a external organization to issue
For example, the CAS number of water is 7732-18-5: the
checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 +
7 6) = 105; 105 mod 10 = 5
http://en.wikipedia.org/wiki/CAS_registry_number
16. InChI
IUPAC International Chemical Identifier
Freely usable and non-proprietary
Do not have to be assigned by some organization
Can be computed from structural information
Human readable (with practice)
http://en.wikipedia.org/wiki/Inchi
17. SMILES
Simplified molecular-input
line-entry system
More human-readable than
InChI
Can convert to InChI
http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
21. Google Docs API
Allows developers to create, retrieve, update, and
delete Google Docs files and collections
Also provides some advanced features like resource
archives, Optical Character
Recognition, translation, and revision history.
Useful to store data in the cloud, perform resource
management, convert document formats
https://developers.google.com/google-apps/documents-list/
22. Google Visualization API
Chart Library
JavaScript classes
Data Table
JavaScript DataTable class
Data Source
Chart Tools Datasource
protocol
https://developers.google.com/chart/interactive/docs/index
25. RESTful Web Service
Representational State Transfer - a simpler alternative to
SOAP - and Web Services Description Language (WSDL)
based Web services
Principles:
Use HTTP methods explicitly.
Be stateless.
Expose directory structure-like URIs.
Transfer XML, JavaScript Object
Notation (JSON), or both.
http://www.ibm.com/developerworks/webservices/library/ws-restful/
26. Compare REST and SOAP
Who's using REST?
All of Yahoo's web services use REST, including Flickr,
del.icio.us API uses it, pubsub, bloglines, technorati, and
both eBay, and Amazon have web services for both
REST and SOAP.
Who's using SOAP?
Google seams to be consistent in implementing their
web services to use SOAP, with the exception of
Blogger, which uses XML-RPC. You will find SOAP web
services in lots of enterprise software as well.
http://www.petefreitag.com/item/431.cfm
27. Compare REST and SOAP
REST SOAP
Lightweight - not a Easy to consume -
lot of extra xml sometimes
markup Rigid - type
Human Readable checking, adheres to
Results a contract
Easy to build - no Development tools
toolkits required
29. An Effort to Aggregate Data from
Multiple Sources
Introducing ChemSpider
An online lookup engine for Chemists
http://www.chemspider.com
40 mil substances
Multiple data sources
A "link farm" to other sources
33. Semantic Web
Describing things in a way that computers
applications can understand it.
“The Beatles was a band from Liverpool”
Describes the relationships between things (like A
is a part of B and Y is a member of Z) and
the properties of things (like size, weight, age, and
price)
“..will make all the data in the world look like
one huge database“ – Tim Berners-Lee
http://www.w3schools.com/web/web_semantic.asp
34. Resource Description Framework
Is a language to describe resources on
the web
Component of the Semantic Web
Data is self-describing
Triples: "subject", "predicate" and "value“
URIs are used to denote resources
35. RDF
Graph Database
Nodes
Edges
Well-suited for Knowledge Representation
Beautified Data => Knowledge
38. Query Language: SPARQL (sparkle)
Query Language for RDF
Graph Traversal
Matching the triples
Example:
Data:
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL
Tutorial”
Query:
SELECT ?title
WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title>
?title . }
Query Result: title "SPARQL Tutorial"
39. To Infinity and Beyond
• DB2 and Oracle are ready for this train
•Object Database
Versant OODBMS, anybody?
•Machine-Readable Data
Will they become self-awareness?
39
46. TheGioiDi
Dong.com
LÂM’s
iPhone
BẢO’s
SS Galaxy
LÂM
BẢO
Connection Detected!
-Bao could have met Lam at Thegioididong?
-They could have discussed their World domination
scheme during the meeting there?
-??? 46
53. SL- The Opportunity for "Edutainment"
iSchool Teaching: Quizzes and Lectures
Classrooms with Powerpoint Research Center
Drexel Island on Second Life
56. Building A Portal From Open Data And
Free Services
Freely hosted Wiki service
Google Spreadsheet
Google Docs API / javascripts
Visualization services/anlalysis services (2D, 3D)
RDF/ Senmantic Web/ Webservices
Cost: free or fit to the purpose
57. Key To Success
Model
+ Transparency
Information
Data
Records
59. References
Oreilly – Beautiful data – Chapter 16th
Beautifying data in the real world
http://techland.time.com/2011/06/01/how-big-
is-the-internet-spoiler-not-as-big-as-itll-be-in-
2015/
http://drexelisland.wikispaces.com/
SMILE to 3D – Secon Life,
http://www.youtube.com/watch?v=tOfhuoRbn
Cg&feature=player_embedded