SlideShare uma empresa Scribd logo
1 de 59
Instructor: Professor Lothar Piepmeyer




Beautifying Data
in the Real World
         Group 5:
     Toan Do - An Du
  Vinh Nguyen - Tan Tran

              1
How big is the data on the Internet?


2004: The first time Internet exceed 1EB
2005: Eric Schmidt estimated it was 5 million
 Terabytes (~ 5EB)
Cisco forecasts that in 2015, the size of the
 Internet will reach nearly 1,000 EB

           How big is it?
                    Source: http://www.wisegeek.com/how-big-is-the-internet.htm
                                                      http://techland.time.com/
If 1 byte = 0.5mm




                    Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
Content



Introduction
Open Notebook Sciences appoaching
Curating and presenting the data
Beautfifying the data
Data Visualization & Building a portal from
 open data and free services
Demonstration
Data on the internet




                Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
Problems of data in real world
(Scientific)


Noisy source of data
The barrier of data presentation
  OCR version
  Text version
  Human-readable
  Machine readable
  …
How to verify the data?
Open Notebook Science


Purpose: record full scientific research raw data,
 make it available and online
Benefits:
   obtain detailed descriptions of procedures
   improve the communication of science
   increase the progress
   reduce time lost due to the repetition of failed
    experiments
   …
Apply ONS on free services
Crowdsourcing


a distributed problem-solving and
 production model
Crowdsourcing
Crowdsourcing
Crowdsourcing




                Source: http://r18ultrachair.com/
Validating crowdsourced data



According to ONS, all detail data have been
 recorded
The doubtful data also be kept and marked
 for
Unique Identifiers for Chemical
Entity



Standardize data

Facilitate the integration with other data sets

Consider 3 possibilities
   CAS Registry Number
   InChI
   SMILES
CAS Registry Number



 Proprietary

 Cannot converted to chemical structure

 Dependent to a external organization to issue

For example, the CAS number of water is 7732-18-5: the
   checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 +
   7 6) = 105; 105 mod 10 = 5
http://en.wikipedia.org/wiki/CAS_registry_number
InChI
 IUPAC International Chemical Identifier
 Freely usable and non-proprietary
 Do not have to be assigned by some organization
 Can be computed from structural information
 Human readable (with practice)




            http://en.wikipedia.org/wiki/Inchi
SMILES

   Simplified molecular-input
    line-entry system

   More human-readable than
    InChI

   Can convert to InChI




http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
18
http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
Analysis Options



Access to live data
Get Summary
Complex Statistical representations of
 models
Mark the skeptical data for later
 consideration
20
Google Docs API


Allows developers to create, retrieve, update, and
 delete Google Docs files and collections
Also provides some advanced features like resource
 archives, Optical Character
Recognition, translation, and revision history.
Useful to store data in the cloud, perform resource
 management, convert document formats


https://developers.google.com/google-apps/documents-list/
Google Visualization API


Chart Library
  JavaScript classes
Data Table
  JavaScript DataTable class
Data Source
  Chart Tools Datasource
   protocol

                        https://developers.google.com/chart/interactive/docs/index
23
24
https://google-developers.appspot.com/chart/interactive/docs/gallery
RESTful Web Service


 Representational State Transfer - a simpler alternative to
  SOAP - and Web Services Description Language (WSDL)
  based Web services
 Principles:
      Use HTTP methods explicitly.
      Be stateless.
      Expose directory structure-like URIs.
      Transfer XML, JavaScript Object
 Notation (JSON), or both.

http://www.ibm.com/developerworks/webservices/library/ws-restful/
Compare REST and SOAP


Who's using REST?
     All of Yahoo's web services use REST, including Flickr,
      del.icio.us API uses it, pubsub, bloglines, technorati, and
      both eBay, and Amazon have web services for both
      REST and SOAP.
Who's using SOAP?
     Google seams to be consistent in implementing their
      web services to use SOAP, with the exception of
      Blogger, which uses XML-RPC. You will find SOAP web
      services in lots of enterprise software as well.
http://www.petefreitag.com/item/431.cfm
Compare REST and SOAP



REST                   SOAP
 Lightweight - not a    Easy to consume -
  lot of extra xml        sometimes
  markup                 Rigid - type
 Human Readable          checking, adheres to
  Results                 a contract
 Easy to build - no     Development tools
  toolkits required
28
An Effort to Aggregate Data from
Multiple Sources



Introducing ChemSpider
  An online lookup engine for Chemists
     http://www.chemspider.com
     40 mil substances
     Multiple data sources
     A "link farm" to other sources
What is "wrong" with
  wikipedia.com?


         30
Wikipedia.com


Not “wrong”:

   Very informative for human being
Wikipedia.com


This little guy is left behind

  Not machine-readable
Semantic Web

Describing things in a way that computers
 applications can understand it.
   “The Beatles was a band from Liverpool”
Describes the relationships between things (like A
 is a part of B and Y is a member of Z) and
 the properties of things (like size, weight, age, and
 price)
“..will make all the data in the world look like
 one huge database“ – Tim Berners-Lee
                             http://www.w3schools.com/web/web_semantic.asp
Resource Description Framework

Is a language to describe resources on
 the web
Component of the Semantic Web
Data is self-describing
  Triples: "subject", "predicate" and "value“
  URIs are used to denote resources
RDF

Graph Database
  Nodes
  Edges




Well-suited for Knowledge Representation
  Beautified Data => Knowledge
RDF Example

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
  <cd:country>USA</cd:country>
  <cd:company>Columbia</cd:company>
  <cd:price>10.90</cd:price>
  <cd:year>1985</cd:year>
</rdf:Description>
</rdf:RDF>
Semantic Web Example: DBPedia

“Old School” wikipedia:
     http://en.wikipedia.org/wiki/Porsche_Panamera


DbPedia Entries

   http://dbpedia.org/page/Porsche_Panamera
   http://dbpedia.org/page/Chromium_carbide
Query Language: SPARQL (sparkle)

Query Language for RDF
    Graph Traversal
    Matching the triples
Example:
    Data:
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL
  Tutorial”

    Query:
  SELECT ?title
  WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title>
  ?title . }

    Query Result:           title "SPARQL Tutorial"
To Infinity and Beyond

• DB2 and Oracle are ready for this train

•Object Database
    Versant OODBMS, anybody?

•Machine-Readable Data
    Will they become self-awareness?

                     39
“Data Finds Data” and Semantic Data
       Model – A Hypothesis




                 40
Non-Obvious Relationship Awareness




   LÂM



                         BẢO




                41
Non-Obvious Relationship Awareness

     LÂM’s
     iPhone




   LÂM


                         BẢO




                42
Non-Obvious Relationship Awareness

     LÂM’s
     iPhone

                         BẢO’s
                      SS Galaxy

   LÂM


                         BẢO




                43
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            44
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            45
TheGioiDi
                           Dong.com


             LÂM’s
             iPhone

                                          BẢO’s
                                       SS Galaxy

           LÂM


                                          BẢO
Connection Detected!
 -Bao could have met Lam at Thegioididong?
 -They could have discussed their World domination
scheme during the meeting there?
-???                         46
TheGioiDi
           Dong.com


  LÂM’s
  iPhone

                          BẢO’s
                       SS Galaxy

LÂM


                          BẢO




            47
 Data Visualization

 Building a portal from open data and
free services
Visualization of Data




                        Top million web
                        sites (per Alexa
                        traffic data) was
                        performed in
                        early 2010 ]


                        Source http://nmap.org/favicon/
Visualization of Data
Second Life
Second Life is a 3D world where everyone you see is a real person and
every place you visit is built by people just like you.
3D Visualization in SL
SL- The Opportunity for "Edutainment"




           iSchool                      Teaching: Quizzes and Lectures




  Classrooms with Powerpoint                        Research Center
                     Drexel Island on Second Life
3-D Environments




                               http://3rdrockgrid.com/
  http://www.secondlife.com/




                               http://www.craft-world.org


  http://www.osgrid.org/


                                 http://youralternativelife.com//
Visualization To Suggest New
Experiments
Building A Portal From Open Data And
 Free Services


 Freely hosted Wiki service
 Google Spreadsheet
 Google Docs API / javascripts
 Visualization services/anlalysis services (2D, 3D)
 RDF/ Senmantic Web/ Webservices
 Cost: free or fit to the purpose
Key To Success




                     Model
+ Transparency
                  Information


                    Data

                  Records
Demonstration
 Google Docs
 Second Life
References


Oreilly – Beautiful data – Chapter 16th
 Beautifying data in the real world
http://techland.time.com/2011/06/01/how-big-
 is-the-internet-spoiler-not-as-big-as-itll-be-in-
 2015/
http://drexelisland.wikispaces.com/
SMILE to 3D – Secon Life,
 http://www.youtube.com/watch?v=tOfhuoRbn
 Cg&feature=player_embedded

Mais conteúdo relacionado

Mais procurados

Database Pro Power Days 2010 - Graph data in the cloud using .NET
Database Pro Power Days 2010 -  Graph data in the cloud using .NETDatabase Pro Power Days 2010 -  Graph data in the cloud using .NET
Database Pro Power Days 2010 - Graph data in the cloud using .NET
Achim Friedland
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
European Data Forum
 
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Artificial Intelligence Institute at UofSC
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demo
barakmich
 

Mais procurados (17)

Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient web
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
Database Pro Power Days 2010 - Graph data in the cloud using .NET
Database Pro Power Days 2010 -  Graph data in the cloud using .NETDatabase Pro Power Days 2010 -  Graph data in the cloud using .NET
Database Pro Power Days 2010 - Graph data in the cloud using .NET
 
BIBFRAME
BIBFRAMEBIBFRAME
BIBFRAME
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge Chapter
 
Why Link?
Why Link?Why Link?
Why Link?
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demo
 
Serendipity in Linked Open Data
Serendipity in Linked Open DataSerendipity in Linked Open Data
Serendipity in Linked Open Data
 

Destaque (6)

Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
BIS Vietnamese-German University
BIS Vietnamese-German UniversityBIS Vietnamese-German University
BIS Vietnamese-German University
 
Brief Introduction to HCI
Brief Introduction to HCIBrief Introduction to HCI
Brief Introduction to HCI
 
Personal task management
Personal task managementPersonal task management
Personal task management
 
Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)
 
Phac thao compendium
Phac thao compendiumPhac thao compendium
Phac thao compendium
 

Semelhante a Beautifying Data in the real world

Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
animove
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic Web
Aditya Tuli
 

Semelhante a Beautifying Data in the real world (20)

The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0The Semantic Web: What IAs Need to Know About Web 3.0
The Semantic Web: What IAs Need to Know About Web 3.0
 
Web Technology Trends (early 2009)
Web Technology Trends (early 2009)Web Technology Trends (early 2009)
Web Technology Trends (early 2009)
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic Web
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 

Mais de Tan Tran (11)

Managing for results
Managing for resultsManaging for results
Managing for results
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniques
 
Jira in action
Jira in actionJira in action
Jira in action
 
Management skills in IT - Communication
Management skills in IT - CommunicationManagement skills in IT - Communication
Management skills in IT - Communication
 
Internet governance and the filtering problems
Internet governance and the filtering problemsInternet governance and the filtering problems
Internet governance and the filtering problems
 
C# conventions & good practices
C# conventions & good practicesC# conventions & good practices
C# conventions & good practices
 
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
 
Flash coding convention for action script 3
Flash coding convention for action script 3Flash coding convention for action script 3
Flash coding convention for action script 3
 
Java convention
Java conventionJava convention
Java convention
 
VGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementVGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information Management
 
Scrum introduction
Scrum introductionScrum introduction
Scrum introduction
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Último (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 

Beautifying Data in the real world

  • 1. Instructor: Professor Lothar Piepmeyer Beautifying Data in the Real World Group 5: Toan Do - An Du Vinh Nguyen - Tan Tran 1
  • 2. How big is the data on the Internet? 2004: The first time Internet exceed 1EB 2005: Eric Schmidt estimated it was 5 million Terabytes (~ 5EB) Cisco forecasts that in 2015, the size of the Internet will reach nearly 1,000 EB How big is it? Source: http://www.wisegeek.com/how-big-is-the-internet.htm http://techland.time.com/
  • 3. If 1 byte = 0.5mm Source:3http://blog.fliptop.com/how-much-data-is-on-the-internet/
  • 4. Content Introduction Open Notebook Sciences appoaching Curating and presenting the data Beautfifying the data Data Visualization & Building a portal from open data and free services Demonstration
  • 5. Data on the internet Source: http://news.bbc.co.uk/2/hi/technology/8562801.stm
  • 6. Problems of data in real world (Scientific) Noisy source of data The barrier of data presentation OCR version Text version Human-readable Machine readable … How to verify the data?
  • 7. Open Notebook Science Purpose: record full scientific research raw data, make it available and online Benefits: obtain detailed descriptions of procedures improve the communication of science increase the progress reduce time lost due to the repetition of failed experiments …
  • 8. Apply ONS on free services
  • 12. Crowdsourcing Source: http://r18ultrachair.com/
  • 13. Validating crowdsourced data According to ONS, all detail data have been recorded The doubtful data also be kept and marked for
  • 14. Unique Identifiers for Chemical Entity Standardize data Facilitate the integration with other data sets Consider 3 possibilities  CAS Registry Number  InChI  SMILES
  • 15. CAS Registry Number  Proprietary  Cannot converted to chemical structure  Dependent to a external organization to issue For example, the CAS number of water is 7732-18-5: the checksum 5 is calculated as (8 1 + 1 2 + 2 3 + 3 4 + 7 5 + 7 6) = 105; 105 mod 10 = 5 http://en.wikipedia.org/wiki/CAS_registry_number
  • 16. InChI  IUPAC International Chemical Identifier  Freely usable and non-proprietary  Do not have to be assigned by some organization  Can be computed from structural information  Human readable (with practice) http://en.wikipedia.org/wiki/Inchi
  • 17. SMILES  Simplified molecular-input line-entry system  More human-readable than InChI  Can convert to InChI http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
  • 19. Analysis Options Access to live data Get Summary Complex Statistical representations of models Mark the skeptical data for later consideration
  • 20. 20
  • 21. Google Docs API Allows developers to create, retrieve, update, and delete Google Docs files and collections Also provides some advanced features like resource archives, Optical Character Recognition, translation, and revision history. Useful to store data in the cloud, perform resource management, convert document formats https://developers.google.com/google-apps/documents-list/
  • 22. Google Visualization API Chart Library JavaScript classes Data Table JavaScript DataTable class Data Source Chart Tools Datasource protocol https://developers.google.com/chart/interactive/docs/index
  • 23. 23
  • 25. RESTful Web Service  Representational State Transfer - a simpler alternative to SOAP - and Web Services Description Language (WSDL) based Web services  Principles:  Use HTTP methods explicitly.  Be stateless.  Expose directory structure-like URIs.  Transfer XML, JavaScript Object  Notation (JSON), or both. http://www.ibm.com/developerworks/webservices/library/ws-restful/
  • 26. Compare REST and SOAP Who's using REST? All of Yahoo's web services use REST, including Flickr, del.icio.us API uses it, pubsub, bloglines, technorati, and both eBay, and Amazon have web services for both REST and SOAP. Who's using SOAP? Google seams to be consistent in implementing their web services to use SOAP, with the exception of Blogger, which uses XML-RPC. You will find SOAP web services in lots of enterprise software as well. http://www.petefreitag.com/item/431.cfm
  • 27. Compare REST and SOAP REST SOAP Lightweight - not a Easy to consume - lot of extra xml sometimes markup Rigid - type Human Readable checking, adheres to Results a contract Easy to build - no Development tools toolkits required
  • 28. 28
  • 29. An Effort to Aggregate Data from Multiple Sources Introducing ChemSpider An online lookup engine for Chemists http://www.chemspider.com 40 mil substances Multiple data sources A "link farm" to other sources
  • 30. What is "wrong" with wikipedia.com? 30
  • 31. Wikipedia.com Not “wrong”:  Very informative for human being
  • 32. Wikipedia.com This little guy is left behind Not machine-readable
  • 33. Semantic Web Describing things in a way that computers applications can understand it. “The Beatles was a band from Liverpool” Describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price) “..will make all the data in the world look like one huge database“ – Tim Berners-Lee http://www.w3schools.com/web/web_semantic.asp
  • 34. Resource Description Framework Is a language to describe resources on the web Component of the Semantic Web Data is self-describing Triples: "subject", "predicate" and "value“ URIs are used to denote resources
  • 35. RDF Graph Database Nodes Edges Well-suited for Knowledge Representation Beautified Data => Knowledge
  • 36. RDF Example <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.recshop.fake/cd/Empire Burlesque"> <cd:artist>Bob Dylan</cd:artist> <cd:country>USA</cd:country> <cd:company>Columbia</cd:company> <cd:price>10.90</cd:price> <cd:year>1985</cd:year> </rdf:Description> </rdf:RDF>
  • 37. Semantic Web Example: DBPedia “Old School” wikipedia:  http://en.wikipedia.org/wiki/Porsche_Panamera DbPedia Entries  http://dbpedia.org/page/Porsche_Panamera  http://dbpedia.org/page/Chromium_carbide
  • 38. Query Language: SPARQL (sparkle) Query Language for RDF Graph Traversal Matching the triples Example: Data: <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL Tutorial” Query: SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . } Query Result: title "SPARQL Tutorial"
  • 39. To Infinity and Beyond • DB2 and Oracle are ready for this train •Object Database Versant OODBMS, anybody? •Machine-Readable Data Will they become self-awareness? 39
  • 40. “Data Finds Data” and Semantic Data Model – A Hypothesis 40
  • 42. Non-Obvious Relationship Awareness LÂM’s iPhone LÂM BẢO 42
  • 43. Non-Obvious Relationship Awareness LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 43
  • 44. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 44
  • 45. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 45
  • 46. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO Connection Detected! -Bao could have met Lam at Thegioididong? -They could have discussed their World domination scheme during the meeting there? -??? 46
  • 47. TheGioiDi Dong.com LÂM’s iPhone BẢO’s SS Galaxy LÂM BẢO 47
  • 48.  Data Visualization  Building a portal from open data and free services
  • 49. Visualization of Data Top million web sites (per Alexa traffic data) was performed in early 2010 ] Source http://nmap.org/favicon/
  • 51. Second Life Second Life is a 3D world where everyone you see is a real person and every place you visit is built by people just like you.
  • 53. SL- The Opportunity for "Edutainment" iSchool Teaching: Quizzes and Lectures Classrooms with Powerpoint Research Center Drexel Island on Second Life
  • 54. 3-D Environments http://3rdrockgrid.com/ http://www.secondlife.com/ http://www.craft-world.org http://www.osgrid.org/ http://youralternativelife.com//
  • 55. Visualization To Suggest New Experiments
  • 56. Building A Portal From Open Data And Free Services  Freely hosted Wiki service  Google Spreadsheet  Google Docs API / javascripts  Visualization services/anlalysis services (2D, 3D)  RDF/ Senmantic Web/ Webservices  Cost: free or fit to the purpose
  • 57. Key To Success Model + Transparency Information Data Records
  • 59. References Oreilly – Beautiful data – Chapter 16th Beautifying data in the real world http://techland.time.com/2011/06/01/how-big- is-the-internet-spoiler-not-as-big-as-itll-be-in- 2015/ http://drexelisland.wikispaces.com/ SMILE to 3D – Secon Life, http://www.youtube.com/watch?v=tOfhuoRbn Cg&feature=player_embedded