SlideShare uma empresa Scribd logo
1 de 42
Tetherless World Constellation




   Data: Big and Broad
             Jim Hendler
    Tetherless World Constellation
Tetherless World Professor of Computer and Cognitive Science
            Head, Computer Science Department

   Rensselaer Polytechnic Institute
   http://www.cs.rpi.edu/~hendler
         @jahendler (twitter)
Outline (if I stick to it)

                       Tetherless World Constellation


• What is big data?
• How big is big?
• What is big data on the Web?
• What is Broad data?
• Got an example?
• What’s the problem?
• What’s going on
Useful Terms
                                              Tetherless World Constellation

• Machine-readable Data
   – Information available in a form that is accessible and
     manipulable by computer
   – Accessible ≠ Manipulable
      • eg PDF documents can be read in and displayed, but the
        information in the document is not readily available without special
        tooling
• Metadata
   – Information associated with (machine-readable) data that
     provides information about the data set
• Workflow, Provenance, and lots of other terms
   – Useful sorts of metadata with respect to who created the data,
     when, how was it processed, etc.
• Metadata and the other stuff most useful when it is
  machine-readable and openly available in commonly agreed
  upon formats
BIG Data is NOT the Web of Data
                                       Tetherless World Constellation

• The term “Big Data” is widely used
  nowadays to refer to a whole bunch of
  machine-readable data in one accessible
  (to the researcher) place
   – 3 main contexts
    • The large data collections of “big science” projects
       – in traditional data warehouse or database formats
    • The enterprise data of large, non-Web-based
      companies (IBM, TATA, etc.)
       – Generally in multiple
    • The data holdings of a Google, Facebook or other
      large Web company
       – Include large “unstructured” holdings
       – Include “graph” data
Tera, Peta, Zeta
                                            yotta, yotta, yotta…
                                       Tetherless World Constellation


• World Wide Web data is extremely large
• Extremely well “funded”
  – eg. Facebook
     • 25 Terabytes of logged data per day; valuation $33B (US
       NIH budget ~ $31B)
  – eg. Google
     • In 2008 it was estimated at 20 petabytes per day (not
       including youTube); current valuation $190B (about 1/3
       the entire US DoD budget)

• And really, really fascinating stuff
  – Data about people and their relationships
     •   To each other
     •   To products
     •   To activities and actions
     •   …
How BIG is Big?

Tetherless World Constellation
BIG Data

                            Tetherless World Constellation




Google uses their data in many ways
         Search => ads => user
Big Data is becoming different on the Web

                                     Tetherless World Constellation


• New Work
  – is moving away from traditional relational
   models
     • cf. NoSQL
  – Moving towards third party application and
    extension
     • cf. Mobile apps for local governments
  – Includes a focus on interoperability and
    exchange with “lightweight” semantics
     • Using ideas from the Semantic Web
        – Search: Schema.org
        – Social Networking: OGP
Which in part gives rise to BROAD data

                                     Tetherless World Constellation


• 4th context: Broad Data
  – The huge amount of freely available, but widely varied,
    Open Data on the World Wide Web (Structured and
    Semi-structured)
     • Example: The extended Facebook OGP graph (the
       part outside Facebook’s datasets)
     • Example: The growing linked open data cloud of
       freely available RDF linked data
     • Example: Hundreds of thousands of datasets that are
       available on the Web free from governments around
       the world
Example: adding “Breadth”

 Tetherless World Constellation




                    April 2010
Facebook’s Open Graph Protocol

                                                             Tetherless World Constellation

• Facebook now allows other sites to extend the graph
• Open Graph Protocol uses RDFa to let web sites contain
  information about the things people “like”
       og:title - The title of your object as it should appear within the graph, e.g., "The Rock".
       og:type - The type of your object, e.g., "movie". Depending on the type you specify, other
       properties may also be required.
       og:image - An image URL which should represent your object within the graph.
       og:url - The canonical URL of your object that will be used as its permanent ID in the graph
       og:description - A one to two sentence description of your object.
       og:site_name - If your object is part of a larger web site, the name which should be
       displayed for the overall site. e.g., "IMDb".




   – Not a traditional “ontology”
Big Data

                                   Tetherless World Constellation




Facebook generates terabytes of data per day
          What could be learned from this?
Creates a platform for SW-powered apps

              Tetherless World Constellation
BROAD data challenges

                            Tetherless World Constellation


• For broad data the new challenges
  that emerge include
  – (Web-scale) data search
  – “Crowd-sourced” modeling
  – rapid (and potentially ad hoc)
    integration of datasets
  – visualization and analysis of only-
    partially modeled datasets
  – policies for data use, reuse and
    combination.
Huh?

                          Tetherless World Constellation


“The more I work with data, the more I
realize I need Semantics”

 Huh?

The traditional database community has,
umm, not always been the first to embrace
semantics

What is different here?
Government Data Sharing

Tetherless World Constellation
The Web of Open
Government Data is Growing
• Analytics based on over 1,000,000 datasets
  from around the world can be seen at
   – http://logd.tw.rpi.edu/iogds_data_analytics
• The examples that follow are from that page
Datasets                 1,028,054
Countries                43
Catalogs                 192
Categories               2460
Languages                24
          2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           17
International




          2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           18
2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           19
Many others…




                                                   Important note:
                                                   quantity is not really the most
                                                   important issue

          2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           20
Topics (Across All Catalogs)




          2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           21
Topics (Across All Catalogs)




          2012 International Open Government Data Conference—Open Gov Data Tutorial
9 July 2012                                                                           22
Combining data from different data sharing sites

                       Tetherless World Constellation
Data Integration Problems

                                       Tetherless World Constellation




Head to head comparions shows that
burglaries in Avon and Somerset (UK) far
exceed those in Los Angeles, California
(one of the highest crime areas in the US)
The problem is (likely) semantics

                                          Tetherless World Constellation




                                                        Same or
                                                        different?




Do the terms mean the same? Are they collected in the same way? Are
they processed differently? …
Example: Water

Tetherless World Constellation
Example: Water/Kenya

Tetherless World Constellation
Finding Data

                        Tetherless World Constellation




World Bank: Africa     Africover: Agriculture




 Kenya: Agricultural   US Data.gov: Crop
5 Star Data

                                         Tetherless World Constellation




              IOGDC Open Data Tutorial             29
9 July 2012
Broad Data “Integration”
requires simple semantics
 Tetherless World Constellation
Example any wikipedia topic!

   Tetherless World Constellation
Arizona

Tetherless World Constellation
Arizona info (From the previous)

       Tetherless World Constellation
USDA data turns out to be crucial

        Tetherless World Constellation
Metadata is crucial for Broad Data
                                           Tetherless World Constellation


• Metadata design is crucial to govt data
  sharing
  – Needed for search and federation in large data
    sharing efforts
• International data sharing
  – W3C Govt Linked Data Working Group
  – Need for vocabularies within govt sectors
     • Esp for cross-langauge use
        – How can we compare health (or legal, or social, or ….) data
          between countries like US, UK, India, Kenya (English) with
          Norway, China, France, etc.
        – How can we link local govts (in traditional languages, local
          dialects, etc) w/national data
Database metadata

Tetherless World Constellation
Dataset extension to schema.org (pending)

                 Tetherless World Constellation
Government Data in the linked open data cloud

                     Tetherless World Constellation




    Government Data is
    currently over ½ the cloud in
    size (~17B triples), 10s of
    thousands of links to other
    data (within and without)

http://linkeddata.org/
Research in Govt Data => Broad Data challenges

                                             Tetherless World Constellation

• Trust
   – Government data is controversial, and potentially biased
       • How do we confirm or dispute?
• Combination
   – When we combine data we need to keep the provenance of
     information (see trust)
       • How do we make policies explicit and sharable
• Scaling
   – Our project has already converted 9.9B triples from only
     >2,000 of the 710,000 government databases we can identify
     (116 catalogs, 32 countries, 16 languages)
       • Cross-catalog
       • Cross Langauge
• Versioning and updating
• Archiving
• Visualization
Big Data needs bigger ideas
            for visualization
          Tetherless World Constellation




      (Fox &Hendler, Science, 2/11/10)
A new idea we’re playing with at RPI

                               Tetherless World Constellation


• Data as “exhibition”
  – Museums/Performing Arts have explored
    accessibility for real world artifacts, can
    we extend these to the data web?
• Data via physical
  interaction
  – Using theatre techniques
    we can literally move a
    person through a data landscape, what
    new metaphors does this open up?
Conclusions
                                    Tetherless World Constellation

• Big data is going Broad
  – World Wide Web trend towards more and more
    varied data
     • In many domains
        – E-commerce, Open Govt, many more (cf.
          Health/Medical care)

• Broad data requires thinking outside the
  “Database” box
  – Including considering access
• Broad data opens exciting possibilities for
  research and innovation
  – And I hope will help provide tools for making
    data more accessible

Mais conteúdo relacionado

Mais procurados

The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update James Hendler
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)James Hendler
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015Jonathan Woodward
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingJack Park
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopIan Hopkinson
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 

Mais procurados (20)

The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 

Semelhante a Data Big and Broad (Oxford, 2012)

Semantic Web: "ten year" update
Semantic Web: "ten year" updateSemantic Web: "ten year" update
Semantic Web: "ten year" updateJames Hendler
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedRensselaer Polytechnic Institute
 
Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Haklae Kim
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateJames Hendler
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?Li Ding
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notesBernadette Hyland-Wood
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data ExperienceDublinked .
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic webTony Dobaj
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 

Semelhante a Data Big and Broad (Oxford, 2012) (20)

Semantic Web: "ten year" update
Semantic Web: "ten year" updateSemantic Web: "ten year" update
Semantic Web: "ten year" update
 
The Future of LOD
The Future of LODThe Future of LOD
The Future of LOD
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
PhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek JainPhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek Jain
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 

Mais de James Hendler

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")James Hendler
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide WebJames Hendler
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs James Hendler
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic WebJames Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...James Hendler
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep LearningJames Hendler
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIJames Hendler
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)James Hendler
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?James Hendler
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's PerspectiveJames Hendler
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013James Hendler
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)James Hendler
 

Mais de James Hendler (18)

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep Learning
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AI
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Data Big and Broad (Oxford, 2012)

  • 1. Tetherless World Constellation Data: Big and Broad Jim Hendler Tetherless World Constellation Tetherless World Professor of Computer and Cognitive Science Head, Computer Science Department Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler (twitter)
  • 2. Outline (if I stick to it) Tetherless World Constellation • What is big data? • How big is big? • What is big data on the Web? • What is Broad data? • Got an example? • What’s the problem? • What’s going on
  • 3. Useful Terms Tetherless World Constellation • Machine-readable Data – Information available in a form that is accessible and manipulable by computer – Accessible ≠ Manipulable • eg PDF documents can be read in and displayed, but the information in the document is not readily available without special tooling • Metadata – Information associated with (machine-readable) data that provides information about the data set • Workflow, Provenance, and lots of other terms – Useful sorts of metadata with respect to who created the data, when, how was it processed, etc. • Metadata and the other stuff most useful when it is machine-readable and openly available in commonly agreed upon formats
  • 4. BIG Data is NOT the Web of Data Tetherless World Constellation • The term “Big Data” is widely used nowadays to refer to a whole bunch of machine-readable data in one accessible (to the researcher) place – 3 main contexts • The large data collections of “big science” projects – in traditional data warehouse or database formats • The enterprise data of large, non-Web-based companies (IBM, TATA, etc.) – Generally in multiple • The data holdings of a Google, Facebook or other large Web company – Include large “unstructured” holdings – Include “graph” data
  • 5. Tera, Peta, Zeta yotta, yotta, yotta… Tetherless World Constellation • World Wide Web data is extremely large • Extremely well “funded” – eg. Facebook • 25 Terabytes of logged data per day; valuation $33B (US NIH budget ~ $31B) – eg. Google • In 2008 it was estimated at 20 petabytes per day (not including youTube); current valuation $190B (about 1/3 the entire US DoD budget) • And really, really fascinating stuff – Data about people and their relationships • To each other • To products • To activities and actions • …
  • 6. How BIG is Big? Tetherless World Constellation
  • 7. BIG Data Tetherless World Constellation Google uses their data in many ways Search => ads => user
  • 8. Big Data is becoming different on the Web Tetherless World Constellation • New Work – is moving away from traditional relational models • cf. NoSQL – Moving towards third party application and extension • cf. Mobile apps for local governments – Includes a focus on interoperability and exchange with “lightweight” semantics • Using ideas from the Semantic Web – Search: Schema.org – Social Networking: OGP
  • 9. Which in part gives rise to BROAD data Tetherless World Constellation • 4th context: Broad Data – The huge amount of freely available, but widely varied, Open Data on the World Wide Web (Structured and Semi-structured) • Example: The extended Facebook OGP graph (the part outside Facebook’s datasets) • Example: The growing linked open data cloud of freely available RDF linked data • Example: Hundreds of thousands of datasets that are available on the Web free from governments around the world
  • 10. Example: adding “Breadth” Tetherless World Constellation April 2010
  • 11. Facebook’s Open Graph Protocol Tetherless World Constellation • Facebook now allows other sites to extend the graph • Open Graph Protocol uses RDFa to let web sites contain information about the things people “like” og:title - The title of your object as it should appear within the graph, e.g., "The Rock". og:type - The type of your object, e.g., "movie". Depending on the type you specify, other properties may also be required. og:image - An image URL which should represent your object within the graph. og:url - The canonical URL of your object that will be used as its permanent ID in the graph og:description - A one to two sentence description of your object. og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., "IMDb". – Not a traditional “ontology”
  • 12. Big Data Tetherless World Constellation Facebook generates terabytes of data per day What could be learned from this?
  • 13. Creates a platform for SW-powered apps Tetherless World Constellation
  • 14. BROAD data challenges Tetherless World Constellation • For broad data the new challenges that emerge include – (Web-scale) data search – “Crowd-sourced” modeling – rapid (and potentially ad hoc) integration of datasets – visualization and analysis of only- partially modeled datasets – policies for data use, reuse and combination.
  • 15. Huh? Tetherless World Constellation “The more I work with data, the more I realize I need Semantics” Huh? The traditional database community has, umm, not always been the first to embrace semantics What is different here?
  • 16. Government Data Sharing Tetherless World Constellation
  • 17. The Web of Open Government Data is Growing • Analytics based on over 1,000,000 datasets from around the world can be seen at – http://logd.tw.rpi.edu/iogds_data_analytics • The examples that follow are from that page Datasets 1,028,054 Countries 43 Catalogs 192 Categories 2460 Languages 24 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 17
  • 18. International 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 18
  • 19. 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 19
  • 20. Many others… Important note: quantity is not really the most important issue 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 20
  • 21. Topics (Across All Catalogs) 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 21
  • 22. Topics (Across All Catalogs) 2012 International Open Government Data Conference—Open Gov Data Tutorial 9 July 2012 22
  • 23. Combining data from different data sharing sites Tetherless World Constellation
  • 24. Data Integration Problems Tetherless World Constellation Head to head comparions shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California (one of the highest crime areas in the US)
  • 25. The problem is (likely) semantics Tetherless World Constellation Same or different? Do the terms mean the same? Are they collected in the same way? Are they processed differently? …
  • 28. Finding Data Tetherless World Constellation World Bank: Africa Africover: Agriculture Kenya: Agricultural US Data.gov: Crop
  • 29. 5 Star Data Tetherless World Constellation IOGDC Open Data Tutorial 29 9 July 2012
  • 30. Broad Data “Integration” requires simple semantics Tetherless World Constellation
  • 31. Example any wikipedia topic! Tetherless World Constellation
  • 33. Arizona info (From the previous) Tetherless World Constellation
  • 34. USDA data turns out to be crucial Tetherless World Constellation
  • 35. Metadata is crucial for Broad Data Tetherless World Constellation • Metadata design is crucial to govt data sharing – Needed for search and federation in large data sharing efforts • International data sharing – W3C Govt Linked Data Working Group – Need for vocabularies within govt sectors • Esp for cross-langauge use – How can we compare health (or legal, or social, or ….) data between countries like US, UK, India, Kenya (English) with Norway, China, France, etc. – How can we link local govts (in traditional languages, local dialects, etc) w/national data
  • 37. Dataset extension to schema.org (pending) Tetherless World Constellation
  • 38. Government Data in the linked open data cloud Tetherless World Constellation Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without) http://linkeddata.org/
  • 39. Research in Govt Data => Broad Data challenges Tetherless World Constellation • Trust – Government data is controversial, and potentially biased • How do we confirm or dispute? • Combination – When we combine data we need to keep the provenance of information (see trust) • How do we make policies explicit and sharable • Scaling – Our project has already converted 9.9B triples from only >2,000 of the 710,000 government databases we can identify (116 catalogs, 32 countries, 16 languages) • Cross-catalog • Cross Langauge • Versioning and updating • Archiving • Visualization
  • 40. Big Data needs bigger ideas for visualization Tetherless World Constellation (Fox &Hendler, Science, 2/11/10)
  • 41. A new idea we’re playing with at RPI Tetherless World Constellation • Data as “exhibition” – Museums/Performing Arts have explored accessibility for real world artifacts, can we extend these to the data web? • Data via physical interaction – Using theatre techniques we can literally move a person through a data landscape, what new metaphors does this open up?
  • 42. Conclusions Tetherless World Constellation • Big data is going Broad – World Wide Web trend towards more and more varied data • In many domains – E-commerce, Open Govt, many more (cf. Health/Medical care) • Broad data requires thinking outside the “Database” box – Including considering access • Broad data opens exciting possibilities for research and innovation – And I hope will help provide tools for making data more accessible