SlideShare uma empresa Scribd logo
1 de 26
From relational databases to linked
   data:R for the semantic web
            Jose Quesada,
      Max Planck Institute, Berlin
Who this talk targets
• You have big data; you use a database

• You have an evolving schema definition.
  Sometimes at runtime

• You are interested in alternative ways to present
  your data

• You would thrive by using data out there, if only
  they were more accessible
Semantic web
Credit: Jim Hendler

THE TWO TOWERS
The Semantic web
        • Ontology as Barad-dur
          (Sauron’s tower)
          – Extremely powerful

          – Patrolled by Orcs
             • Let one little hobbit in it,
               and the whole thing could
               come crashing down
          – OWL
The Semantic web
        • Ontology as Barad-dur
          (Sauron’s tower)
          – Extremely powerful
                     Decidable logic basis
          – Patrolled by Orcs
                           inconsistency
             • Let one little hobbit in it,
               and the whole thing could
               come crashing down
          – OWL
Inconsistency
The semantic web
        • The tower of Babel
           – We will build a tower to
             reach the sky
           – We only need a little
             ontological agreement
              • Who cares if we all speak
                different languages?
        This is RDFS
        Statistics matter here
        Web-scale
        Lots of data; finding
          anything in the mess can
          be a win
Approaches to data representation

•   Objects
•   Tables (relational databases)
•   Non-relational databases
•   Tables (data.frame)
•   Graphs
What one can do with semantic web data,
now:
 People that died in Nazi Germany and if possible, any
 notable works that they might have created
SELECT *
WHERE {
  ?subject dbpprop:deathPlace
<http://dbpedia.org/resource/Nazi_Germany> .
  OPTIONAL {
    ?subject dbpedia-owl:notableworks ?works
  }
}
subject                          works
:Anne_Frank                      :The_Diary_of_a_Young_Girl
:Martin_Bormann                  -
:Ir%C3%A8ne_N%C3%A9mirovsky -
:Erich_Fellgiebel                -
:Friedrich_Ferdinand%2C_Duke_of_Schleswig-Holstein
                                 -

:Friedrich_Olbricht              -
:Ludwig_Beck                     -
:Erwin_Rommel                    -
:Maurice_Bavaud                  -
:Early_Years_of_Adolf_Hitler     -
:Emil_Zegad%C5%82owicz           -
:Friedrich_Fromm                 -
:Helmuth_James_Graf_von_Moltke
• Scale to the entire web   • Use cases:
                               – Real time city
                               – Cancer monographs for
• Do reasoning with open         WHO
  word assumption              – Gene expression finding

• Retrieval in real-time

• Go beyond logics
RDF is a graph
• We have lots of interesting statistics that run on graphs

• In many Semantic Web (SW) domains a tremendous
  amount of statements (expressed as triples) might be
  true but, in a given domain, only a small number of
  statements is known to be true or can be inferred to be
  true. It thus makes sense to attempt to estimate the
  truth values of statements by exploring regularities in
  the SW data with machine learning
Scale
• You cannot use the entire thing at once:
  subsetting

• Are there patterns in knowledge structures
  that we can use for subsetting?
Idea
• Graph theory applied to subsetting large graphs

• Developing Semantic Web applications requires
  handling the RDF data model in a programming
  language

• Problem: current software is developed in the
  object-oriented paradigm, programming in RDF is
  currently triple-based.
Data
   IMDB is a big graph:
   – 1.4 m movies
   – 1.7 m actors
   – 11 M connections
        • Movies have votes
   – Bipartite network

Packages: igraph:
   –   Nice functions that you cannot find anywhere else
   –   Uses Sparse Matrices
   –   Implemented in C
   –   Some support for bipartite networks

Rmysql, Matrix (sparse m)
Centrality
Centrality
Pagerank
  • The pagerank vector is
    the stationary
    distribution of a markov           1                3
    chain in a link matrix

  • Some assumptions to               2                 4
    warrant convergence

  • The typical value of d
    is .85
norm <- function(x) x/sum(x)
norm(eigen(0.15/nVertices + 0.85 * t(A))$vectors[,1])
Top movies by pageRank
             in the actor->movie network

degree pagerank      cluster imdbID   title                               rank       votes
       0.000243688
  1298 252192870          0    822609Around the World in Eighty Days (1956) 40031 6134
       0.000103540
   313 862390464          0     76352Beyond Our Control" (1968)"               0       0
       0.000091669
   291 0099912811         0    993780Gone to Earth (1950)                 7.0          291
       0.000089025
   285 5923652847         0    915626Deadlands 2: Trapped (2008)          39971         15
       0.000083882
   424 328163772          0 1282574Stuck on You (2003)                    6.0        19709
       0.000080824
   629 1101098043         0    622100Shortland Street" (1992)"          39850        225
Problems
• Graphs have advantages over
  RDBMS/tables[1]. But we are used to think in
  tables
• There is no direct way to handle RDF in R.
  worth an R package?
Linked data are out there for the grabs

We need to start thinking in terms of graphs,
and slowly move away from tables




    Thanks for your attention
    Jose Quesada, quesada@workingcogs.com, http://josequesada.name
    Twitter: @Quesada

Mais conteúdo relacionado

Semelhante a R for the semantic web, Quesada useR 2009

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
Involutionary%20Self-Replicating%20Machines.ppt_1
Involutionary%20Self-Replicating%20Machines.ppt_1Involutionary%20Self-Replicating%20Machines.ppt_1
Involutionary%20Self-Replicating%20Machines.ppt_1David Keirsey
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQueryCsaba Toth
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemFITC
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemFITC
 

Semelhante a R for the semantic web, Quesada useR 2009 (20)

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Involutionary%20Self-Replicating%20Machines.ppt_1
Involutionary%20Self-Replicating%20Machines.ppt_1Involutionary%20Self-Replicating%20Machines.ppt_1
Involutionary%20Self-Replicating%20Machines.ppt_1
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQuery
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Spark
SparkSpark
Spark
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
 
Nosql public
Nosql publicNosql public
Nosql public
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

R for the semantic web, Quesada useR 2009

  • 1. From relational databases to linked data:R for the semantic web Jose Quesada, Max Planck Institute, Berlin
  • 2. Who this talk targets • You have big data; you use a database • You have an evolving schema definition. Sometimes at runtime • You are interested in alternative ways to present your data • You would thrive by using data out there, if only they were more accessible
  • 4.
  • 5.
  • 7. The Semantic web • Ontology as Barad-dur (Sauron’s tower) – Extremely powerful – Patrolled by Orcs • Let one little hobbit in it, and the whole thing could come crashing down – OWL
  • 8. The Semantic web • Ontology as Barad-dur (Sauron’s tower) – Extremely powerful Decidable logic basis – Patrolled by Orcs inconsistency • Let one little hobbit in it, and the whole thing could come crashing down – OWL
  • 10. The semantic web • The tower of Babel – We will build a tower to reach the sky – We only need a little ontological agreement • Who cares if we all speak different languages? This is RDFS Statistics matter here Web-scale Lots of data; finding anything in the mess can be a win
  • 11. Approaches to data representation • Objects • Tables (relational databases) • Non-relational databases • Tables (data.frame) • Graphs
  • 12. What one can do with semantic web data, now: People that died in Nazi Germany and if possible, any notable works that they might have created SELECT * WHERE { ?subject dbpprop:deathPlace <http://dbpedia.org/resource/Nazi_Germany> . OPTIONAL { ?subject dbpedia-owl:notableworks ?works } }
  • 13. subject works :Anne_Frank :The_Diary_of_a_Young_Girl :Martin_Bormann - :Ir%C3%A8ne_N%C3%A9mirovsky - :Erich_Fellgiebel - :Friedrich_Ferdinand%2C_Duke_of_Schleswig-Holstein - :Friedrich_Olbricht - :Ludwig_Beck - :Erwin_Rommel - :Maurice_Bavaud - :Early_Years_of_Adolf_Hitler - :Emil_Zegad%C5%82owicz - :Friedrich_Fromm - :Helmuth_James_Graf_von_Moltke
  • 14. • Scale to the entire web • Use cases: – Real time city – Cancer monographs for • Do reasoning with open WHO word assumption – Gene expression finding • Retrieval in real-time • Go beyond logics
  • 15. RDF is a graph • We have lots of interesting statistics that run on graphs • In many Semantic Web (SW) domains a tremendous amount of statements (expressed as triples) might be true but, in a given domain, only a small number of statements is known to be true or can be inferred to be true. It thus makes sense to attempt to estimate the truth values of statements by exploring regularities in the SW data with machine learning
  • 16. Scale • You cannot use the entire thing at once: subsetting • Are there patterns in knowledge structures that we can use for subsetting?
  • 17.
  • 18. Idea • Graph theory applied to subsetting large graphs • Developing Semantic Web applications requires handling the RDF data model in a programming language • Problem: current software is developed in the object-oriented paradigm, programming in RDF is currently triple-based.
  • 19. Data IMDB is a big graph: – 1.4 m movies – 1.7 m actors – 11 M connections • Movies have votes – Bipartite network Packages: igraph: – Nice functions that you cannot find anywhere else – Uses Sparse Matrices – Implemented in C – Some support for bipartite networks Rmysql, Matrix (sparse m)
  • 22. Pagerank • The pagerank vector is the stationary distribution of a markov 1 3 chain in a link matrix • Some assumptions to 2 4 warrant convergence • The typical value of d is .85 norm <- function(x) x/sum(x) norm(eigen(0.15/nVertices + 0.85 * t(A))$vectors[,1])
  • 23.
  • 24. Top movies by pageRank in the actor->movie network degree pagerank cluster imdbID title rank votes 0.000243688 1298 252192870 0 822609Around the World in Eighty Days (1956) 40031 6134 0.000103540 313 862390464 0 76352Beyond Our Control" (1968)" 0 0 0.000091669 291 0099912811 0 993780Gone to Earth (1950) 7.0 291 0.000089025 285 5923652847 0 915626Deadlands 2: Trapped (2008) 39971 15 0.000083882 424 328163772 0 1282574Stuck on You (2003) 6.0 19709 0.000080824 629 1101098043 0 622100Shortland Street" (1992)" 39850 225
  • 25. Problems • Graphs have advantages over RDBMS/tables[1]. But we are used to think in tables • There is no direct way to handle RDF in R. worth an R package?
  • 26. Linked data are out there for the grabs We need to start thinking in terms of graphs, and slowly move away from tables Thanks for your attention Jose Quesada, quesada@workingcogs.com, http://josequesada.name Twitter: @Quesada