SlideShare uma empresa Scribd logo
1 de 41
in collaboration with  Georgiana Ifrim, Gjergji Kasneci, Josiane Parreira, Maya Ramanath,  Ralf Schenkel, Fabian Suchanek, Martin Theobald
DB and IR: Two Parallel Universes canonical  application: accounting libraries data type: numbers, short strings text foundation: algebraic / logic based probabilistic / statistics based search paradigm: Boolean retrieval (exact queries, result sets/bags)‏ ranked retrieval (vague queries, result lists)‏ Database Systems Information Retrieval market leaders: Oracle, IBM DB2, MS SQL Server, etc. Google, Yahoo!, MSN, Verity, Fast, etc. parallel universes forever ?
Why DB&IR Now? – Application Needs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Simplify life for application areas like: Typical data: Disease (DId, Name,  Category , Pathogen …)   UMLS-Categories ( … )‏ Patient (… Age, HId, Date,  Report , TreatedDId)  Hospital (HId,  Address  …) Typical query:  symptoms of  tropical virus diseases  and  reported anomalies with young patients in  central Europe  in the last two weeks
Why DB&IR Now? – Platform Desiderata Structured data (records)‏ Unstructured data (documents)‏ Unstructured search (keywords)‏ Structured search (SQL,XQuery)‏ DB Systems IR Systems Search Engines Keyword Search on Relational Graphs (IIT Bombay, UCSD, MSR, Hebrew U, CU Hong Kong, Duke U, ...)‏ Querying entities & relations from IE (MSR Beijing, UW Seattle, IBM Almaden, UIUC, MPI, … )‏ Platform desiderata (from app developer‘s viewpoint): ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Integrated DB&IR Platform
Why DB&IR Forever? Turn the Web, Web2.0, and Web3.0 into the world‘s  most comprehensive  knowledge base  („ semantic DB “) !   ,[object Object],[object Object],[object Object],  2000   2007 indexed Web  2 Bio.   20 Bio. Flickr photos   ---   100 Mio. digital photos   ?   150 Bio.  Wikipedia  8 000   1.8 Mio. OECD researchers  7.4 Mio.   8.4 Mio. patents world-wide   ?  60 Mio. US Library of Congres   115 Mio.   134 Mio. Google Scholar   ---   500 Mio.
Outline • Past • Future • Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Parallel Universes: A Closer Look Matter Antimatter ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DB IR 1990 1995 2000 2005 VAGUE (Motro)‏ Proximal Nodes (Baeza-Yates et al.)‏ WHIRL (Cohen)‏ Prob. Datalog (Fuhr et al.)‏ INEX XPath XPath Full-Text Prob. DB (Cavallo&Pittarelli)‏ Prob. Tuples (Barbara et al.)‏ Web Entity Search: Libra, Avatar, ExDB … Faceted Search: Flamenco … 1st Gen. XML IR: XXL, XIRQL, Elixir, JuruXML Multimedia IR Web Query Languages: W3QS, WebOQL, Araneus … Semistructured Data:  Lore, Xyleme … 2nd Gen. XML IR: XRank,Timber, TIJAH, XSearch, FleXPath, CoXML, TopX, MarkLogic, Fast … Uncertain & Prob. Relations: Mystiq, Trio … Struct. Docs Deep Web Search Digital Libraries Graph IR
WHIRL: IR over Relations  [W.W. Cohen: SIGMOD’98] Add text-similarity selection and join to relational algebra Example:  Select * From Movies M, Reviews R  Where M.Plot  ~   ” fight“ And M.Year > 1990 And R.Rating > 3 And M.Title  ~  R.Title And M.Plot  ~  R.Comment Title  Plot  …  Year Movies Title  Comment  …  Rating Reviews Matrix Hero Matrix 1 Matrix Reloaded Matrix Eigenvalues Ying xiong aka. Hero Shrek 2 …  matrix spectrum  …  orthonormal …  …  fight for peace … …  sword fight …  dramatic colors … …  In ancient China …  fights  …  sword fight … fights Broken Sword … In the near future …  computer hacker Neo … …  fight training … …  cool fights … new techniques … …  fights … and more fights … …  fairly boring … 1999 2002 2004 In Far Far Away … our lovely hero fights with cat killer … 4 1 5 5 ,[object Object],[object Object],[object Object],Scoring and ranking: s (<x,y>, q: A~B) = cosine (x.A, y.B)  s (<x,y>, q 1     …    q m ) =  x j  ~  tf  (word j in x)     idf  (word j)‏ with dampening & normalization
XXL: Early XML IR  [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] Which professors  from Saarbruecken (SB)‏ are teaching IR and have research projects on XML? Union of  heterogeneous  sources  without global schema   Similarity-aware XPath: // ~ Professor   [//* =  ” ~ SB“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Similarity-aware XPath: // ~ Professor   [//* =  ” ~ SB“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Professor Name : Gerhard Weikum Address ... City : SB Country :  Germany Teaching Research   Course Title :  IR Description :  Information  retrieval ... Syllabus ... Book Article ... ... Project Title :  Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked  retrieval … Literature:  … Scientific Name: INEX task coordinator (Initiative for the  Evaluation of XML …)‏ Other Sponsor:  EU …
XXL: Early XML IR  [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Similarity-aware XPath: // ~ Professor   [//* =  ” ~ Saarbruecken“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Which professors  from Saarbruecken (SB)‏ are teaching IR and have research projects on XML? Motivation: Union of heterogeneous sources has no schema  Professor Name : Gerhard Weikum Address ... City : SB Country :  Germany Teaching Research   Course Title :  IR Description :  Information  retrieval ... Syllabus ... Book Article ... ... Project Title :  Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked  retrieval … Literature:  … Scientific Name: INEX task coordinator (Initiative for the  Evaluation of XML …)‏ Other Sponsor:  EU … Wu&Palmer: |path| through lca(x,y)‏ Dice coeff.: 2 #(x,y) / (#x + #y) on Web query expansion model: disjunction of tags magician wizard intellectual artist alchemist director primadonna professor teacher scholar academic, academician, faculty member scientist researcher HYPONYM (0.749)‏ investigator mentor RELATED (0.48)‏ lecturer
The Past: Lessons Learned  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],precision recall //  ~Professor [...] //  { Professor, Researcher,  Lecturer, Scientist,  Scholar, Academic, ... }[...] element gold produce Golden Delicious entity food substance solid edible fruit apple pome
Outline  Past • Future • Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
TopX: 2nd Generation XML IR ” Semantic“ XPath Full-Text query:  / Article  [ftcontains(// Person ,  ” Max Planck“)] [ftcontains(// Work ,  ” quantum physics“)] // Children [@ Gender  =  ” female“]// Birthdates supported by  TopX  engine:  http://infao5501.ag5.mpi-sb.mpg.de:8080/topx/ http://topx.sourceforge.net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Martin Theobald, Ralf Schenkel, GW: VLDB’05, VLDB Journal]
Commercial Break [Martin Theobald, Ralf Schenkel, GW: VLDB’95] TopX demo  today 3:30 – 5:30
Principled Ranking by Probabilistic IR odds for item d with terms d i  being relevant for  query q = {q 1 , …, q m } binary features, conditional independence of features [Robertson & Sparck-Jones 1976] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],„ God does not play dice.“ (Einstein)‏ IR does. with related to but different from statistical language models  Relationship to tf*idf ,[object Object],[object Object],[object Object]
Probabilistic Ranking for SQL SQL queries that return  many answers  need ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],odds for tuple d with attributes X  Y  relevant for  query  q: X 1 =x 1    …     X m =x m Estimate prob‘s, exploiting  workload  W: [S. Chaudhuri, G. Das, V. Hristidis, GW: TODS‘06] ,[object Object],[object Object],[object Object],[object Object]
From Tables and Trees to Graphs Example:  Conferences (CId, Title, Location, Year) Journals (JId, Title)‏ CPublications (PId, Title, CId) JPublications (PId, Title, Vol, No, Year)  Authors (PId, Person) Editors (CId, Person)‏ Select * From * Where * Contains  ” Gray, DeWitt, XML, Performance “  And Year > 95 Schema-agnostic  keyword search  over  multiple tables : graph of tuples with foreign-key relationships as edges  [BANKS, Discover, DBExplorer, KUPS, SphereSearch, BLINKS] Result is  connected tree  with nodes that contain  as many query keywords as possible Ranking:  with  nodeScore  based on tf*idf or prob. IR and  edgeScore  reflecting importance of relationships (or confidence, authority, etc.)‏ ,[object Object],[object Object],[object Object],[object Object],[object Object],Top-k querying:  compute best trees, e.g. Steiner trees (NP-hard)
The Present: Observations & Opportunities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],actor movie movie plot director movie actor actor director plot ” life physicist Max Planck“ //article[//person ”Max Planck“] [//category ”physicist“] //biography
Outline  Past • Future  Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Knowledge Queries  Nobel laureate who survived both world wars and his children drama with three women making a prophecy  to a British nobleman that he will become king proteins that inhibit both protease and some other enzyme connection between Thomas Mann and Goethe differences in Rembetiko music from Greece and from Turkey neutron stars with Xray bursts > 10 40  erg s -1  & black holes in 10‘‘  market impact of Web2.0 technology in December 2006  sympathy or antipathy for Germany from May to August 2006 Turn the Web, Web2.0, and Web3.0 into the world‘s  most comprehensive  knowledge base  („ semantic DB “) !  Answer „knowledge queries“ such as:
Three Roads to Knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-Quality Knowledge Sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],growing with strong momentum
High-Quality Knowledge Sources General-purpose  thesauri  and concept networks:  WordNet  family enzyme  -- (any of several complex proteins that are produced by cells and  act as catalysts in specific biochemical reactions)‏ =>  protein  -- (any of a large group of nitrogenous organic compounds  that are essential constituents of living cells; ...)‏ => macromolecule, supermolecule  ... =>  organic compound  -- (any compound of carbon  and another element or a radical)‏ ...  =>  catalyst, accelerator  -- ((chemistry) a substance that initiates or  accelerates a chemical reaction  without itself being affected)‏ =>  activator  -- ((biology) any agency bringing about activation; ...)‏ ,[object Object],[object Object],[object Object],[object Object]
High-Quality Knowledge Sources Wikipedia  and other lexical sources
Exploit Hand-Crafted Knowledge {{Infobox_Scientist | name = Max Planck | birth_date = [[April 23]], [[1858]]  | birth_place = [[Kiel]], [[Germany]] | death_date = [[October 4]], [[1947]] | death_place = [[Göttingen]], [[Germany]] | residence = [[Germany]]  | nationality = [[Germany|German]]  | field = [[Physicist]] | work_institution = [[University of Kiel]]</br>  [[Humboldt-Universität zu Berlin]]</br> [[Georg-August-Universität Göttingen]] | alma_mater = [[Ludwig-Maximilians-Universität München]] | doctoral_advisor = [[Philipp von Jolly]] | doctoral_students =  [[Gustav Ludwig Hertz]]</br> …  | known_for  = [[Planck's constant]],  [[Quantum mechanics|quantum theory]] | prizes =  [[Nobel Prize in Physics]] (1918)‏ … Wikipedia, WordNet,  and other lexical sources
YAGO: Yet Another Great Ontology [F. Suchanek, G. Kasneci, GW: WWW 2007] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],entity1 entity2 relation Max_Planck Kiel bornIn Kiel City isInstanceOf Examples:
YAGO Knowledge Representation Entity Max_Planck April 23, 1858 Person City Country subclass Location subclass instanceOf subclass subclass bornOn “ Max Planck” means “ Dr. Planck” means subclass October 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means “ Max Karl Ernst Ludwig Planck” Physicist instanceOf subclass Biologist subclass concepts individuals words Online access and download at  http://www.mpi-inf.mpg.de/~suchanek/yago/   Accuracy: 97% Knowledge Base  # Facts KnowItAll   30 000 SUMO   60 000 WordNet   200 000 OpenCyc   300 000 Cyc    5 000 000 YAGO   6 000 000
NAGA: Graph IR on YAGO  [G. Kasneci et al.: WWW‘07] queries with regular expressions Ling $x scientist isa hasFirstName | hasLastName $y Zhejiang locatedIn * worksFor conjunctive queries Beng Chin Ooi (coAuthor | advisor) * Kiel $x scientist isa bornIn Graph-based search on YAGO-style knowledge bases  with built-in  ranking  based on  confidence  and  informativeness    statistical language model for result graphs
Ranking Factors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],bornIn (Max Planck, Kiel)  from „ Max Planck was born in Kiel“ (Wikipedia)‏ livesIn (Elvis Presley, Mars)  from „ They believe Elvis hides on Mars“ (Martian Bloggeria)‏ q: isa (Einstein, $y)‏ isa (Einstein, scientist)‏ isa (Einstein, vegetarian)‏ q: isa ($x, vegetarian)‏ isa (Einstein, vegetarian)‏ isa (Al Nobody, vegetarian)‏ Einstein vegetarian Bohr Nobel Prize Tom Cruise 1962 isa isa bornIn diedIn won won
Information Extraction (IE): Text to Records combine NLP, pattern matching, lexicons, statistical learning Max Planck  4/23, 1858  Kiel Albert Einstein  3/14, 1879  Ulm  Mahatma Gandhi 10/2, 1869  Porbandar Person  BirthDate  BirthPlace  ... Person  ScientificResult Max Planck Quantum Theory Person  Collaborator Max Planck  Albert Einstein Max Planck  Niels Bohr Planck‘s constant  6.226  10 23   Js Constant  Value  Dimension
Knowledge Acquisition from the Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Existing approaches and tools (Snowball [Gravano et al. 2000], KnowItAll [Etzioni et al. 2004], …): almost-unsupervised pattern matching and learning: seeds (known facts)    patterns (in text)    (extraction) rule    (new) facts
Methods for Web-Scale Fact Extration city(Beijing)   plays(Coltrane, sax)   city(Beijing)   old center of Beijing plays(Coltrane, sax)   sax player Coltrane city(Beijing)   old center of Beijing old center of X plays(Coltrane, sax)   sax player Coltrane Y player X Example: city (Seattle)  in downtown Seattle  city (Seattle)  Seattle and other towns  city (Las Vegas)   Las Vegas and other towns plays (Zappa, guitar)  playing guitar: … Zappa plays (Davis, trumpet)  Davis … blows trumpet seeds     text       rules     new facts  Example: city (Seattle)  in downtown Seattle  in downtown X city (Seattle)  Seattle and other towns  X and other towns city (Las Vegas)   Las Vegas and other towns X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y Example: city (Seattle)  in downtown Seattle  in downtown X city (Seattle)  Seattle and other towns  X and other towns city (Las Vegas)   Las Vegas and other towns  X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y Example: city (Seattle)  in downtown Seattle   in downtown X city (Seattle)  Seattle and other towns   X and other towns city (Las Vegas)    Las Vegas and other towns X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y   in downtown Beijing city(Beijing)‏   Coltrane blows sax plays(C., sax)‏ Assessment of facts & generation of rules based on statistics Rules can be more sophisticated:  playing NN: (ADJ|ADV)* NP & class(NN)=instrument & class(head(NP))=person     plays(head(NP), NN)‏
Performance of Web-IE State-of-the-art precision/recall results: Anecdotic evidence: invented (A.G. Bell, telephone)‏ married (Hillary Clinton, Bill Clinton)‏ isa (yoga, relaxation technique)‏ isa ( zearalenone, mycotoxin)‏ contains (chocolate,  theobromine)‏ contains (Singapore sling, gin)‏ invented (Johannes Kepler, logarithm tables)‏ married (Segolene Royal, Francois Hollande)‏ isa (yoga, excellent way)‏ isa (your day, good one)‏ contains (chocolate, raisins)‏ plays (the liver, central role)‏ makes (everybody, mistakes)‏ relation precision  recall   corpus  systems countries 80%   90%   Web  KnowItAll cities 80%  ???   Web  KnowItAll scientists 60%   ???   Web KnowItAll headquarters 90%   50%   News  Snowball, LEILA birthdates 80%   70%   Wikipedia  LEILA instanceOf 40%   20%   Web Text2Onto, LEILA Open IE 80%   ???   Web TextRunner precision value-chain: entities 80%, attributes 70%, facts 60%, events 50%
Beyond Surface Learning with LEILA Almost-unsupervised Statistical Learning with Dependency Parsing Limitation of surface patterns: who discovered or invented what “ Tesla ’s work formed the basis of  AC electric power ”  Learning to Extract Information by Linguistic Analysis [F.Suchanek, G.Ifrim, GW: KDD‘06] ,[object Object],[object Object],[object Object],[object Object],“ Al Gore  funded more work for a better basis of the  Internet ” (Cologne, Rhine), (Cairo, Nile), …  (Cairo, Rhine), (Rome, 0911), (  ,   [0..9]*  ), … Paris  was founded on an island in the  Seine (Paris, Seine)  Ss Pv MVp Ds Js DG Js MVp NP VP VP PP NP NP PP NP NP Cologne  lies on the banks of the  Rhine Ss MVp DMc Mp Dg Js Jp NP PP VP NP PP NP NP NP People in  Cairo  like wine from the  Rhine  valley Mp Js Os Sp Mvp Ds Js AN NP NP PP VP PP NP NP NP NP
IE Efficiency and Accuracy Tradeoffs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IE is cool, but what‘s in it for DB folks? [see also tutorials by Cohen, Doan/Ramakrishnan/Vaithyanathan, Agichtein/Sarawagi]
The Future: Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline  Past  Future  Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Major Trends in DB and IR malleable schema (later)‏ deep NLP, adding structure record linkage info extraction graph mining entity-relationship graph IR  ontologies ranking Database Systems Information Retrieval statistical language models data uncertainty programmability search as Web Service dataspaces Web objects Web 2.0 Web 2.0
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DB&IR:  Both Sides Now ,[object Object],[object Object],[object Object],Thank You ! DB&IR

Mais conteúdo relacionado

Mais procurados

Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
Lihua Zhao
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
Marko Rodriguez
 

Mais procurados (19)

Lacey Liu SDE II Resume
Lacey Liu SDE II ResumeLacey Liu SDE II Resume
Lacey Liu SDE II Resume
 
Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge Acquisition
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
co:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidalco:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidal
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative Networks
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries
 

Destaque

Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
aokutur
 
Unit 14b Types of Managed Funds
Unit 14b Types of Managed FundsUnit 14b Types of Managed Funds
Unit 14b Types of Managed Funds
Andrew Hingston
 
Unit 11d Property disadvantages
Unit 11d Property disadvantagesUnit 11d Property disadvantages
Unit 11d Property disadvantages
Andrew Hingston
 
Sonajero de chupetes
Sonajero de chupetesSonajero de chupetes
Sonajero de chupetes
diegoredondo
 
KAYA KARATAS 26 02 2015 ckm konseri
KAYA KARATAS 26 02 2015 ckm konseriKAYA KARATAS 26 02 2015 ckm konseri
KAYA KARATAS 26 02 2015 ckm konseri
aokutur
 
Robotics Fall 2009
Robotics  Fall 2009Robotics  Fall 2009
Robotics Fall 2009
Anna Donskoy
 
Fra idé til handling
Fra idé til handlingFra idé til handling
Fra idé til handling
Aud Hakestad
 
Gospel Family Reunion Small Summary
Gospel Family Reunion Small SummaryGospel Family Reunion Small Summary
Gospel Family Reunion Small Summary
fpres1079
 

Destaque (20)

Career Outlook
Career OutlookCareer Outlook
Career Outlook
 
Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
Cem Karaca Kültür Merkezinde SERAP MUTLU AKBULUT Konseri resimleri 09_11_2013
 
Unit 14b Types of Managed Funds
Unit 14b Types of Managed FundsUnit 14b Types of Managed Funds
Unit 14b Types of Managed Funds
 
Unit 11d Property disadvantages
Unit 11d Property disadvantagesUnit 11d Property disadvantages
Unit 11d Property disadvantages
 
Cyber covenant
Cyber covenantCyber covenant
Cyber covenant
 
Sonajero de chupetes
Sonajero de chupetesSonajero de chupetes
Sonajero de chupetes
 
KAYA KARATAS 26 02 2015 ckm konseri
KAYA KARATAS 26 02 2015 ckm konseriKAYA KARATAS 26 02 2015 ckm konseri
KAYA KARATAS 26 02 2015 ckm konseri
 
De Novo
De NovoDe Novo
De Novo
 
Evan & ethan
Evan & ethanEvan & ethan
Evan & ethan
 
Информационный вестник. Июнь 2011
Информационный вестник. Июнь 2011Информационный вестник. Июнь 2011
Информационный вестник. Июнь 2011
 
When thieves strike: Executive briefing on SWIFT attacks
When thieves strike: Executive briefing on SWIFT attacksWhen thieves strike: Executive briefing on SWIFT attacks
When thieves strike: Executive briefing on SWIFT attacks
 
Unit 3d Job interviews
Unit 3d Job interviewsUnit 3d Job interviews
Unit 3d Job interviews
 
Defrag2014 anomalies final
Defrag2014 anomalies finalDefrag2014 anomalies final
Defrag2014 anomalies final
 
You Had Me at Hello: Tips for Building Relationships with Media and Influence...
You Had Me at Hello: Tips for Building Relationships with Media and Influence...You Had Me at Hello: Tips for Building Relationships with Media and Influence...
You Had Me at Hello: Tips for Building Relationships with Media and Influence...
 
Game Design Document
Game Design DocumentGame Design Document
Game Design Document
 
Robotics Fall 2009
Robotics  Fall 2009Robotics  Fall 2009
Robotics Fall 2009
 
Fra idé til handling
Fra idé til handlingFra idé til handling
Fra idé til handling
 
Бизнес-потенциал социальных технологий_РУС
Бизнес-потенциал социальных технологий_РУСБизнес-потенциал социальных технологий_РУС
Бизнес-потенциал социальных технологий_РУС
 
Gospel Family Reunion Small Summary
Gospel Family Reunion Small SummaryGospel Family Reunion Small Summary
Gospel Family Reunion Small Summary
 
Why Papble
Why PapbleWhy Papble
Why Papble
 

Semelhante a DB-IR-ranking

osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edu
butest
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
butest
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
Mounia Lalmas-Roelleke
 
download
downloaddownload
download
butest
 
download
downloaddownload
download
butest
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
Web Data Extraction Como2010
Web Data Extraction Como2010Web Data Extraction Como2010
Web Data Extraction Como2010
Giorgio Orsi
 

Semelhante a DB-IR-ranking (20)

osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edu
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
1645 track 2 pafka
1645 track 2 pafka1645 track 2 pafka
1645 track 2 pafka
 
download
downloaddownload
download
 
download
downloaddownload
download
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Web Data Extraction Como2010
Web Data Extraction Como2010Web Data Extraction Como2010
Web Data Extraction Como2010
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 

Mais de FELIX75

technorati
technoratitechnorati
technorati
FELIX75
 
technorati
technoratitechnorati
technorati
FELIX75
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
FELIX75
 

Mais de FELIX75 (9)

technorati
technoratitechnorati
technorati
 
technorati
technoratitechnorati
technorati
 
php
phpphp
php
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

DB-IR-ranking

  • 1. in collaboration with Georgiana Ifrim, Gjergji Kasneci, Josiane Parreira, Maya Ramanath, Ralf Schenkel, Fabian Suchanek, Martin Theobald
  • 2. DB and IR: Two Parallel Universes canonical application: accounting libraries data type: numbers, short strings text foundation: algebraic / logic based probabilistic / statistics based search paradigm: Boolean retrieval (exact queries, result sets/bags)‏ ranked retrieval (vague queries, result lists)‏ Database Systems Information Retrieval market leaders: Oracle, IBM DB2, MS SQL Server, etc. Google, Yahoo!, MSN, Verity, Fast, etc. parallel universes forever ?
  • 3.
  • 4.
  • 5.
  • 6. Outline • Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 7.
  • 8. DB IR 1990 1995 2000 2005 VAGUE (Motro)‏ Proximal Nodes (Baeza-Yates et al.)‏ WHIRL (Cohen)‏ Prob. Datalog (Fuhr et al.)‏ INEX XPath XPath Full-Text Prob. DB (Cavallo&Pittarelli)‏ Prob. Tuples (Barbara et al.)‏ Web Entity Search: Libra, Avatar, ExDB … Faceted Search: Flamenco … 1st Gen. XML IR: XXL, XIRQL, Elixir, JuruXML Multimedia IR Web Query Languages: W3QS, WebOQL, Araneus … Semistructured Data: Lore, Xyleme … 2nd Gen. XML IR: XRank,Timber, TIJAH, XSearch, FleXPath, CoXML, TopX, MarkLogic, Fast … Uncertain & Prob. Relations: Mystiq, Trio … Struct. Docs Deep Web Search Digital Libraries Graph IR
  • 9.
  • 10. XXL: Early XML IR [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] Which professors from Saarbruecken (SB)‏ are teaching IR and have research projects on XML? Union of heterogeneous sources without global schema Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Professor Name : Gerhard Weikum Address ... City : SB Country : Germany Teaching Research Course Title : IR Description : Information retrieval ... Syllabus ... Book Article ... ... Project Title : Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked retrieval … Literature: … Scientific Name: INEX task coordinator (Initiative for the Evaluation of XML …)‏ Other Sponsor: EU …
  • 11.
  • 12.
  • 13. Outline  Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 14.
  • 15. Commercial Break [Martin Theobald, Ralf Schenkel, GW: VLDB’95] TopX demo today 3:30 – 5:30
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Outline  Past • Future  Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 21. Knowledge Queries Nobel laureate who survived both world wars and his children drama with three women making a prophecy to a British nobleman that he will become king proteins that inhibit both protease and some other enzyme connection between Thomas Mann and Goethe differences in Rembetiko music from Greece and from Turkey neutron stars with Xray bursts > 10 40 erg s -1 & black holes in 10‘‘ market impact of Web2.0 technology in December 2006 sympathy or antipathy for Germany from May to August 2006 Turn the Web, Web2.0, and Web3.0 into the world‘s most comprehensive knowledge base („ semantic DB “) ! Answer „knowledge queries“ such as:
  • 22.
  • 23.
  • 24.
  • 25. High-Quality Knowledge Sources Wikipedia and other lexical sources
  • 26. Exploit Hand-Crafted Knowledge {{Infobox_Scientist | name = Max Planck | birth_date = [[April 23]], [[1858]] | birth_place = [[Kiel]], [[Germany]] | death_date = [[October 4]], [[1947]] | death_place = [[Göttingen]], [[Germany]] | residence = [[Germany]] | nationality = [[Germany|German]] | field = [[Physicist]] | work_institution = [[University of Kiel]]</br> [[Humboldt-Universität zu Berlin]]</br> [[Georg-August-Universität Göttingen]] | alma_mater = [[Ludwig-Maximilians-Universität München]] | doctoral_advisor = [[Philipp von Jolly]] | doctoral_students = [[Gustav Ludwig Hertz]]</br> … | known_for = [[Planck's constant]], [[Quantum mechanics|quantum theory]] | prizes = [[Nobel Prize in Physics]] (1918)‏ … Wikipedia, WordNet, and other lexical sources
  • 27.
  • 28. YAGO Knowledge Representation Entity Max_Planck April 23, 1858 Person City Country subclass Location subclass instanceOf subclass subclass bornOn “ Max Planck” means “ Dr. Planck” means subclass October 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means “ Max Karl Ernst Ludwig Planck” Physicist instanceOf subclass Biologist subclass concepts individuals words Online access and download at http://www.mpi-inf.mpg.de/~suchanek/yago/ Accuracy: 97% Knowledge Base # Facts KnowItAll 30 000 SUMO 60 000 WordNet 200 000 OpenCyc 300 000 Cyc 5 000 000 YAGO 6 000 000
  • 29. NAGA: Graph IR on YAGO [G. Kasneci et al.: WWW‘07] queries with regular expressions Ling $x scientist isa hasFirstName | hasLastName $y Zhejiang locatedIn * worksFor conjunctive queries Beng Chin Ooi (coAuthor | advisor) * Kiel $x scientist isa bornIn Graph-based search on YAGO-style knowledge bases with built-in ranking based on confidence and informativeness  statistical language model for result graphs
  • 30.
  • 31. Information Extraction (IE): Text to Records combine NLP, pattern matching, lexicons, statistical learning Max Planck 4/23, 1858 Kiel Albert Einstein 3/14, 1879 Ulm Mahatma Gandhi 10/2, 1869 Porbandar Person BirthDate BirthPlace ... Person ScientificResult Max Planck Quantum Theory Person Collaborator Max Planck Albert Einstein Max Planck Niels Bohr Planck‘s constant 6.226  10 23 Js Constant Value Dimension
  • 32.
  • 33. Methods for Web-Scale Fact Extration city(Beijing) plays(Coltrane, sax) city(Beijing) old center of Beijing plays(Coltrane, sax) sax player Coltrane city(Beijing) old center of Beijing old center of X plays(Coltrane, sax) sax player Coltrane Y player X Example: city (Seattle) in downtown Seattle city (Seattle) Seattle and other towns city (Las Vegas) Las Vegas and other towns plays (Zappa, guitar) playing guitar: … Zappa plays (Davis, trumpet) Davis … blows trumpet seeds  text  rules  new facts Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y in downtown Beijing city(Beijing)‏ Coltrane blows sax plays(C., sax)‏ Assessment of facts & generation of rules based on statistics Rules can be more sophisticated: playing NN: (ADJ|ADV)* NP & class(NN)=instrument & class(head(NP))=person  plays(head(NP), NN)‏
  • 34. Performance of Web-IE State-of-the-art precision/recall results: Anecdotic evidence: invented (A.G. Bell, telephone)‏ married (Hillary Clinton, Bill Clinton)‏ isa (yoga, relaxation technique)‏ isa ( zearalenone, mycotoxin)‏ contains (chocolate, theobromine)‏ contains (Singapore sling, gin)‏ invented (Johannes Kepler, logarithm tables)‏ married (Segolene Royal, Francois Hollande)‏ isa (yoga, excellent way)‏ isa (your day, good one)‏ contains (chocolate, raisins)‏ plays (the liver, central role)‏ makes (everybody, mistakes)‏ relation precision recall corpus systems countries 80% 90% Web KnowItAll cities 80% ??? Web KnowItAll scientists 60% ??? Web KnowItAll headquarters 90% 50% News Snowball, LEILA birthdates 80% 70% Wikipedia LEILA instanceOf 40% 20% Web Text2Onto, LEILA Open IE 80% ??? Web TextRunner precision value-chain: entities 80%, attributes 70%, facts 60%, events 50%
  • 35.
  • 36.
  • 37.
  • 38. Outline  Past  Future  Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 39. Major Trends in DB and IR malleable schema (later)‏ deep NLP, adding structure record linkage info extraction graph mining entity-relationship graph IR ontologies ranking Database Systems Information Retrieval statistical language models data uncertainty programmability search as Web Service dataspaces Web objects Web 2.0 Web 2.0
  • 40.
  • 41.