SlideShare uma empresa Scribd logo
1 de 59
Making the Web Searchable Peter Mika  Senior Researcher and Data Architect Yahoo! Inc.
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Convergence of  Search and Online Media
It used to be pretty simple…
Yahoo! today is a global network of online media sites
... with search as an important entry point to content Information box with content from and links to Yahoo! Travel Points of interest in Vienna, Austria Since Aug, 2010, ‘regular’ search results are ‘Powered by Bing’ Shopping results from  Yahoo! Shopping
Conversely, online media as an entry point to search Hovering over an underlined phrase triggers a search for related news items.
Aggregation across space: hyperlocal pages Hyperlocal: showing content from across Yahoo that is relevant to a particular neighbourhood.
Aggregation across entity types: special events
Personalization Yahoo’s Content Optimization Relevance Engine (CORE) technology uses machine learning to predict click behavior based on user profile Display advertizing is also personalized by default. Users can opt-out of behavioral targeting through AdChoices.
Contextualization Show related content Social discovery: connect with friends watching the same
Convergence of search and online media ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic technologies for Search
Search is really fast, without necessarily being intelligent
State of Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Not just search…
What it’s like to be a machine? Roi Blanco
What it’s like to be a machine?  ✜ Θ ♬♬ţğ   ✜ Θ ♬♬ţğ √∞  ®ÇĤĪ ✜★  ♬☐ ✓✓ ţğ  ★  ✜   ✪✚✜ Δ ΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ Γ ≠ =⅚ ©§ ★✓♪ ΒΓΕ  ℠   ✖ Γ ♫⅜±  ⏎ ↵⏏  ☐ģğğğμλκσςτ   ⏎  ⌥ °¶§ΥΦΦΦ ✗✕ ☐ 
If machines are dumb, how to make their job easier? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Enter the Semantic Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
History of metadata in HTML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
HTML meta tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Microformats (μf) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: the hCard microformat <cite  class=&quot;vcard&quot; > <a  class=&quot;fn url&quot;  rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from  the <span  class=&quot;vcard” > <a  class=&quot;fn org url&quot;  href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a>  </span>.  <div  class=&quot;vcard&quot; >  <a  class=&quot;email fn&quot;  href=&quot;mailto:jfriday@host.com&quot;>Joe Friday</a>  <div  class=&quot;tel&quot; >+1-919-555-7878</div>  <div  class=&quot;title&quot; >Area Administrator, Assistant</div>  </div>
Microformats: limitations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Yahoo! Enhanced Results (was: SearchMonkey) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Google’s Rich Snippets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Facebook’s Like and the Open Graph Protocol ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Facebook’s Open Graph Protocol ,[object Object],[object Object],[object Object],[object Object],<html  xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; >  <head>  <title>The Rock (1996)</title>  <meta  property=&quot;og:title&quot;  content=&quot;The Rock&quot; />  <meta  property=&quot;og:type&quot;  content=&quot;movie&quot; />  <meta  property=&quot;og:url&quot;  content=&quot;http://www.imdb.com/title/tt0117500/&quot; />  <meta  property=&quot;og:image&quot;  content=&quot;http://ia.media-imdb.com/images/rock.jpg&quot; /> … </head> ...
Example: rNews ,[object Object],[object Object],[object Object],[object Object],[object Object]
Microdata ,[object Object],[object Object],[object Object],[object Object],<div  itemscope itemid=“http://www.yahoo.com/resource/person ”> <p>My name is <span  itemprop=&quot;name&quot; >Neil</span>.</p> <p>My band is called  <span  itemprop =&quot;band&quot;>Four Parts Water</span>. I was born on  <time  itemprop=&quot;birthday&quot;  datetime=&quot;2009-05-10&quot;>May 10th 2009</time>. <img  itemprop=&quot;image&quot;  src=”me.png&quot; alt=”me”> </p> </div
Competing formats, competing schemas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
schema.org ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1 st  schema.org workshop (Sept 21, 2011) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Current state of semantic search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
Semantic Search development ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic technologies for Data Integration
Today’s world is a Web of Pages
All these pages come from structured knowledge about people, places, and things MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
This underlying world is WOO—the Web of Objects MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse Sports Entertainment Finance Local Shopping Upcoming MLB team Chicago Cubs isa Chicago Scott Roy Carlos Zambrano 10% off tickets for plays for plays in from
Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team Chicago Cubs isa Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
Knowledge comes from many sources Entities Attributes Show times and other information for US movies from source B Harry Potter and the Deathly Hallows part II Show times Show times for Harry Potter and the Deathly Hallows part II
Combining these requires working with complementary, parallel, and overlapping sources Attributes Entities Cast information for global movies from Wikipedia Cast information for US movies from source A Cast and show time information for global movies from licensed feeds
There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Attributes Entities Information from structured data extraction on billions of Web pages
Semantic technologies for data integration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Components ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WOO ontology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WOO ontology cntd. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Value #1 — Breadth, depth, and accuracy at scale Real entities Dups, errors, and outdated entities Up-to-date correct entities  Incorrect store URL No photo We show many entities we shouldn’t No business hours WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping
Value #2 — Agility launching new experiences Answers instead of links WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB Related knowledge in context The integrated KB lets us show relevant knowledge from one Yahoo property on other properties and off network Emerging markets and tail pages The KB gets us deep into the tail by combining and blending knowledge from many sources
Other potential benefits ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Innovative media companies are moving in this direction Courtesy of Silver Oliver (BBC)
Innovative media companies are moving in this direction Courtesy of Evan Sandhaus (NYT).
Take home: use what works! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The End ,[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaPlatypus
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!Armin Haller
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked DataGabriela Agustini
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Juan Sequeda
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data UsecasesMyungjin Lee
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at YahooPeter Mika
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 

Mais procurados (20)

RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFa
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked Data
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging News
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data Usecases
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Semantic search
Semantic searchSemantic search
Semantic search
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 

Semelhante a Making the Web searchable

Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchablekksst
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorialThengo Kim
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesThanh Tran
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Semantic Web
Semantic WebSemantic Web
Semantic Webhardchiu
 
The Semantic Web An Introduction
The Semantic Web An IntroductionThe Semantic Web An Introduction
The Semantic Web An Introductionshaouy
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.Shyjal Raazi
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadatarobin fay
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & WhyRachael L Moore
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies secondJoseba Abaitua
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Webliddy
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadatalisbk
 

Semelhante a Making the Web searchable (20)

Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchable
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
The Semantic Web An Introduction
The Semantic Web An IntroductionThe Semantic Web An Introduction
The Semantic Web An Introduction
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadata
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & Why
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies second
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadata
 

Mais de Peter Mika

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the WebPeter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pbPeter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin YahooPeter Mika
 

Mais de Peter Mika (9)

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Making the Web searchable

  • 1. Making the Web Searchable Peter Mika Senior Researcher and Data Architect Yahoo! Inc.
  • 2.
  • 3. Convergence of Search and Online Media
  • 4. It used to be pretty simple…
  • 5. Yahoo! today is a global network of online media sites
  • 6. ... with search as an important entry point to content Information box with content from and links to Yahoo! Travel Points of interest in Vienna, Austria Since Aug, 2010, ‘regular’ search results are ‘Powered by Bing’ Shopping results from Yahoo! Shopping
  • 7. Conversely, online media as an entry point to search Hovering over an underlined phrase triggers a search for related news items.
  • 8. Aggregation across space: hyperlocal pages Hyperlocal: showing content from across Yahoo that is relevant to a particular neighbourhood.
  • 9. Aggregation across entity types: special events
  • 10. Personalization Yahoo’s Content Optimization Relevance Engine (CORE) technology uses machine learning to predict click behavior based on user profile Display advertizing is also personalized by default. Users can opt-out of behavioral targeting through AdChoices.
  • 11. Contextualization Show related content Social discovery: connect with friends watching the same
  • 12.
  • 14. Search is really fast, without necessarily being intelligent
  • 15.
  • 17. What it’s like to be a machine? Roi Blanco
  • 18. What it’s like to be a machine?  ✜ Θ ♬♬ţğ   ✜ Θ ♬♬ţğ √∞  ®ÇĤĪ ✜★  ♬☐ ✓✓ ţğ  ★  ✜   ✪✚✜ Δ ΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ Γ ≠ =⅚ ©§ ★✓♪ ΒΓΕ  ℠   ✖ Γ ♫⅜±  ⏎ ↵⏏  ☐ģğğğμλκσςτ   ⏎  ⌥ °¶§ΥΦΦΦ ✗✕ ☐ 
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Example: the hCard microformat <cite class=&quot;vcard&quot; > <a class=&quot;fn url&quot; rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from the <span class=&quot;vcard” > <a class=&quot;fn org url&quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a> </span>. <div class=&quot;vcard&quot; > <a class=&quot;email fn&quot; href=&quot;mailto:jfriday@host.com&quot;>Joe Friday</a> <div class=&quot;tel&quot; >+1-919-555-7878</div> <div class=&quot;title&quot; >Area Administrator, Assistant</div> </div>
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
  • 39.
  • 40. Semantic technologies for Data Integration
  • 41. Today’s world is a Web of Pages
  • 42. All these pages come from structured knowledge about people, places, and things MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 43. This underlying world is WOO—the Web of Objects MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 44. Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse Sports Entertainment Finance Local Shopping Upcoming MLB team Chicago Cubs isa Chicago Scott Roy Carlos Zambrano 10% off tickets for plays for plays in from
  • 45. Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team Chicago Cubs isa Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 46. Knowledge comes from many sources Entities Attributes Show times and other information for US movies from source B Harry Potter and the Deathly Hallows part II Show times Show times for Harry Potter and the Deathly Hallows part II
  • 47. Combining these requires working with complementary, parallel, and overlapping sources Attributes Entities Cast information for global movies from Wikipedia Cast information for US movies from source A Cast and show time information for global movies from licensed feeds
  • 48. There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Attributes Entities Information from structured data extraction on billions of Web pages
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. Value #1 — Breadth, depth, and accuracy at scale Real entities Dups, errors, and outdated entities Up-to-date correct entities Incorrect store URL No photo We show many entities we shouldn’t No business hours WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping
  • 54. Value #2 — Agility launching new experiences Answers instead of links WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB Related knowledge in context The integrated KB lets us show relevant knowledge from one Yahoo property on other properties and off network Emerging markets and tail pages The KB gets us deep into the tail by combining and blending knowledge from many sources
  • 55.
  • 56. Innovative media companies are moving in this direction Courtesy of Silver Oliver (BBC)
  • 57. Innovative media companies are moving in this direction Courtesy of Evan Sandhaus (NYT).
  • 58.
  • 59.

Notas do Editor

  1. Everything is search: search and online media are converging businesses
  2. Yahoo serves over 600 million users in 25 countries 38% of O&amp;O revenue from search advertizing, 53% from display advertizing, 9% from listings and other marketing services (Q3 2010)
  3. Search is a form of content aggregation
  4. Improvements in search are harder and harder to come by…. The current search paradigm reached a plateau: we have solved large classes of queries, and what remains is difficult to solve in the current paradigm.
  5. With ads, the situation is even worse due to the sparsity problem. Note how poor the ads are…
  6. This is how a human sees the world.
  7. This is how a machine sees the world… Machines are not ‘intelligent’ and can not ‘read’… they just see a string of symbols and try to match the users input to that stream.
  8. However, we can make the job of the machine easier by giving some hints…
  9. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
  10. Facebook invited, but continues to pursue OGP
  11. Publisher: schema.org enable your website, publish Linked Data Developer: build standard APIs using Linked Data technology