SlideShare uma empresa Scribd logo
1 de 12
Entity linking in Advertisements
Team: Mentor:
Rounak Patni Pulkit Goel
Kumar Rishabh
Rohit Jain
Siva kumar
Goals
•Identify important entities within the
advertisements.
•Link them to corresponding wikipedia pages.
•Identify relevant concepts in order to
disambigute entity.
Benefits of Wikipedia
•Ever-expanding number of Pages in Corpus
Wikipedia
•A rigorous structure but with low coverage
which emulates real world data very well.
•Many number of entities including proper
names unlikely to be found in any other
collection.
•Redirect pages or disambiguation pages.
Process Overview
•Parser Module - This module parses the the
given webpage page and produces two
documents namely the Advertisments itself and
the Document which will later be used to in the
final steps to disambigute results of the search
module.
•Tokenizer Module - Converts the
advertisments into a list of tokens.
•POS Tagger Module- It is used for marking up a
word in an Ad particular part of speech
Process Overview
•Parsing Module – Returns advertisements in
tree format.
•Noun Phrase Extraction Module - Extract NP
from the tree generated in the previous process.
•Noun Phrase Ranking – Ranks NP using a
heuristic function.
Process Overview
•Entity/Keyword Extraction Module:- Probable
entity and keywords are extracted from the
highest ranked NP.
•Search Module – Returns a list of relevant
documents. The seach module is basically a
inverted index of the wiki dump. We extract only
the titles and summary of the page.
•Filtering of results – Finds out most likely/close
wiki page.
Entity Detection
•Basic Technique for entity detection is chunk
detection via shallow parsing.
•This technique reduces the key-words to be
searched in the corpus, improving performance
and accuracy.
Evaluation and Results
•Advertisement: An Apple a day keeps the
doctor away Wiki Page: Apple(fruit)
•Advertisement: Apple innovates relentlessly to
make great products , buy an apple Wiki Page:
Apple Corporation
•Advertisement: Royal Stag , its your life make it
large Wiki Page: royal stag
Conclusions
• It is possible to use NLP techniques to narrow
down list of words to be searched in the search
engine.
•Context can be extracted from the
advertisement itslef using NLP techniques.
•The search module gives satifactory results on a
simple inverted index created using page titles
and summary.
References
•M. Datar, N. Immorlica, P. Indyk, and V.S. Mirrokni, “Locality-sensitive
hashing scheme based on p-stable distributions,―Symposium on
Computational Geometry pp. 253–262, 2004.
•A.Z. Broder, “On the resemblance and containment of documents,―
Proc. Compression and Complexity of Sequences, pp. 21–29, Positano Italy,
1997
•A. Andoni and P. Indyk, “Near-optimal hashing algorithms for
approximate nearest neighbor in high dimensions,―Comm. ACM
51:1, pp. 117– 122, 2008.
Thank You !!

Mais conteúdo relacionado

Destaque

High Point Market KPI Presentation
High Point Market KPI PresentationHigh Point Market KPI Presentation
High Point Market KPI Presentation
Ashley Dombrowski MS
 
Product Line and Depth of Pepsico
Product Line and Depth of PepsicoProduct Line and Depth of Pepsico
Product Line and Depth of Pepsico
Shreya Prabhu
 

Destaque (15)

Rektorslyftet slutredovisning
Rektorslyftet slutredovisningRektorslyftet slutredovisning
Rektorslyftet slutredovisning
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
運用でSSHログインしなければいけないのは◯◯力不足
運用でSSHログインしなければいけないのは◯◯力不足運用でSSHログインしなければいけないのは◯◯力不足
運用でSSHログインしなければいけないのは◯◯力不足
 
High Point Market KPI Presentation
High Point Market KPI PresentationHigh Point Market KPI Presentation
High Point Market KPI Presentation
 
20160916 aws premier-night-v2-public
20160916 aws premier-night-v2-public20160916 aws premier-night-v2-public
20160916 aws premier-night-v2-public
 
20160929 serverless-conf-osaka
20160929 serverless-conf-osaka20160929 serverless-conf-osaka
20160929 serverless-conf-osaka
 
Tugas softskill standar akuntansi singapura
Tugas softskill standar akuntansi singapuraTugas softskill standar akuntansi singapura
Tugas softskill standar akuntansi singapura
 
Plasmid mediated quinolone resistance
Plasmid mediated quinolone resistancePlasmid mediated quinolone resistance
Plasmid mediated quinolone resistance
 
運用でSSHログインしなければいけないのは◯◯力不足
運用でSSHログインしなければいけないのは◯◯力不足運用でSSHログインしなければいけないのは◯◯力不足
運用でSSHログインしなければいけないのは◯◯力不足
 
Digital Jewellery compiled by Anshika Nigam
Digital Jewellery compiled by Anshika NigamDigital Jewellery compiled by Anshika Nigam
Digital Jewellery compiled by Anshika Nigam
 
Product Line and Depth of Pepsico
Product Line and Depth of PepsicoProduct Line and Depth of Pepsico
Product Line and Depth of Pepsico
 
Nimalox re-branding marketing plan
Nimalox re-branding marketing planNimalox re-branding marketing plan
Nimalox re-branding marketing plan
 
101007314 di-cc-ionary
101007314 di-cc-ionary101007314 di-cc-ionary
101007314 di-cc-ionary
 
Smart eco hub open day 1
Smart eco hub open day 1Smart eco hub open day 1
Smart eco hub open day 1
 
206452716 hr-ll-case-studies
206452716 hr-ll-case-studies206452716 hr-ll-case-studies
206452716 hr-ll-case-studies
 

Semelhante a Entity linking in advertisements

Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
Sai Ganesh
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Joel Pinho Lucas
 
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
IIBA Rochester NY
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 

Semelhante a Entity linking in advertisements (20)

Student Industrial Training Presentation Slide
Student Industrial Training Presentation SlideStudent Industrial Training Presentation Slide
Student Industrial Training Presentation Slide
 
Web services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsWeb services and the Development of Semantic Applications
Web services and the Development of Semantic Applications
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
 
How search engine work ppt
How search engine work pptHow search engine work ppt
How search engine work ppt
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
 
Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and Powerful
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
Intresting changes in mongo 2.6
Intresting changes in mongo 2.6Intresting changes in mongo 2.6
Intresting changes in mongo 2.6
 
Enterprise Search Case Study: SpareBank1 Gruppen
Enterprise Search Case Study: SpareBank1 GruppenEnterprise Search Case Study: SpareBank1 Gruppen
Enterprise Search Case Study: SpareBank1 Gruppen
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 

Último

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 

Entity linking in advertisements

  • 1. Entity linking in Advertisements Team: Mentor: Rounak Patni Pulkit Goel Kumar Rishabh Rohit Jain Siva kumar
  • 2. Goals •Identify important entities within the advertisements. •Link them to corresponding wikipedia pages. •Identify relevant concepts in order to disambigute entity.
  • 3. Benefits of Wikipedia •Ever-expanding number of Pages in Corpus Wikipedia •A rigorous structure but with low coverage which emulates real world data very well. •Many number of entities including proper names unlikely to be found in any other collection. •Redirect pages or disambiguation pages.
  • 4. Process Overview •Parser Module - This module parses the the given webpage page and produces two documents namely the Advertisments itself and the Document which will later be used to in the final steps to disambigute results of the search module. •Tokenizer Module - Converts the advertisments into a list of tokens. •POS Tagger Module- It is used for marking up a word in an Ad particular part of speech
  • 5. Process Overview •Parsing Module – Returns advertisements in tree format. •Noun Phrase Extraction Module - Extract NP from the tree generated in the previous process. •Noun Phrase Ranking – Ranks NP using a heuristic function.
  • 6. Process Overview •Entity/Keyword Extraction Module:- Probable entity and keywords are extracted from the highest ranked NP. •Search Module – Returns a list of relevant documents. The seach module is basically a inverted index of the wiki dump. We extract only the titles and summary of the page. •Filtering of results – Finds out most likely/close wiki page.
  • 7.
  • 8. Entity Detection •Basic Technique for entity detection is chunk detection via shallow parsing. •This technique reduces the key-words to be searched in the corpus, improving performance and accuracy.
  • 9. Evaluation and Results •Advertisement: An Apple a day keeps the doctor away Wiki Page: Apple(fruit) •Advertisement: Apple innovates relentlessly to make great products , buy an apple Wiki Page: Apple Corporation •Advertisement: Royal Stag , its your life make it large Wiki Page: royal stag
  • 10. Conclusions • It is possible to use NLP techniques to narrow down list of words to be searched in the search engine. •Context can be extracted from the advertisement itslef using NLP techniques. •The search module gives satifactory results on a simple inverted index created using page titles and summary.
  • 11. References •M. Datar, N. Immorlica, P. Indyk, and V.S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,―Symposium on Computational Geometry pp. 253–262, 2004. •A.Z. Broder, “On the resemblance and containment of documents,― Proc. Compression and Complexity of Sequences, pp. 21–29, Positano Italy, 1997 •A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,―Comm. ACM 51:1, pp. 117– 122, 2008.