Entity linking in advertisements

2. Goals •Identify important entities within the advertisements. •Link them to corresponding wikipedia pages. •Identify relevant concepts in order to disambigute entity.

3. Benefits of Wikipedia •Ever-expanding number of Pages in Corpus Wikipedia •A rigorous structure but with low coverage which emulates real world data very well. •Many number of entities including proper names unlikely to be found in any other collection. •Redirect pages or disambiguation pages.

4. Process Overview •Parser Module - This module parses the the given webpage page and produces two documents namely the Advertisments itself and the Document which will later be used to in the final steps to disambigute results of the search module. •Tokenizer Module - Converts the advertisments into a list of tokens. •POS Tagger Module- It is used for marking up a word in an Ad particular part of speech

5. Process Overview •Parsing Module – Returns advertisements in tree format. •Noun Phrase Extraction Module - Extract NP from the tree generated in the previous process. •Noun Phrase Ranking – Ranks NP using a heuristic function.

6. Process Overview •Entity/Keyword Extraction Module:- Probable entity and keywords are extracted from the highest ranked NP. •Search Module – Returns a list of relevant documents. The seach module is basically a inverted index of the wiki dump. We extract only the titles and summary of the page. •Filtering of results – Finds out most likely/close wiki page.

8. Entity Detection •Basic Technique for entity detection is chunk detection via shallow parsing. •This technique reduces the key-words to be searched in the corpus, improving performance and accuracy.

9. Evaluation and Results •Advertisement: An Apple a day keeps the doctor away Wiki Page: Apple(fruit) •Advertisement: Apple innovates relentlessly to make great products , buy an apple Wiki Page: Apple Corporation •Advertisement: Royal Stag , its your life make it large Wiki Page: royal stag

10. Conclusions • It is possible to use NLP techniques to narrow down list of words to be searched in the search engine. •Context can be extracted from the advertisement itslef using NLP techniques. •The search module gives satifactory results on a simple inverted index created using page titles and summary.

11. References •M. Datar, N. Immorlica, P. Indyk, and V.S. Mirrokni, â€œLocality-sensitive hashing scheme based on p-stable distributions,â€•Symposium on Computational Geometry pp. 253â€“262, 2004. •A.Z. Broder, â€œOn the resemblance and containment of documents,â€• Proc. Compression and Complexity of Sequences, pp. 21â€“29, Positano Italy, 1997 •A. Andoni and P. Indyk, â€œNear-optimal hashing algorithms for approximate nearest neighbor in high dimensions,â€•Comm. ACM 51:1, pp. 117â€“ 122, 2008.

12. Thank You !!

Entity linking in advertisements

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (15)

Semelhante a Entity linking in advertisements

Semelhante a Entity linking in advertisements (20)

Último

Último (20)

Entity linking in advertisements