SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Semantic Search Jargon – a short guide
Mihai Lupu
TU Wien / RSA Data Science
mihai.lupu@researchstudio.at
“Semantic”
â–Ș adjective
– dictionary.com: of, relating to, or arising from the different meanings of
words or other symbols
– Merriam-Webster: of or relating to the meanings of words and phrases
– Cambridge: connected with the meanings of words
– Oxford: connected with the meaning of words and sentences
They are among us
A human characteristic
Counting words (aka Statistics)

 semantics
The geometric metaphor of meaning
“Meanings are locations in a semantic
space, and semantic similarity is proximity
between the locations”
(Sahlgren, 2006)
Hans Peter Luhn
and others
pure counting
term frequency
position in sentence
SMART
IDF
cosine similarity
and many more
195
196
197
198
199
200
201
202
from counting to predicting
Latent
Semantic
Analysis
Random
Indexing
WWW
appears
Semantic
Web
appears
Deep
Learning
Speech
Vision
NLP
IR
The Golden Age of
Artificial Intelligence Expert Systems,
Knowledge
bases (e.g. Cyc)
Inference
on billions
of tuples
on trillions
Probabilistic
models for IR
Language Models
where are we now?
â–Ș Inference directly from text
â–Ș [Bowman et al. 2016]
A man rides a bike on
a snow covered road
A man is outside
2 female babies
eating chips
Two female babies are
enjoying chips
A man in an apron
shopping at a market
A man in an apron is
preparing dinner
Model %
Accur
acy
Feature-based classifier 78.2
Previous SOTA sentence
encoder [Mou et al. 2016]
82.1
LSTN RNN sequence model 80.6
Tree LSTM 80.9
SPINN 83.2
SOTA (sentence pair
alignment model) [Parikh et
al. 2016]
86.8
where are we now?
â–Ș Inference directly from text
â–Ș [Bowman et al. 2016]
A man rides a bike on
a snow covered road
A man is outside
2 female babies
eating chips
Two female babies are
enjoying chips
A man in an apron
shopping at a market
A man in an apron is
preparing dinner
Model %
Accur
acy
Feature-based classifier 78.2
Previous SOTA sentence
encoder [Mou et al. 2016]
82.1
LSTN RNN sequence model 80.6
Tree LSTM 80.9
SPINN 83.2
SOTA (sentence pair
alignment model) [Parikh et
al. 2016]
86.8
Particular success cases:
Negation:
- The rhythmic gymnast completes her floor exercise at the competition
- The gymnast cannot finish her exercise
Long examples (>20 words):
- A man wearing glasses and a ragged costume is playing a Jaguar electric
guitar and singing with the accompaniment of a drummer
- A man with glasses and a disheveled outfit is playing a guitar and singing
along with a drummer.
Where are we for patents?
â–Ș Latent Semantic Indexing
– Some commercial systems claim
to use it
â–Ș “Latent semantic analysis uses
sophisticated statistical
analysis of language to search
on concepts, not just words, to
help you find those documents
- even if they don't contain any
of the words you used in your
search”
– Minimal improvements found in
experiments
â–Ș [Moldovan:2005]
Random Indexing
â–Ș Initial experiments using the Semantic Vectors package
– Unsatisfactory results for document similarity
– Noticeably good results for term similarity
Term vectors
Document vectors
[Lupu et al.:2013]
Random Indexing
â–Ș Initial experiments using the Semantic Vectors package
– Unsatisfactory results for document similarity
– Noticeably good results for term similarity
Term vectors
Document vectors
1.0:coatings
0.9999339:rubs
0.9999338:coating
0.9999328:acrylics
0.9999271:vinyls
0.9999268:cratering
0.9999251:distinctness
0.9999246:blistering
0.9999235:pompano
0.9999234:cyanamid
1.0:crystal
0.9999378:cyrstal
0.9999305:crytal
0.9999022:nicol // a type of prism
0.9999014:jjap
0.9999006:nicols
0.9998996:nematic // a type of liquid crystal
0.9998943:uniaxial //minerals that form crystals used in optics
0.9998894:cb15 //a particular liquid crystal
0.9998887:anisotropy
1.0:crystals
0.9998632:supersaturation
0.9998519:crystallizing
0.9998281:supersaturated
0.9998213:crys
0.9998193:purer
0.9998166:soda
0.9998120:crystallize
0.9998105:crystallizers
0.9998081:tals
[Lupu et al.:2013]
[Rekabsaz et al.:2016]
CLEF-IP patent collection
looks like we have a problem
Why
words are too simple and documents
are too large
documents are too large
Particular success cases:
Negation:
- The rhythmic gymnast completes her floor exercise at the competition
- The gymnast cannot finish her exercise
Long examples (>20 words):
- A man wearing glasses and a ragged costume is playing a Jaguar electric
guitar and singing with the accompaniment of a drummer
- A man with glasses and a disheveled outfit is playing a guitar and singing
along with a drummer.
words are too simple
“In a railroad car truck, a windowed side frame, a bolster extending
through the window, a wedge pocket in said bolster having an
upwardly and outwardly inclined floor in opposition to a vertical
wear surface on the side frame, a stabilizing wedge in the pocket
having a vertical friction surface in contact with the wear surface on
the side frame and an inclined wedging surface in opposition to the
floor of the pocket, a removable wear plate inset in a recess In said
inclined floor, said recess having a horizontal lower edge, said wear
plate having an inclined lower edge formed and adapted to engage
and be supported on said horizontal lower edge of said recess, said
wear plate being held in said recess by a weldment located
between the upper edge of said recess and the lower edge of said
wear plate, and, a spring biasing the wedge upwardly against the
removable wear plate to cam the wedge laterally against the wear
surface on the side frame.”
How much is the patent corpus covered by the CELEX
lexical database?
[Verberne et al., 2010]
Patent data COBUILD corpus
Tokens 96% 92%
Types 55% (?)
What to do?Research Evaluation
words are too simple
Query Generation [Andersson:2016]
– Baseline, NLP:(word, phrases) and Statistically:(unigram, bigram)
– Section Claims or entire document
– Termhood
â–Ș Experiment to learn termhoodness, two sample sets:
– 637 with C-value and 4,400 without C-value
â–Ș upper boundary (manual list) versus machine learning
â–Ș Skip-gram versus exact phrase,
â–Ș Technical terms versus or non-technical
Continuous and objective evaluation
Search Engine
Effectiveness
Test
Artificial Intelligence - Will it ever come?
a machine will pass the Turing test by 2029
(Kurzweil 1999, pp. 189-235.)
* The Turing Test does not
specify the use of patents
in the conversation
Thank you
Glossary
â–Ș CBOW Continuous Bag-of-Words
â–Ș DBPedia Automatically extracted knowledge resource from Wikipedia
â–Ș dimensionality reduction Any procedure that takes as input a vector of size N and outputs a vector of size
M<N
â–Ș feed-forward a particular type of neural network, which does not contain cycles between its neurons
â–Ș hypernym a term denoting a broader category than another
â–Ș hyponym a term denoting a narrower category than another
â–Ș LOD Linked Open Data
â–Ș LSA Latent Semantic Analysis
â–Ș LSI Latent Semantic Indexing
â–Ș LSTM Long Short Term Memory
â–Ș matrix decomposition a mathematical procedure to represent a matrix as the product of two or more
matrices
â–Ș matrix factorization matrix decomposition
â–Ș neural networks an algorithmic model (loosely) simulating brain structures
â–Ș ontology (here) a knowledge representation resource
â–Ș OWL Web Ontology Language
â–Ș PCA Principal Component Analysis
â–Ș PMI Pointwise Mutual Information
â–Ș RDF Resource Description Framework
â–Ș recurrent nn a particular type of neural network, which contains cycles between its neurons
â–Ș RI Random Indexing
â–Ș skip-grams method to predict a context from a word
â–Ș SVD Singular Value Decomposition
â–Ș WordNet a large lexical database of English

Mais conteĂșdo relacionado

Mais procurados

Extracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic IpsicExtracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic Ipsic
Institute of Contemporary Sciences
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 

Mais procurados (18)

Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
 
Text mining
Text miningText mining
Text mining
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Gaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyGaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive Technology
 
Extracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic IpsicExtracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic Ipsic
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
 
Week12
Week12Week12
Week12
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World Settings
 
Irmac presentation for website
Irmac presentation for websiteIrmac presentation for website
Irmac presentation for website
 
Text Mining and Visualization
Text Mining and VisualizationText Mining and Visualization
Text Mining and Visualization
 

Semelhante a II-SDV 2017: Semantic Search Jargon - A short Guide

IEEE P2P 2011 Flexible Routing Tables
IEEE P2P 2011 Flexible Routing TablesIEEE P2P 2011 Flexible Routing Tables
IEEE P2P 2011 Flexible Routing Tables
Hiroya Nagao
 
Lecture 2 Hierarchy of NLP & TF-IDF.pptx
Lecture 2 Hierarchy of NLP & TF-IDF.pptxLecture 2 Hierarchy of NLP & TF-IDF.pptx
Lecture 2 Hierarchy of NLP & TF-IDF.pptx
KunalSingh560957
 
ìŠŹëŒìŽë“œ 1
ìŠŹëŒìŽë“œ 1ìŠŹëŒìŽë“œ 1
ìŠŹëŒìŽë“œ 1
butest
 

Semelhante a II-SDV 2017: Semantic Search Jargon - A short Guide (20)

Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked Data
 
IEEE P2P 2011 Flexible Routing Tables
IEEE P2P 2011 Flexible Routing TablesIEEE P2P 2011 Flexible Routing Tables
IEEE P2P 2011 Flexible Routing Tables
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Lecture 2 Hierarchy of NLP & TF-IDF.pptx
Lecture 2 Hierarchy of NLP & TF-IDF.pptxLecture 2 Hierarchy of NLP & TF-IDF.pptx
Lecture 2 Hierarchy of NLP & TF-IDF.pptx
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Knowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesKnowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with Frames
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
ìŠŹëŒìŽë“œ 1
ìŠŹëŒìŽë“œ 1ìŠŹëŒìŽë“œ 1
ìŠŹëŒìŽë“œ 1
 
ÖrĂŒntĂŒ tanıma - Pattern Recognition
ÖrĂŒntĂŒ tanıma - Pattern RecognitionÖrĂŒntĂŒ tanıma - Pattern Recognition
ÖrĂŒntĂŒ tanıma - Pattern Recognition
 
Why Semantic Search Is Hard
Why Semantic Search Is HardWhy Semantic Search Is Hard
Why Semantic Search Is Hard
 

Mais de Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mais de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about
? Looney Tunes¼ Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Último

Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
SUHANI PANDEY
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
singhpriety023
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
SUHANI PANDEY
 
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRLLucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
imonikaupta
 
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
SUHANI PANDEY
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
đ“€€Call On 7877925207 đ“€€ Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
đ“€€Call On 7877925207 đ“€€ Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...đ“€€Call On 7877925207 đ“€€ Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
đ“€€Call On 7877925207 đ“€€ Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎ 9205541914 ☎ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎ 9205541914 ☎ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎ 9205541914 ☎ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎ 9205541914 ☎ Independent Esc...
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRLLucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
Lucknow ❀CALL GIRL 88759*99948 ❀CALL GIRLS IN Lucknow ESCORT SERVICE❀CALL GIRL
 
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
All Time Service Available Call Girls Mg Road 👌 ⏭ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭ 6378878445
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 

II-SDV 2017: Semantic Search Jargon - A short Guide

  • 1. Semantic Search Jargon – a short guide Mihai Lupu TU Wien / RSA Data Science mihai.lupu@researchstudio.at
  • 2. “Semantic” â–Ș adjective – dictionary.com: of, relating to, or arising from the different meanings of words or other symbols – Merriam-Webster: of or relating to the meanings of words and phrases – Cambridge: connected with the meanings of words – Oxford: connected with the meaning of words and sentences
  • 5. Counting words (aka Statistics)
  • 7. The geometric metaphor of meaning “Meanings are locations in a semantic space, and semantic similarity is proximity between the locations” (Sahlgren, 2006)
  • 9. and others pure counting term frequency position in sentence SMART IDF cosine similarity and many more 195 196 197 198 199 200 201 202 from counting to predicting Latent Semantic Analysis Random Indexing WWW appears Semantic Web appears Deep Learning Speech Vision NLP IR The Golden Age of Artificial Intelligence Expert Systems, Knowledge bases (e.g. Cyc) Inference on billions of tuples on trillions Probabilistic models for IR Language Models
  • 10. where are we now? â–Ș Inference directly from text â–Ș [Bowman et al. 2016] A man rides a bike on a snow covered road A man is outside 2 female babies eating chips Two female babies are enjoying chips A man in an apron shopping at a market A man in an apron is preparing dinner Model % Accur acy Feature-based classifier 78.2 Previous SOTA sentence encoder [Mou et al. 2016] 82.1 LSTN RNN sequence model 80.6 Tree LSTM 80.9 SPINN 83.2 SOTA (sentence pair alignment model) [Parikh et al. 2016] 86.8
  • 11. where are we now? â–Ș Inference directly from text â–Ș [Bowman et al. 2016] A man rides a bike on a snow covered road A man is outside 2 female babies eating chips Two female babies are enjoying chips A man in an apron shopping at a market A man in an apron is preparing dinner Model % Accur acy Feature-based classifier 78.2 Previous SOTA sentence encoder [Mou et al. 2016] 82.1 LSTN RNN sequence model 80.6 Tree LSTM 80.9 SPINN 83.2 SOTA (sentence pair alignment model) [Parikh et al. 2016] 86.8 Particular success cases: Negation: - The rhythmic gymnast completes her floor exercise at the competition - The gymnast cannot finish her exercise Long examples (>20 words): - A man wearing glasses and a ragged costume is playing a Jaguar electric guitar and singing with the accompaniment of a drummer - A man with glasses and a disheveled outfit is playing a guitar and singing along with a drummer.
  • 12. Where are we for patents? â–Ș Latent Semantic Indexing – Some commercial systems claim to use it â–Ș “Latent semantic analysis uses sophisticated statistical analysis of language to search on concepts, not just words, to help you find those documents - even if they don't contain any of the words you used in your search” – Minimal improvements found in experiments â–Ș [Moldovan:2005]
  • 13. Random Indexing â–Ș Initial experiments using the Semantic Vectors package – Unsatisfactory results for document similarity – Noticeably good results for term similarity Term vectors Document vectors [Lupu et al.:2013]
  • 14. Random Indexing â–Ș Initial experiments using the Semantic Vectors package – Unsatisfactory results for document similarity – Noticeably good results for term similarity Term vectors Document vectors 1.0:coatings 0.9999339:rubs 0.9999338:coating 0.9999328:acrylics 0.9999271:vinyls 0.9999268:cratering 0.9999251:distinctness 0.9999246:blistering 0.9999235:pompano 0.9999234:cyanamid 1.0:crystal 0.9999378:cyrstal 0.9999305:crytal 0.9999022:nicol // a type of prism 0.9999014:jjap 0.9999006:nicols 0.9998996:nematic // a type of liquid crystal 0.9998943:uniaxial //minerals that form crystals used in optics 0.9998894:cb15 //a particular liquid crystal 0.9998887:anisotropy 1.0:crystals 0.9998632:supersaturation 0.9998519:crystallizing 0.9998281:supersaturated 0.9998213:crys 0.9998193:purer 0.9998166:soda 0.9998120:crystallize 0.9998105:crystallizers 0.9998081:tals [Lupu et al.:2013]
  • 17. looks like we have a problem
  • 18. Why words are too simple and documents are too large
  • 19. documents are too large Particular success cases: Negation: - The rhythmic gymnast completes her floor exercise at the competition - The gymnast cannot finish her exercise Long examples (>20 words): - A man wearing glasses and a ragged costume is playing a Jaguar electric guitar and singing with the accompaniment of a drummer - A man with glasses and a disheveled outfit is playing a guitar and singing along with a drummer.
  • 20. words are too simple “In a railroad car truck, a windowed side frame, a bolster extending through the window, a wedge pocket in said bolster having an upwardly and outwardly inclined floor in opposition to a vertical wear surface on the side frame, a stabilizing wedge in the pocket having a vertical friction surface in contact with the wear surface on the side frame and an inclined wedging surface in opposition to the floor of the pocket, a removable wear plate inset in a recess In said inclined floor, said recess having a horizontal lower edge, said wear plate having an inclined lower edge formed and adapted to engage and be supported on said horizontal lower edge of said recess, said wear plate being held in said recess by a weldment located between the upper edge of said recess and the lower edge of said wear plate, and, a spring biasing the wedge upwardly against the removable wear plate to cam the wedge laterally against the wear surface on the side frame.” How much is the patent corpus covered by the CELEX lexical database? [Verberne et al., 2010] Patent data COBUILD corpus Tokens 96% 92% Types 55% (?)
  • 21. What to do?Research Evaluation
  • 22. words are too simple Query Generation [Andersson:2016] – Baseline, NLP:(word, phrases) and Statistically:(unigram, bigram) – Section Claims or entire document – Termhood â–Ș Experiment to learn termhoodness, two sample sets: – 637 with C-value and 4,400 without C-value â–Ș upper boundary (manual list) versus machine learning â–Ș Skip-gram versus exact phrase, â–Ș Technical terms versus or non-technical
  • 23. Continuous and objective evaluation Search Engine Effectiveness Test
  • 24. Artificial Intelligence - Will it ever come? a machine will pass the Turing test by 2029 (Kurzweil 1999, pp. 189-235.) * The Turing Test does not specify the use of patents in the conversation
  • 26. Glossary â–Ș CBOW Continuous Bag-of-Words â–Ș DBPedia Automatically extracted knowledge resource from Wikipedia â–Ș dimensionality reduction Any procedure that takes as input a vector of size N and outputs a vector of size M<N â–Ș feed-forward a particular type of neural network, which does not contain cycles between its neurons â–Ș hypernym a term denoting a broader category than another â–Ș hyponym a term denoting a narrower category than another â–Ș LOD Linked Open Data â–Ș LSA Latent Semantic Analysis â–Ș LSI Latent Semantic Indexing â–Ș LSTM Long Short Term Memory â–Ș matrix decomposition a mathematical procedure to represent a matrix as the product of two or more matrices â–Ș matrix factorization matrix decomposition â–Ș neural networks an algorithmic model (loosely) simulating brain structures â–Ș ontology (here) a knowledge representation resource â–Ș OWL Web Ontology Language â–Ș PCA Principal Component Analysis â–Ș PMI Pointwise Mutual Information â–Ș RDF Resource Description Framework â–Ș recurrent nn a particular type of neural network, which contains cycles between its neurons â–Ș RI Random Indexing â–Ș skip-grams method to predict a context from a word â–Ș SVD Singular Value Decomposition â–Ș WordNet a large lexical database of English