Semantic Search_ NLP_ ML.pdf

Semantic Search,
Machine Learning & AI in
the latest Google
algorithms
More than 80 international certificates
Over 40 lectures in Bulgaria
More than 10 lectures abroad
Over 15 years of professional experience
More than 20 interviews for Bulgarian and international medias
More than 20 interviews for Bulgarian and international medias
The only Bulgarian SEO agency with a presentation at a Google event
The only Bulgarian SEO agency officially accredited by Stone Temple
Nominees at the Europe Search Awards 2019
ABOUT SERPACT
WHAT IS SEMANTIC SEARCH?
WHAT IS MACHINE LEARNING
AND WHY SEARCH ENGINES
USE IT?
FIND PATTERNS FOR URLS AND
PAGE CONTENT
Multiple outbound links to
unrelated pages
Excessive use of the same keywords
Excessive use of synonyms
Overoptimization of anchor texts
Other similar variables.
ANALYSIS OF SEARCH AND
CLASSIFICATION PHRASES
One of the best applications of machine learning
algorithms is the classification of search phrases
and, accordingly, index documents based on the
user’s search intent.
As we know, phrases can generally be –
information, navigation and transactional.
By analyzing click patterns and the type of
content that users engage with (e.g., CTR by
content type), a search engine can use machine
learning to determine user intent.
Identification of synonyms
When you see search results
that don’t include the keyword
in a snippet, it’s likely that
Google uses machine learning to
identify synonyms.
Better identification of word connections
Google’s purpose for synonyms is to understand the context and meaning of a page.
Creating content in a clear and consistent sense is much more important than
spamming a page with keywords and synonyms.
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
CUSTOM ALERTS BASED ON
SPECIFIC REQUESTS
Machine learning algorithms can put more weight on variables in some queries than
others. The search engine “learns” about the preferences of a particular user and can
base its information on past queries to present the most interesting information.
Overall, according to consumer research, personalized phrases through machine
learning have increased the clickthrough rate (CTR) of results by about 10%.
IDENTIFYING NEW ALERTS
According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps
identify patterns in queries but also helps the search engine identify possible new
ranking signals. These alerts are being sought so that Google can continue to
improve the quality of search results.
WHAT IS NATURAL LANGUAGE
PROCESSING -
OR HOW SEARCH ENGINES UNDERSTAND
OUR CONTENT?
Tasks of NLP
Tokenization, Lemmatization, Stemming
Sentence boundary detection
Part-of-speech tagging
Syntax parsing - Dependency parsing & Constituency parsing
Semantic role labeling
Semantic dependency parsing
Word sense disambiguation/induction
Named-entity recognition/classification
Entity linking
Temporal expression recognition/normalization
Co-reference resolution
Information extraction
Terminology extraction
Topic modeling
Attributional similarity (word similarity)
Relational similarity
Phrase similarity
Sentence similarity
Paraphrase identification
Textual entailment
Natural language generation
Speech recognition
Speech synthesis
Ontology population
Question answering
Machine translation
Text coherence
Fake news detection
Tasks of NLP
Serpact Ltd. | AffiliateCon Sofia 2019
How Search Engines Like
Google Process The Content
Today?
Semantic Search_ NLP_ ML.pdf
Text Pre-Processing
Noise Removal
Lexicon Normalization
Stemming: Stemming is a rudimentary rule-based
process of stripping the suffixes (“ing”, “ly”, “es”, “s”
etc) from a word.
Lemmatization: Lemmatization, on the other hand, is
an organized & step by step procedure of obtaining
the root form of the word, it makes use of
vocabulary and morphological analysis (word
structure and grammar relations).
Object Standardization - acronyms, hashtags with
attached words, and colloquial slangs
Normalization and Lemmatization: POS tags are the
basis of lemmatization process for converting a word
to its base form (lemma).
Efficient stopword removal : P OS tags are also useful
in efficient removal of stopwords.
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Dependency Trees – Sentences are composed of some words sewed together. The relationship among the
words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of
syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items
(words).
Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with
a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function
of a word in the sentence.
Word sense disambiguation: Some language words have multiple meanings according to their usage. For
example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the
flight”
Serpact Ltd. | AffiliateCon Sofia 2019
Entity Extraction
Named Entity Recognition (NER)
Noun phrase identification
Phrase classification
Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on
top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge
graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
Serpact Ltd. | AffiliateCon Sofia 2019
Topic Modelling - Latent Dirichlet
Allocation (LDA)
Topic modeling is a process of automatically
identifying the topics present in a text corpus, it
derives the hidden patterns among the words in the
corpus in an unsupervised manner.
Topics are defined as “a repeating pattern of co-
occurring terms in a corpus”. A good topic model
results in – “health”, “doctor”, “patient”, “hospital” for a
topic – Healthcare, and “farm”, “crops”, “wheat” for a
topic – “Farming”.
Bag of Words
Is a commonly used model that allows you to count all words in a piece of text.
Basically it creates an occurrence matrix for the sentence or document,
disregarding grammar and word order. These word frequencies or occurrences
are then used as features for training a classifier.
Serpact Ltd. | AffiliateCon Sofia 2019
N-Grams as Features
A combination of N words together are called N-Grams. N grams (N > 1) are
generally more informative as compared to words (Unigrams) as features. Also,
bigrams (N = 2) are considered as the most important features of all the others.
Statistical Features
Term Frequency – Inverse Document Frequency (TF – IDF)
Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D”
Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total
documents available in the corpus and number of documents containing the term T.
Count / Density / Readability Features - Count or Density based features can also be used in
models and analysis. These features might seem trivial but shows a great impact in learning models.
Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry
specific word counts.
Serpact Ltd. | AffiliateCon Sofia 2019
Word Embedding (text vectors)
Word embedding is the modern way of representing words as vectors. The aim of word
embedding is to redefine the high dimensional word features into low dimensional feature
vectors by preserving the contextual similarity in the corpus.
They are widely used in deep learning models such as Convolutional Neural Networks and
Recurrent Neural Networks.
Serpact Ltd. | AffiliateCon Sofia 2019
Final Result
Text Classification - Email Spam Identification, topic classification of news, sentiment classification
and organization of web pages by search engines.
Text Matching / Similarity
Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name,
location name etc) and produces a character string that identifies a set of words that are (roughly)
phonetically similar.
Flexible String Matching – A complete text matching system includes different algorithms pipelined
together to compute variety of text variations. (exact string matching, lemmatized matching, and
compact matching (takes care of spaces, punctuation’s, slangs etc).
Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can
also be applied in order to measure vectorized similarity. Following code converts a text to vectors
(using term frequency) and applies cosine similarity to provide closeness among two text.
Coreference Resolution- it is a process of finding relational links among the words (or phrases) within
the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
Serpact Ltd. | AffiliateCon Sofia 2019
What is GOOGLE BERT?
Serpact Ltd. | AffiliateCon Sofia 2019
According to Google:
The phrase was “how to catch a cow fishing?
”In New England, the word “cow” in the context of
fishing means a large striped bass. A striped bass is a
popular saltwater game fish that millions of anglers fish
for on the Atlantic coast.
So earlier this month, during the course of research for a
PubCon Vegas presentation, I typed the phrase, “how to
catch a cow fishing” and Google provided results related
to livestock, to cows.
Serpact Ltd. | AffiliateCon Sofia 2019
An Example?
How to Write Better
Optimized Texts
Serpact Ltd. | AffiliateCon Sofia 2019
Understand
What Your
Audience
Wants…
AnswerThePublic
Google Trends
Ahrefs
Semrush
Keywordtool.io
Ubersuggest.io
Moz Keyword Tool
Serpstat
SpuFu
Google Search Console
Use Keyword / Phrases Research Tools
Serpact Ltd. | AffiliateCon Sofia 2019
Group keywords around topics
Group keywords around intent
Group keywords around common classifiers - colors, w-words, sizes,
locations, brands etc.
Answer a question you want to target and provide the best answer
Answer Follow Up Questions
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
Be Careful With Your Website Structure - Be Topical
& Map Keywords
Serpact Ltd. | AffiliateCon Sofia 2019
Connect questions & answers in
your content
Connect your current and following question
Combine questions into a piece of content
Split the questions into sub-topics
Optimizing content for NLP should begin with
simple sentence structure and focused on
providing concise information for your audience.
You should always try to submit the exact
questions that people ask along with relevant
answers since people will be typing those
questions into search engines.
Serpact Ltd. | AffiliateCon Sofia 2019
Identify Units, Classifications, and
Adjectives
Within NLP SEO, words have meaning and therefore may have expected units,
classifications, or adjectives associated with them. NLP parsing will be on the lookout for
these elements when determining if the content contains the precise answer to a
question. Let’s look at two examples.
Example Query 1: “Safe Temperature for Chicken”
For this query, temperature has a unit of degrees in either Fahrenheit or Celsius
expressed as a numerical value. If a sentence does not include these elements, it does
not satisfy the question. A well-structured sentence should contain a number and the
word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the
answer is more accurate and specific, while also improving our localized targeting.
Be Clear With Your Answers
Reduce Dependency Hops
Reading a sentence and determining if a question is answered depends on
Google’s NLP parsers not getting hung up as they “crawl” through a
sentence. If a sentence’s structure is overly complicated, Google may fail to
create clear links between words or may require it to take too many hops to
build that relationship.
Don’t Beat Around the Bush
A common NLP problem is “beating around the bush” when it comes to
answering a question. It’s not that these answered are “wrong,” but they
don’t give Google a precise determination of the answer.
Serpact Ltd. | AffiliateCon Sofia 2019
Be Clear With Your Answers
Follow the Query Through
What is an Emergency Fund
How Much to Save
Types of Emergency Funds
How to Build an Emergency Fund
Answer yourself the question: “does this article answer all the subjects and questions a searcher
might have when they search?” Google can identify these follow through topics and questions by
looking at follow up searches and query refinement within search sessions.
Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them
content that eliminates the need for two to three additional searches.
If a user searches for “Emergency Fund,” they may have the following goals on their journey:
Disambiguate Entities
Isolate the entities when not used in a sentence. When an entity is used outside of a
sentence, try to isolate it in the text, and within an HTML tag where it appears, such as
headings, list items, or table cells.
Avoid grouping it with a price, year, category, parentheses, or any other data/text.
Simplify your content around entities you want Google to extract successfully.
There are two simple rules here:
Disambiguate Entities
When an entity can be confused, such as cities in multiple states, movies with the same
name, or films vs. books, you can disambiguate entities by using indicator words in the
same sentence.
For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For
the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted
entity is Portland, ME.
There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google
to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities
could have multiple identities.
Use indicator words to disambiguate entities.
Serpact Ltd. | AffiliateCon Sofia 2019
Text Formatting
Inverted Pyramid: Articles have a lede, body, and a tail. Content has
different meanings based on how far down the page it appears.
Headings: Headings define the content between it and the next heading.
Subtopics: Think of headings as sub-articles within the parent article.
Proximity: Proximity determines relationships.
Words/phrases in the same sentence are closely related.
Words/phrases in the same paragraph are related.
Words/phrases in different sections are distantly related.
Relationships: Subheadings have a parent – > child relationship. (A page with a
list of categories as H2s and products as H3s is a list of categories. A page with a
list of products as H2s and categories as H3s is a list of products.)
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However,
heading tags are preferred.)Text also doesn’t need a heading tag to have a parent ->
child relationship. Its display and font formatting can visually dictate this without the
use of heading tags.
Lists: HTML ordered and unordered lists function as lists, which have a
meaning.Headings can also perform as lists.Headings with a number first can work
as an ordered list.
Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary”
phrases at the start of a paragraph can function as a list.
Tables inherently imply row/column relationships.Some formatting suggests
classification, like addresses and date formats.
Structure by Content Type: Some content types have expected data that
define them. Events have names, locations, and dates. Products have names,
brands, and prices.
Other Formats:
Serpact Ltd. | AffiliateCon Sofia 2019
Google NLP Tool
Webtexttool - Text Metrics
Semrush SEO Writing Assistant
Tools for content optimization:
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
Contacts
EMAIL ADDRESS
info@serpact.com
PHONE NUMBER
+359 (032) 260 096
WEBSITE
serpact.com
SERP.AC/FB
SERP.AC/TWITTER
SERP.AC/YOUTUBE
1 de 52

Recomendados

Essential Elements of Excellent Multilingual Search por
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Searchandrew_paulsen
448 visualizações13 slides
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM por
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMITC Infotech
135 visualizações5 slides
Introduction to Semantic Technology for SharePoint Administrators por
Introduction to Semantic Technology for SharePoint AdministratorsIntroduction to Semantic Technology for SharePoint Administrators
Introduction to Semantic Technology for SharePoint AdministratorsBradley Bennet
559 visualizações32 slides
Aq35241246 por
Aq35241246Aq35241246
Aq35241246IJERA Editor
598 visualizações6 slides
Metaphic or the art of looking another way. por
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Suresh Manian
1.1K visualizações17 slides
Transform unstructured e&p information por
Transform unstructured e&p informationTransform unstructured e&p information
Transform unstructured e&p informationStig-Arne Kristoffersen
194 visualizações16 slides

Mais conteúdo relacionado

Similar a Semantic Search_ NLP_ ML.pdf

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi... por
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET Journal
11 visualizações6 slides
AI BASED PLAGIARISM CHECKER por
AI BASED PLAGIARISM CHECKERAI BASED PLAGIARISM CHECKER
AI BASED PLAGIARISM CHECKERAndrew Molina
5 visualizações6 slides
NLP and its applications por
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
939 visualizações6 slides
You Don't Know SEO por
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
121K visualizações214 slides
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE... por
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
88 visualizações11 slides
IRJET- Short-Text Semantic Similarity using Glove Word Embedding por
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
28 visualizações6 slides

Similar a Semantic Search_ NLP_ ML.pdf(20)

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi... por IRJET Journal
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET Journal11 visualizações
AI BASED PLAGIARISM CHECKER por Andrew Molina
AI BASED PLAGIARISM CHECKERAI BASED PLAGIARISM CHECKER
AI BASED PLAGIARISM CHECKER
Andrew Molina5 visualizações
NLP and its applications por Utphala P
NLP and its applicationsNLP and its applications
NLP and its applications
Utphala P939 visualizações
You Don't Know SEO por Michael King
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
Michael King121K visualizações
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE... por cscpconf
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
cscpconf88 visualizações
IRJET- Short-Text Semantic Similarity using Glove Word Embedding por IRJET Journal
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal28 visualizações
Volume 2-issue-6-2016-2020 por Editor IJARCET
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Editor IJARCET263 visualizações
Volume 2-issue-6-2016-2020 por Editor IJARCET
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Editor IJARCET352 visualizações
Language Modeling.docx por AnuradhaRaheja1
Language Modeling.docxLanguage Modeling.docx
Language Modeling.docx
AnuradhaRaheja112 visualizações
SEO + NLP - Redefining The Computer & Human Relationship.pdf por Let's Get Visible
SEO + NLP - Redefining The Computer & Human Relationship.pdfSEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdf
Let's Get Visible69 visualizações
Teaching machines about a subject domain por Paul Cleverley
Teaching machines about a subject domainTeaching machines about a subject domain
Teaching machines about a subject domain
Paul Cleverley616 visualizações
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,... por cscpconf
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
cscpconf35 visualizações
Computing semantic similarity measure between words using web search engine por csandit
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit1.5K visualizações
Ijarcet vol-2-issue-7-2252-2257 por Editor IJARCET
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET153 visualizações
Ijarcet vol-2-issue-7-2252-2257 por Editor IJARCET
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET233 visualizações
EasyChair-Preprint-7375.pdf por NohaGhoweil
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil46 visualizações
Optimized Technique for Academic Search engine Optimization por komalkumari103
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimization
komalkumari10312 visualizações
Semantic Grounding Strategies for Tagbased Recommender Systems por dannyijwest
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
dannyijwest16 visualizações

Último

nintendo_64.pptx por
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
6 visualizações7 slides
Chat GPTs por
Chat GPTsChat GPTs
Chat GPTsGene Leybzon
12 visualizações36 slides
How to build dyanmic dashboards and ensure they always work por
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always workWiiisdom
14 visualizações13 slides
Agile 101 por
Agile 101Agile 101
Agile 101John Valentino
12 visualizações20 slides
Page Object Model por
Page Object ModelPage Object Model
Page Object Modelartembondar5
6 visualizações5 slides
The Path to DevOps por
The Path to DevOpsThe Path to DevOps
The Path to DevOpsJohn Valentino
5 visualizações6 slides

Último(20)

nintendo_64.pptx por paiga02016
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptx
paiga020166 visualizações
Chat GPTs por Gene Leybzon
Chat GPTsChat GPTs
Chat GPTs
Gene Leybzon12 visualizações
How to build dyanmic dashboards and ensure they always work por Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom14 visualizações
Agile 101 por John Valentino
Agile 101Agile 101
Agile 101
John Valentino12 visualizações
Page Object Model por artembondar5
Page Object ModelPage Object Model
Page Object Model
artembondar56 visualizações
The Path to DevOps por John Valentino
The Path to DevOpsThe Path to DevOps
The Path to DevOps
John Valentino5 visualizações
FOSSLight Community Day 2023-11-30 por Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan7 visualizações
Playwright Retries por artembondar5
Playwright RetriesPlaywright Retries
Playwright Retries
artembondar56 visualizações
predicting-m3-devopsconMunich-2023-v2.pptx por Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app12 visualizações
Using Qt under LGPL-3.0 por Burkhard Stubert
Using Qt under LGPL-3.0Using Qt under LGPL-3.0
Using Qt under LGPL-3.0
Burkhard Stubert13 visualizações
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... por NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 visualizações
tecnologia18.docx por nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 visualizações
EV Charging App Case por iCoderz Solutions
EV Charging App Case EV Charging App Case
EV Charging App Case
iCoderz Solutions9 visualizações
Transport Management System - Shipment & Container Tracking por Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 5 visualizações
Quality Engineer: A Day in the Life por John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 visualizações
Ports-and-Adapters Architecture for Embedded HMI por Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert33 visualizações
Streamlining Your Business Operations with Enterprise Application Integration... por Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 visualizações
Automated Testing of Microsoft Power BI Reports por RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS10 visualizações

Semantic Search_ NLP_ ML.pdf

  • 1. Semantic Search, Machine Learning & AI in the latest Google algorithms
  • 2. More than 80 international certificates Over 40 lectures in Bulgaria More than 10 lectures abroad Over 15 years of professional experience More than 20 interviews for Bulgarian and international medias More than 20 interviews for Bulgarian and international medias The only Bulgarian SEO agency with a presentation at a Google event The only Bulgarian SEO agency officially accredited by Stone Temple Nominees at the Europe Search Awards 2019 ABOUT SERPACT
  • 4. WHAT IS MACHINE LEARNING AND WHY SEARCH ENGINES USE IT?
  • 5. FIND PATTERNS FOR URLS AND PAGE CONTENT Multiple outbound links to unrelated pages Excessive use of the same keywords Excessive use of synonyms Overoptimization of anchor texts Other similar variables. ANALYSIS OF SEARCH AND CLASSIFICATION PHRASES One of the best applications of machine learning algorithms is the classification of search phrases and, accordingly, index documents based on the user’s search intent. As we know, phrases can generally be – information, navigation and transactional. By analyzing click patterns and the type of content that users engage with (e.g., CTR by content type), a search engine can use machine learning to determine user intent.
  • 6. Identification of synonyms When you see search results that don’t include the keyword in a snippet, it’s likely that Google uses machine learning to identify synonyms.
  • 7. Better identification of word connections Google’s purpose for synonyms is to understand the context and meaning of a page. Creating content in a clear and consistent sense is much more important than spamming a page with keywords and synonyms.
  • 8. Identifying the similarities between the words in a search query
  • 9. Identifying the similarities between the words in a search query
  • 10. Identifying the similarities between the words in a search query
  • 11. CUSTOM ALERTS BASED ON SPECIFIC REQUESTS Machine learning algorithms can put more weight on variables in some queries than others. The search engine “learns” about the preferences of a particular user and can base its information on past queries to present the most interesting information. Overall, according to consumer research, personalized phrases through machine learning have increased the clickthrough rate (CTR) of results by about 10%.
  • 12. IDENTIFYING NEW ALERTS According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps identify patterns in queries but also helps the search engine identify possible new ranking signals. These alerts are being sought so that Google can continue to improve the quality of search results.
  • 13. WHAT IS NATURAL LANGUAGE PROCESSING - OR HOW SEARCH ENGINES UNDERSTAND OUR CONTENT?
  • 14. Tasks of NLP Tokenization, Lemmatization, Stemming Sentence boundary detection Part-of-speech tagging Syntax parsing - Dependency parsing & Constituency parsing Semantic role labeling Semantic dependency parsing Word sense disambiguation/induction Named-entity recognition/classification Entity linking Temporal expression recognition/normalization Co-reference resolution Information extraction Terminology extraction Topic modeling
  • 15. Attributional similarity (word similarity) Relational similarity Phrase similarity Sentence similarity Paraphrase identification Textual entailment Natural language generation Speech recognition Speech synthesis Ontology population Question answering Machine translation Text coherence Fake news detection Tasks of NLP
  • 16. Serpact Ltd. | AffiliateCon Sofia 2019 How Search Engines Like Google Process The Content Today?
  • 18. Text Pre-Processing Noise Removal Lexicon Normalization Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary and morphological analysis (word structure and grammar relations). Object Standardization - acronyms, hashtags with attached words, and colloquial slangs Normalization and Lemmatization: POS tags are the basis of lemmatization process for converting a word to its base form (lemma). Efficient stopword removal : P OS tags are also useful in efficient removal of stopwords.
  • 19. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing
  • 20. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing Dependency Trees – Sentences are composed of some words sewed together. The relationship among the words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items (words). Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function of a word in the sentence. Word sense disambiguation: Some language words have multiple meanings according to their usage. For example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the flight”
  • 21. Serpact Ltd. | AffiliateCon Sofia 2019 Entity Extraction Named Entity Recognition (NER) Noun phrase identification Phrase classification Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
  • 22. Serpact Ltd. | AffiliateCon Sofia 2019 Topic Modelling - Latent Dirichlet Allocation (LDA) Topic modeling is a process of automatically identifying the topics present in a text corpus, it derives the hidden patterns among the words in the corpus in an unsupervised manner. Topics are defined as “a repeating pattern of co- occurring terms in a corpus”. A good topic model results in – “health”, “doctor”, “patient”, “hospital” for a topic – Healthcare, and “farm”, “crops”, “wheat” for a topic – “Farming”.
  • 23. Bag of Words Is a commonly used model that allows you to count all words in a piece of text. Basically it creates an occurrence matrix for the sentence or document, disregarding grammar and word order. These word frequencies or occurrences are then used as features for training a classifier.
  • 24. Serpact Ltd. | AffiliateCon Sofia 2019 N-Grams as Features A combination of N words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words (Unigrams) as features. Also, bigrams (N = 2) are considered as the most important features of all the others.
  • 25. Statistical Features Term Frequency – Inverse Document Frequency (TF – IDF) Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D” Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total documents available in the corpus and number of documents containing the term T. Count / Density / Readability Features - Count or Density based features can also be used in models and analysis. These features might seem trivial but shows a great impact in learning models. Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry specific word counts.
  • 26. Serpact Ltd. | AffiliateCon Sofia 2019 Word Embedding (text vectors) Word embedding is the modern way of representing words as vectors. The aim of word embedding is to redefine the high dimensional word features into low dimensional feature vectors by preserving the contextual similarity in the corpus. They are widely used in deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks.
  • 27. Serpact Ltd. | AffiliateCon Sofia 2019 Final Result Text Classification - Email Spam Identification, topic classification of news, sentiment classification and organization of web pages by search engines. Text Matching / Similarity Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name, location name etc) and produces a character string that identifies a set of words that are (roughly) phonetically similar. Flexible String Matching – A complete text matching system includes different algorithms pipelined together to compute variety of text variations. (exact string matching, lemmatized matching, and compact matching (takes care of spaces, punctuation’s, slangs etc). Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can also be applied in order to measure vectorized similarity. Following code converts a text to vectors (using term frequency) and applies cosine similarity to provide closeness among two text. Coreference Resolution- it is a process of finding relational links among the words (or phrases) within the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
  • 28. Serpact Ltd. | AffiliateCon Sofia 2019 What is GOOGLE BERT?
  • 29. Serpact Ltd. | AffiliateCon Sofia 2019 According to Google:
  • 30. The phrase was “how to catch a cow fishing? ”In New England, the word “cow” in the context of fishing means a large striped bass. A striped bass is a popular saltwater game fish that millions of anglers fish for on the Atlantic coast. So earlier this month, during the course of research for a PubCon Vegas presentation, I typed the phrase, “how to catch a cow fishing” and Google provided results related to livestock, to cows. Serpact Ltd. | AffiliateCon Sofia 2019 An Example?
  • 31. How to Write Better Optimized Texts
  • 32. Serpact Ltd. | AffiliateCon Sofia 2019 Understand What Your Audience Wants…
  • 33. AnswerThePublic Google Trends Ahrefs Semrush Keywordtool.io Ubersuggest.io Moz Keyword Tool Serpstat SpuFu Google Search Console Use Keyword / Phrases Research Tools
  • 34. Serpact Ltd. | AffiliateCon Sofia 2019
  • 35. Group keywords around topics Group keywords around intent Group keywords around common classifiers - colors, w-words, sizes, locations, brands etc.
  • 36. Answer a question you want to target and provide the best answer Answer Follow Up Questions Serpact Ltd. | AffiliateCon Sofia 2019
  • 37. Serpact Ltd. | AffiliateCon Sofia 2019 Be Careful With Your Website Structure - Be Topical & Map Keywords
  • 38. Serpact Ltd. | AffiliateCon Sofia 2019 Connect questions & answers in your content Connect your current and following question Combine questions into a piece of content Split the questions into sub-topics Optimizing content for NLP should begin with simple sentence structure and focused on providing concise information for your audience. You should always try to submit the exact questions that people ask along with relevant answers since people will be typing those questions into search engines.
  • 39. Serpact Ltd. | AffiliateCon Sofia 2019 Identify Units, Classifications, and Adjectives Within NLP SEO, words have meaning and therefore may have expected units, classifications, or adjectives associated with them. NLP parsing will be on the lookout for these elements when determining if the content contains the precise answer to a question. Let’s look at two examples. Example Query 1: “Safe Temperature for Chicken” For this query, temperature has a unit of degrees in either Fahrenheit or Celsius expressed as a numerical value. If a sentence does not include these elements, it does not satisfy the question. A well-structured sentence should contain a number and the word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the answer is more accurate and specific, while also improving our localized targeting.
  • 40. Be Clear With Your Answers Reduce Dependency Hops Reading a sentence and determining if a question is answered depends on Google’s NLP parsers not getting hung up as they “crawl” through a sentence. If a sentence’s structure is overly complicated, Google may fail to create clear links between words or may require it to take too many hops to build that relationship. Don’t Beat Around the Bush A common NLP problem is “beating around the bush” when it comes to answering a question. It’s not that these answered are “wrong,” but they don’t give Google a precise determination of the answer.
  • 41. Serpact Ltd. | AffiliateCon Sofia 2019 Be Clear With Your Answers
  • 42. Follow the Query Through What is an Emergency Fund How Much to Save Types of Emergency Funds How to Build an Emergency Fund Answer yourself the question: “does this article answer all the subjects and questions a searcher might have when they search?” Google can identify these follow through topics and questions by looking at follow up searches and query refinement within search sessions. Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them content that eliminates the need for two to three additional searches. If a user searches for “Emergency Fund,” they may have the following goals on their journey:
  • 43. Disambiguate Entities Isolate the entities when not used in a sentence. When an entity is used outside of a sentence, try to isolate it in the text, and within an HTML tag where it appears, such as headings, list items, or table cells. Avoid grouping it with a price, year, category, parentheses, or any other data/text. Simplify your content around entities you want Google to extract successfully. There are two simple rules here:
  • 44. Disambiguate Entities When an entity can be confused, such as cities in multiple states, movies with the same name, or films vs. books, you can disambiguate entities by using indicator words in the same sentence. For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted entity is Portland, ME. There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities could have multiple identities. Use indicator words to disambiguate entities.
  • 45. Serpact Ltd. | AffiliateCon Sofia 2019 Text Formatting Inverted Pyramid: Articles have a lede, body, and a tail. Content has different meanings based on how far down the page it appears. Headings: Headings define the content between it and the next heading. Subtopics: Think of headings as sub-articles within the parent article. Proximity: Proximity determines relationships. Words/phrases in the same sentence are closely related. Words/phrases in the same paragraph are related. Words/phrases in different sections are distantly related. Relationships: Subheadings have a parent – > child relationship. (A page with a list of categories as H2s and products as H3s is a list of categories. A page with a list of products as H2s and categories as H3s is a list of products.)
  • 46. Serpact Ltd. | AffiliateCon Sofia 2019
  • 47. Serpact Ltd. | AffiliateCon Sofia 2019 HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However, heading tags are preferred.)Text also doesn’t need a heading tag to have a parent -> child relationship. Its display and font formatting can visually dictate this without the use of heading tags. Lists: HTML ordered and unordered lists function as lists, which have a meaning.Headings can also perform as lists.Headings with a number first can work as an ordered list. Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary” phrases at the start of a paragraph can function as a list. Tables inherently imply row/column relationships.Some formatting suggests classification, like addresses and date formats. Structure by Content Type: Some content types have expected data that define them. Events have names, locations, and dates. Products have names, brands, and prices. Other Formats:
  • 48. Serpact Ltd. | AffiliateCon Sofia 2019 Google NLP Tool Webtexttool - Text Metrics Semrush SEO Writing Assistant Tools for content optimization:
  • 52. Contacts EMAIL ADDRESS info@serpact.com PHONE NUMBER +359 (032) 260 096 WEBSITE serpact.com SERP.AC/FB SERP.AC/TWITTER SERP.AC/YOUTUBE