SlideShare uma empresa Scribd logo
1 de 33
GOOGLING OF How Google Search Engine Works….
[object Object],[object Object],[object Object],[object Object],[object Object],Introduction
What is Search engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hi how are u WEB CRAWLER
WEB CRAWLER
Indexer ,[object Object],[object Object],Document 5 red Document 2,document 4 is Document 1,document 2,document 3 apple Document Word
[object Object],[object Object],[object Object],[object Object]
SEARCH ALGORITHM ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
73% 71% 64% 56% 51% Positive ranking factors 68% 56% 51% 51% 46% Negative ranking factors Keyword focused anchor text from external links External link Popularity Diversity of link sources Keyword Use Anywhere in the title tag Trustworthiness of the Domain Based on Link Distance from Trusted Cloaking with Malicious intent Link acquisition from known link brokers  Link from the page to Web Spam Pages Cloaking by User Agent Frequent Server Downtime & Site Inaccessibility
OVERALL  RANKING  FACTORS
 
Google architecture ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Crawling deeply in Google's Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],url page pagelen Url len Ecode  Doc Id
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching techniques ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Page rank
Page C has a higher PageRank than Page E, even though it has fewer links to it; the link it has is of a much higher value. A web surfer who chooses a random link on every page (but with 15% likelihood jumps to a random page on the whole web) is going to be on Page E for 8.1% of the time.  (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have Page Rank zero. Page A is assumed to link to all pages in the web, because it has no outgoing links  Mathematical Page Ranks
Trust rank
Google and Web Spam ,[object Object],[object Object],[object Object],[object Object],[object Object]
Link based web spam
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Web spam detection and result
Thus based on the following features the content based spam pages can be detected by Naïve Bayesian Classifier which focuses on the no of times a word is repeated in the content of the page .  Figure 1: Figure 2:
Link Based Features    Data set is obtained by using web crawler .   For each page, links and its contents are obtained.  From data set, a full graph is built .   For each host and page, certain features are computed .   Link-based features are extracted from host graph.  ,[object Object],[object Object],[object Object]
It has been observed that a normal webpage have their graph of the supporter increasing exponentially and the number of supporters increases with the distance. But in the case of the web spam their graph has a sudden increase in the supporters over a small distance of time and decreasing to zero after some distance. The distribution of the supporters over the distance has been shown in the figure  Distribution of supporters over a distance of the spam and non-spam page Non spam spam
System performance It is important for a search engine to crawl and index efficiently. This way information can be kept up to date and major changes to the system can be tested relatively quickly In total it took roughly 9 days to download the 26 million pages (including errors) downloading the last11 million pages in just 63 hours, averaging just over 4 million pages per day or 48.5 pages per second. The indexer runs at roughly 54 pages per second. The sorters can be run completely in parallel; using four machines, the whole process of sorting takes about 24 hours.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Future work
[object Object],[object Object],[object Object],conclusion
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],References
Thank  You  All  !!

Mais conteúdo relacionado

Mais procurados

Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
Search engine
Search engineSearch engine
Search engineswaraj27
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engineguestf460ed0
 
Efficient focused web crawling approach
Efficient focused web crawling approachEfficient focused web crawling approach
Efficient focused web crawling approachSyed Islam
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search EnginesShivam Saxena
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlerishmecse13
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine Aniket_1415
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpointvbaker2210
 
Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its workingMukesh Kumar
 
Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Surit Datta
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesJason Yang
 

Mais procurados (19)

Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
Search engine
Search engineSearch engine
Search engine
 
Smart Searching
Smart SearchingSmart Searching
Smart Searching
 
Search engine
Search engineSearch engine
Search engine
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Efficient focused web crawling approach
Efficient focused web crawling approachEfficient focused web crawling approach
Efficient focused web crawling approach
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search Engines
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
 
Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its working
 
Anatomy of google
Anatomy of googleAnatomy of google
Anatomy of google
 
Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Search Engine Optimization(SEO)
Search Engine Optimization(SEO)
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
 
On page Optimization
On page OptimizationOn page Optimization
On page Optimization
 
Google
GoogleGoogle
Google
 

Semelhante a Googling of GooGle

page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawlingpradiprahul
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glancepoojagupta267
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architectureDivyangee Jain
 
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET Journal
 
SEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in IndiaSEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in Indiaannakoch32
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldCarlo Vaccari
 
SEO Basics - SEO Company in India
SEO Basics - SEO Company in IndiaSEO Basics - SEO Company in India
SEO Basics - SEO Company in Indiaannakoch32
 
How search engine works
How search engine worksHow search engine works
How search engine worksleoniehannah
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşAykut Aslantaş
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Google indexing
Google indexingGoogle indexing
Google indexingtahoor71
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engineEchelon Institute of Technology
 
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadSEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadRahul Gupta
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search enginePrimya Tamil
 
Digital Markeing
Digital MarkeingDigital Markeing
Digital MarkeingUTTAMTADWAL
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodIOSR Journals
 
Evaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesEvaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesWaqas Tariq
 

Semelhante a Googling of GooGle (20)

page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawling
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
 
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A Comparison
 
SEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in IndiaSEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in India
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google world
 
SEO Basics - SEO Company in India
SEO Basics - SEO Company in IndiaSEO Basics - SEO Company in India
SEO Basics - SEO Company in India
 
How search engine works
How search engine worksHow search engine works
How search engine works
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut Aslantaş
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Google indexing
Google indexingGoogle indexing
Google indexing
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engine
 
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadSEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Digital Markeing
Digital MarkeingDigital Markeing
Digital Markeing
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
 
E017624043
E017624043E017624043
E017624043
 
Evaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesEvaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and Features
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Googling of GooGle

  • 1. GOOGLING OF How Google Search Engine Works….
  • 2.
  • 3.
  • 4. Hi how are u WEB CRAWLER
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. 73% 71% 64% 56% 51% Positive ranking factors 68% 56% 51% 51% 46% Negative ranking factors Keyword focused anchor text from external links External link Popularity Diversity of link sources Keyword Use Anywhere in the title tag Trustworthiness of the Domain Based on Link Distance from Trusted Cloaking with Malicious intent Link acquisition from known link brokers Link from the page to Web Spam Pages Cloaking by User Agent Frequent Server Downtime & Site Inaccessibility
  • 11. OVERALL RANKING FACTORS
  • 12.  
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Page C has a higher PageRank than Page E, even though it has fewer links to it; the link it has is of a much higher value. A web surfer who chooses a random link on every page (but with 15% likelihood jumps to a random page on the whole web) is going to be on Page E for 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have Page Rank zero. Page A is assumed to link to all pages in the web, because it has no outgoing links Mathematical Page Ranks
  • 23.
  • 25.
  • 26. Thus based on the following features the content based spam pages can be detected by Naïve Bayesian Classifier which focuses on the no of times a word is repeated in the content of the page . Figure 1: Figure 2:
  • 27.
  • 28. It has been observed that a normal webpage have their graph of the supporter increasing exponentially and the number of supporters increases with the distance. But in the case of the web spam their graph has a sudden increase in the supporters over a small distance of time and decreasing to zero after some distance. The distribution of the supporters over the distance has been shown in the figure Distribution of supporters over a distance of the spam and non-spam page Non spam spam
  • 29. System performance It is important for a search engine to crawl and index efficiently. This way information can be kept up to date and major changes to the system can be tested relatively quickly In total it took roughly 9 days to download the 26 million pages (including errors) downloading the last11 million pages in just 63 hours, averaging just over 4 million pages per day or 48.5 pages per second. The indexer runs at roughly 54 pages per second. The sorters can be run completely in parallel; using four machines, the whole process of sorting takes about 24 hours.
  • 30.
  • 31.
  • 32.
  • 33. Thank You All !!