SlideShare uma empresa Scribd logo
1 de 9
What is a search engine ?
A web search engine is a software system that is
designed to search for information on the World Wide
Web. The search results are generally presented in a
line of results often referred to assearch engine results
pages (SERPs). The information may be a specialist
in web pages, images, information and other types of
files. Some search engines also mine data available
in databasesor open directories. Unlike web
directories, which are maintained only by human
editors, search engines also maintain real-
time information by running an algorithm on a web
crawler.
List of search engine
What is Google
Google Inc. is an American multinational corporation specializing in Internet-related services
and products. These include search, cloud computing,software and online
advertising technologies. Most of its profits are derived from AdWords.[
Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford
University. Together they own about 16 percent of its shares. They incorporated Google as a
privately held company on September 4, 1998. An initial public offering followed on August 19,
2004. Itsmission statement from the outset was "to organize the world's information and make
it universally accessible and useful", and its unofficial slogan was "Don't be evil". In 2006 Google
moved to headquarters in Mountain View, California, nicknamed the Googleplex.
Rapid growth since incorporation has triggered a chain of products, acquisitions, and
partnerships beyond Google's core search engine. It offers onlineproductivity
software including email, an office suite, and social networking. Desktop products include
applications for web browsing, organizing andediting photos, and instant messaging. The
company leads the development of the Android mobile operating system and the browser-
only Google Chrome OS for a specialized type of netbook known as a Chromebook. Google has
moved increasingly into communications hardware: it partners with major electronics
manufacturers in production of its high-end Nexus devices and acquired Motorola Mobility in
May 2012. In 2012, a fiber-optic infrastructure was installed in Kansas City to facilitate a Google
Fiber broadband service.
Founder of Google
Google as best search engine!
Google could be said to be the best search engine for the
following reasons:
1. It relies on a simplicity that many other search
engines lack.
2. It's fast, reliable, easy to use, user friendly.
Fewer, less noticeable ads.
3. The search algorithm seems to bring the most
relevant items to the top.
More relevant ads.
More people use Google than any other search engine in
the world, giving Google the information to improve their
engine further.
Reason why Google is the best!
Basically, Google is a crawler-based engine, meaning that it has
software programs designed to "crawl" the information on the
Net and add it to its sizeable database. Google has a great
reputation for relevant and thorough search results, and is a good
first place to start when searching. Google's home page is
extremely clean and simple, loads quickly, and delivers arguably
the best results of any search engine out there, mostly due to its
PageRank technology and massive listings. Google also earns high
marks for its maps and searches for images, videos and blog
posts; you can even search inside books with Book Search.
However, search experts say that no single search engine
provides the most relevant results for all queries. Google is also
ranked number one because more people use it than any other
search engine.
Working behind Google
Running a web crawler is a challenging task. There are tricky performance and reliability issues and even more
importantly, there are social issues. Crawling is the most fragile application since it involves interacting with
hundreds of thousands of web servers and various name servers which are all beyond the control of the system.In
order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single
URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the
crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. This is necessary to
retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second
using four crawlers. This amounts to roughly 600K per second of data. A major performance stress is DNS lookup.
Each crawler maintains a its own DNS cache so it does not need to do a DNS lookup before crawling each document.
Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host,
sending request, and receiving response. These factors make the crawler a complex component of the system. It
uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state.
It turns out that running a crawler which connects to more than half a million servers, and generates tens of
millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people
coming on line, there are always those who do not know what a crawler is, because this is the first one they have
seen. Almost daily, we receive an email something like, "Wow, you looked at a lot of pages from my web site. How
did you like it?" There are also some people who do not know about the robots exclusion protocol, and think their
page should be protected from indexing by a statement like, "This page is copyrighted and should not be indexed",
which needless to say is difficult for web crawlers to understand. Also, because of the huge amount of data
involved, unexpected things will happen. For example, our system tried to crawl an online game. This resulted in
lots of garbage messages in the middle of their game! It turns out this was an easy problem to fix. But this problem
had not come up until we had downloaded tens of millions of pages. Because of the immense variation in web
pages and servers, it is virtually impossible to test a crawler without running it on large part of the Internet.
Invariably, there are hundreds of obscure problems which may only occur on one page out of the whole web and
cause the crawler to crash, or worse, cause unpredictable or incorrect behavior. Systems which access large parts of
the Internet need to be designed to be very robust and carefully tested. Since large complex systems such as
crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and
solving these problems as they come up.
Submitted to:- Mrs. Madhuri Mam
Submitted from:-Raman Madaan

Mais conteúdo relacionado

Destaque (9)

Decreto nº 149
Decreto nº 149Decreto nº 149
Decreto nº 149
 
Registros contrato emprestito2
Registros contrato emprestito2Registros contrato emprestito2
Registros contrato emprestito2
 
Samba
SambaSamba
Samba
 
Deudores morosos 3
Deudores morosos 3Deudores morosos 3
Deudores morosos 3
 
RESOLUSIÓN 2018 TERRENO BALDÍO
RESOLUSIÓN 2018 TERRENO BALDÍORESOLUSIÓN 2018 TERRENO BALDÍO
RESOLUSIÓN 2018 TERRENO BALDÍO
 
หลักสูตรอิสลามศึกษา 2551
หลักสูตรอิสลามศึกษา 2551หลักสูตรอิสลามศึกษา 2551
หลักสูตรอิสลามศึกษา 2551
 
Home remidies
Home remidiesHome remidies
Home remidies
 
ACUERDO N°007
ACUERDO N°007ACUERDO N°007
ACUERDO N°007
 
TIGER ENDANGERED
TIGER ENDANGEREDTIGER ENDANGERED
TIGER ENDANGERED
 

Último

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

WORKING OF GOOGLE.

  • 1. What is a search engine ? A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to assearch engine results pages (SERPs). The information may be a specialist in web pages, images, information and other types of files. Some search engines also mine data available in databasesor open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real- time information by running an algorithm on a web crawler.
  • 2. List of search engine
  • 3. What is Google Google Inc. is an American multinational corporation specializing in Internet-related services and products. These include search, cloud computing,software and online advertising technologies. Most of its profits are derived from AdWords.[ Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. Together they own about 16 percent of its shares. They incorporated Google as a privately held company on September 4, 1998. An initial public offering followed on August 19, 2004. Itsmission statement from the outset was "to organize the world's information and make it universally accessible and useful", and its unofficial slogan was "Don't be evil". In 2006 Google moved to headquarters in Mountain View, California, nicknamed the Googleplex. Rapid growth since incorporation has triggered a chain of products, acquisitions, and partnerships beyond Google's core search engine. It offers onlineproductivity software including email, an office suite, and social networking. Desktop products include applications for web browsing, organizing andediting photos, and instant messaging. The company leads the development of the Android mobile operating system and the browser- only Google Chrome OS for a specialized type of netbook known as a Chromebook. Google has moved increasingly into communications hardware: it partners with major electronics manufacturers in production of its high-end Nexus devices and acquired Motorola Mobility in May 2012. In 2012, a fiber-optic infrastructure was installed in Kansas City to facilitate a Google Fiber broadband service.
  • 5. Google as best search engine! Google could be said to be the best search engine for the following reasons: 1. It relies on a simplicity that many other search engines lack. 2. It's fast, reliable, easy to use, user friendly. Fewer, less noticeable ads. 3. The search algorithm seems to bring the most relevant items to the top. More relevant ads. More people use Google than any other search engine in the world, giving Google the information to improve their engine further.
  • 6. Reason why Google is the best! Basically, Google is a crawler-based engine, meaning that it has software programs designed to "crawl" the information on the Net and add it to its sizeable database. Google has a great reputation for relevant and thorough search results, and is a good first place to start when searching. Google's home page is extremely clean and simple, loads quickly, and delivers arguably the best results of any search engine out there, mostly due to its PageRank technology and massive listings. Google also earns high marks for its maps and searches for images, videos and blog posts; you can even search inside books with Book Search. However, search experts say that no single search engine provides the most relevant results for all queries. Google is also ranked number one because more people use it than any other search engine.
  • 7.
  • 8. Working behind Google Running a web crawler is a challenging task. There are tricky performance and reliability issues and even more importantly, there are social issues. Crawling is the most fragile application since it involves interacting with hundreds of thousands of web servers and various name servers which are all beyond the control of the system.In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. This amounts to roughly 600K per second of data. A major performance stress is DNS lookup. Each crawler maintains a its own DNS cache so it does not need to do a DNS lookup before crawling each document. Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response. These factors make the crawler a complex component of the system. It uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state. It turns out that running a crawler which connects to more than half a million servers, and generates tens of millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen. Almost daily, we receive an email something like, "Wow, you looked at a lot of pages from my web site. How did you like it?" There are also some people who do not know about the robots exclusion protocol, and think their page should be protected from indexing by a statement like, "This page is copyrighted and should not be indexed", which needless to say is difficult for web crawlers to understand. Also, because of the huge amount of data involved, unexpected things will happen. For example, our system tried to crawl an online game. This resulted in lots of garbage messages in the middle of their game! It turns out this was an easy problem to fix. But this problem had not come up until we had downloaded tens of millions of pages. Because of the immense variation in web pages and servers, it is virtually impossible to test a crawler without running it on large part of the Internet. Invariably, there are hundreds of obscure problems which may only occur on one page out of the whole web and cause the crawler to crash, or worse, cause unpredictable or incorrect behavior. Systems which access large parts of the Internet need to be designed to be very robust and carefully tested. Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up.
  • 9. Submitted to:- Mrs. Madhuri Mam Submitted from:-Raman Madaan