SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Introduction to Google API…
By Pratheepan Raveendranathan
 The Google Web APIs service is a beta web program that enables developers to
easily find and manipulate information on the web.
 Google Web APIs are for developers and researchers interested in using
Google as a resource in their applications.
 The Google Web APIs service allows software developers to query more
than 3 billion web documents directly from their own computer programs.
 Google uses the SOAP and WSDL standards to act as an interface between
the user’s program and Google API.
 Programming environments such as Java, Perl, Visual Studio .NET are
compatible with Google API.
Definitions from http:// www.google.com/apis/
What is Google API?
What can you do with the API
 Developers can issue search requests to
Google's index of more than 3 billion web
pages.
 and receive results as
 structured data,
 Estimated number of results, URL’s, Snippets, Query Time
etc.
 access information in the Google cache,
 and check the spelling of words.
To start using the API
 You need to,
 Download API Package from http://
www.google.com/apis/
 Create an account and get your license key
 Install kit in your UMD account
 And also need Soap::Lite
 However, it is on all the csdev machines, so you don’t
need to get it. IT is not on UB or Bulldog.
Contents of this package:
 googleapi.jar - Java library for accessing the Google Web APIs
service.
 GoogleAPIDemo.java - Example program that uses googleapi.jar.
dotnet/
 Example .NET - programs that uses Google Web APIs.
 APIs_Reference.html - Reference doc for the API. Describes
semantics of all calls and fields.
 Javadoc - Documentation for the example Java libraries.
 Licenses - Licenses for Java code that is redistributed in this package.
 GoogleSearch.wsdl -WSDL description for Google SOAP API.
 soap-samples/
WSDL
Web Services Description Language
 The standard format for describing a web
service.
 Expressed in XML, a WSDL definition
describes how to access a web service and
what operations it will perform.
 This is the most important file (only) to use
the API with Perl.
SOAP –
Simple Object Access Protocol
 SOAP stands for Simple Object Access Protocol
 SOAP is a communication protocol
 SOAP is for communication between applications
 SOAP is a format for sending messages
 SOAP is designed to communicate via Internet
 SOAP is platform independent
 SOAP is language independent
 SOAP is based on XML
 SOAP will be developed as a W3C standard
Google API for Perl
SOAP:Lite
SOAP:Lite for Perl is a collection of Perl
modules which provides a simple and
lightweight interface to the SOAP both on
client and server side.
So How do I Query Google?
#!/usr/local/bin/perl –w
use SOAP::Lite;
# Configuration
$key = "Your Key Goes Here";
# Initialize with local SOAP::Lite file
$service = SOAP::Lite
-> service('file:GoogleSearch.wsdl');
$query= “duluth”;
Search Contd…
$result = $service
-> doGoogleSearch(
$key, # key
$query, # search query
0, # start results
10, # max results
"false", # filter: boolean
"", # restrict (string)
"false", # safeSearch: boolean
"", # lr
"", # ie
"" # oe
);
Name Description
key
Provided by Google, this is required for you to access the
Google service. Google uses the key for authentication and
logging.
q Query Phrase.
start Zero-based index of the first desired result.
maxResults
Number of results desired per query. The maximum value per
query is 10.
Note: If you do a query that doesn't have many matches, the
actual number of results you get may be smaller than what you
request.
filter
Activates or deactivates automatic results filtering, which hides
very similar results and results that all come from the same
Web host.
restrict
Restricts the search to a subset of the Google Web
index, such as a country like "Ukraine" or a topic like
"Linux."
safeSearch
A Boolean value which enables filtering of adult
content in the search results.
lr
Language Restrict - Restricts the search to documents
within one or more languages.
ie
Input Encoding - this parameter has been deprecated
and is ignored. All requests to the APIs should be
made with UTF-8 encoding.
oe
Output Encoding - this parameter has been
deprecated and is ignored. All requests to the APIs
should be made with UTF-8 encoding.
Now to Retrieve the Search
Results
if(defined($result->{resultElements})) {
print join "n",
"Found:",
$result->{resultElements}->[0]->{title},
$result->{resultElements}->[0]->{URL},
$result->{resultElements}->[0]->{snippet} . "n"
}
print "n The search took ";
print $result->{searchTime};
print "nn";
print "The estimated Number of results for your query is: ";
print $result->{estimatedTotalResultsCount};
print "nn";
What you need for
your program
Search.pl Output
Found:
University of Minnesota <b>Duluth</b> Welcomes You
http://www.d.umn.edu/
The University of Minnesota <b>Duluth</b> Homepage: an overview of academic
prog
rams, campus<br> life, resources, news and events, with extensive links to other
web sites <b>...</b>
The search took 0.159791
The estimated Number of results for your query is: 881000
Or, to get all elements:
foreach $temp (@{$result->{resultElements}}) {
print $temp->{snippet};
}
foreach $temp (@{$result->{resultElements}}) {
print $temp->{URL};
}
foreach $temp (@{$result->{resultElements}}) {
$title_array[$count++]=$temp->{title};
}
How to get a spelling suggestion?
#!/usr/local/bin/perl -w
use SOAP::Lite;
# Configuration
$key = "Your Key Goes Here";
# Initialize with local SOAP::Lite file
$service = SOAP::Lite
-> service('file:GoogleSearch.wsdl');
$correction = $service->doSpellingSuggestion($key,$searchString);
How do I get the results?
 Easy,
 The variable Correction will contain the spelling
suggestion, if Google has one, or it would be
empty if there is no suggestion
 So Retrieving the result would be as easy as:
print "n The suggested spelling for $searchString is
$correction nn";
Spelling output
Enter a word
dulut
The suggested spelling for “Duluth” is:
duluth
How do I get a cached web page?
 Google has this feature that given a URL, it
will try to retrieve the web page from its
“cache”.
 So the actual contents of the page might be
somewhat old, relative to when the web crawlers
or Google did an update on the site
 Example,
Example Contd…
#!/usr/local/bin/perl –w
use SOAP::Lite;
# Configuration
$key = "Your Key Goes Here";
# Initialize with local SOAP::Lite file
$service = SOAP::Lite
-> service('file:GoogleSearch.wsdl');
$url="http://www.d.umn.edu";
$cachedPage=$service->doGetCachedPage($key,$url);
How do I retrieve the results?
 This is going to be the same as the spelling
suggestion,
 So if the web page does exist you will have the
whole web page HTML in the “cachedWebpage”
variable.
 Otherwise, you would get a message from Google
which says
 “ This web page has not been updated…blah…
blah…blah “
Search with other options:
Google has four topic restricts:
Topic<restrict> value
US. Government unclesam
Linux linux
Macintosh mac
FreeBSD bsd
Search with Restrictions:
$result = $service
-> doGoogleSearch(
$key, # key
$query, # search query
0, # start results
10, # max results
"false", # filter: boolean
"linux", # restrict (string)
"false", # safeSearch: boolean
"", # lr
"", # ie
"" # oe
);
Search with Language
Restrictions
$result = $service
-> doGoogleSearch(
$key, # key
$query, # search query
0, # start results
10, # max results
"false", # filter: boolean
"", # restrict (string)
"false", # safeSearch: boolean
"lang_de", # lr
"", # ie
"" # oe
);
print "n The search took ";
print $result->{searchTime};
print "nn";
print "The estimated Number of results for your query is: ";
print $result->{estimatedTotalResultsCount};
print "nn";
if(defined($result->{resultElements})) {
print join "n",
"Found:",
$result->{resultElements}->[0]->{title},$result->{resultElements}->[0]->{URL},
$result->{resultElements}->[0]->{snippet} . "n"
}
lang_de = Gernman
Search with Language
Restrictions Contd…
Please Enter Search Item
der sturm
The search took 0.309039
The estimated Number of results for your query is: 206000
Found:
SK <b>STURM</b> GRAZ - Willkommen beim Sk <b>Sturm</b>
http://www.sksturm.at/
Eintreten. Puntigamer das bierige Bier, Steiermark.com, Puma, Tipp3,<br>
Autohaus Jakob Prügger, Graz - Hausmannstätten. © 2003 SkSturm
<b>...</b>
Tips on Querying Google
Default Search
By default, Google only returns pages that include all of the terms in the query string.
Stop Words
Google ignores common words and characters such as "where" and "how," as well as certain single digits and single
letters. Common words that are ignored are known as stop words.
However, you can prevent Google from ignoring stop words by enclosing them in quotes, such as in the phrase "to be or
not to be".
Special Characters
By default, all non-alphanumeric characters that are included in a search query are treated as word separators.
The only exceptions are the following: double quote mark ("), plus sign (+), minus sign or hyphen (-), and ampersand
(&).
The ampersand character (&) is treated as another character in the query term in which it is included, while the
remaining exception characters correspond to search features listed in the section below.
Special Query Terms
Google supports the use of several special query terms that allow the user or search administrator to access additional
capabilities of the Google search engine.
(The same Explanations can be found in the API Reference Section in your Google API download)
(The following Special Query table can
be found in the API Reference Section in
your Google API download)
Special Query
Capability
Example Query Description
Include Query Term Star Wars Episode +I
If a common word is essential to
getting the results you want, you can
include it by putting a "+" sign in front
of it.
Exclude Query Term bass -music
You can exclude a word from your
search by putting a minus sign ("-")
immediately in front of the term you
want to exclude from the search
results.
Phrase Search "yellow pages"
Search for complete phrases by
enclosing them in quotation marks or
connecting them with hyphens. Words
marked in this way will appear
together in all results exactly as
entered.
Note: You may need to use a "+" to
force inclusion of common words in a
phrase.
Boolean OR Search vacation london OR paris
Google search supports the Boolean "OR" operator.
To retrieve pages that include either word A or word
B, use an uppercase OR between terms.
Site Restricted
Search
admission
site:www.stanford.edu
If you know the specific web site you want to search
but aren't sure where the information is located within
that site, you can use Google to search only within a
specific web site.
Do this by entering your query followed by the string
"site:" followed by the host name.
Date Restricted
Search
Star Wars
daterange:2452122-
2452234
If you want to limit your results to documents
that were published within a specific date
range, then you can use the "daterange: "
query term to accomplish this. The
"daterange:" query term must be in the
following format:
daterange:<start_date>-<end date>
Title Search
(term)
intitle:Google
search
If you prepend "intitle:" to a query term, Google search
restricts the results to documents containing that
word in the title. Note there can be no space between
the "intitle:" and the following word.
Title Search (all) allintitle: Google
search
Starting a query with the term "allintitle:" restricts the results to
those with all of the query words in the title.
URL Search
(term)
inurl:Google
search
If you prepend "inurl:" to a query term, Google search restricts
the results to documents containing that word in the result
URL. Note there can be no space between the "inurl:" and the
following word.
To find multiple words in a result URL, use the "inurl:"
operator for each word.
Note: Putting "inurl:" in front of every word in your query is
equivalent to putting "allinurl:" at the front of your query
URL Search
(all)
allinurl: Google search
Starting a query with the term "allinurl:" restricts the results
to those with all of the query words in the result URL.
Text Only
Search (all)
allintext: Google
search
Starting a query with the term "allintext:" restricts the results
to those with all of the query words in only the body
text, ignoring link, URL, and title matches.
Links Only
Search (all)
allinlinks: Google
search
Starting a query with the term "allinlinks:" restricts the
results to those with all of the query words in the
URL links on the page.
File Type
Filtering
Google filetype:doc OR
filetype:pdf
The query prefix "filetype:" filters the results returned to
include only documents with the extension
specified immediately after. Note there can be no
space between "filetype:" and the specified
extension.
File Type
Exclusion
Google -filetype:doc -
filetype:pdf
The query prefix "-filetype:" filters the results to exclude
documents with the extension specified immediately after.
Note there can be no space between "-filetype:" and the
specified extension.
Web Document
Info
info:www.google.com 
The query prefix "info:" returns a single result for the specified
URL if it exists in the index.
Back Links link:www.google.com 
The query prefix "link:" lists web pages that have links to the
specified web page. Note there can be no space between
"link:" and the web page URL.
Related Links related:www.google.com 
The query prefix "related:" lists web pages that are similar to
the specified web page. Note there can be no space
between "link:" and the web page URL.
Cached Results
Page
cache:www.google.com
web
The query prefix "cache:" returns the cached HTML version of
the specified web document that the Google search
crawled. Note there can be no space between "cache:" and
the web page URL.
.
Other Interesting Issues
 Search for say “yahoo”, and look at the
estimated number of results.
 Wait for like a minute or so.
 Search again for “yahoo” and look at the
estimated number of results.
 The result, 5 out of 10 times, will be different.
Conclusion…
The API can be used as means of retrieving “information” and “Text” from the web.
Some interesting examples:
http://www.googleduel.com/original.php
http://douweosinga.com/projects/googlehacks
http://www.researchbuzz.org/archives/001418.shtml
http://cgi.sfu.ca/~gpeters/cgi-bin/pear/gender.php

Mais conteúdo relacionado

Mais procurados

Web Development Course: PHP lecture 4
Web Development Course: PHP  lecture 4Web Development Course: PHP  lecture 4
Web Development Course: PHP lecture 4Gheyath M. Othman
 
RESTful Web API and MongoDB go for a pic nic
RESTful Web API and MongoDB go for a pic nicRESTful Web API and MongoDB go for a pic nic
RESTful Web API and MongoDB go for a pic nicNicola Iarocci
 
Introduction to Web Programming with Perl
Introduction to Web Programming with PerlIntroduction to Web Programming with Perl
Introduction to Web Programming with PerlDave Cross
 
Gaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineGaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineTim Berglund
 
PHP Presentation
PHP PresentationPHP Presentation
PHP PresentationAnkush Jain
 
PHP Presentation
PHP PresentationPHP Presentation
PHP PresentationNikhil Jain
 
A winning combination: Plone as CMS and your favorite Python web framework as...
A winning combination: Plone as CMS and your favorite Python web framework as...A winning combination: Plone as CMS and your favorite Python web framework as...
A winning combination: Plone as CMS and your favorite Python web framework as...Carlos de la Guardia
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersKathy Brown
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceElad Elrom
 
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)joegilbert
 
Web Development Course: PHP lecture 1
Web Development Course: PHP lecture 1Web Development Course: PHP lecture 1
Web Development Course: PHP lecture 1Gheyath M. Othman
 
Session Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersSession Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersStephan Schmidt
 
Abstracting functionality with centralised content
Abstracting functionality with centralised contentAbstracting functionality with centralised content
Abstracting functionality with centralised contentMichael Peacock
 
Web Development Course: PHP lecture 3
Web Development Course: PHP lecture 3Web Development Course: PHP lecture 3
Web Development Course: PHP lecture 3Gheyath M. Othman
 

Mais procurados (20)

Web Development Course: PHP lecture 4
Web Development Course: PHP  lecture 4Web Development Course: PHP  lecture 4
Web Development Course: PHP lecture 4
 
RESTful Web API and MongoDB go for a pic nic
RESTful Web API and MongoDB go for a pic nicRESTful Web API and MongoDB go for a pic nic
RESTful Web API and MongoDB go for a pic nic
 
Introduction to Web Programming with Perl
Introduction to Web Programming with PerlIntroduction to Web Programming with Perl
Introduction to Web Programming with Perl
 
Gaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App EngineGaelyk: Lightweight Groovy on the Google App Engine
Gaelyk: Lightweight Groovy on the Google App Engine
 
PHP Presentation
PHP PresentationPHP Presentation
PHP Presentation
 
PHP Presentation
PHP PresentationPHP Presentation
PHP Presentation
 
A winning combination: Plone as CMS and your favorite Python web framework as...
A winning combination: Plone as CMS and your favorite Python web framework as...A winning combination: Plone as CMS and your favorite Python web framework as...
A winning combination: Plone as CMS and your favorite Python web framework as...
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client Developers
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 Unconference
 
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
 
working with PHP & DB's
working with PHP & DB'sworking with PHP & DB's
working with PHP & DB's
 
Web Development Course: PHP lecture 1
Web Development Course: PHP lecture 1Web Development Course: PHP lecture 1
Web Development Course: PHP lecture 1
 
Develop webservice in PHP
Develop webservice in PHPDevelop webservice in PHP
Develop webservice in PHP
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Day of code
Day of codeDay of code
Day of code
 
PHP
PHPPHP
PHP
 
Session Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersSession Server - Maintaing State between several Servers
Session Server - Maintaing State between several Servers
 
Abstracting functionality with centralised content
Abstracting functionality with centralised contentAbstracting functionality with centralised content
Abstracting functionality with centralised content
 
Advanced Json
Advanced JsonAdvanced Json
Advanced Json
 
Web Development Course: PHP lecture 3
Web Development Course: PHP lecture 3Web Development Course: PHP lecture 3
Web Development Course: PHP lecture 3
 

Semelhante a Introduction to Google API - Focusky

Google App Engine for PHP
Google App Engine for PHP Google App Engine for PHP
Google App Engine for PHP Eric Johnson
 
12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocratJonathan Linowes
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesTikal Knowledge
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And GroovyKen Kousen
 
AEM Sightly Deep Dive
AEM Sightly Deep DiveAEM Sightly Deep Dive
AEM Sightly Deep DiveGabriel Walt
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with PythonBrian Lyttle
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
Web Services
Web ServicesWeb Services
Web ServicesSHC
 
Making Chrome Extension with AngularJS
Making Chrome Extension with AngularJSMaking Chrome Extension with AngularJS
Making Chrome Extension with AngularJSBen Lau
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for JavaLars Vogel
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Tony Frame
 
Java to Golang: An intro by Ryan Dawson Seldon.io
Java to Golang: An intro by Ryan Dawson Seldon.ioJava to Golang: An intro by Ryan Dawson Seldon.io
Java to Golang: An intro by Ryan Dawson Seldon.ioMauricio (Salaboy) Salatino
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformWSO2
 

Semelhante a Introduction to Google API - Focusky (20)

Google
GoogleGoogle
Google
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
 
Google App Engine for PHP
Google App Engine for PHP Google App Engine for PHP
Google App Engine for PHP
 
12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
AEM Sightly Deep Dive
AEM Sightly Deep DiveAEM Sightly Deep Dive
AEM Sightly Deep Dive
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Android networking-2
Android networking-2Android networking-2
Android networking-2
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
Web Services
Web ServicesWeb Services
Web Services
 
Making Chrome Extension with AngularJS
Making Chrome Extension with AngularJSMaking Chrome Extension with AngularJS
Making Chrome Extension with AngularJS
 
Knolx session
Knolx sessionKnolx session
Knolx session
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
 
Ext Js
Ext JsExt Js
Ext Js
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01
 
Java to Golang: An intro by Ryan Dawson Seldon.io
Java to Golang: An intro by Ryan Dawson Seldon.ioJava to Golang: An intro by Ryan Dawson Seldon.io
Java to Golang: An intro by Ryan Dawson Seldon.io
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
 

Último

Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfMaria Adalfio
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technologysoufianbouktaib1
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?krc0yvm5
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...hasimatwork
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Eric Johnson
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondContinuent
 
SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxJustineGarcia32
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of VirtualizationRajan yadav
 
Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxlibertyuae uae
 

Último (10)

Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdf
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technology
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptx
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualization
 
Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptx
 

Introduction to Google API - Focusky

  • 1. Introduction to Google API… By Pratheepan Raveendranathan
  • 2.  The Google Web APIs service is a beta web program that enables developers to easily find and manipulate information on the web.  Google Web APIs are for developers and researchers interested in using Google as a resource in their applications.  The Google Web APIs service allows software developers to query more than 3 billion web documents directly from their own computer programs.  Google uses the SOAP and WSDL standards to act as an interface between the user’s program and Google API.  Programming environments such as Java, Perl, Visual Studio .NET are compatible with Google API. Definitions from http:// www.google.com/apis/ What is Google API?
  • 3. What can you do with the API  Developers can issue search requests to Google's index of more than 3 billion web pages.  and receive results as  structured data,  Estimated number of results, URL’s, Snippets, Query Time etc.  access information in the Google cache,  and check the spelling of words.
  • 4. To start using the API  You need to,  Download API Package from http:// www.google.com/apis/  Create an account and get your license key  Install kit in your UMD account  And also need Soap::Lite  However, it is on all the csdev machines, so you don’t need to get it. IT is not on UB or Bulldog.
  • 5. Contents of this package:  googleapi.jar - Java library for accessing the Google Web APIs service.  GoogleAPIDemo.java - Example program that uses googleapi.jar. dotnet/  Example .NET - programs that uses Google Web APIs.  APIs_Reference.html - Reference doc for the API. Describes semantics of all calls and fields.  Javadoc - Documentation for the example Java libraries.  Licenses - Licenses for Java code that is redistributed in this package.  GoogleSearch.wsdl -WSDL description for Google SOAP API.  soap-samples/
  • 6. WSDL Web Services Description Language  The standard format for describing a web service.  Expressed in XML, a WSDL definition describes how to access a web service and what operations it will perform.  This is the most important file (only) to use the API with Perl.
  • 7. SOAP – Simple Object Access Protocol  SOAP stands for Simple Object Access Protocol  SOAP is a communication protocol  SOAP is for communication between applications  SOAP is a format for sending messages  SOAP is designed to communicate via Internet  SOAP is platform independent  SOAP is language independent  SOAP is based on XML  SOAP will be developed as a W3C standard
  • 8. Google API for Perl SOAP:Lite SOAP:Lite for Perl is a collection of Perl modules which provides a simple and lightweight interface to the SOAP both on client and server side.
  • 9. So How do I Query Google? #!/usr/local/bin/perl –w use SOAP::Lite; # Configuration $key = "Your Key Goes Here"; # Initialize with local SOAP::Lite file $service = SOAP::Lite -> service('file:GoogleSearch.wsdl'); $query= “duluth”;
  • 10. Search Contd… $result = $service -> doGoogleSearch( $key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "", # restrict (string) "false", # safeSearch: boolean "", # lr "", # ie "" # oe );
  • 11. Name Description key Provided by Google, this is required for you to access the Google service. Google uses the key for authentication and logging. q Query Phrase. start Zero-based index of the first desired result. maxResults Number of results desired per query. The maximum value per query is 10. Note: If you do a query that doesn't have many matches, the actual number of results you get may be smaller than what you request. filter Activates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host.
  • 12. restrict Restricts the search to a subset of the Google Web index, such as a country like "Ukraine" or a topic like "Linux." safeSearch A Boolean value which enables filtering of adult content in the search results. lr Language Restrict - Restricts the search to documents within one or more languages. ie Input Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding. oe Output Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding.
  • 13. Now to Retrieve the Search Results if(defined($result->{resultElements})) { print join "n", "Found:", $result->{resultElements}->[0]->{title}, $result->{resultElements}->[0]->{URL}, $result->{resultElements}->[0]->{snippet} . "n" } print "n The search took "; print $result->{searchTime}; print "nn"; print "The estimated Number of results for your query is: "; print $result->{estimatedTotalResultsCount}; print "nn"; What you need for your program
  • 14. Search.pl Output Found: University of Minnesota <b>Duluth</b> Welcomes You http://www.d.umn.edu/ The University of Minnesota <b>Duluth</b> Homepage: an overview of academic prog rams, campus<br> life, resources, news and events, with extensive links to other web sites <b>...</b> The search took 0.159791 The estimated Number of results for your query is: 881000
  • 15. Or, to get all elements: foreach $temp (@{$result->{resultElements}}) { print $temp->{snippet}; } foreach $temp (@{$result->{resultElements}}) { print $temp->{URL}; } foreach $temp (@{$result->{resultElements}}) { $title_array[$count++]=$temp->{title}; }
  • 16. How to get a spelling suggestion? #!/usr/local/bin/perl -w use SOAP::Lite; # Configuration $key = "Your Key Goes Here"; # Initialize with local SOAP::Lite file $service = SOAP::Lite -> service('file:GoogleSearch.wsdl'); $correction = $service->doSpellingSuggestion($key,$searchString);
  • 17. How do I get the results?  Easy,  The variable Correction will contain the spelling suggestion, if Google has one, or it would be empty if there is no suggestion  So Retrieving the result would be as easy as: print "n The suggested spelling for $searchString is $correction nn";
  • 18. Spelling output Enter a word dulut The suggested spelling for “Duluth” is: duluth
  • 19. How do I get a cached web page?  Google has this feature that given a URL, it will try to retrieve the web page from its “cache”.  So the actual contents of the page might be somewhat old, relative to when the web crawlers or Google did an update on the site  Example,
  • 20. Example Contd… #!/usr/local/bin/perl –w use SOAP::Lite; # Configuration $key = "Your Key Goes Here"; # Initialize with local SOAP::Lite file $service = SOAP::Lite -> service('file:GoogleSearch.wsdl'); $url="http://www.d.umn.edu"; $cachedPage=$service->doGetCachedPage($key,$url);
  • 21. How do I retrieve the results?  This is going to be the same as the spelling suggestion,  So if the web page does exist you will have the whole web page HTML in the “cachedWebpage” variable.  Otherwise, you would get a message from Google which says  “ This web page has not been updated…blah… blah…blah “
  • 22. Search with other options: Google has four topic restricts: Topic<restrict> value US. Government unclesam Linux linux Macintosh mac FreeBSD bsd
  • 23. Search with Restrictions: $result = $service -> doGoogleSearch( $key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "linux", # restrict (string) "false", # safeSearch: boolean "", # lr "", # ie "" # oe );
  • 24. Search with Language Restrictions $result = $service -> doGoogleSearch( $key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "", # restrict (string) "false", # safeSearch: boolean "lang_de", # lr "", # ie "" # oe ); print "n The search took "; print $result->{searchTime}; print "nn"; print "The estimated Number of results for your query is: "; print $result->{estimatedTotalResultsCount}; print "nn"; if(defined($result->{resultElements})) { print join "n", "Found:", $result->{resultElements}->[0]->{title},$result->{resultElements}->[0]->{URL}, $result->{resultElements}->[0]->{snippet} . "n" } lang_de = Gernman
  • 25. Search with Language Restrictions Contd… Please Enter Search Item der sturm The search took 0.309039 The estimated Number of results for your query is: 206000 Found: SK <b>STURM</b> GRAZ - Willkommen beim Sk <b>Sturm</b> http://www.sksturm.at/ Eintreten. Puntigamer das bierige Bier, Steiermark.com, Puma, Tipp3,<br> Autohaus Jakob Prügger, Graz - Hausmannstätten. © 2003 SkSturm <b>...</b>
  • 26. Tips on Querying Google Default Search By default, Google only returns pages that include all of the terms in the query string. Stop Words Google ignores common words and characters such as "where" and "how," as well as certain single digits and single letters. Common words that are ignored are known as stop words. However, you can prevent Google from ignoring stop words by enclosing them in quotes, such as in the phrase "to be or not to be". Special Characters By default, all non-alphanumeric characters that are included in a search query are treated as word separators. The only exceptions are the following: double quote mark ("), plus sign (+), minus sign or hyphen (-), and ampersand (&). The ampersand character (&) is treated as another character in the query term in which it is included, while the remaining exception characters correspond to search features listed in the section below. Special Query Terms Google supports the use of several special query terms that allow the user or search administrator to access additional capabilities of the Google search engine. (The same Explanations can be found in the API Reference Section in your Google API download)
  • 27. (The following Special Query table can be found in the API Reference Section in your Google API download)
  • 28. Special Query Capability Example Query Description Include Query Term Star Wars Episode +I If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. Exclude Query Term bass -music You can exclude a word from your search by putting a minus sign ("-") immediately in front of the term you want to exclude from the search results. Phrase Search "yellow pages" Search for complete phrases by enclosing them in quotation marks or connecting them with hyphens. Words marked in this way will appear together in all results exactly as entered. Note: You may need to use a "+" to force inclusion of common words in a phrase.
  • 29. Boolean OR Search vacation london OR paris Google search supports the Boolean "OR" operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms. Site Restricted Search admission site:www.stanford.edu If you know the specific web site you want to search but aren't sure where the information is located within that site, you can use Google to search only within a specific web site. Do this by entering your query followed by the string "site:" followed by the host name. Date Restricted Search Star Wars daterange:2452122- 2452234 If you want to limit your results to documents that were published within a specific date range, then you can use the "daterange: " query term to accomplish this. The "daterange:" query term must be in the following format: daterange:<start_date>-<end date>
  • 30. Title Search (term) intitle:Google search If you prepend "intitle:" to a query term, Google search restricts the results to documents containing that word in the title. Note there can be no space between the "intitle:" and the following word. Title Search (all) allintitle: Google search Starting a query with the term "allintitle:" restricts the results to those with all of the query words in the title. URL Search (term) inurl:Google search If you prepend "inurl:" to a query term, Google search restricts the results to documents containing that word in the result URL. Note there can be no space between the "inurl:" and the following word. To find multiple words in a result URL, use the "inurl:" operator for each word. Note: Putting "inurl:" in front of every word in your query is equivalent to putting "allinurl:" at the front of your query
  • 31. URL Search (all) allinurl: Google search Starting a query with the term "allinurl:" restricts the results to those with all of the query words in the result URL. Text Only Search (all) allintext: Google search Starting a query with the term "allintext:" restricts the results to those with all of the query words in only the body text, ignoring link, URL, and title matches. Links Only Search (all) allinlinks: Google search Starting a query with the term "allinlinks:" restricts the results to those with all of the query words in the URL links on the page. File Type Filtering Google filetype:doc OR filetype:pdf The query prefix "filetype:" filters the results returned to include only documents with the extension specified immediately after. Note there can be no space between "filetype:" and the specified extension.
  • 32. File Type Exclusion Google -filetype:doc - filetype:pdf The query prefix "-filetype:" filters the results to exclude documents with the extension specified immediately after. Note there can be no space between "-filetype:" and the specified extension. Web Document Info info:www.google.com  The query prefix "info:" returns a single result for the specified URL if it exists in the index. Back Links link:www.google.com  The query prefix "link:" lists web pages that have links to the specified web page. Note there can be no space between "link:" and the web page URL. Related Links related:www.google.com  The query prefix "related:" lists web pages that are similar to the specified web page. Note there can be no space between "link:" and the web page URL. Cached Results Page cache:www.google.com web The query prefix "cache:" returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between "cache:" and the web page URL. .
  • 33. Other Interesting Issues  Search for say “yahoo”, and look at the estimated number of results.  Wait for like a minute or so.  Search again for “yahoo” and look at the estimated number of results.  The result, 5 out of 10 times, will be different.
  • 34. Conclusion… The API can be used as means of retrieving “information” and “Text” from the web. Some interesting examples: http://www.googleduel.com/original.php http://douweosinga.com/projects/googlehacks http://www.researchbuzz.org/archives/001418.shtml http://cgi.sfu.ca/~gpeters/cgi-bin/pear/gender.php