SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
WebSearch Academy
Internet Librarian International

Surfacing the Deep Web
Arthur Weiss
Email: a.weiss@aware.co.uk / Twitter: @awareci
www.marketing-intelligence.co.uk
14 October 2013
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
Not everything can be
found with Google….
The ‘Invisible Web’ or ‘Deep
Web’ consists of web pages
and documents which are
not indexed by conventional
search engines or are poorly
or incompletely indexed.
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
5 Types of “Invisibility”

Not search
engine
optimised
so pages fail
to appear in
“simple”
searches

© AWARE 2013

Not indexed
by search
engines

Excluded
from search
index

Subscription
or
proprietary
content
Encrypted
or nonindexable
content

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

3
Know your tool kit

or

Standard Google
© AWARE 2013

Multiple approaches
& tools

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

4
What do I need to find?

What sort of needle? What sort of haystack?
http://www.morguefile.com/archive/display/21091
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

5
Why will the information be available?
Where will it be held
(Who will know it?)
Can I obtain it legally and ethically from
this source & if so, how?
If not, are there other sources or ways of
obtaining the information?
After obtaining the information are
any checks needed to verify it?
What is the information’s relationship to
other information?
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

6
Not everything is online or can be found!
•  Try to find:
  Original TV coverage of the storming of the
Bastille1
  A newspaper interview with Christopher
Columbus, following his return from
discovering America

  A recording of Abraham Lincoln delivering the
Gettysburg address
  A photo of Jesus in his crib (Question from a 9
year old: “Why didn’t anybody take photos
with their phones?”)

1 With

thanks to Karen Blakeman of RBA Information (rba.co.uk) for these examples

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
“Forty-two! Is that all you’ve got to show for
seven and a half million year’s work?”

“I checked it very thoroughly and that quite
definitely is the answer. I think the problem,
to be quite honest with you, is that you’ve
never actually known what the question is.”
Douglas Adams, “The Hitchhiker’s Guide to the Galaxy”

If your search approach is wrong, it doesn’t
matter which approach or tool you use, or how
you use it. Your results will be poor or wrong.
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
Before starting to search consider
sources for the subject / topic of interest…
Why is information likely to be available?
Consider also file-formats, and location of search terms

What search tool / approach is most likely to
access or index the information’s location (container)
Are there unique terms or jargon that lead to a specialist tool
e.g. Lung cancer (consumer) versus pulmonary carcinoma (medical)

Are there societies, organisations, people, or groups
that may have information? (Who/where else could have information?)
Would any of the relevant pages be in another language?
“cheap hotel in Dubai” OR “‫”ﻓﻨﺪق اﻗﺘﺼﺎدي ﻓﻲ دﺑﻲ‬
© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

9
Before starting to search: consider search
terms for the topic or subject of interest
Are there any synonyms or variant spellings?
Tyre or tire; Aluminum

Candy or sweet

Basle or Basel

Are there any other words likely to be in documents on the
topic?
Are any keywords part of a common phrase?
Are any keywords likely to be in irrelevant documents
that should be excluded from searches?
How might the information be written?
“I work for Xcompany” to search for
employees of Xcompany
© AWARE 2013

“X is better than” for comparisons

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

10
Research Planning

Information
Requirements

© AWARE 2013

Break down into
individual
questions that,
when answered,
will provide the
required
knowledge

Don’t start
searching
without
knowing what
you are looking
for, and why

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

11
An example research plan
Copy & fill in sheet for each key information question / topic

Research Topic

Research Questions (breakdown topic into answerable questions)

Sources

LINKEDIN
GOOGLE
SCHOLAR
NATIONAL
STATISTICS
© AWARE 2013

Search Approach /
Parameters

JOB TITLE, CURRENT
EMPLOYER, ETC.
AUTHOR NAME, TOPIC,
DATE, ETC.
SITE SEARCH ENGINE

Type of information
expected

Comments / Possible
problems

PEOPLE PROFILES

MAY NOT BE ACCURATE
OR IN-DATE
CITATIONS, ACADEMIC
DOESN T COVER
RESEARCH PAPERS .
EVERYTHING
CENSUS & DEMOGRAPHIC MAY BE OLD OR
DATA
INCOMPLETE

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

12
Types of “Invisibility”

Not search
engine
optimised
so pages fail
to appear in
“simple”
searches

© AWARE 2013

Not indexed
by search
engines

Excluded
from search
index

Subscription
or
proprietary
content
Encrypted
or nonindexable
content

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

13
Advanced Searching
•  Use advanced search operators and options e.g.
Filetype: / InTitle: / InUrl: / .. (numeric) and *
(wildcard)

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

14
Search Engines – not just Google

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
Types of “Invisibility”

Not search
engine
optimised
so pages fail
to appear in
“simple”
searches

© AWARE 2013

Not indexed
by search
engines

Excluded
from search
index

Subscription
or
proprietary
content
Encrypted
or nonindexable
content

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

16
Specialist Search / Deep Web Search

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

17
Search for Information “Containers”
•  Knowing a reason for the information to be
available can lead to an information source
  Who else would want this information?
  Search for topic + “Database”
e.g. Coffee database – first two results:

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

18
Case Examples – Economics by Country

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

19
Case Examples – Trade Statistics

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

20
Case Examples – Economic Indicators

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

21
Case Examples – Genealogy

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

22
Types of “Invisibility”

Not search
engine
optimised
so pages fail
to appear in
“simple”
searches

© AWARE 2013

Not indexed
by search
engines

Excluded
from search
index

Subscription
or
proprietary
content
Encrypted
or nonindexable
content

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

23
Proprietary sites / Blocked from Index
•  Register for password protected sites
•  Use site search or site map – if available
•  If Robots.txt file exists may be able to view the
hidden pages e.g. nytimes.com/robots.txt

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

24
Types of “Invisibility”

Not search
engine
optimised
so pages fail
to appear in
“simple”
searches

© AWARE 2013

Not indexed
by search
engines

Excluded
from search
index

Subscription
or
proprietary
content
Encrypted
or nonindexable
content

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

25
Content that can’t / won’t be indexed
•  Non-textual information e.g. multimedia /
audiovisual
  Bing has search operators that can find RSS feeds
(hasfeed:) and pages containing specific types of file
(e.g. mp3 files – contains:mp3)
  Search for related textual information e.g. descriptions,
or sources (e.g. artwork or film titles)

•  Encrypted information / .Onion sites
  Project Tor (torproject.org) and the TOR browser
Access encrypted sites via proxy servers

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

26
Searching TOR
•  On regular Google: fake passport site:onion.to

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

27
TOR / .Onion Sites

© AWARE 2013

Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

28
Any Questions?

Arthur Weiss is the managing director of AWARE - a UK based
consultancy specialising in marketing & competitive intelligence analysis.
Contact Details:
Web Sites:
www.marketing-intelligence.co.uk
E-mail: a.weiss@aware.co.uk
Twitter: @awareci
Telephone:
Fax:
© AWARE 2013

+44 20 8954 9121
+44 20 8954 2102
Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk

29

Mais conteúdo relacionado

Semelhante a Surfacing the deep web

Invisible Web - Te,Krislynn
Invisible Web - Te,KrislynnInvisible Web - Te,Krislynn
Invisible Web - Te,Krislynn
krislynnte
 
Data Stewardship for Government: Exploring the Process Beyond Data Governance
Data Stewardship for Government: Exploring the Process Beyond Data GovernanceData Stewardship for Government: Exploring the Process Beyond Data Governance
Data Stewardship for Government: Exploring the Process Beyond Data Governance
Precisely
 
Discover seo adelaide
Discover seo adelaideDiscover seo adelaide
Discover seo adelaide
masonaschke
 

Semelhante a Surfacing the deep web (20)

Are you plugged in
Are you plugged inAre you plugged in
Are you plugged in
 
Cool Tools for the Cloud Generation
Cool Tools for the Cloud GenerationCool Tools for the Cloud Generation
Cool Tools for the Cloud Generation
 
Ili unknown google
Ili unknown googleIli unknown google
Ili unknown google
 
Invisible Web - Te,Krislynn
Invisible Web - Te,KrislynnInvisible Web - Te,Krislynn
Invisible Web - Te,Krislynn
 
Moving to the media (2 slides per page)
Moving to the media (2 slides per page)Moving to the media (2 slides per page)
Moving to the media (2 slides per page)
 
Social media - Developments
Social media - DevelopmentsSocial media - Developments
Social media - Developments
 
02.Branding and identity
02.Branding and identity02.Branding and identity
02.Branding and identity
 
Ili twiter-full
Ili twiter-fullIli twiter-full
Ili twiter-full
 
04.Social media and PR
04.Social media and PR04.Social media and PR
04.Social media and PR
 
Skapa presentation for Carthage College
Skapa presentation for Carthage CollegeSkapa presentation for Carthage College
Skapa presentation for Carthage College
 
google.ppt
google.pptgoogle.ppt
google.ppt
 
GO curs X
GO curs XGO curs X
GO curs X
 
Skapa Carthage College 2015-01-19
Skapa Carthage College 2015-01-19Skapa Carthage College 2015-01-19
Skapa Carthage College 2015-01-19
 
Data Stewardship for Government: Exploring the Process Beyond Data Governance
Data Stewardship for Government: Exploring the Process Beyond Data GovernanceData Stewardship for Government: Exploring the Process Beyond Data Governance
Data Stewardship for Government: Exploring the Process Beyond Data Governance
 
Building a Business Website
Building a Business WebsiteBuilding a Business Website
Building a Business Website
 
Discover seo adelaide
Discover seo adelaideDiscover seo adelaide
Discover seo adelaide
 
Evaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media todayEvaluating the use of search engines and social Media today
Evaluating the use of search engines and social Media today
 
Working in Harmony - UI/UX- SEO
Working in Harmony - UI/UX- SEO Working in Harmony - UI/UX- SEO
Working in Harmony - UI/UX- SEO
 
Inbound Marketing for Manufacturing and Engineering Companies
Inbound Marketing for Manufacturing and Engineering CompaniesInbound Marketing for Manufacturing and Engineering Companies
Inbound Marketing for Manufacturing and Engineering Companies
 
Information update February 2018
Information update February 2018Information update February 2018
Information update February 2018
 

Último

Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in OmanMifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
instagramfab782445
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 

Último (20)

Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial Wings
 
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in OmanMifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Cracking the 'Career Pathing' Slideshare
Cracking the 'Career Pathing' SlideshareCracking the 'Career Pathing' Slideshare
Cracking the 'Career Pathing' Slideshare
 
BeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfBeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdf
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 

Surfacing the deep web

  • 1. WebSearch Academy Internet Librarian International Surfacing the Deep Web Arthur Weiss Email: a.weiss@aware.co.uk / Twitter: @awareci www.marketing-intelligence.co.uk 14 October 2013 © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
  • 2. Not everything can be found with Google…. The ‘Invisible Web’ or ‘Deep Web’ consists of web pages and documents which are not indexed by conventional search engines or are poorly or incompletely indexed. © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
  • 3. 5 Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 3
  • 4. Know your tool kit or Standard Google © AWARE 2013 Multiple approaches & tools Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 4
  • 5. What do I need to find? What sort of needle? What sort of haystack? http://www.morguefile.com/archive/display/21091 © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 5
  • 6. Why will the information be available? Where will it be held (Who will know it?) Can I obtain it legally and ethically from this source & if so, how? If not, are there other sources or ways of obtaining the information? After obtaining the information are any checks needed to verify it? What is the information’s relationship to other information? © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 6
  • 7. Not everything is online or can be found! •  Try to find:   Original TV coverage of the storming of the Bastille1   A newspaper interview with Christopher Columbus, following his return from discovering America   A recording of Abraham Lincoln delivering the Gettysburg address   A photo of Jesus in his crib (Question from a 9 year old: “Why didn’t anybody take photos with their phones?”) 1 With thanks to Karen Blakeman of RBA Information (rba.co.uk) for these examples © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
  • 8. “Forty-two! Is that all you’ve got to show for seven and a half million year’s work?” “I checked it very thoroughly and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.” Douglas Adams, “The Hitchhiker’s Guide to the Galaxy” If your search approach is wrong, it doesn’t matter which approach or tool you use, or how you use it. Your results will be poor or wrong. © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
  • 9. Before starting to search consider sources for the subject / topic of interest… Why is information likely to be available? Consider also file-formats, and location of search terms What search tool / approach is most likely to access or index the information’s location (container) Are there unique terms or jargon that lead to a specialist tool e.g. Lung cancer (consumer) versus pulmonary carcinoma (medical) Are there societies, organisations, people, or groups that may have information? (Who/where else could have information?) Would any of the relevant pages be in another language? “cheap hotel in Dubai” OR “‫”ﻓﻨﺪق اﻗﺘﺼﺎدي ﻓﻲ دﺑﻲ‬ © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 9
  • 10. Before starting to search: consider search terms for the topic or subject of interest Are there any synonyms or variant spellings? Tyre or tire; Aluminum Candy or sweet Basle or Basel Are there any other words likely to be in documents on the topic? Are any keywords part of a common phrase? Are any keywords likely to be in irrelevant documents that should be excluded from searches? How might the information be written? “I work for Xcompany” to search for employees of Xcompany © AWARE 2013 “X is better than” for comparisons Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 10
  • 11. Research Planning Information Requirements © AWARE 2013 Break down into individual questions that, when answered, will provide the required knowledge Don’t start searching without knowing what you are looking for, and why Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 11
  • 12. An example research plan Copy & fill in sheet for each key information question / topic Research Topic Research Questions (breakdown topic into answerable questions) Sources LINKEDIN GOOGLE SCHOLAR NATIONAL STATISTICS © AWARE 2013 Search Approach / Parameters JOB TITLE, CURRENT EMPLOYER, ETC. AUTHOR NAME, TOPIC, DATE, ETC. SITE SEARCH ENGINE Type of information expected Comments / Possible problems PEOPLE PROFILES MAY NOT BE ACCURATE OR IN-DATE CITATIONS, ACADEMIC DOESN T COVER RESEARCH PAPERS . EVERYTHING CENSUS & DEMOGRAPHIC MAY BE OLD OR DATA INCOMPLETE Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 12
  • 13. Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 13
  • 14. Advanced Searching •  Use advanced search operators and options e.g. Filetype: / InTitle: / InUrl: / .. (numeric) and * (wildcard) © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 14
  • 15. Search Engines – not just Google © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk
  • 16. Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 16
  • 17. Specialist Search / Deep Web Search © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 17
  • 18. Search for Information “Containers” •  Knowing a reason for the information to be available can lead to an information source   Who else would want this information?   Search for topic + “Database” e.g. Coffee database – first two results: © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 18
  • 19. Case Examples – Economics by Country © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 19
  • 20. Case Examples – Trade Statistics © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 20
  • 21. Case Examples – Economic Indicators © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 21
  • 22. Case Examples – Genealogy © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 22
  • 23. Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 23
  • 24. Proprietary sites / Blocked from Index •  Register for password protected sites •  Use site search or site map – if available •  If Robots.txt file exists may be able to view the hidden pages e.g. nytimes.com/robots.txt © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 24
  • 25. Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 25
  • 26. Content that can’t / won’t be indexed •  Non-textual information e.g. multimedia / audiovisual   Bing has search operators that can find RSS feeds (hasfeed:) and pages containing specific types of file (e.g. mp3 files – contains:mp3)   Search for related textual information e.g. descriptions, or sources (e.g. artwork or film titles) •  Encrypted information / .Onion sites   Project Tor (torproject.org) and the TOR browser Access encrypted sites via proxy servers © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 26
  • 27. Searching TOR •  On regular Google: fake passport site:onion.to © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 27
  • 28. TOR / .Onion Sites © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 28
  • 29. Any Questions? Arthur Weiss is the managing director of AWARE - a UK based consultancy specialising in marketing & competitive intelligence analysis. Contact Details: Web Sites: www.marketing-intelligence.co.uk E-mail: a.weiss@aware.co.uk Twitter: @awareci Telephone: Fax: © AWARE 2013 +44 20 8954 9121 +44 20 8954 2102 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 29