SlideShare a Scribd company logo
1 of 77
UB Utrecht                  HvA-MIC                     GO Opleidingen




      searching the internet
better with Google / Google not always best


                      Eric Sieverts
                         @sieverts

                                               CODARTS, 04-03-2013
agenda


    • searching the web
    • smart searching
    • google options
    • beyond google
    • beyond general web search


         for all links see: http://sieverts.pbworks.com/codarts


2
the general
                        agenda               importance
    web                                       of specific
    ?=?                                        material
 everything                                     types?
              general             specific
               web                material
              search              search       how to …
how to …




                          when
                          & why
an ever changing google landscape




            •   unreliable numbers
            •   irreproducible results
            •   disappearing functions
            •   changing interfaces

4
5
building block approach

    systematic searching in structured information systems (like JStor etc.)
      start analytically with so-called building block approach
      e.g.: subject "modern american composers"
         – it breaks up in 3 facets
         – collect keywords for each facet
         – combine keywords with OR and AND operators

              modern              american             composers
       modern                american              composer
       contemporary          america               composers
       20th century OR       usa           OR      songwriters    OR
       twentieth century     united states         …
       …                     …

6
                          AND                   AND
building block approach

             modern             american           composers
        modern              american           composer
        contemporary        america            composers
        20th century OR     usa           OR   songwriters   OR
        twentieth century   united states      …
        …                   …

                        AND                AND
    it makes a query:
    (modern OR contemporary OR "twentieth century" OR "20th
    century")
       AND (america OR american OR usa OR "united states")
       AND (composer OR composers OR songwriter OR songwriters)
7
building block approach

    also with Google ?
    web search engines are not specifically designed for such structured
    queries, but it is possible to do


    Google and Yahoo make it even easier, since you may omit parentheses
    and the AND-operator (since it is default) :
                                                                 implied
                                                                  AND

    modern OR contemporary OR "twentieth century" OR "20th century" america
    OR american OR usa OR "united states" composer OR composers OR
    songwriter OR songwriters
                                       implied
                                        AND

8
relevance ranking (1)

    Google (and other web search engines) are primarily
    focused on presenting search results in order of relevance
    how do they know what is relevant?
     – they interpret the importance of words for the subject matter of
       the retrieved documents
       (your search terms present in title, url, headings, ... ?)
         • you can enhance importance of a certain term for your
           query by repeating that word a couple of times
     – they estimate the importance of the relation between words in
       the retrieved documents: whether ..
        • your search words occur close together
        • your search words occur in same order as you entered them
9
          >> formulate your query like you expect it formulated
word order matters
relevance ranking (2)

     Google (and other web search engines) are primarily
     focused on presenting search results in order of relevance
     how do they know what is relevant?
      – importance or quality of retrieved web pages is deduced from
        the number and the importance of links from other sites
        (for each site a pagerank is calculated)
      – importance of retrieved web pages for your personal interest is
        deduced on basis of your previous search and browse behaviour,
        which is monitored whenever you're logged in

     since every search engine uses somewhat different algorithms for its
     relevance calculations (and their coverage is different as well) there
     tends to be little overlap between top 10 results form different engines
11
search terms

     use of proper search terms is crucial for search success
     think of :
      –   singular / plural , verbs / nouns / adjectives , conjugations , ...
      –   spelling variations (behavior / behaviour)
      –   compound terms (writer / songwriter)
      –   synonyms, acronyms (compact disc / compact disk / cd / digital disc)

     how would the answer to my question be formulated in a
     relevant document? "think as if being a document"
      –   the right terms
      –   as an "exact phrase" or in most probable word order
      –   use wildcard for variable words ("modern * * composers")
      –   use known examples from a list to be found
      –   use of popular <> scientific terms etc.
13
refining searches

 if results are too broad, too diverse
  – add another essential term or set of terms to your query
  – see what your search engine suggests
    while you enter your query




   – exclude unwanted term with NOT (francis bacon NOT philosopher)
     NB: Google does not understand NOT ; use minus-sign instead:
14                                     francis bacon -philosopher
nice interactive infographic "how search works"
     http://www.google.com/insidesearch/howsearchworks/thestory/
15
is Google outsmarting us ?
     Google tries to improve and to broaden your queries
     •   automatic spelling corrections (veilgheid >> veiligheid)
     •   automatic search for words with same word stem (singular/plural,
         verb, conjugation, inflection, …)
     •   expands acronyms (jfk >> john f kennedy | wwii >> world war II)
     •   adds some synonyms (vaccination >> immunization)
     •   transforms separate words to compound term & vice versa
         (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food)
     •   may leave out term as optional if not differentiating enough

     more often what/when or notEnglish than in Dutch
     never sure and elaborate in
     • personalisation based on previous search behaviour

     but what, if you don't like all of this ........
16
                                                            >> "verbatim"
d
    searche
   only    literally
                     t
   f or t he exac
                   u
      w ords yo
        entered

  on google.nl:
"woord voor woord"
some more "how to"


     • domain search: site:edu OR site:edu.* [for all edu (sub)domains]
                          site:shell.com OR site:philips.com
     • url search:        inurl:novelty
     • title search:      intitle:catalytic

                     just
     • filetype search: filetype:pdf
                          filetype:xls OR filetype:xlsx
                          filetype:doc OR filetype:docx
                                                            more than shown in
                                                             advanced search
                                                             drop-down menu
                          filetype:rss
     • exact search:      "greenhouses“       [or VERBATIM for all words]



20
advanced search

     Google is hiding its advanced search screen :
     you must perform a simple search
     first, to get the "cog wheel"




21
some more "how to"

     some of this can be done from the advanced search screen
     but regular search box offers greater flexibility,
     once you know the syntax
     • domain search: [in combination with real search terms]
                         site:codarts.nl
                         site:edu OR site:edu.* [for all edu (sub)domains]
                         site:last.fm OR site:spotify.com
     • url search:       inurl:course
     • title search:     intitle:guitar



22
some more "how to" (2)

     • filetype search:    filetype:pdf
                           filetype:xls OR filetype:xlsx     more types than shown
                                                              in advanced search
                           filetype:doc OR filetype:docx
                                                                drop-down menu
                           filetype:rss
     • numeric search: 10..20              [includes all values in between]
                           $10..$20        [not for other currencies]
     • punctuation:        &, %, dot, ...          [can be searched]
                           €, /, ", comma, ...     [is ignored]
     • exact search:       "greenhouses“         [or VERBATIM for all words]
     • synonym search: ~guitar
     • time limitations:   [after search, hidden in top menu]

23
synonym
 search
date
limitations
26
who searches for “Bach” is probably more interested
       in data about him, than in websites about him; and
       most probably in "J.S." instead of one of his relatives




Google's "Knowledge Graph"
knows 500 million objects
with 3,5 billion properties and
even more mutual relations
(but only in English)
it also interprets the intention of your query (sometimes ;-)




28
general
         search engines besides google
 • Bing         microsoft, large
 • Yahoo!       content=Bing, large
 • Blekko       uses hashtags to search more [domain-] selective
                also many predefined hashtags; e.g. /likes for Facebook
 • DuckDuckGo assures privacy, no personalisation, no filter-bubble,
                rather small, !Bang-function offers many extras
 • Gigablast    green search engine, rather small, some unique functions
 • Exalead      french, many advanced functions, primarily demo system
 • Millionshort leaves out results from most popular sites → the long tail
 • WolframAlpha knowledge engine, facts, calculations
 together, these others have 30% market share in US; in NL only 3%
 •   Yandex        in Russia more popular than Google
 •   Baidu         in China more popular than Google
 •   Naver, Daum   in South Korea more popular than Google
 •   Seznam        in Czechia more popular than Google
30
material type specific search
     science   google scholar, microsoft academic, scirus,
               oaister, scientific commons, science.gov
     reference wikipedia, quora, wolfram|alpha, answers.com
     news     google news, yahoo news, bing news, cnn, bbc
     old news way-back-machine, historische kranten KB
     images google image, yahoo image, bing image, flickr,
                tineye (ip-check), panoramio (geo-search)
     video      google video, youtube, youtube edu channel,
                bing video, blinkx, voxalead-news
     tweets     twitter search, topsy, postpost, snapbird
     social     socialsearcher, socialmention, whostalkin, kurrently
     forums     google groups, omgili, boardtracker
     blogs      google blogs, icerocket, [rss] CTRLQ, RSS SearchHub
31
scientific search

     books
       –   Google Books (full text search)
       –   Hathitrust Digital Library (open book scan project / part of G-books)
       –   Librarything (catalog of 58.000.000 books from 1.000.000 owners)
       –   GoodReads (reviews, recommandation, friends, ...)
       –   Open Textbook Catalog (open access leerboeken)

     journal articles
       –   licensed databases (like JStor, ...)
       –   Google Scholar (articles, dissertations, reports, ...)
       –   sEURch / UvA-library ("discovery" systems of EUR / UvA)
       –   Scirus / SciVerse (journal articles -Elsevier- , database content, webpages)
       –   Magportal (also -English- popular magazines)
       –   DeepDyve (scientific articles "for rent" - for 24 hours)

32
Google Books

     •   all pages scanned and full-text searchable
     •   important to discover specific subjects/terms - not primary book topic
     •   often limitations on display and browsability
         (no preview / snippet view / limited preview / full preview)
     •   content from publishers and large libraries
     •   problems with viewing copyrighted material also from libraries
     •   build your personal ‘My Library’
     •   NL-books not only from Gent University (and soon KB), also from
         US/UK
     •   also some ‘magazines’
     •   metadata on about-this-book-page


33
Google Scholar

     •   > 100 million scientific publications (most articles)
     •   differences between availability (and hence searchability) of
         full-text (majority), bibliographic-only, and citation data
     •   competitor of Web of Science, Scopus, Scirus, ...
     •   indexing many selected -even licensed- sources (publishers,
         abstract-databases, university sites, institutional repositories, ...)
     •   includes numbers of citations! [and links to them]
     •   number of citations important factor for relevance ranking
         (!! reason why recent publications get low rankings)
     •   advanced search limited, many mistakes in metadata (authors etc.)
     •   accessibility of full-text often a problem because of licences
     •   often many versions of same article (including sometimes free ones)
     •   coupling with library subscriptions to allow smoother linking
     •   no info about sources, updates etc.
37
open access




            if this article is interesting,
            these 23 more recent ones probably also




  ## of
citations
                                                      subscription
                                                      univ. utrecht
facts and reference

     encyclopedias
       – wikipedia
       – internet movie database
       – ...
     Q&A (human powered)
       – Quora
       – Yahoo-answers
     direct answers, facts and calculations
       – Wolfram|Alpha
     dictionaries, translations
       –   answers.com (metasearch)
       –   Roget thesaurus
       –   Bartleby
       –   Google Translate
       –   Google Translated search           >
       –   Synoniemen.net (dutch)
41
wikipedia

     •   >250 languages
     •   “wisdom of the crowds” ?=? “wisdom” for all topics?
     •   quite good for “factual” topics
     •   many detailed specific topics (>20 million lemmas, >1 million NL)
     •   there are policies & guidelines
         & management: stewards, administrators
     •   for searching the wikipedia use Google rather than internal search
         limit to:               site:wikipedia.org
         gives more complete results
         and searches directly in all language versions together




42
google's
"translated search"
is now almost hidden
translates original query
(here in english)
into chosen languages
and translates results
back into english
... and pages selected
from the result list are
translated in English too
old stuff : web & news

     •   web archive
          – "way-back machine": old versions of websites, back to 1996
            access thru the -original- url, NO search
            internal site links will mostly work
          – also other archived materials (a.o. music)
     •   historical Dutch newspapers
           – historische kranten KB (1618-1995 ; full-text search)
     •   historical international newspapers
           – British newspapers 1800-1900
           – historic American newspapers
           – international overview



50
… and the very oldest one from february 1998:




53
twitter & social search

     twitter search (often limited to messages from past 1 - 2 weeks only)
           – twitter (also advanced search)
           – topsy (best one at the moment, also older messages)
           – postpost (search your own timeline - everything you're following)
           – snapbird (search thru all tweets of particular person -
                        you have to know twittername)
     real time / social search
           – socialsearcher (facebook | twitter | g+ : side by side)
           – socialmention (also weblogs)
           – samepoint, whostalkin, kurrently, … (also weblogs)
     forum discussions
         – omgili, boardtracker, ...
         – Google groups

54
55
56
57
58
multimedia search / images

     mostly search by keywords
       – Google-image (simple image recognition)
       – Yahoo-image (also pictures from Flickr)
       – Bing-image
       – Flickr (photo upload-site; search on user tags;
                      filter on “Creative Commons” material)
       – photographs on twitter (twicsy, picfog, topsy, skylines.io, …)
       – special sites (beeldbank nationaal archief, wikimedia commons, ...)

     special techniques:
       – geographical (panoramio [google-maps], worldc.am [instagram], ...)
       – Google (search by example)
       – Tineye (search for -almost- exact copies; a.o. copyright infringed?)

62
63
image search

     Content based image retrieval (CBIR)
     •   search on colors
          – examples: Tineye, Chromatik, Picitup, Google, ...




64
image search

 Content based image retrieval
 • search by example

     – draw it yourself
       Retrievr, ...

     – existing image
       Google (visually similar)
       Tineye (almost exact copies)
       Retrievr, ...
       example found on the web or
       uploaded from your own computer



65
example




67
google looks for most probable
keywords to describe this image
and in the search box combines
them already with the image




           ... and how about these
           "visually similar images" ?
photoshopped
advertisement,
but what's the
  original ?
multimedia search / video
     (mostly) uploaded material
      – YouTube (growth: 70 hours/minute ; also many "how to" video's)
        also: YouTube-channels / YouTube-education / YouTube-teachers /
        YouTube-movies / YouTube-shows / …
      – Vimeo

     (mostly) broadcasted material
      – Blinkx (35 million hours video, speech recognition?)
      – VoxaleadNews (speech recognition in several languages - also NL!
        hence "full-text" search on spoken words)
      – Bing-video (not easy to find from European home page)
      – Google-video (also videos from YouTube; metadata search only)
      – Dutch TV-programs:
          • Uitzending gemist (limited search functionality)
          • Beeld & Geluid (metadata search; use “uitgebreid zoeken”)
          • Academia (selection from Beeld & Geluid for higher education)
74
?
the end
     any questions?




77

More Related Content

What's hot (7)

Name That Graph !
Name That Graph !Name That Graph !
Name That Graph !
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Pick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research toolPick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research tool
 
Web Search Alert 2006
Web Search Alert 2006Web Search Alert 2006
Web Search Alert 2006
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."
 
Research 2 0
Research 2 0Research 2 0
Research 2 0
 
Queen Mary MA Performance Induction
Queen Mary MA Performance InductionQueen Mary MA Performance Induction
Queen Mary MA Performance Induction
 

Viewers also liked

Models of Information Searching
Models of Information SearchingModels of Information Searching
Models of Information Searching
Johan Koren
 
Information Search Skills
Information Search SkillsInformation Search Skills
Information Search Skills
wendy0315
 

Viewers also liked (11)

Models of Information Searching
Models of Information SearchingModels of Information Searching
Models of Information Searching
 
CT231: Research & search skills
CT231: Research & search skillsCT231: Research & search skills
CT231: Research & search skills
 
The 8-Fold Path to Web Searching Power
The 8-Fold Path to Web Searching PowerThe 8-Fold Path to Web Searching Power
The 8-Fold Path to Web Searching Power
 
Blossom591 interactivepresentation
Blossom591 interactivepresentationBlossom591 interactivepresentation
Blossom591 interactivepresentation
 
20110521 eightfold path and meditation2
20110521 eightfold path and meditation220110521 eightfold path and meditation2
20110521 eightfold path and meditation2
 
Information Searching Skills
Information Searching SkillsInformation Searching Skills
Information Searching Skills
 
Information Search Skills
Information Search SkillsInformation Search Skills
Information Search Skills
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
Gathering information and Scanning the environment
Gathering information and Scanning the environmentGathering information and Scanning the environment
Gathering information and Scanning the environment
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Effective web search techniques
Effective web search techniquesEffective web search techniques
Effective web search techniques
 

Similar to Searching the internet - better with Google / Google not always best

Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
Eric Sieverts
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)
Brenda Crawford
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
Gambari Amosa Isiaka
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011
cyberspaced educator
 
Glider Research Intro
Glider Research IntroGlider Research Intro
Glider Research Intro
smkitsis
 

Similar to Searching the internet - better with Google / Google not always best (20)

Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014
 
Advance searching techniques
Advance searching techniquesAdvance searching techniques
Advance searching techniques
 
GoogleSmart
GoogleSmartGoogleSmart
GoogleSmart
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011
 
Basics of Web Research for ELA 10
Basics of Web Research for ELA 10Basics of Web Research for ELA 10
Basics of Web Research for ELA 10
 
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-1910 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
Google Search
Google SearchGoogle Search
Google Search
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Glider Research Intro
Glider Research IntroGlider Research Intro
Glider Research Intro
 
Google Magic2
Google Magic2Google Magic2
Google Magic2
 

More from Eric Sieverts

Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Eric Sieverts
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Eric Sieverts
 
UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0
Eric Sieverts
 

More from Eric Sieverts (20)

Automatische classificatie
Automatische classificatieAutomatische classificatie
Automatische classificatie
 
Een andere blik op Google
Een andere blik op GoogleEen andere blik op Google
Een andere blik op Google
 
Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023
 
Zoekmachines weten het antwoord
Zoekmachines weten het antwoordZoekmachines weten het antwoord
Zoekmachines weten het antwoord
 
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurVertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
 
Semantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingSemantisch zoeken in een webomgeving
Semantisch zoeken in een webomgeving
 
Information Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityInformation Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodity
 
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
 
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
 
Zin en onzin van metadata
Zin en onzin van metadataZin en onzin van metadata
Zin en onzin van metadata
 
40 jaar informatiegebruik
40 jaar informatiegebruik40 jaar informatiegebruik
40 jaar informatiegebruik
 
UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?
 
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataMetadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
 
Searchtrends
SearchtrendsSearchtrends
Searchtrends
 
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingA pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
 
Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
 
Vinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataVinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadata
 
UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 

Searching the internet - better with Google / Google not always best

  • 1. UB Utrecht HvA-MIC GO Opleidingen searching the internet better with Google / Google not always best Eric Sieverts @sieverts CODARTS, 04-03-2013
  • 2. agenda • searching the web • smart searching • google options • beyond google • beyond general web search for all links see: http://sieverts.pbworks.com/codarts 2
  • 3. the general agenda importance web of specific ?=? material everything types? general specific web material search search how to … how to … when & why
  • 4. an ever changing google landscape • unreliable numbers • irreproducible results • disappearing functions • changing interfaces 4
  • 5. 5
  • 6. building block approach systematic searching in structured information systems (like JStor etc.) start analytically with so-called building block approach e.g.: subject "modern american composers" – it breaks up in 3 facets – collect keywords for each facet – combine keywords with OR and AND operators modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … 6 AND AND
  • 7. building block approach modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … AND AND it makes a query: (modern OR contemporary OR "twentieth century" OR "20th century") AND (america OR american OR usa OR "united states") AND (composer OR composers OR songwriter OR songwriters) 7
  • 8. building block approach also with Google ? web search engines are not specifically designed for such structured queries, but it is possible to do Google and Yahoo make it even easier, since you may omit parentheses and the AND-operator (since it is default) : implied AND modern OR contemporary OR "twentieth century" OR "20th century" america OR american OR usa OR "united states" composer OR composers OR songwriter OR songwriters implied AND 8
  • 9. relevance ranking (1) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – they interpret the importance of words for the subject matter of the retrieved documents (your search terms present in title, url, headings, ... ?) • you can enhance importance of a certain term for your query by repeating that word a couple of times – they estimate the importance of the relation between words in the retrieved documents: whether .. • your search words occur close together • your search words occur in same order as you entered them 9 >> formulate your query like you expect it formulated
  • 11. relevance ranking (2) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – importance or quality of retrieved web pages is deduced from the number and the importance of links from other sites (for each site a pagerank is calculated) – importance of retrieved web pages for your personal interest is deduced on basis of your previous search and browse behaviour, which is monitored whenever you're logged in since every search engine uses somewhat different algorithms for its relevance calculations (and their coverage is different as well) there tends to be little overlap between top 10 results form different engines 11
  • 12.
  • 13. search terms use of proper search terms is crucial for search success think of : – singular / plural , verbs / nouns / adjectives , conjugations , ... – spelling variations (behavior / behaviour) – compound terms (writer / songwriter) – synonyms, acronyms (compact disc / compact disk / cd / digital disc) how would the answer to my question be formulated in a relevant document? "think as if being a document" – the right terms – as an "exact phrase" or in most probable word order – use wildcard for variable words ("modern * * composers") – use known examples from a list to be found – use of popular <> scientific terms etc. 13
  • 14. refining searches if results are too broad, too diverse – add another essential term or set of terms to your query – see what your search engine suggests while you enter your query – exclude unwanted term with NOT (francis bacon NOT philosopher) NB: Google does not understand NOT ; use minus-sign instead: 14 francis bacon -philosopher
  • 15. nice interactive infographic "how search works" http://www.google.com/insidesearch/howsearchworks/thestory/ 15
  • 16. is Google outsmarting us ? Google tries to improve and to broaden your queries • automatic spelling corrections (veilgheid >> veiligheid) • automatic search for words with same word stem (singular/plural, verb, conjugation, inflection, …) • expands acronyms (jfk >> john f kennedy | wwii >> world war II) • adds some synonyms (vaccination >> immunization) • transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food) • may leave out term as optional if not differentiating enough more often what/when or notEnglish than in Dutch never sure and elaborate in • personalisation based on previous search behaviour but what, if you don't like all of this ........ 16 >> "verbatim"
  • 17.
  • 18.
  • 19. d searche only literally t f or t he exac u w ords yo entered on google.nl: "woord voor woord"
  • 20. some more "how to" • domain search: site:edu OR site:edu.* [for all edu (sub)domains] site:shell.com OR site:philips.com • url search: inurl:novelty • title search: intitle:catalytic just • filetype search: filetype:pdf filetype:xls OR filetype:xlsx filetype:doc OR filetype:docx more than shown in advanced search drop-down menu filetype:rss • exact search: "greenhouses“ [or VERBATIM for all words] 20
  • 21. advanced search Google is hiding its advanced search screen : you must perform a simple search first, to get the "cog wheel" 21
  • 22. some more "how to" some of this can be done from the advanced search screen but regular search box offers greater flexibility, once you know the syntax • domain search: [in combination with real search terms] site:codarts.nl site:edu OR site:edu.* [for all edu (sub)domains] site:last.fm OR site:spotify.com • url search: inurl:course • title search: intitle:guitar 22
  • 23. some more "how to" (2) • filetype search: filetype:pdf filetype:xls OR filetype:xlsx more types than shown in advanced search filetype:doc OR filetype:docx drop-down menu filetype:rss • numeric search: 10..20 [includes all values in between] $10..$20 [not for other currencies] • punctuation: &, %, dot, ... [can be searched] €, /, ", comma, ... [is ignored] • exact search: "greenhouses“ [or VERBATIM for all words] • synonym search: ~guitar • time limitations: [after search, hidden in top menu] 23
  • 26. 26
  • 27. who searches for “Bach” is probably more interested in data about him, than in websites about him; and most probably in "J.S." instead of one of his relatives Google's "Knowledge Graph" knows 500 million objects with 3,5 billion properties and even more mutual relations (but only in English)
  • 28. it also interprets the intention of your query (sometimes ;-) 28
  • 29.
  • 30. general search engines besides google • Bing microsoft, large • Yahoo! content=Bing, large • Blekko uses hashtags to search more [domain-] selective also many predefined hashtags; e.g. /likes for Facebook • DuckDuckGo assures privacy, no personalisation, no filter-bubble, rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions • Exalead french, many advanced functions, primarily demo system • Millionshort leaves out results from most popular sites → the long tail • WolframAlpha knowledge engine, facts, calculations together, these others have 30% market share in US; in NL only 3% • Yandex in Russia more popular than Google • Baidu in China more popular than Google • Naver, Daum in South Korea more popular than Google • Seznam in Czechia more popular than Google 30
  • 31. material type specific search science google scholar, microsoft academic, scirus, oaister, scientific commons, science.gov reference wikipedia, quora, wolfram|alpha, answers.com news google news, yahoo news, bing news, cnn, bbc old news way-back-machine, historische kranten KB images google image, yahoo image, bing image, flickr, tineye (ip-check), panoramio (geo-search) video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news tweets twitter search, topsy, postpost, snapbird social socialsearcher, socialmention, whostalkin, kurrently forums google groups, omgili, boardtracker blogs google blogs, icerocket, [rss] CTRLQ, RSS SearchHub 31
  • 32. scientific search books – Google Books (full text search) – Hathitrust Digital Library (open book scan project / part of G-books) – Librarything (catalog of 58.000.000 books from 1.000.000 owners) – GoodReads (reviews, recommandation, friends, ...) – Open Textbook Catalog (open access leerboeken) journal articles – licensed databases (like JStor, ...) – Google Scholar (articles, dissertations, reports, ...) – sEURch / UvA-library ("discovery" systems of EUR / UvA) – Scirus / SciVerse (journal articles -Elsevier- , database content, webpages) – Magportal (also -English- popular magazines) – DeepDyve (scientific articles "for rent" - for 24 hours) 32
  • 33. Google Books • all pages scanned and full-text searchable • important to discover specific subjects/terms - not primary book topic • often limitations on display and browsability (no preview / snippet view / limited preview / full preview) • content from publishers and large libraries • problems with viewing copyrighted material also from libraries • build your personal ‘My Library’ • NL-books not only from Gent University (and soon KB), also from US/UK • also some ‘magazines’ • metadata on about-this-book-page 33
  • 34.
  • 35.
  • 36.
  • 37. Google Scholar • > 100 million scientific publications (most articles) • differences between availability (and hence searchability) of full-text (majority), bibliographic-only, and citation data • competitor of Web of Science, Scopus, Scirus, ... • indexing many selected -even licensed- sources (publishers, abstract-databases, university sites, institutional repositories, ...) • includes numbers of citations! [and links to them] • number of citations important factor for relevance ranking (!! reason why recent publications get low rankings) • advanced search limited, many mistakes in metadata (authors etc.) • accessibility of full-text often a problem because of licences • often many versions of same article (including sometimes free ones) • coupling with library subscriptions to allow smoother linking • no info about sources, updates etc. 37
  • 38. open access if this article is interesting, these 23 more recent ones probably also ## of citations subscription univ. utrecht
  • 39.
  • 40.
  • 41. facts and reference encyclopedias – wikipedia – internet movie database – ... Q&A (human powered) – Quora – Yahoo-answers direct answers, facts and calculations – Wolfram|Alpha dictionaries, translations – answers.com (metasearch) – Roget thesaurus – Bartleby – Google Translate – Google Translated search > – Synoniemen.net (dutch) 41
  • 42. wikipedia • >250 languages • “wisdom of the crowds” ?=? “wisdom” for all topics? • quite good for “factual” topics • many detailed specific topics (>20 million lemmas, >1 million NL) • there are policies & guidelines & management: stewards, administrators • for searching the wikipedia use Google rather than internal search limit to: site:wikipedia.org gives more complete results and searches directly in all language versions together 42
  • 43.
  • 44.
  • 46. translates original query (here in english) into chosen languages and translates results back into english
  • 47. ... and pages selected from the result list are translated in English too
  • 48.
  • 49.
  • 50. old stuff : web & news • web archive – "way-back machine": old versions of websites, back to 1996 access thru the -original- url, NO search internal site links will mostly work – also other archived materials (a.o. music) • historical Dutch newspapers – historische kranten KB (1618-1995 ; full-text search) • historical international newspapers – British newspapers 1800-1900 – historic American newspapers – international overview 50
  • 51.
  • 52.
  • 53. … and the very oldest one from february 1998: 53
  • 54. twitter & social search twitter search (often limited to messages from past 1 - 2 weeks only) – twitter (also advanced search) – topsy (best one at the moment, also older messages) – postpost (search your own timeline - everything you're following) – snapbird (search thru all tweets of particular person - you have to know twittername) real time / social search – socialsearcher (facebook | twitter | g+ : side by side) – socialmention (also weblogs) – samepoint, whostalkin, kurrently, … (also weblogs) forum discussions – omgili, boardtracker, ... – Google groups 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 59.
  • 60.
  • 61.
  • 62. multimedia search / images mostly search by keywords – Google-image (simple image recognition) – Yahoo-image (also pictures from Flickr) – Bing-image – Flickr (photo upload-site; search on user tags; filter on “Creative Commons” material) – photographs on twitter (twicsy, picfog, topsy, skylines.io, …) – special sites (beeldbank nationaal archief, wikimedia commons, ...) special techniques: – geographical (panoramio [google-maps], worldc.am [instagram], ...) – Google (search by example) – Tineye (search for -almost- exact copies; a.o. copyright infringed?) 62
  • 63. 63
  • 64. image search Content based image retrieval (CBIR) • search on colors – examples: Tineye, Chromatik, Picitup, Google, ... 64
  • 65. image search Content based image retrieval • search by example – draw it yourself Retrievr, ... – existing image Google (visually similar) Tineye (almost exact copies) Retrievr, ... example found on the web or uploaded from your own computer 65
  • 66.
  • 68. google looks for most probable keywords to describe this image and in the search box combines them already with the image ... and how about these "visually similar images" ?
  • 69.
  • 70.
  • 72.
  • 73.
  • 74. multimedia search / video (mostly) uploaded material – YouTube (growth: 70 hours/minute ; also many "how to" video's) also: YouTube-channels / YouTube-education / YouTube-teachers / YouTube-movies / YouTube-shows / … – Vimeo (mostly) broadcasted material – Blinkx (35 million hours video, speech recognition?) – VoxaleadNews (speech recognition in several languages - also NL! hence "full-text" search on spoken words) – Bing-video (not easy to find from European home page) – Google-video (also videos from YouTube; metadata search only) – Dutch TV-programs: • Uitzending gemist (limited search functionality) • Beeld & Geluid (metadata search; use “uitgebreid zoeken”) • Academia (selection from Beeld & Geluid for higher education) 74
  • 75.
  • 76. ?
  • 77. the end any questions? 77

Editor's Notes

  1. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  2. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  3. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  4. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  5. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  6. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  7. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie