SlideShare uma empresa Scribd logo
1 de 187
Baixar para ler offline
1
               Zo veel informatie
               Zo weinig tijd

Paul.Nieuwenhuysen@vub.ac.be
Created to support a presentation
 at the bi-annual 2-day conference series “Informatie”
 organised by VVBAD, in Oostende, Belgium
 September 10-11, 2009
 “Informatie aan zee”
2
               0. Introduction
                  with problem statements
  contents
               1. Methods to make
= summary         information retrieval
= structure       efficient in a world of
                  scattered sources
= overview
               2. Applications of those
                  methods
of this
presentation   3. Comparison of the
                  methods

               4. Conclusions
3
These slides should be available from the WWW site
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
and also from the WWW site of the organisers of the conference =
  VVBAD
4




Information Retrieval in a World
of Scattered Information Sources
            0. Introduction
        and problem statements
5

                 Introduction:
              scattering of sources

• Users want to exploit information sources fast and
  effectively.
• This is hindered by the fact that digital, electronic
  information sources that may contain relevant
  information are created and scattered, distributed on
  numerous computers all over the intranet of the user’s
  organization AND over the Internet and the WWW.
6

Introduction:
scattering of sources

  • In other words:
    integration / aggregation
    is still far from perfect.
7

               Introduction:
    scattering of sources difficulties

• Using many information retrieval systems costs time:
  1. They must be used one after the other which requires
   many decisions and actions
8

               Introduction:
    scattering of sources difficulties

• Using many information retrieval systems costs time:
  2. They offer different user interfaces in the retrieval phase,
   which is confusing
9

               Introduction:
    scattering of sources difficulties

• Using many information retrieval systems costs time:
  3. They offer found information items in various data
   formats
10

               Introduction:
    scattering of sources difficulties

• Using many information retrieval systems costs time:
  4. They display found items in different ways on a computer
   screen
11

           Introduction:
scattering of sources difficulties



 Small = BEAUTIFUL
12

           Introduction:
scattering of sources difficulties
13

   Introduction:
problem statements




         1. Which methods have been
             developed and applied to
             cope with this reality?
14

   Introduction:
problem statements




         2. Which concrete
             applications are available
             and how can an end-user
             exploit systems created in
             this domain?
15

   Introduction:
problem statements




         3. How can information
             intermediaries evaluate and
             apply these methods to
             bring information more
             efficiently to end-users?
16




Information Retrieval in a World
of Scattered Information Sources
                 1. Methods
   to make information retrieval efficient
       in a world of scattered sources
17

Method 1: Merging = aggregating
   into a searchable database
   User
    User                   User
                            User




           Search engine           Aggregated database




                    Database         Database         Database       D
                   or web site      or web site      or web site    or
                       or…              or…              or…
18

     Method 2: Federated searching
      through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
19

                 Both methods
           offer benefits to the users

+ Saves the users time executing queries to various servers
  or browsing through various systems.




                          ☺
20

                 Both methods
           offer benefits to the users

+ Offers a uniform / consistent display of results in the
  output phase.




                           ☺
21

                 Both methods
           offer benefits to the users

+ Some systems offer tools to refine display of the results;
  for instance
  + to deduplicate very similar items in the result set,
  + to sort the results,
  + to rank the results,
  + to visualize the results in a more graphical way,
  + to search within the result set,
  +…
                            ☺
22

            Both methods bring
     difficulties / challenges / problems

- In many cases there are differences among the merged
  sources in the formatting/structuring of their database
  records in fields.
  This hinders
  - searching limited to a field
  - displaying selected fields only (such as title)
  - sorting of the displayed records on the contents of a
    particular selected field (such as author or date)
23

           Both methods bring
    difficulties / challenges / problems

- In many cases there are differences among sources in the
  metadata schemes that are applied in the databases to
  improve retrieval, such as
  »classifications
  »taxonomies
  »thesaurus systems
  »ontologies
 This hinders the exploitation of the added value of such
 metadata.
24

            Both methods bring
    difficulties / challenges / problems

- How to deduplicate/dedupe/cluster
   very similar entries/results/items
  = near-duplicates,
  from various target sources?
  When is similar similar enough?
  Which entry/result/item to choose/select
  as the representative of a cluster of similar entries?
25

            Both methods bring
    difficulties / challenges / problems

- When some special, non-standard, dedicated retrieval
  software is made available by a specific target source
  database, to offer special features to the user to exploit
  the database better than with a more classical standard
  retrieval interface, then this may be lost in the new
  retrieval system.
  Searches are reduced to the lowest common denominator.
  Examples:
  - clustering of results
  - deduplication of results…
26

Method 1: Merging = aggregating
   into a searchable database
   User
    User                   User
                            User




           Search engine           Aggregated database




                    Database         Database         Database       D
                   or web site      or web site      or web site    or
                       or…              or…              or…
27

 Open Archives Initiative Protocol for
  Metadata Harvesting (OAI-PMH)
   user
   user
                                                              Data
                        Service                             Providers
             Search     Provider
  Client       &                          request
computer                Metadata                            metadata
     +
            retrieval
  client                database
                         server                PMH
 software
               http                                         metadata
                                            http protocol
             protocol

                                                             metadata
                        Digital objects
28

   Merging into a searchable database
      offers benefits for the users

+ Applicable even in the absence of data communication to
  remote servers
  (whereas federated searching needs good, fast data
  communication.)
  Therefore this is the relatively ‘old’ method.




                         ☺
29

   Merging into a searchable database
     brings difficulties / challenges

- The contents of the aggregated database is less up to data
  than the original information sources.
 The importance of this aspect depends of course
  - on the particular application
  - on the time delay
30

     Method 2: Federated searching
      through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
31

           Federated searching:
    terminology / vocabulary / synonyms

      federated searching
=     meta-searching = metasearching
=     cross-database searching
=     multi-database searching
=     multi-threaded searching
=     one-stop searching
=     poly-searching = polysearching
=     broadcast searching
=     searching through a portal / gateway
32

        Federated searching
  through scattered databases: why?


The perfect trip:
 The perfect trip:


                                                ☺
1. A cheap and nice flight
 1. A cheap and nice flight
2. A cheap and nice hotel
 2. A cheap and nice hotel
3. A visit to a nice museum
 3. A visit to a nice museum
4. Something nice to read (free via your library)
 4. Something nice to read (free via your library)
Example                                    33

      Federated searching: application:
          finding a suitable flight

  Example:
  • http://CheapTickets.com/ for the USA
Example                                   34

      Federated searching: application:
      finding a hotel room in some city
Example                           35

           Federated searching:
          searching in a museum
Example                            36

          Federated searching:
          searching in a library
37

    Federated searching:
     integrating access

                                Intranet
                                 Intranet
         Articles
          Articles
                                           WWW
                                            WWW
                                       search engines
                                        search engines
         Journals
          Journals
                                                 Catalog
                                                  Catalog
        Publishers
         Publishers                           database(s)
                                               database(s)
                                            of other libraries
                                             of other libraries
         Databases
          Databases
(full-text or bibliographic)
 (full-text or bibliographic)
                                   Local library catalog
                                    Local library catalog
                                       database(s)
                                        database(s)
             Meta-searching system
             Meta-searching system
38

              Federated searching:
              benefits for the users

+ The system can help the user to select appropriate
  sources.




                         ☺
39

             Federated searching:
             benefits for the users

+ The system can help in the process of authentication and
  authorization when this involves not only a simple
  recognition of IP-address of the user’s client computer,
  but when it involves user-id’s and passwords.




                         ☺
40

              Federated searching:
              benefits for the users

+ The need to know which particular database is suitable
  for a particular search is reduced, because several ones
  can be searched in one action.




                          ☺
41

              Federated searching:
              benefits for the users

+ The users have to learn only 1 user interface for
  searching and only 1 search syntax,
  instead of a user interface and a search syntax for each
  database!




                          ☺
42

             Federated searching:
             benefits for the users

+ Can make users search and exploit databases that they
  would never use otherwise, that is without federated
  search system!




                         ☺
43

              Federated searching:
              benefits for the users

+ Useful, relevant, interesting items/references can be
  found/uncovered from unexpected, unknown, unfamiliar
  databases!
  This is mainly beneficial in the case of interdisciplinary
  subjects/topics.




                          ☺
44

               Federated searching:
               benefits for the users

+ Some systems offer tools to refine display of the results;
  for instance
  »to dedupe very similar items in the result set,
  »to sort the results,
  »to rank the results,
  »to search within the result set,
  »…


                            ☺
45

             Federated searching:
             benefits for the users

+ Some systems offer interesting links from a retrieval
   result to various related sources or services
   (such as the full text or a document delivery service),
   using a link generator based on the OpenURL standard.




                         ☺
46

              Federated searching:
              benefits for the users

+ Some systems check for each retrieved bibliographic
   description if the corresponding full text is immediately
   available online and indicate this immediately to the
   user, on the fly.




                          ☺
47

              Federated searching:
              benefits for the users

+ Some systems further process the retrieved results and
   display them in an interesting way that is not offered by
   the searched original systems.
   For instance:
  »   Clustering of results according to
      subject or age or availability of full text
  »   Displaying the results in a graphical way



                            ☺
48

Federated searching:
benefits for the users



     So far so good !




        ☺
49

           Federated searching
       through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
50

           Federated searching:
    difficulties / challenges / problems

- How to provide some useful relevance ranking of search
  results/entries,
  even when the target databases can be quite different in
  type and quality, and
  even when no index is created in advance, just-in-case,
  well before the search action, like Google and other
  Internet search engines do.
51

           Federated searching:
    difficulties / challenges / problems

- Powerful / sophisticated / refined forms of searching may
  not be applicable in a federated search.
  Example:
  limiting to a particular type of document,
  such as a therapy (in medicine).
  This may cause a LOSS of time, instead of winning time.
52

           Federated searching
       through scattered databases
     User                                    User
                                              User
      User




                           Federated search engine




           Search engine       Search engine         Search engine



Database                          Database                           Database
53

           Federated searching:
    difficulties / challenges / problems

- Differences among target sources in the Internet
  application protocols that are applied normally,
  by default, for connection/communication and retrieval,
  such as
   »(telnet) HTTP
   »proprietary, non-standard protocols
   »Z39.50, ISO239.50, SRU, and related protocols that are
     developed for federated-searching!
54

          Federated searching
      through scattered databases
    User                                 User
                                          User
     User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
55

           Federated searching:
    difficulties / challenges / problems

- Various search engines may act in different ways!
  For instance:
  Is truncation of a word in a search query possible?
  Is limitation to a particular field possible?

  How can a federated search engine take these differences
 into account?
56

           Federated searching:
    difficulties / challenges / problems

- A query with several words and without explicit Boolean
  operators can be interpreted in various ways
  by the various database retrieval systems.
  For instance, the retrieval software may apply the
  Boolean operator AND to combine all the query words,
  but it may also use OR.
  In the case that the federated search system does not take
  care of this well, then this may lead to lower recall and
  precision.
57

           Federated searching:
    difficulties / challenges / problems

- When some special, non-standard, dedicated retrieval
  software is made available by a specific target source
  databases to offer special features to the user to exploit
  the database better than with a standard retrieval
  interface,
  then the source can probably not be exploited as well by
  the federated search system.
  Searches are reduced to the lowest common denominator.
58

           Federated searching:
    difficulties / challenges / problems

- Differences in response time among the target sources.
  A slow response of a target source can hinder the final
  analysis and presentation of the results to the user.
59

           Federated searching
       through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
60

           Federated searching:
    difficulties / challenges / problems

- Some databases can NOT be included as a target
  database in a federated searching engine,
  because their owners/producers do not allow this.
  This is an important difficulty, because in this way
  interesting / valuable databases are perhaps not exploited
  by users who rely on federated searching.
61

           Federated searching
       through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
62

           Federated searching:
    difficulties / challenges / problems

- Users may be less impressed by a federated searching
  system than by the simple, common, familiar, famous
  Internet / WWW search engines, as response time is in
  most cases less impressive, due to differences as follows:
  - The computer hardware used by the systems
  - Slower distributed searching through several computer
    systems, versus faster searching through a more centralised
    computer database of a priori compiled records
63

           Federated searching:
    difficulties / challenges / problems

- The evaluation of the quality of each search result
  from a federated search action may be more difficult than
  when each database is searched separately,
  because the user may be less aware of the limitations,
  strengths, selection criteria and aims of the individual,
  separate databases that offer each result.
  For instance, peer-reviewed articles from reputable scientific
  journals may be mixed with more popular and more biased,
  unscientific texts from trade literature.
64

              Federated searching:
                  conclusion

Federated searching
- is a continuous challenge
  for developers of the sophisticated software and
  for the implementers in libraries and information centers
- offers benefits for those end-users
  who are not enthusiastic to work with separate target
  source databases
- does not eliminate the need for access to individual
  databases
65

                Hybrid method:
       merging data + federated searching
     User               User
                         User
      User



                                                        Search engine

             Federated search engine

                                                     Aggregated database

Search engine      Search engine


                                        Database        Database         Database
                                       or web site     or web site      or web site
                     Database
 Database                                  or…             or…              or…
66




Information Retrieval in a World
of Scattered Information Sources
         2. Applications of methods
     for efficient information retrieval
67

Method 1: Merging = aggregating
   into a searchable database
   User
    User                   User
                            User




           Search engine           Aggregated database




                    Database         Database         Database       D
                   or web site      or web site      or web site    or
                       or…              or…              or…
68

    Internet global subject directories:
               introduction

• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
  people.
• They can be browsed following a tree structure or a more
  complicated variation.
Example                                      69

      Internet global subject directories:
       Yahoo!: screenshot of home page
Example                                                         70

       Internet global subject directories:
                  BUBL LINK

   • A hypertext global subject directory to more than 10 000
     WWW sites for the higher education community can be
     found at
     http://bubl.ac.uk/link/ [accessed 2008]
   • Accessible free of charge.
   • The categories are based on the well-known general
     Dewey classification system.
Example                                      71

      Internet global subject directories:
     dmoz: screenshot of the starting page
Example                                      72

     Internet global subject directories:
    Librarians' Internet Index: screenshot
Example                                      73

      Internet global subject directories:
                IPL: screenshot
Example                                      74

      Internet global subject directories:
               Intute: screenshot
75

                Internet indexes:
            scheme of the mechanism
            User searching for Internet based information


               Internet client hardware and software


 user interface to a search engine        Internet information source


Internet index search engine      Internet crawler and indexing system


             database of Internet files, including an index
Example                                                76

                    Internet indexes:
                         Google

   • http://www.google.com/
   • Available since 2001 with most of its features.
   • The most popular search system since 2003.
Example                                                        77

                    Internet indexes:
                     Google Scholar

   • Google Scholar allows us to search for more scholarly
     information sources, including journal articles.
   • A beta (test) version has been available since November
     2004.
   • The system is accessible starting from the home page of
     Google as one of the additional services,
     or more directly from http://scholar.google.com/
Example                                78

              Internet indexes:
          Google Scholar: screenshot
Example                                          79

                     Internet indexes:
                           Bing

   • http://www.bing.com/
   • Available in 2009 in beta = test version.
   • Replaces
     Microsoft Live
     as well as
     Yahoo Web Search ?
Example                                                            80

                     Internet indexes:
                          Scirus

   • The search interface: http://www.scirus.com/
   • Since 2001.
   • Offers not only access to files in html format,
     but also to files in PDF.
   • Allows you to search for more or less “manually” selected
      »scientific WWW pages, plus
      »the contents of some scientific, bibliographic databases.
   • In the sense that Scirus is dedicated to scientific
     information, it is similar to Google Scholar.
Example                                                          81

                    Internet indexes:
                          Ask

   • Available from: http://www.ask.com/

   • Offers a feature that is not offered by most other search
     systems:
     categorization = classification = refinement = clustering
     of search results,
     to help the user coping with the problem of ambiguity of
     meaning of the search query that was made
82

      Internet indexes cover only a part of
            the Internet: metaphore


                         The “visible” part of Internet




     The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google Web Search)
Example                                                         83

   Databases accessible over the Internet:
           example: OAISTER
   • http://oaister.umdl.umich.edu/
   • “Our goal is to create a collection of freely available,
     previously difficult-to-access, academically-oriented
     digital resources that are easily searchable by anyone.”
Example                                                        84

   Databases accessible over the Internet:
           example: OAISTER
   • OAISTER makes searching possible in millions of digital
     documents that form part of institutional repositories
     all over the world.
   • OAISTER covers this kind of documents better than
     Google Web Search (according to independent academic
     investigations in 2006 and 2008).
Example                                                       85

   Databases accessible over the Internet:
        example: scientificcommons
   • http://www.scientificcommons.org/
   • Since 2007
   • Similar to OAISTER:
     Allows you to search the full texts in scientific open
     access repositories all over the world.




                             ☺
Example                                   86

          Databases accessible over the
           Internet: example: Medline

  • Medline/PubMed offers
    bibliographic descriptions
    of publications on
    medicine, free of charge.




                            ☺
87

  Current awareness services focusing
    on WWW pages: Google Alerts

• Available at http://www.google.com/ and then see the
  page with additional services
  or more directly from http://www.google.com/alerts/
• Since 2004.
• Can discover relevant changed or new WWW pages for
  you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored
  on their server computer.
88

              Internet with WWW
               and printed books

• Since a few years, Internet with the WWW have become
  the primary information source for many people.
• However:
  »A lot of information is still distributed only in the form of
   printed books
  »The content of old printed books can still be interesting.
  »The content of most printed books is (still) not available on
   the Internet.
89

        Public access book databases:
                introduction

• Most general WWW search engines do NOT allow you
  to find out about the existence of books that may be
  interesting for you, at least not in a systematic and
  efficient way.
• So, specific search tools to find books can be useful.
90

        Public access book databases
          provided by bookshops

• To find currently available books, the bibliographic
  databases assembled by big bookshops are interesting.
• Several offer a good coverage.
• Many are accessible free of charge.
• The added price information can be useful for the
  acquisition and accounting department of a library or if
  an individual user wants to buy a book.
• Some provide a current awareness service,
  also free of charge.
• Take into account delivery costs: postage + import tax
Examples                                                     91

        Book databases accessible free of
          charge: examples in U.S.A.

   • Amazon.com (US):
     http://www.amazon.com/
   • This company offers also different, more local
     versions that offer books in other languages, such as
     http://www.amazon.co.uk/
     http://www.amazon.fr/
   • note: amazon, NOT amazone
   • Subject description is poor.
   • Take into account delivery costs: postage + import
     tax
Examples                                                    92

       Book databases accessible free of
         charge: examples in U.S.A.

   • Barnes and Noble (US):
     http://www.barnesandnoble.com/ or http://www.bn.com/
Examples                                               93

       Book databases accessible free of
         charge: examples in U.S.A.

   • http://www.completebook.com/cbmsi/bookaction.do
Examples                                   94

       Book databases accessible free of
         charge: examples in U.S.A.

   • http://www.overstock.com/
Examples                                    95

        Book databases accessible free of
          charge: examples in U.S.A.

   • http://www.powells.com/
   • Specialised in books only.
Examples                                                    96

        Book databases accessible free of
          charge: examples in Europe

   • Blackwell’s on the Internet
     (International, academic books):
     http://www.blackwell.co.uk/
   • VLB for books in German
     http://www.buchhandel.de/
   • For books in French
     http://www.chapitre.com
   • Boeknet - De Nederlandse Internet Boekhandel (Dutch)
     http://www.boeknet.nl/
97

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                 database

descriptions of books & real books for sale
98

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                databases
descriptions of books & real books for sale
99

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
100

       Search systems for books that are
          made available by dealers
User


                                              Multi-dealer
                                                 database
                                                = merged
                                              book dealer
                                                databases

                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
101

       Search systems for books that are
          made available by dealers
User

                                              Multi-dealer
                                                databases
                                                = merged
                                              book dealer
                                                 databases

                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
102

       Search systems for books that are
          made available by dealers
User


                                              Multi-dealer
                                                databases
                                                = merged
                                              book dealer
                                                 databases

                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
103

  Free public access multi-dealer book
         databases: examples

• http://www.abebooks.com/
  [accessed 2008]
• http://www.abebooks.fr/
  offers a user interface in
  French
• Covers > 10 000
  bookshops.
• The company has been
  acquired by Amazon in
  2008.
104

  Free public access multi-dealer book
         databases: examples

• http://www.alibris.com/
  [accessed 2008]
105

  Free public access multi-dealer book
         databases: examples

• Amazon Marketplace:
  http://www.amazon.com/
  [accessed 2009]
• In synergy with the online bookshop Amazon on 1
  WWW site:
  Used books are displayed alongside Amazon’s new
  books.
• “the world’s biggest online book bazaar”
• Subject description is poor.
• Take into account delivery costs: postage + tax
106

Free public access multi-dealer book
       databases: examples
107

  Free public access multi-dealer book
         databases: examples

• http://www.biblio.com/ or http://biblio.com/
  [accessed 2008]
108

  Free public access multi-dealer book
         databases: examples

• http://www.boekenverkoper.nl
  [accessed in 2007]
109

  Free public access multi-dealer book
         databases: examples

• http://www.choosebooks.com/
  [accessed 2008]
110

  Free public access multi-dealer book
         databases: examples

• http://www.tomfolio.com/
  [accessed 2008]
111

        Full-text databases of books:
                introduction

• Some organisations have scanned the contents of
  thousands of books,
  to make them full-text searchable through the Internet.
112

        Full-text databases of books:
                   Amazon

• http://www.amazon.com/ and choose BOOKS
• Since 2004
• Also incorporated in the search engine A9
113

        Full-text databases of books:
            Google Book Search

• http://www.books.google
• Since 2005
Example                                                      114

          Online Public Access Catalogues:
            union catalogues of libraries

   • Some systems offer access to the merged catalogues of
     several libraries, so-called ‘union catalogues’.
   • Example:
     Copac
     http://www.copac.ac.uk/
     is accessible free of charge.
Examples                                                   115

       Online Public Access Catalogues:
         union catalogues: examples

   • European National Libraries, catalogues harvested:
     http://www.theeuropeanlibrary.org/portal/index.html
Examples                                                       116

       Online Public Access Catalogues:
         union catalogues: examples

   • Europeana: documents on European culture.
     http://www.europeana.eu/portal/
     Metadata are harvested from co-operating organisations.
117

         Online access databases
      about journal articles: overview

• Thousands of fee-based online access databases offer
  bibliographies or full-texts of journal articles in
  particular subject domains and published by many
  publishers.
• Many publishers offer searchable bibliographies,
  but only of their own publications.
  (for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
  articles published in journals from many publishers,
  free of charge.
Example                                                         118

    Online access databases about journal
              articles: Ingenta

   • Available from: http://www.ingentaconnect.com/
   • Ingenta allows you to search a bibliographic database of
     millions of journal articles,
     including titles, authors, in many cases abstracts.
   • The organisation claims to be
     “The most comprehensive collection of academic and
     professional publications”
Example                                                            119

    Online access databases about journal
      articles: Infotrieve ArticleFinder

   • Available from: http://www.infotrieve.com/
   • Infotrieve allows you to search free of charge
     in a bibliographic database of the articles
     of more than 20 000 journal titles and conference
     proceedings,
     NOT full-text.
   • Payment is required to receive the full text of a document.
Example                                                         120

    Online access databases about journal
               articles: Scirus

   • The search interface: http://www.scirus.com
   • This is a specialised Internet index that allows you to
     search for selected scientific information (only) on the
     WWW.
   • This includes the peer-reviewed articles in the journals
     that are published in ScienceDirect by Elsevier.
   • Offered free of charge by Elsevier.
   • An article can be downloaded in full-text format only
     when a fee has been paid to the publisher.
Example                                                          121

    Online access databases about journal
           articles: Google Scholar

   • Google Scholar allows us to search for more scholarly
     information sources, including journal articles.
   • A beta (= test) version has been available since November
     2004.
   • The system is accessible starting from the home page of
     Google as one of the additional services besides the
     normal, classical WWW search.
Example                                     122

    Online access databases about journal
         articles: DOAJ screenshot
Example                                                            123

    Online access databases about journal
                articles: Eric

   • http://ericir.syr.edu/Eric/
   • Eric allows searching a bibliographic database of articles
     and other documents in the fields of information science
     and education.
   + Available in open access, free of charge
   - Payment is required to receive the full text of a document.
Example                                                         124

    Online access databases about journal
               articles: LISTA

   • http://www.libraryresearch.com/
   • Bibliographic database; covers libraries and information
     management, with subjects such as librarianship,
     classification, cataloging, bibliometrics, online
     information retrieval, information management and
     more, from more than 600 periodicals plus books,
     research reports, and proceedings
   • Offered since 2005
   • Delivered via the EBSCOhost platform
   + Free of charge
Example                                                         125

    Online access databases about journal
     articles: Teacher Reference Center

   • http://www.TeacherReference.com/
   • Teacher Reference Center (TRC)
     Journal Information for Teachers
     allows to search popular teacher and administrator trade
     journals, periodicals, and books
   • via the EBSCOhost platform
   • since 2006
   + offered free of charge
Example                                                          126

              Online access databases:
                  Web of Science

   • One of the bibliographic databases in Web of Knowledge
     is the Web of Science.
   • This is a bibliographic database that covers the articles
     published in the most important scientific journals.




                        Web of Knowledge

                          Web of Science
127

      Finding images on the Internet:
               introduction

+ Several public access search systems are available free of
  charge to search for
  images / pictures (either artwork, either photos, or both)
  on the Internet.


+ When searching for images, the search results from such
  a system offer not only links to the image files on the
  Internet, but also directly small versions of the images
  (so-called “thumbnails”).
Examples                                    128

        Finding images on the Internet:
     screen shot of a Google image search
Example                                                    129

          Finding images on the Internet:
            examples of search engines

   • http://images.google.com/ !
     or through http://www.google.com/
     [accessed in 2009]
   • The largest database in this category
     (at least in 2002…2008).
     For each result, not only a thumbnail is offered,
     but also directly the origin with the readable URL;
     this makes it easier to guess the relevance of the
     document.
Eample                                           130

         Finding images on the Internet:
           examples of search engines

   • http://www.bing.com/
   • Available in 2009 in beta = test version.
   • Replacing
     Microsoft Live and Yahoo Search ?
131

     Method 2: Federated searching
      through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
132

         Federated searching
   through scattered databases: why?

• Applications:
  »Finding information in bibliographic databases
  »Finding the availability of rooms in various hotels
  »Finding flights to a particular destination offered by
   various airline companies
  »Finding scientific data that are made available by various
   computers all over the world
Example                                   133

      Federated searching: application:
      finding a hotel room in some city
Example                                    134

       Federated searching: application:
            finding scientific data

  •   OBIS
      = Ocean Biogeographic
      Information System
  •   http://www.iobis.org/
  •   Gateway to scientific
      data on living systems
      in the oceans.
  •   The data reside on
      many computers all
      over the world.
135

                Hybrid method:
       merging data + federated searching
     User               User
                         User
      User



                                                        Search engine

             Federated search engine

                                                     Aggregated database

Search engine     Search engine


                                        Database         Database        Database
                                       or web site      or web site     or web site
                     Database
 Database                                  or…              or…             or…
Example                                                          136

   Databases accessible over the Internet:
                 example

  • http://WorldWideScience.org/
  • “A global science gateway connecting you to national and
    international scientific databases and portals.
    Accelerates scientific discovery and progress by providing
    one-stop searching of global science sources.”
137

          Meta WWW search systems
       on a server computer in the WWW

               Client                                    Internet
             computer                                    WWW
                 +
                                       WWW
               WWW                     server
          client program              computer


User                                                WWW
                                                    server
                                                  computers
                                                 with Internet
                                                    search
                                                    systems

                           In   Out
138

           Meta-search systems:
    terminology / vocabulary / synonyms

      “multi-threaded search systems”
=     “multiple search systems”
=     “multi-search systems”
=     “meta-search systems”
=     “intelligent search agents”
=     “federated search systems”
=     “portals”
Examples                                                    139

                     Meta-search systems
                     on a server computer
   •   http://aftervote.com/
   •   http://draze.com/
   •   http://www.all4one.com
   •   http://www.bytesearch.com
   •   http://clusty.com/
   •   http://www.cyber411.com
   •   http://www.dogpile.com = http://dogpile.com/
   •   http://www.go2net.com = http://www.metacrawler.com
   •   http://jux2.com
   •   http://www.kartoo.com
   •   http://www.mamma.com
   •   http://www.museseek.com
   •   http://www.profusion.com
   •   http://www.search.com
   •   http://www.vivisimo.com = http://vivisimo.com/
140

Meta-search systems: server-based:
        example: Vivisimo
141

   Meta-search systems: server-based:
           example: Vivisimo

• Vivisimo adds value by analysing the retrieved
  results / hits / links / WWW documents,
  in order to
  cluster / group / categorize / classify / map
  these under headings / classes / categories,
  to make further selections by the user / searcher easier
  and faster.
• Vivisimo can accomplish this on the fly,
  that is WITHOUT pre-processing the documents before
  the search.
Example                                                         142

      Meta-search systems: server-based:
               example: Clusty

   • Adds value by analysing the retrieved
     results / hits / links / WWW documents,
     in order to
     cluster / group / categorize / classify / map
     these under headings / classes / categories,
     to make further selections by the user / searcher easier
     and faster.
   • Can accomplish this on the fly, that is WITHOUT pre-
     processing the documents before the search.
Example                                   143

     Meta-search systems: server-based:
     example: Clusty screenshot in 2006
144

              Meta-search systems:
                 disadvantages

- It is not always clear through which Internet indexes the
  meta-search system will search.
- Not all meta-search systems can search all the major
  primary search systems; for instance the famous Google
  Internet index is NOT included in most systems.
- Only a limited number of the results that can be obtained
  from the various Internet indexes are shown.
145

  Free public access book meta-search
             systems: types

We can make the following distinction between various
  types of meta-systems for searching:
1. Database resulting from merging several existing
   smaller databases = aggregator database
   In this case of books:
   multi-dealer database = “listing service”
2. Federated search system
   = cross-database search system
146

    Free public access search systems:
        federated search systems

• Each of the searched target databases can be
  »a catalogue database managed by the
   owner/dealer/shop/seller,
   as well as
  »a multi-dealer database
147

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                 database

descriptions of books & real books for sale
148

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                databases
descriptions of books & real books for sale
149

       Search systems for books that are
          made available by dealers
User




                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
150

       Search systems for books that are
          made available by dealers
User


                                              Multi-dealer
                                                 database
                                                = merged
                                              book dealer
                                                databases

                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
151

       Search systems for books that are
          made available by dealers
User

                                              Multi-dealer
                                                databases
                                                = merged
                                              book dealer
                                                 databases

                                              Book dealer
                                                  catalog
                                                databases

descriptions of books & real books for sale
152

       Search systems for books that are
          made available by dealers
User                                             Federated
                                        book search systems


                                               Multi-dealer
                                                 databases
                                                 = merged
                                               book dealer
                                                  databases

                                               Book dealer
                                                   catalog
                                                 databases
descriptions of books & real books for sale
153

       Search systems for books that are
          made available by dealers
User                                             Federated
                                        book search systems


                                               Multi-dealer
                                                 databases
                                                 = merged
                                               book dealer
                                                  databases

                                               Book dealer
                                                   catalog
                                                 databases
descriptions of books & real books for sale
154

       Search systems for books that are
          made available by dealers
User                                             Federated
                                        book search systems

                                               Multi-dealer
                                                 databases
                                                 = merged
                                               book dealer
                                                  databases

                                               Book dealer
                                                   catalog
                                                 databases

descriptions of books & real books for sale
-                                         155

    Free public access federated search
       systems for books: examples
156

   Free public access federated search
      systems for books: examples

• http://www.allbookstores.com/ [accessed 2006]
157

Free public access federated search
   systems for books: examples
158

   Free public access federated search
      systems for books: examples

• http://www.BookFinder.com/
  [accessed 2009]
159

   Free public access federated search
      systems for books: examples

• http://www.bookfinder4u.com/ [accessed 2007]
160

   Free public access federated search
      systems for books: examples

• http://www.bookpursuit.com/
  [accessed 2006]
161

Free public access federated search
   systems for books: examples
162

   Free public access federated search
      systems for books: examples

• http://www.dealtime.com/ [accessed 2006]
163

   Free public access federated search
      systems for books: examples

• http://www.epinions.com/Books [accessed 2006]
164

   Free public access federated search
      systems for books: examples

• http://www.fetchbook.info/ [accessed 2006]
165

   Free public access federated search
      systems for books: examples

• http://www.gallileus.info/search/
  [accessed 2006]
166

   Free public access federated search
      systems for books: examples

• http://www.priceminister.com/livres-bd [accessed 2007]
• Can search not only books but also other products in
  various shops.
167

   Free public access federated search
      systems for books: examples

• http://www.usedbooksearch.co.uk/books.htm
  [accessed 2008]
• Specialised in used books, not in new books.
168

   Free public access federated search
      systems for books: examples

• http://www.vialibri.net/ [accessed 2008]
169

   Free public access federated search
    systems for books are interesting

• Knowledge about their quality is interesting
  » for end users as well as for librarians who buy books,
  » for librarians who serve their users by performing
    searches for books,
  » for librarians who propose databases to their users, for
    instance on their library WWW site or who want to
    include one or several book search engines in their own
    local system for federated searching through several
    targets in one action.
170

    Online Public Access Catalogues:
        simultaneous searching

• Some meta-search services allow simultaneous, parallel
  searching in one search action over several databases of
  libraries.
Example                                                          171

          Online Public Access Catalogues:
          simultaneous searching: examples

   • Simultaneous access to catalogues of libraries related to
     water, organised by IAMSLIC, using Z39.50
172




Information Retrieval in a World
of Scattered Information Sources
         3. Comparison of methods
     for efficient information retrieval
173

Method 1: Merging = aggregating
   into a searchable database
   User
    User                   User
                            User




           Search engine           Aggregated database




                    Database         Database         Database        D
                   or web site      or web site      or web site     or
                       or…              or…              or…
174

         Comparison of methods
    for efficient information retrieval

• Merged=aggregated databases react faster than federated
  search systems (in most cases).
  »Explanation:
   They do not need several simultaneous Internet
   connections
   &
   they do not have to merge raw intermediate results into the
   result that is finally shown to the user.


                          ☺
175

     Method 2: Federated searching
      through scattered databases
     User                                User
                                          User
      User




                       Federated search engine




           Search engine
                           Search engine         Search engine



Database                      Database                           Database
176

                Hybrid method:
       merging data + federated searching
     User               User
                         User
      User



                                                        Search engine

             Federated search engine

                                                     Aggregated database

Search engine      Search engine


                                        Database        Database      Database
                                       or web site     or web site   or web site
                     Database
 Database                                  or…             or…           or…
177

         Comparison of methods
    for efficient information retrieval

• Federated search systems offer a higher coverage than
  direct searching of databases or merged databases
  (in most cases).
  »Explanation: They can exploit many databases and even
   merged=aggregated databases in one search action.
   For example, in 1 search, they can cover more than 100
   million descriptions of physical books
   = couples of book and dealer (not book titles).


                         ☺
178

          Comparison of methods
     for efficient information retrieval

• Federated search systems offer results that are more up
  to date than when an aggregated database is searched
  with contents that is (only) a snapshot made in the past.
  This is important
  when data should be very fresh = up-to-date.
  Examples:
  booking=reservation systems for flights, hotel rooms



                          ☺
179




Information Retrieval in a World
of Scattered Information Sources
            Conclusions
180

                    Conclusions:
                     2 methods

• A single, simple, standard method = approach = solution
  does not (yet) exist.
• Two basic methods are common.
• They have their own
  »advantages
   and
  »disadvantages.
181

                      Conclusions:
                      1 dimension

•   Up to now we have made primarily the distinction
    »   Merging records in 1 database on 1 computer
        & searching this database

    »   Federated searching in one action of databases on
        various computers
182

                   Conclusions:
                  more dimensions

• However, the location of the databases is only 1 aspect /
  dimension of possible methodological approaches.
• Other dimensions / aspects are for instance:
  2. Unification / standardization of database record structures
   in fields according to a standard,
   for better interoperability.
  3. Unification / standardization of subject descriptions,
   for better interoperability.
• This bring us to 3 aspects / dimensions
  so we can visualize this as a cube.
183

                       Conclusions:
                the cube of interoperability
                                         1. One computer
                                         2. One database field structure
                                         3. One subject description system
                                             BEST CASE



                                    Inter-
                                  operability



1. Various computers
2. Various database field structures
3. Various subject description systems
    WORST CASE
184

    Methods for efficient information
         retrieval: conclusions

• For end users, the underlying methods of most
  information systems are either
  “not clear”    (= negative formulation)
  “transparent” (= positive formulation)
185

  Methods for
     efficient
  information
    retrieval:
  conclusions
• The examples given
  show at least that
  progress in this field
  is impressive.



        ☺
186




Questions? Suggestions? Remarks?
187

• You are free to copy, distribute, display this work under
  the following conditions:
  »Attribution:
   You must mention the author.
  »Noncommercial:
   You may not use this work for commercial purposes.
  »No Derivative Works:
   You may not change, modify, alter, transform, or build
   upon this work.
• For any reuse or distribution, you must make clear to
  others the license terms of this work.

Mais conteúdo relacionado

Semelhante a Zoveel informatie, zo weinig tijd

Towards a Global Network of Food Safety Knowledge Hubs
Towards a Global Network of Food Safety Knowledge HubsTowards a Global Network of Food Safety Knowledge Hubs
Towards a Global Network of Food Safety Knowledge HubsNikos Manouselis
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...Pedro Príncipe
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web Centro Web
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationMathieu d'Aquin
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsDenodo
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management PlansSarah Jones
 
EUBrazilOpenbio Technologies
EUBrazilOpenbio TechnologiesEUBrazilOpenbio Technologies
EUBrazilOpenbio TechnologiesLeonardo Candela
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning WorkshopLizzy_Rolando
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...OpenAIRE
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big datacthanopoulos
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
 

Semelhante a Zoveel informatie, zo weinig tijd (20)

Towards a Global Network of Food Safety Knowledge Hubs
Towards a Global Network of Food Safety Knowledge HubsTowards a Global Network of Food Safety Knowledge Hubs
Towards a Global Network of Food Safety Knowledge Hubs
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 
Ircdl damico del-bimbo-meoni
Ircdl damico del-bimbo-meoniIrcdl damico del-bimbo-meoni
Ircdl damico del-bimbo-meoni
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large Deployments
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
EUBrazilOpenbio Technologies
EUBrazilOpenbio TechnologiesEUBrazilOpenbio Technologies
EUBrazilOpenbio Technologies
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Database part1-
Database part1-Database part1-
Database part1-
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Vinay bamane
Vinay bamaneVinay bamane
Vinay bamane
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big data
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 

Mais de Vlaamse Vereniging voor Bibliotheek, Archief & Documentatie vzw (VVBAD)

Mais de Vlaamse Vereniging voor Bibliotheek, Archief & Documentatie vzw (VVBAD) (20)

Presentatie AHD studiedag Leeszaalmedewerkers -
Presentatie AHD studiedag Leeszaalmedewerkers -Presentatie AHD studiedag Leeszaalmedewerkers -
Presentatie AHD studiedag Leeszaalmedewerkers -
 
ChatGPT, chatboxes en het einde van de databases
ChatGPT, chatboxes en het einde van de databasesChatGPT, chatboxes en het einde van de databases
ChatGPT, chatboxes en het einde van de databases
 
Connecting libraries to EU resources
Connecting libraries to EU resourcesConnecting libraries to EU resources
Connecting libraries to EU resources
 
Ben je klaar voor innovatie?
Ben je klaar voor innovatie?Ben je klaar voor innovatie?
Ben je klaar voor innovatie?
 
Hoe maak ik mijn project impactvol?
Hoe maak ik mijn project impactvol?Hoe maak ik mijn project impactvol?
Hoe maak ik mijn project impactvol?
 
Connecteren faciliteren in hoger onderwijs, welke rol heeft de bibliotheek?
Connecteren faciliteren in hoger onderwijs, welke rol heeft de bibliotheek?Connecteren faciliteren in hoger onderwijs, welke rol heeft de bibliotheek?
Connecteren faciliteren in hoger onderwijs, welke rol heeft de bibliotheek?
 
Netwerken bij Informatie aan Zee
Netwerken bij Informatie aan ZeeNetwerken bij Informatie aan Zee
Netwerken bij Informatie aan Zee
 
Islamtisch (religieus) erfgoed. Waar liggen de uitdagingen en kansen?
Islamtisch (religieus) erfgoed. Waar liggen de uitdagingen en kansen?Islamtisch (religieus) erfgoed. Waar liggen de uitdagingen en kansen?
Islamtisch (religieus) erfgoed. Waar liggen de uitdagingen en kansen?
 
Waarderen van archieven
Waarderen van archievenWaarderen van archieven
Waarderen van archieven
 
Okapi2-Vlaanderen een hulp richting data driven management
Okapi2-Vlaanderen een hulp richting data driven managementOkapi2-Vlaanderen een hulp richting data driven management
Okapi2-Vlaanderen een hulp richting data driven management
 
Van experiment naar structurele oplossing: gezichtsherkenning in functie van ...
Van experiment naar structurele oplossing: gezichtsherkenning in functie van ...Van experiment naar structurele oplossing: gezichtsherkenning in functie van ...
Van experiment naar structurele oplossing: gezichtsherkenning in functie van ...
 
Het gebruik van AI bij het catalogiseren van boeken in KBR
Het gebruik van AI bij het catalogiseren van boeken in KBRHet gebruik van AI bij het catalogiseren van boeken in KBR
Het gebruik van AI bij het catalogiseren van boeken in KBR
 
Data-interoperabiliteit in de praktijk
Data-interoperabiliteit in de praktijkData-interoperabiliteit in de praktijk
Data-interoperabiliteit in de praktijk
 
Droomhuis of luchtkasteel: De verbouwing van de Nederlandse informatiehuishou...
Droomhuis of luchtkasteel: De verbouwing van de Nederlandse informatiehuishou...Droomhuis of luchtkasteel: De verbouwing van de Nederlandse informatiehuishou...
Droomhuis of luchtkasteel: De verbouwing van de Nederlandse informatiehuishou...
 
Participative Registration of Intangible Cultural Heritage on immaterieelerfg...
Participative Registration of Intangible Cultural Heritage on immaterieelerfg...Participative Registration of Intangible Cultural Heritage on immaterieelerfg...
Participative Registration of Intangible Cultural Heritage on immaterieelerfg...
 
Een MaakBib voor iedereen in elke bib
Een MaakBib voor iedereen in elke bibEen MaakBib voor iedereen in elke bib
Een MaakBib voor iedereen in elke bib
 
De bib als derde plek? Ja, selvølgelig
De bib als derde plek? Ja, selvølgeligDe bib als derde plek? Ja, selvølgelig
De bib als derde plek? Ja, selvølgelig
 
ZB Bibliotheek van Zeeland maakt het verschil
ZB Bibliotheek van Zeeland maakt het verschilZB Bibliotheek van Zeeland maakt het verschil
ZB Bibliotheek van Zeeland maakt het verschil
 
Hybriditeit als uitgangspunt: de vernieuwde leeszaal van het ModeMuseum Antwe...
Hybriditeit als uitgangspunt: de vernieuwde leeszaal van het ModeMuseum Antwe...Hybriditeit als uitgangspunt: de vernieuwde leeszaal van het ModeMuseum Antwe...
Hybriditeit als uitgangspunt: de vernieuwde leeszaal van het ModeMuseum Antwe...
 
De bib is mens- en buurtversterkend
De bib is mens- en buurtversterkendDe bib is mens- en buurtversterkend
De bib is mens- en buurtversterkend
 

Zoveel informatie, zo weinig tijd

  • 1. 1 Zo veel informatie Zo weinig tijd Paul.Nieuwenhuysen@vub.ac.be Created to support a presentation at the bi-annual 2-day conference series “Informatie” organised by VVBAD, in Oostende, Belgium September 10-11, 2009 “Informatie aan zee”
  • 2. 2 0. Introduction with problem statements contents 1. Methods to make = summary information retrieval = structure efficient in a world of scattered sources = overview 2. Applications of those methods of this presentation 3. Comparison of the methods 4. Conclusions
  • 3. 3 These slides should be available from the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ (note: BIBLIO and not biblio) and also from the WWW site of the organisers of the conference = VVBAD
  • 4. 4 Information Retrieval in a World of Scattered Information Sources 0. Introduction and problem statements
  • 5. 5 Introduction: scattering of sources • Users want to exploit information sources fast and effectively. • This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW.
  • 6. 6 Introduction: scattering of sources • In other words: integration / aggregation is still far from perfect.
  • 7. 7 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 1. They must be used one after the other which requires many decisions and actions
  • 8. 8 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 2. They offer different user interfaces in the retrieval phase, which is confusing
  • 9. 9 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 3. They offer found information items in various data formats
  • 10. 10 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 4. They display found items in different ways on a computer screen
  • 11. 11 Introduction: scattering of sources difficulties Small = BEAUTIFUL
  • 12. 12 Introduction: scattering of sources difficulties
  • 13. 13 Introduction: problem statements 1. Which methods have been developed and applied to cope with this reality?
  • 14. 14 Introduction: problem statements 2. Which concrete applications are available and how can an end-user exploit systems created in this domain?
  • 15. 15 Introduction: problem statements 3. How can information intermediaries evaluate and apply these methods to bring information more efficiently to end-users?
  • 16. 16 Information Retrieval in a World of Scattered Information Sources 1. Methods to make information retrieval efficient in a world of scattered sources
  • 17. 17 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 18. 18 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 19. 19 Both methods offer benefits to the users + Saves the users time executing queries to various servers or browsing through various systems. ☺
  • 20. 20 Both methods offer benefits to the users + Offers a uniform / consistent display of results in the output phase. ☺
  • 21. 21 Both methods offer benefits to the users + Some systems offer tools to refine display of the results; for instance + to deduplicate very similar items in the result set, + to sort the results, + to rank the results, + to visualize the results in a more graphical way, + to search within the result set, +… ☺
  • 22. 22 Both methods bring difficulties / challenges / problems - In many cases there are differences among the merged sources in the formatting/structuring of their database records in fields. This hinders - searching limited to a field - displaying selected fields only (such as title) - sorting of the displayed records on the contents of a particular selected field (such as author or date)
  • 23. 23 Both methods bring difficulties / challenges / problems - In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as »classifications »taxonomies »thesaurus systems »ontologies This hinders the exploitation of the added value of such metadata.
  • 24. 24 Both methods bring difficulties / challenges / problems - How to deduplicate/dedupe/cluster very similar entries/results/items = near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries?
  • 25. 25 Both methods bring difficulties / challenges / problems - When some special, non-standard, dedicated retrieval software is made available by a specific target source database, to offer special features to the user to exploit the database better than with a more classical standard retrieval interface, then this may be lost in the new retrieval system. Searches are reduced to the lowest common denominator. Examples: - clustering of results - deduplication of results…
  • 26. 26 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 27. 27 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) user user Data Service Providers Search Provider Client & request computer Metadata metadata + retrieval client database server PMH software http metadata http protocol protocol metadata Digital objects
  • 28. 28 Merging into a searchable database offers benefits for the users + Applicable even in the absence of data communication to remote servers (whereas federated searching needs good, fast data communication.) Therefore this is the relatively ‘old’ method. ☺
  • 29. 29 Merging into a searchable database brings difficulties / challenges - The contents of the aggregated database is less up to data than the original information sources. The importance of this aspect depends of course - on the particular application - on the time delay
  • 30. 30 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 31. 31 Federated searching: terminology / vocabulary / synonyms federated searching = meta-searching = metasearching = cross-database searching = multi-database searching = multi-threaded searching = one-stop searching = poly-searching = polysearching = broadcast searching = searching through a portal / gateway
  • 32. 32 Federated searching through scattered databases: why? The perfect trip: The perfect trip: ☺ 1. A cheap and nice flight 1. A cheap and nice flight 2. A cheap and nice hotel 2. A cheap and nice hotel 3. A visit to a nice museum 3. A visit to a nice museum 4. Something nice to read (free via your library) 4. Something nice to read (free via your library)
  • 33. Example 33 Federated searching: application: finding a suitable flight Example: • http://CheapTickets.com/ for the USA
  • 34. Example 34 Federated searching: application: finding a hotel room in some city
  • 35. Example 35 Federated searching: searching in a museum
  • 36. Example 36 Federated searching: searching in a library
  • 37. 37 Federated searching: integrating access Intranet Intranet Articles Articles WWW WWW search engines search engines Journals Journals Catalog Catalog Publishers Publishers database(s) database(s) of other libraries of other libraries Databases Databases (full-text or bibliographic) (full-text or bibliographic) Local library catalog Local library catalog database(s) database(s) Meta-searching system Meta-searching system
  • 38. 38 Federated searching: benefits for the users + The system can help the user to select appropriate sources. ☺
  • 39. 39 Federated searching: benefits for the users + The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords. ☺
  • 40. 40 Federated searching: benefits for the users + The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action. ☺
  • 41. 41 Federated searching: benefits for the users + The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database! ☺
  • 42. 42 Federated searching: benefits for the users + Can make users search and exploit databases that they would never use otherwise, that is without federated search system! ☺
  • 43. 43 Federated searching: benefits for the users + Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases! This is mainly beneficial in the case of interdisciplinary subjects/topics. ☺
  • 44. 44 Federated searching: benefits for the users + Some systems offer tools to refine display of the results; for instance »to dedupe very similar items in the result set, »to sort the results, »to rank the results, »to search within the result set, »… ☺
  • 45. 45 Federated searching: benefits for the users + Some systems offer interesting links from a retrieval result to various related sources or services (such as the full text or a document delivery service), using a link generator based on the OpenURL standard. ☺
  • 46. 46 Federated searching: benefits for the users + Some systems check for each retrieved bibliographic description if the corresponding full text is immediately available online and indicate this immediately to the user, on the fly. ☺
  • 47. 47 Federated searching: benefits for the users + Some systems further process the retrieved results and display them in an interesting way that is not offered by the searched original systems. For instance: » Clustering of results according to subject or age or availability of full text » Displaying the results in a graphical way ☺
  • 48. 48 Federated searching: benefits for the users So far so good ! ☺
  • 49. 49 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 50. 50 Federated searching: difficulties / challenges / problems - How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do.
  • 51. 51 Federated searching: difficulties / challenges / problems - Powerful / sophisticated / refined forms of searching may not be applicable in a federated search. Example: limiting to a particular type of document, such as a therapy (in medicine). This may cause a LOSS of time, instead of winning time.
  • 52. 52 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 53. 53 Federated searching: difficulties / challenges / problems - Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as »(telnet) HTTP »proprietary, non-standard protocols »Z39.50, ISO239.50, SRU, and related protocols that are developed for federated-searching!
  • 54. 54 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 55. 55 Federated searching: difficulties / challenges / problems - Various search engines may act in different ways! For instance: Is truncation of a word in a search query possible? Is limitation to a particular field possible? How can a federated search engine take these differences into account?
  • 56. 56 Federated searching: difficulties / challenges / problems - A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems. For instance, the retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision.
  • 57. 57 Federated searching: difficulties / challenges / problems - When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system. Searches are reduced to the lowest common denominator.
  • 58. 58 Federated searching: difficulties / challenges / problems - Differences in response time among the target sources. A slow response of a target source can hinder the final analysis and presentation of the results to the user.
  • 59. 59 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 60. 60 Federated searching: difficulties / challenges / problems - Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this. This is an important difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching.
  • 61. 61 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 62. 62 Federated searching: difficulties / challenges / problems - Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows: - The computer hardware used by the systems - Slower distributed searching through several computer systems, versus faster searching through a more centralised computer database of a priori compiled records
  • 63. 63 Federated searching: difficulties / challenges / problems - The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result. For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature.
  • 64. 64 Federated searching: conclusion Federated searching - is a continuous challenge for developers of the sophisticated software and for the implementers in libraries and information centers - offers benefits for those end-users who are not enthusiastic to work with separate target source databases - does not eliminate the need for access to individual databases
  • 65. 65 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • 66. 66 Information Retrieval in a World of Scattered Information Sources 2. Applications of methods for efficient information retrieval
  • 67. 67 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 68. 68 Internet global subject directories: introduction • They are virtual libraries with open shelves, for browsing. • They are manually generated, man-made by many people. • They can be browsed following a tree structure or a more complicated variation.
  • 69. Example 69 Internet global subject directories: Yahoo!: screenshot of home page
  • 70. Example 70 Internet global subject directories: BUBL LINK • A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found at http://bubl.ac.uk/link/ [accessed 2008] • Accessible free of charge. • The categories are based on the well-known general Dewey classification system.
  • 71. Example 71 Internet global subject directories: dmoz: screenshot of the starting page
  • 72. Example 72 Internet global subject directories: Librarians' Internet Index: screenshot
  • 73. Example 73 Internet global subject directories: IPL: screenshot
  • 74. Example 74 Internet global subject directories: Intute: screenshot
  • 75. 75 Internet indexes: scheme of the mechanism User searching for Internet based information Internet client hardware and software user interface to a search engine Internet information source Internet index search engine Internet crawler and indexing system database of Internet files, including an index
  • 76. Example 76 Internet indexes: Google • http://www.google.com/ • Available since 2001 with most of its features. • The most popular search system since 2003.
  • 77. Example 77 Internet indexes: Google Scholar • Google Scholar allows us to search for more scholarly information sources, including journal articles. • A beta (test) version has been available since November 2004. • The system is accessible starting from the home page of Google as one of the additional services, or more directly from http://scholar.google.com/
  • 78. Example 78 Internet indexes: Google Scholar: screenshot
  • 79. Example 79 Internet indexes: Bing • http://www.bing.com/ • Available in 2009 in beta = test version. • Replaces Microsoft Live as well as Yahoo Web Search ?
  • 80. Example 80 Internet indexes: Scirus • The search interface: http://www.scirus.com/ • Since 2001. • Offers not only access to files in html format, but also to files in PDF. • Allows you to search for more or less “manually” selected »scientific WWW pages, plus »the contents of some scientific, bibliographic databases. • In the sense that Scirus is dedicated to scientific information, it is similar to Google Scholar.
  • 81. Example 81 Internet indexes: Ask • Available from: http://www.ask.com/ • Offers a feature that is not offered by most other search systems: categorization = classification = refinement = clustering of search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made
  • 82. 82 Internet indexes cover only a part of the Internet: metaphore The “visible” part of Internet The “deep, hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index like Google Web Search)
  • 83. Example 83 Databases accessible over the Internet: example: OAISTER • http://oaister.umdl.umich.edu/ • “Our goal is to create a collection of freely available, previously difficult-to-access, academically-oriented digital resources that are easily searchable by anyone.”
  • 84. Example 84 Databases accessible over the Internet: example: OAISTER • OAISTER makes searching possible in millions of digital documents that form part of institutional repositories all over the world. • OAISTER covers this kind of documents better than Google Web Search (according to independent academic investigations in 2006 and 2008).
  • 85. Example 85 Databases accessible over the Internet: example: scientificcommons • http://www.scientificcommons.org/ • Since 2007 • Similar to OAISTER: Allows you to search the full texts in scientific open access repositories all over the world. ☺
  • 86. Example 86 Databases accessible over the Internet: example: Medline • Medline/PubMed offers bibliographic descriptions of publications on medicine, free of charge. ☺
  • 87. 87 Current awareness services focusing on WWW pages: Google Alerts • Available at http://www.google.com/ and then see the page with additional services or more directly from http://www.google.com/alerts/ • Since 2004. • Can discover relevant changed or new WWW pages for you in the future. • Is based on the popular Internet index Google. • Works with search queries given by you that are stored on their server computer.
  • 88. 88 Internet with WWW and printed books • Since a few years, Internet with the WWW have become the primary information source for many people. • However: »A lot of information is still distributed only in the form of printed books »The content of old printed books can still be interesting. »The content of most printed books is (still) not available on the Internet.
  • 89. 89 Public access book databases: introduction • Most general WWW search engines do NOT allow you to find out about the existence of books that may be interesting for you, at least not in a systematic and efficient way. • So, specific search tools to find books can be useful.
  • 90. 90 Public access book databases provided by bookshops • To find currently available books, the bibliographic databases assembled by big bookshops are interesting. • Several offer a good coverage. • Many are accessible free of charge. • The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book. • Some provide a current awareness service, also free of charge. • Take into account delivery costs: postage + import tax
  • 91. Examples 91 Book databases accessible free of charge: examples in U.S.A. • Amazon.com (US): http://www.amazon.com/ • This company offers also different, more local versions that offer books in other languages, such as http://www.amazon.co.uk/ http://www.amazon.fr/ • note: amazon, NOT amazone • Subject description is poor. • Take into account delivery costs: postage + import tax
  • 92. Examples 92 Book databases accessible free of charge: examples in U.S.A. • Barnes and Noble (US): http://www.barnesandnoble.com/ or http://www.bn.com/
  • 93. Examples 93 Book databases accessible free of charge: examples in U.S.A. • http://www.completebook.com/cbmsi/bookaction.do
  • 94. Examples 94 Book databases accessible free of charge: examples in U.S.A. • http://www.overstock.com/
  • 95. Examples 95 Book databases accessible free of charge: examples in U.S.A. • http://www.powells.com/ • Specialised in books only.
  • 96. Examples 96 Book databases accessible free of charge: examples in Europe • Blackwell’s on the Internet (International, academic books): http://www.blackwell.co.uk/ • VLB for books in German http://www.buchhandel.de/ • For books in French http://www.chapitre.com • Boeknet - De Nederlandse Internet Boekhandel (Dutch) http://www.boeknet.nl/
  • 97. 97 Search systems for books that are made available by dealers User Book dealer catalog database descriptions of books & real books for sale
  • 98. 98 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 99. 99 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 100. 100 Search systems for books that are made available by dealers User Multi-dealer database = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 101. 101 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 102. 102 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 103. 103 Free public access multi-dealer book databases: examples • http://www.abebooks.com/ [accessed 2008] • http://www.abebooks.fr/ offers a user interface in French • Covers > 10 000 bookshops. • The company has been acquired by Amazon in 2008.
  • 104. 104 Free public access multi-dealer book databases: examples • http://www.alibris.com/ [accessed 2008]
  • 105. 105 Free public access multi-dealer book databases: examples • Amazon Marketplace: http://www.amazon.com/ [accessed 2009] • In synergy with the online bookshop Amazon on 1 WWW site: Used books are displayed alongside Amazon’s new books. • “the world’s biggest online book bazaar” • Subject description is poor. • Take into account delivery costs: postage + tax
  • 106. 106 Free public access multi-dealer book databases: examples
  • 107. 107 Free public access multi-dealer book databases: examples • http://www.biblio.com/ or http://biblio.com/ [accessed 2008]
  • 108. 108 Free public access multi-dealer book databases: examples • http://www.boekenverkoper.nl [accessed in 2007]
  • 109. 109 Free public access multi-dealer book databases: examples • http://www.choosebooks.com/ [accessed 2008]
  • 110. 110 Free public access multi-dealer book databases: examples • http://www.tomfolio.com/ [accessed 2008]
  • 111. 111 Full-text databases of books: introduction • Some organisations have scanned the contents of thousands of books, to make them full-text searchable through the Internet.
  • 112. 112 Full-text databases of books: Amazon • http://www.amazon.com/ and choose BOOKS • Since 2004 • Also incorporated in the search engine A9
  • 113. 113 Full-text databases of books: Google Book Search • http://www.books.google • Since 2005
  • 114. Example 114 Online Public Access Catalogues: union catalogues of libraries • Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’. • Example: Copac http://www.copac.ac.uk/ is accessible free of charge.
  • 115. Examples 115 Online Public Access Catalogues: union catalogues: examples • European National Libraries, catalogues harvested: http://www.theeuropeanlibrary.org/portal/index.html
  • 116. Examples 116 Online Public Access Catalogues: union catalogues: examples • Europeana: documents on European culture. http://www.europeana.eu/portal/ Metadata are harvested from co-operating organisations.
  • 117. 117 Online access databases about journal articles: overview • Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers. • Many publishers offer searchable bibliographies, but only of their own publications. (for instance Elsevier, Emerald, Sage) • Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.
  • 118. Example 118 Online access databases about journal articles: Ingenta • Available from: http://www.ingentaconnect.com/ • Ingenta allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts. • The organisation claims to be “The most comprehensive collection of academic and professional publications”
  • 119. Example 119 Online access databases about journal articles: Infotrieve ArticleFinder • Available from: http://www.infotrieve.com/ • Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text. • Payment is required to receive the full text of a document.
  • 120. Example 120 Online access databases about journal articles: Scirus • The search interface: http://www.scirus.com • This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. • This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier. • Offered free of charge by Elsevier. • An article can be downloaded in full-text format only when a fee has been paid to the publisher.
  • 121. Example 121 Online access databases about journal articles: Google Scholar • Google Scholar allows us to search for more scholarly information sources, including journal articles. • A beta (= test) version has been available since November 2004. • The system is accessible starting from the home page of Google as one of the additional services besides the normal, classical WWW search.
  • 122. Example 122 Online access databases about journal articles: DOAJ screenshot
  • 123. Example 123 Online access databases about journal articles: Eric • http://ericir.syr.edu/Eric/ • Eric allows searching a bibliographic database of articles and other documents in the fields of information science and education. + Available in open access, free of charge - Payment is required to receive the full text of a document.
  • 124. Example 124 Online access databases about journal articles: LISTA • http://www.libraryresearch.com/ • Bibliographic database; covers libraries and information management, with subjects such as librarianship, classification, cataloging, bibliometrics, online information retrieval, information management and more, from more than 600 periodicals plus books, research reports, and proceedings • Offered since 2005 • Delivered via the EBSCOhost platform + Free of charge
  • 125. Example 125 Online access databases about journal articles: Teacher Reference Center • http://www.TeacherReference.com/ • Teacher Reference Center (TRC) Journal Information for Teachers allows to search popular teacher and administrator trade journals, periodicals, and books • via the EBSCOhost platform • since 2006 + offered free of charge
  • 126. Example 126 Online access databases: Web of Science • One of the bibliographic databases in Web of Knowledge is the Web of Science. • This is a bibliographic database that covers the articles published in the most important scientific journals. Web of Knowledge Web of Science
  • 127. 127 Finding images on the Internet: introduction + Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet. + When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).
  • 128. Examples 128 Finding images on the Internet: screen shot of a Google image search
  • 129. Example 129 Finding images on the Internet: examples of search engines • http://images.google.com/ ! or through http://www.google.com/ [accessed in 2009] • The largest database in this category (at least in 2002…2008). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.
  • 130. Eample 130 Finding images on the Internet: examples of search engines • http://www.bing.com/ • Available in 2009 in beta = test version. • Replacing Microsoft Live and Yahoo Search ?
  • 131. 131 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 132. 132 Federated searching through scattered databases: why? • Applications: »Finding information in bibliographic databases »Finding the availability of rooms in various hotels »Finding flights to a particular destination offered by various airline companies »Finding scientific data that are made available by various computers all over the world
  • 133. Example 133 Federated searching: application: finding a hotel room in some city
  • 134. Example 134 Federated searching: application: finding scientific data • OBIS = Ocean Biogeographic Information System • http://www.iobis.org/ • Gateway to scientific data on living systems in the oceans. • The data reside on many computers all over the world.
  • 135. 135 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • 136. Example 136 Databases accessible over the Internet: example • http://WorldWideScience.org/ • “A global science gateway connecting you to national and international scientific databases and portals. Accelerates scientific discovery and progress by providing one-stop searching of global science sources.”
  • 137. 137 Meta WWW search systems on a server computer in the WWW Client Internet computer WWW + WWW WWW server client program computer User WWW server computers with Internet search systems In Out
  • 138. 138 Meta-search systems: terminology / vocabulary / synonyms “multi-threaded search systems” = “multiple search systems” = “multi-search systems” = “meta-search systems” = “intelligent search agents” = “federated search systems” = “portals”
  • 139. Examples 139 Meta-search systems on a server computer • http://aftervote.com/ • http://draze.com/ • http://www.all4one.com • http://www.bytesearch.com • http://clusty.com/ • http://www.cyber411.com • http://www.dogpile.com = http://dogpile.com/ • http://www.go2net.com = http://www.metacrawler.com • http://jux2.com • http://www.kartoo.com • http://www.mamma.com • http://www.museseek.com • http://www.profusion.com • http://www.search.com • http://www.vivisimo.com = http://vivisimo.com/
  • 141. 141 Meta-search systems: server-based: example: Vivisimo • Vivisimo adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. • Vivisimo can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.
  • 142. Example 142 Meta-search systems: server-based: example: Clusty • Adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. • Can accomplish this on the fly, that is WITHOUT pre- processing the documents before the search.
  • 143. Example 143 Meta-search systems: server-based: example: Clusty screenshot in 2006
  • 144. 144 Meta-search systems: disadvantages - It is not always clear through which Internet indexes the meta-search system will search. - Not all meta-search systems can search all the major primary search systems; for instance the famous Google Internet index is NOT included in most systems. - Only a limited number of the results that can be obtained from the various Internet indexes are shown.
  • 145. 145 Free public access book meta-search systems: types We can make the following distinction between various types of meta-systems for searching: 1. Database resulting from merging several existing smaller databases = aggregator database In this case of books: multi-dealer database = “listing service” 2. Federated search system = cross-database search system
  • 146. 146 Free public access search systems: federated search systems • Each of the searched target databases can be »a catalogue database managed by the owner/dealer/shop/seller, as well as »a multi-dealer database
  • 147. 147 Search systems for books that are made available by dealers User Book dealer catalog database descriptions of books & real books for sale
  • 148. 148 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 149. 149 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 150. 150 Search systems for books that are made available by dealers User Multi-dealer database = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 151. 151 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 152. 152 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 153. 153 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 154. 154 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 155. - 155 Free public access federated search systems for books: examples
  • 156. 156 Free public access federated search systems for books: examples • http://www.allbookstores.com/ [accessed 2006]
  • 157. 157 Free public access federated search systems for books: examples
  • 158. 158 Free public access federated search systems for books: examples • http://www.BookFinder.com/ [accessed 2009]
  • 159. 159 Free public access federated search systems for books: examples • http://www.bookfinder4u.com/ [accessed 2007]
  • 160. 160 Free public access federated search systems for books: examples • http://www.bookpursuit.com/ [accessed 2006]
  • 161. 161 Free public access federated search systems for books: examples
  • 162. 162 Free public access federated search systems for books: examples • http://www.dealtime.com/ [accessed 2006]
  • 163. 163 Free public access federated search systems for books: examples • http://www.epinions.com/Books [accessed 2006]
  • 164. 164 Free public access federated search systems for books: examples • http://www.fetchbook.info/ [accessed 2006]
  • 165. 165 Free public access federated search systems for books: examples • http://www.gallileus.info/search/ [accessed 2006]
  • 166. 166 Free public access federated search systems for books: examples • http://www.priceminister.com/livres-bd [accessed 2007] • Can search not only books but also other products in various shops.
  • 167. 167 Free public access federated search systems for books: examples • http://www.usedbooksearch.co.uk/books.htm [accessed 2008] • Specialised in used books, not in new books.
  • 168. 168 Free public access federated search systems for books: examples • http://www.vialibri.net/ [accessed 2008]
  • 169. 169 Free public access federated search systems for books are interesting • Knowledge about their quality is interesting » for end users as well as for librarians who buy books, » for librarians who serve their users by performing searches for books, » for librarians who propose databases to their users, for instance on their library WWW site or who want to include one or several book search engines in their own local system for federated searching through several targets in one action.
  • 170. 170 Online Public Access Catalogues: simultaneous searching • Some meta-search services allow simultaneous, parallel searching in one search action over several databases of libraries.
  • 171. Example 171 Online Public Access Catalogues: simultaneous searching: examples • Simultaneous access to catalogues of libraries related to water, organised by IAMSLIC, using Z39.50
  • 172. 172 Information Retrieval in a World of Scattered Information Sources 3. Comparison of methods for efficient information retrieval
  • 173. 173 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 174. 174 Comparison of methods for efficient information retrieval • Merged=aggregated databases react faster than federated search systems (in most cases). »Explanation: They do not need several simultaneous Internet connections & they do not have to merge raw intermediate results into the result that is finally shown to the user. ☺
  • 175. 175 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 176. 176 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • 177. 177 Comparison of methods for efficient information retrieval • Federated search systems offer a higher coverage than direct searching of databases or merged databases (in most cases). »Explanation: They can exploit many databases and even merged=aggregated databases in one search action. For example, in 1 search, they can cover more than 100 million descriptions of physical books = couples of book and dealer (not book titles). ☺
  • 178. 178 Comparison of methods for efficient information retrieval • Federated search systems offer results that are more up to date than when an aggregated database is searched with contents that is (only) a snapshot made in the past. This is important when data should be very fresh = up-to-date. Examples: booking=reservation systems for flights, hotel rooms ☺
  • 179. 179 Information Retrieval in a World of Scattered Information Sources Conclusions
  • 180. 180 Conclusions: 2 methods • A single, simple, standard method = approach = solution does not (yet) exist. • Two basic methods are common. • They have their own »advantages and »disadvantages.
  • 181. 181 Conclusions: 1 dimension • Up to now we have made primarily the distinction » Merging records in 1 database on 1 computer & searching this database » Federated searching in one action of databases on various computers
  • 182. 182 Conclusions: more dimensions • However, the location of the databases is only 1 aspect / dimension of possible methodological approaches. • Other dimensions / aspects are for instance: 2. Unification / standardization of database record structures in fields according to a standard, for better interoperability. 3. Unification / standardization of subject descriptions, for better interoperability. • This bring us to 3 aspects / dimensions so we can visualize this as a cube.
  • 183. 183 Conclusions: the cube of interoperability 1. One computer 2. One database field structure 3. One subject description system BEST CASE Inter- operability 1. Various computers 2. Various database field structures 3. Various subject description systems WORST CASE
  • 184. 184 Methods for efficient information retrieval: conclusions • For end users, the underlying methods of most information systems are either “not clear” (= negative formulation) “transparent” (= positive formulation)
  • 185. 185 Methods for efficient information retrieval: conclusions • The examples given show at least that progress in this field is impressive. ☺
  • 187. 187 • You are free to copy, distribute, display this work under the following conditions: »Attribution: You must mention the author. »Noncommercial: You may not use this work for commercial purposes. »No Derivative Works: You may not change, modify, alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work.