The document summarizes methods for efficient information retrieval in a world with scattered information sources. It discusses merging/aggregating sources into a searchable database and federated searching through scattered databases. Both methods offer benefits like saving user time, but also have challenges like differences in formatting, metadata schemes, and response times among sources. Federated searching in particular has issues with relevance ranking and interpreting queries across different systems.
1. 1
Zo veel informatie
Zo weinig tijd
Paul.Nieuwenhuysen@vub.ac.be
Created to support a presentation
at the bi-annual 2-day conference series “Informatie”
organised by VVBAD, in Oostende, Belgium
September 10-11, 2009
“Informatie aan zee”
2. 2
0. Introduction
with problem statements
contents
1. Methods to make
= summary information retrieval
= structure efficient in a world of
scattered sources
= overview
2. Applications of those
methods
of this
presentation 3. Comparison of the
methods
4. Conclusions
3. 3
These slides should be available from the WWW site
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
and also from the WWW site of the organisers of the conference =
VVBAD
4. 4
Information Retrieval in a World
of Scattered Information Sources
0. Introduction
and problem statements
5. 5
Introduction:
scattering of sources
• Users want to exploit information sources fast and
effectively.
• This is hindered by the fact that digital, electronic
information sources that may contain relevant
information are created and scattered, distributed on
numerous computers all over the intranet of the user’s
organization AND over the Internet and the WWW.
7. 7
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
1. They must be used one after the other which requires
many decisions and actions
8. 8
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
2. They offer different user interfaces in the retrieval phase,
which is confusing
9. 9
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
3. They offer found information items in various data
formats
10. 10
Introduction:
scattering of sources difficulties
• Using many information retrieval systems costs time:
4. They display found items in different ways on a computer
screen
11. 11
Introduction:
scattering of sources difficulties
Small = BEAUTIFUL
12. 12
Introduction:
scattering of sources difficulties
13. 13
Introduction:
problem statements
1. Which methods have been
developed and applied to
cope with this reality?
14. 14
Introduction:
problem statements
2. Which concrete
applications are available
and how can an end-user
exploit systems created in
this domain?
15. 15
Introduction:
problem statements
3. How can information
intermediaries evaluate and
apply these methods to
bring information more
efficiently to end-users?
16. 16
Information Retrieval in a World
of Scattered Information Sources
1. Methods
to make information retrieval efficient
in a world of scattered sources
17. 17
Method 1: Merging = aggregating
into a searchable database
User
User User
User
Search engine Aggregated database
Database Database Database D
or web site or web site or web site or
or… or… or…
18. 18
Method 2: Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
19. 19
Both methods
offer benefits to the users
+ Saves the users time executing queries to various servers
or browsing through various systems.
☺
20. 20
Both methods
offer benefits to the users
+ Offers a uniform / consistent display of results in the
output phase.
☺
21. 21
Both methods
offer benefits to the users
+ Some systems offer tools to refine display of the results;
for instance
+ to deduplicate very similar items in the result set,
+ to sort the results,
+ to rank the results,
+ to visualize the results in a more graphical way,
+ to search within the result set,
+…
☺
22. 22
Both methods bring
difficulties / challenges / problems
- In many cases there are differences among the merged
sources in the formatting/structuring of their database
records in fields.
This hinders
- searching limited to a field
- displaying selected fields only (such as title)
- sorting of the displayed records on the contents of a
particular selected field (such as author or date)
23. 23
Both methods bring
difficulties / challenges / problems
- In many cases there are differences among sources in the
metadata schemes that are applied in the databases to
improve retrieval, such as
»classifications
»taxonomies
»thesaurus systems
»ontologies
This hinders the exploitation of the added value of such
metadata.
24. 24
Both methods bring
difficulties / challenges / problems
- How to deduplicate/dedupe/cluster
very similar entries/results/items
= near-duplicates,
from various target sources?
When is similar similar enough?
Which entry/result/item to choose/select
as the representative of a cluster of similar entries?
25. 25
Both methods bring
difficulties / challenges / problems
- When some special, non-standard, dedicated retrieval
software is made available by a specific target source
database, to offer special features to the user to exploit
the database better than with a more classical standard
retrieval interface, then this may be lost in the new
retrieval system.
Searches are reduced to the lowest common denominator.
Examples:
- clustering of results
- deduplication of results…
26. 26
Method 1: Merging = aggregating
into a searchable database
User
User User
User
Search engine Aggregated database
Database Database Database D
or web site or web site or web site or
or… or… or…
27. 27
Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH)
user
user
Data
Service Providers
Search Provider
Client & request
computer Metadata metadata
+
retrieval
client database
server PMH
software
http metadata
http protocol
protocol
metadata
Digital objects
28. 28
Merging into a searchable database
offers benefits for the users
+ Applicable even in the absence of data communication to
remote servers
(whereas federated searching needs good, fast data
communication.)
Therefore this is the relatively ‘old’ method.
☺
29. 29
Merging into a searchable database
brings difficulties / challenges
- The contents of the aggregated database is less up to data
than the original information sources.
The importance of this aspect depends of course
- on the particular application
- on the time delay
30. 30
Method 2: Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
32. 32
Federated searching
through scattered databases: why?
The perfect trip:
The perfect trip:
☺
1. A cheap and nice flight
1. A cheap and nice flight
2. A cheap and nice hotel
2. A cheap and nice hotel
3. A visit to a nice museum
3. A visit to a nice museum
4. Something nice to read (free via your library)
4. Something nice to read (free via your library)
33. Example 33
Federated searching: application:
finding a suitable flight
Example:
• http://CheapTickets.com/ for the USA
34. Example 34
Federated searching: application:
finding a hotel room in some city
35. Example 35
Federated searching:
searching in a museum
36. Example 36
Federated searching:
searching in a library
37. 37
Federated searching:
integrating access
Intranet
Intranet
Articles
Articles
WWW
WWW
search engines
search engines
Journals
Journals
Catalog
Catalog
Publishers
Publishers database(s)
database(s)
of other libraries
of other libraries
Databases
Databases
(full-text or bibliographic)
(full-text or bibliographic)
Local library catalog
Local library catalog
database(s)
database(s)
Meta-searching system
Meta-searching system
38. 38
Federated searching:
benefits for the users
+ The system can help the user to select appropriate
sources.
☺
39. 39
Federated searching:
benefits for the users
+ The system can help in the process of authentication and
authorization when this involves not only a simple
recognition of IP-address of the user’s client computer,
but when it involves user-id’s and passwords.
☺
40. 40
Federated searching:
benefits for the users
+ The need to know which particular database is suitable
for a particular search is reduced, because several ones
can be searched in one action.
☺
41. 41
Federated searching:
benefits for the users
+ The users have to learn only 1 user interface for
searching and only 1 search syntax,
instead of a user interface and a search syntax for each
database!
☺
42. 42
Federated searching:
benefits for the users
+ Can make users search and exploit databases that they
would never use otherwise, that is without federated
search system!
☺
43. 43
Federated searching:
benefits for the users
+ Useful, relevant, interesting items/references can be
found/uncovered from unexpected, unknown, unfamiliar
databases!
This is mainly beneficial in the case of interdisciplinary
subjects/topics.
☺
44. 44
Federated searching:
benefits for the users
+ Some systems offer tools to refine display of the results;
for instance
»to dedupe very similar items in the result set,
»to sort the results,
»to rank the results,
»to search within the result set,
»…
☺
45. 45
Federated searching:
benefits for the users
+ Some systems offer interesting links from a retrieval
result to various related sources or services
(such as the full text or a document delivery service),
using a link generator based on the OpenURL standard.
☺
46. 46
Federated searching:
benefits for the users
+ Some systems check for each retrieved bibliographic
description if the corresponding full text is immediately
available online and indicate this immediately to the
user, on the fly.
☺
47. 47
Federated searching:
benefits for the users
+ Some systems further process the retrieved results and
display them in an interesting way that is not offered by
the searched original systems.
For instance:
» Clustering of results according to
subject or age or availability of full text
» Displaying the results in a graphical way
☺
49. 49
Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
50. 50
Federated searching:
difficulties / challenges / problems
- How to provide some useful relevance ranking of search
results/entries,
even when the target databases can be quite different in
type and quality, and
even when no index is created in advance, just-in-case,
well before the search action, like Google and other
Internet search engines do.
51. 51
Federated searching:
difficulties / challenges / problems
- Powerful / sophisticated / refined forms of searching may
not be applicable in a federated search.
Example:
limiting to a particular type of document,
such as a therapy (in medicine).
This may cause a LOSS of time, instead of winning time.
52. 52
Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine Search engine Search engine
Database Database Database
53. 53
Federated searching:
difficulties / challenges / problems
- Differences among target sources in the Internet
application protocols that are applied normally,
by default, for connection/communication and retrieval,
such as
»(telnet) HTTP
»proprietary, non-standard protocols
»Z39.50, ISO239.50, SRU, and related protocols that are
developed for federated-searching!
54. 54
Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
55. 55
Federated searching:
difficulties / challenges / problems
- Various search engines may act in different ways!
For instance:
Is truncation of a word in a search query possible?
Is limitation to a particular field possible?
How can a federated search engine take these differences
into account?
56. 56
Federated searching:
difficulties / challenges / problems
- A query with several words and without explicit Boolean
operators can be interpreted in various ways
by the various database retrieval systems.
For instance, the retrieval software may apply the
Boolean operator AND to combine all the query words,
but it may also use OR.
In the case that the federated search system does not take
care of this well, then this may lead to lower recall and
precision.
57. 57
Federated searching:
difficulties / challenges / problems
- When some special, non-standard, dedicated retrieval
software is made available by a specific target source
databases to offer special features to the user to exploit
the database better than with a standard retrieval
interface,
then the source can probably not be exploited as well by
the federated search system.
Searches are reduced to the lowest common denominator.
58. 58
Federated searching:
difficulties / challenges / problems
- Differences in response time among the target sources.
A slow response of a target source can hinder the final
analysis and presentation of the results to the user.
59. 59
Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
60. 60
Federated searching:
difficulties / challenges / problems
- Some databases can NOT be included as a target
database in a federated searching engine,
because their owners/producers do not allow this.
This is an important difficulty, because in this way
interesting / valuable databases are perhaps not exploited
by users who rely on federated searching.
61. 61
Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
62. 62
Federated searching:
difficulties / challenges / problems
- Users may be less impressed by a federated searching
system than by the simple, common, familiar, famous
Internet / WWW search engines, as response time is in
most cases less impressive, due to differences as follows:
- The computer hardware used by the systems
- Slower distributed searching through several computer
systems, versus faster searching through a more centralised
computer database of a priori compiled records
63. 63
Federated searching:
difficulties / challenges / problems
- The evaluation of the quality of each search result
from a federated search action may be more difficult than
when each database is searched separately,
because the user may be less aware of the limitations,
strengths, selection criteria and aims of the individual,
separate databases that offer each result.
For instance, peer-reviewed articles from reputable scientific
journals may be mixed with more popular and more biased,
unscientific texts from trade literature.
64. 64
Federated searching:
conclusion
Federated searching
- is a continuous challenge
for developers of the sophisticated software and
for the implementers in libraries and information centers
- offers benefits for those end-users
who are not enthusiastic to work with separate target
source databases
- does not eliminate the need for access to individual
databases
65. 65
Hybrid method:
merging data + federated searching
User User
User
User
Search engine
Federated search engine
Aggregated database
Search engine Search engine
Database Database Database
or web site or web site or web site
Database
Database or… or… or…
66. 66
Information Retrieval in a World
of Scattered Information Sources
2. Applications of methods
for efficient information retrieval
67. 67
Method 1: Merging = aggregating
into a searchable database
User
User User
User
Search engine Aggregated database
Database Database Database D
or web site or web site or web site or
or… or… or…
68. 68
Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.
• They can be browsed following a tree structure or a more
complicated variation.
69. Example 69
Internet global subject directories:
Yahoo!: screenshot of home page
70. Example 70
Internet global subject directories:
BUBL LINK
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/ [accessed 2008]
• Accessible free of charge.
• The categories are based on the well-known general
Dewey classification system.
71. Example 71
Internet global subject directories:
dmoz: screenshot of the starting page
72. Example 72
Internet global subject directories:
Librarians' Internet Index: screenshot
73. Example 73
Internet global subject directories:
IPL: screenshot
74. Example 74
Internet global subject directories:
Intute: screenshot
75. 75
Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software
user interface to a search engine Internet information source
Internet index search engine Internet crawler and indexing system
database of Internet files, including an index
76. Example 76
Internet indexes:
Google
• http://www.google.com/
• Available since 2001 with most of its features.
• The most popular search system since 2003.
77. Example 77
Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services,
or more directly from http://scholar.google.com/
78. Example 78
Internet indexes:
Google Scholar: screenshot
79. Example 79
Internet indexes:
Bing
• http://www.bing.com/
• Available in 2009 in beta = test version.
• Replaces
Microsoft Live
as well as
Yahoo Web Search ?
80. Example 80
Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Since 2001.
• Offers not only access to files in html format,
but also to files in PDF.
• Allows you to search for more or less “manually” selected
»scientific WWW pages, plus
»the contents of some scientific, bibliographic databases.
• In the sense that Scirus is dedicated to scientific
information, it is similar to Google Scholar.
81. Example 81
Internet indexes:
Ask
• Available from: http://www.ask.com/
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement = clustering
of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made
82. 82
Internet indexes cover only a part of
the Internet: metaphore
The “visible” part of Internet
The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google Web Search)
83. Example 83
Databases accessible over the Internet:
example: OAISTER
• http://oaister.umdl.umich.edu/
• “Our goal is to create a collection of freely available,
previously difficult-to-access, academically-oriented
digital resources that are easily searchable by anyone.”
84. Example 84
Databases accessible over the Internet:
example: OAISTER
• OAISTER makes searching possible in millions of digital
documents that form part of institutional repositories
all over the world.
• OAISTER covers this kind of documents better than
Google Web Search (according to independent academic
investigations in 2006 and 2008).
85. Example 85
Databases accessible over the Internet:
example: scientificcommons
• http://www.scientificcommons.org/
• Since 2007
• Similar to OAISTER:
Allows you to search the full texts in scientific open
access repositories all over the world.
☺
86. Example 86
Databases accessible over the
Internet: example: Medline
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.
☺
87. 87
Current awareness services focusing
on WWW pages: Google Alerts
• Available at http://www.google.com/ and then see the
page with additional services
or more directly from http://www.google.com/alerts/
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored
on their server computer.
88. 88
Internet with WWW
and printed books
• Since a few years, Internet with the WWW have become
the primary information source for many people.
• However:
»A lot of information is still distributed only in the form of
printed books
»The content of old printed books can still be interesting.
»The content of most printed books is (still) not available on
the Internet.
89. 89
Public access book databases:
introduction
• Most general WWW search engines do NOT allow you
to find out about the existence of books that may be
interesting for you, at least not in a systematic and
efficient way.
• So, specific search tools to find books can be useful.
90. 90
Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage.
• Many are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.
• Take into account delivery costs: postage + import tax
91. Examples 91
Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
• This company offers also different, more local
versions that offer books in other languages, such as
http://www.amazon.co.uk/
http://www.amazon.fr/
• note: amazon, NOT amazone
• Subject description is poor.
• Take into account delivery costs: postage + import
tax
92. Examples 92
Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.barnesandnoble.com/ or http://www.bn.com/
93. Examples 93
Book databases accessible free of
charge: examples in U.S.A.
• http://www.completebook.com/cbmsi/bookaction.do
94. Examples 94
Book databases accessible free of
charge: examples in U.S.A.
• http://www.overstock.com/
95. Examples 95
Book databases accessible free of
charge: examples in U.S.A.
• http://www.powells.com/
• Specialised in books only.
96. Examples 96
Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/
• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/
97. 97
Search systems for books that are
made available by dealers
User
Book dealer
catalog
database
descriptions of books & real books for sale
98. 98
Search systems for books that are
made available by dealers
User
Book dealer
catalog
databases
descriptions of books & real books for sale
99. 99
Search systems for books that are
made available by dealers
User
Book dealer
catalog
databases
descriptions of books & real books for sale
100. 100
Search systems for books that are
made available by dealers
User
Multi-dealer
database
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
101. 101
Search systems for books that are
made available by dealers
User
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
102. 102
Search systems for books that are
made available by dealers
User
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
103. 103
Free public access multi-dealer book
databases: examples
• http://www.abebooks.com/
[accessed 2008]
• http://www.abebooks.fr/
offers a user interface in
French
• Covers > 10 000
bookshops.
• The company has been
acquired by Amazon in
2008.
104. 104
Free public access multi-dealer book
databases: examples
• http://www.alibris.com/
[accessed 2008]
105. 105
Free public access multi-dealer book
databases: examples
• Amazon Marketplace:
http://www.amazon.com/
[accessed 2009]
• In synergy with the online bookshop Amazon on 1
WWW site:
Used books are displayed alongside Amazon’s new
books.
• “the world’s biggest online book bazaar”
• Subject description is poor.
• Take into account delivery costs: postage + tax
107. 107
Free public access multi-dealer book
databases: examples
• http://www.biblio.com/ or http://biblio.com/
[accessed 2008]
108. 108
Free public access multi-dealer book
databases: examples
• http://www.boekenverkoper.nl
[accessed in 2007]
109. 109
Free public access multi-dealer book
databases: examples
• http://www.choosebooks.com/
[accessed 2008]
110. 110
Free public access multi-dealer book
databases: examples
• http://www.tomfolio.com/
[accessed 2008]
111. 111
Full-text databases of books:
introduction
• Some organisations have scanned the contents of
thousands of books,
to make them full-text searchable through the Internet.
112. 112
Full-text databases of books:
Amazon
• http://www.amazon.com/ and choose BOOKS
• Since 2004
• Also incorporated in the search engine A9
113. 113
Full-text databases of books:
Google Book Search
• http://www.books.google
• Since 2005
114. Example 114
Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.
115. Examples 115
Online Public Access Catalogues:
union catalogues: examples
• European National Libraries, catalogues harvested:
http://www.theeuropeanlibrary.org/portal/index.html
116. Examples 116
Online Public Access Catalogues:
union catalogues: examples
• Europeana: documents on European culture.
http://www.europeana.eu/portal/
Metadata are harvested from co-operating organisations.
117. 117
Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies,
but only of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers,
free of charge.
118. Example 118
Online access databases about journal
articles: Ingenta
• Available from: http://www.ingentaconnect.com/
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• The organisation claims to be
“The most comprehensive collection of academic and
professional publications”
119. Example 119
Online access databases about journal
articles: Infotrieve ArticleFinder
• Available from: http://www.infotrieve.com/
• Infotrieve allows you to search free of charge
in a bibliographic database of the articles
of more than 20 000 journal titles and conference
proceedings,
NOT full-text.
• Payment is required to receive the full text of a document.
120. Example 120
Online access databases about journal
articles: Scirus
• The search interface: http://www.scirus.com
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
• This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• Offered free of charge by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher.
121. Example 121
Online access databases about journal
articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (= test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services besides the
normal, classical WWW search.
122. Example 122
Online access databases about journal
articles: DOAJ screenshot
123. Example 123
Online access databases about journal
articles: Eric
• http://ericir.syr.edu/Eric/
• Eric allows searching a bibliographic database of articles
and other documents in the fields of information science
and education.
+ Available in open access, free of charge
- Payment is required to receive the full text of a document.
124. Example 124
Online access databases about journal
articles: LISTA
• http://www.libraryresearch.com/
• Bibliographic database; covers libraries and information
management, with subjects such as librarianship,
classification, cataloging, bibliometrics, online
information retrieval, information management and
more, from more than 600 periodicals plus books,
research reports, and proceedings
• Offered since 2005
• Delivered via the EBSCOhost platform
+ Free of charge
125. Example 125
Online access databases about journal
articles: Teacher Reference Center
• http://www.TeacherReference.com/
• Teacher Reference Center (TRC)
Journal Information for Teachers
allows to search popular teacher and administrator trade
journals, periodicals, and books
• via the EBSCOhost platform
• since 2006
+ offered free of charge
126. Example 126
Online access databases:
Web of Science
• One of the bibliographic databases in Web of Knowledge
is the Web of Science.
• This is a bibliographic database that covers the articles
published in the most important scientific journals.
Web of Knowledge
Web of Science
127. 127
Finding images on the Internet:
introduction
+ Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
+ When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).
128. Examples 128
Finding images on the Internet:
screen shot of a Google image search
129. Example 129
Finding images on the Internet:
examples of search engines
• http://images.google.com/ !
or through http://www.google.com/
[accessed in 2009]
• The largest database in this category
(at least in 2002…2008).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.
130. Eample 130
Finding images on the Internet:
examples of search engines
• http://www.bing.com/
• Available in 2009 in beta = test version.
• Replacing
Microsoft Live and Yahoo Search ?
131. 131
Method 2: Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
132. 132
Federated searching
through scattered databases: why?
• Applications:
»Finding information in bibliographic databases
»Finding the availability of rooms in various hotels
»Finding flights to a particular destination offered by
various airline companies
»Finding scientific data that are made available by various
computers all over the world
133. Example 133
Federated searching: application:
finding a hotel room in some city
134. Example 134
Federated searching: application:
finding scientific data
• OBIS
= Ocean Biogeographic
Information System
• http://www.iobis.org/
• Gateway to scientific
data on living systems
in the oceans.
• The data reside on
many computers all
over the world.
135. 135
Hybrid method:
merging data + federated searching
User User
User
User
Search engine
Federated search engine
Aggregated database
Search engine Search engine
Database Database Database
or web site or web site or web site
Database
Database or… or… or…
136. Example 136
Databases accessible over the Internet:
example
• http://WorldWideScience.org/
• “A global science gateway connecting you to national and
international scientific databases and portals.
Accelerates scientific discovery and progress by providing
one-stop searching of global science sources.”
137. 137
Meta WWW search systems
on a server computer in the WWW
Client Internet
computer WWW
+
WWW
WWW server
client program computer
User WWW
server
computers
with Internet
search
systems
In Out
141. 141
Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.
• Vivisimo can accomplish this on the fly,
that is WITHOUT pre-processing the documents before
the search.
142. Example 142
Meta-search systems: server-based:
example: Clusty
• Adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.
• Can accomplish this on the fly, that is WITHOUT pre-
processing the documents before the search.
143. Example 143
Meta-search systems: server-based:
example: Clusty screenshot in 2006
144. 144
Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is NOT included in most systems.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.
145. 145
Free public access book meta-search
systems: types
We can make the following distinction between various
types of meta-systems for searching:
1. Database resulting from merging several existing
smaller databases = aggregator database
In this case of books:
multi-dealer database = “listing service”
2. Federated search system
= cross-database search system
146. 146
Free public access search systems:
federated search systems
• Each of the searched target databases can be
»a catalogue database managed by the
owner/dealer/shop/seller,
as well as
»a multi-dealer database
147. 147
Search systems for books that are
made available by dealers
User
Book dealer
catalog
database
descriptions of books & real books for sale
148. 148
Search systems for books that are
made available by dealers
User
Book dealer
catalog
databases
descriptions of books & real books for sale
149. 149
Search systems for books that are
made available by dealers
User
Book dealer
catalog
databases
descriptions of books & real books for sale
150. 150
Search systems for books that are
made available by dealers
User
Multi-dealer
database
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
151. 151
Search systems for books that are
made available by dealers
User
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
152. 152
Search systems for books that are
made available by dealers
User Federated
book search systems
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
153. 153
Search systems for books that are
made available by dealers
User Federated
book search systems
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
154. 154
Search systems for books that are
made available by dealers
User Federated
book search systems
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
155. - 155
Free public access federated search
systems for books: examples
156. 156
Free public access federated search
systems for books: examples
• http://www.allbookstores.com/ [accessed 2006]
162. 162
Free public access federated search
systems for books: examples
• http://www.dealtime.com/ [accessed 2006]
163. 163
Free public access federated search
systems for books: examples
• http://www.epinions.com/Books [accessed 2006]
164. 164
Free public access federated search
systems for books: examples
• http://www.fetchbook.info/ [accessed 2006]
165. 165
Free public access federated search
systems for books: examples
• http://www.gallileus.info/search/
[accessed 2006]
166. 166
Free public access federated search
systems for books: examples
• http://www.priceminister.com/livres-bd [accessed 2007]
• Can search not only books but also other products in
various shops.
167. 167
Free public access federated search
systems for books: examples
• http://www.usedbooksearch.co.uk/books.htm
[accessed 2008]
• Specialised in used books, not in new books.
168. 168
Free public access federated search
systems for books: examples
• http://www.vialibri.net/ [accessed 2008]
169. 169
Free public access federated search
systems for books are interesting
• Knowledge about their quality is interesting
» for end users as well as for librarians who buy books,
» for librarians who serve their users by performing
searches for books,
» for librarians who propose databases to their users, for
instance on their library WWW site or who want to
include one or several book search engines in their own
local system for federated searching through several
targets in one action.
170. 170
Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries.
171. Example 171
Online Public Access Catalogues:
simultaneous searching: examples
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50
172. 172
Information Retrieval in a World
of Scattered Information Sources
3. Comparison of methods
for efficient information retrieval
173. 173
Method 1: Merging = aggregating
into a searchable database
User
User User
User
Search engine Aggregated database
Database Database Database D
or web site or web site or web site or
or… or… or…
174. 174
Comparison of methods
for efficient information retrieval
• Merged=aggregated databases react faster than federated
search systems (in most cases).
»Explanation:
They do not need several simultaneous Internet
connections
&
they do not have to merge raw intermediate results into the
result that is finally shown to the user.
☺
175. 175
Method 2: Federated searching
through scattered databases
User User
User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
176. 176
Hybrid method:
merging data + federated searching
User User
User
User
Search engine
Federated search engine
Aggregated database
Search engine Search engine
Database Database Database
or web site or web site or web site
Database
Database or… or… or…
177. 177
Comparison of methods
for efficient information retrieval
• Federated search systems offer a higher coverage than
direct searching of databases or merged databases
(in most cases).
»Explanation: They can exploit many databases and even
merged=aggregated databases in one search action.
For example, in 1 search, they can cover more than 100
million descriptions of physical books
= couples of book and dealer (not book titles).
☺
178. 178
Comparison of methods
for efficient information retrieval
• Federated search systems offer results that are more up
to date than when an aggregated database is searched
with contents that is (only) a snapshot made in the past.
This is important
when data should be very fresh = up-to-date.
Examples:
booking=reservation systems for flights, hotel rooms
☺
180. 180
Conclusions:
2 methods
• A single, simple, standard method = approach = solution
does not (yet) exist.
• Two basic methods are common.
• They have their own
»advantages
and
»disadvantages.
181. 181
Conclusions:
1 dimension
• Up to now we have made primarily the distinction
» Merging records in 1 database on 1 computer
& searching this database
» Federated searching in one action of databases on
various computers
182. 182
Conclusions:
more dimensions
• However, the location of the databases is only 1 aspect /
dimension of possible methodological approaches.
• Other dimensions / aspects are for instance:
2. Unification / standardization of database record structures
in fields according to a standard,
for better interoperability.
3. Unification / standardization of subject descriptions,
for better interoperability.
• This bring us to 3 aspects / dimensions
so we can visualize this as a cube.
183. 183
Conclusions:
the cube of interoperability
1. One computer
2. One database field structure
3. One subject description system
BEST CASE
Inter-
operability
1. Various computers
2. Various database field structures
3. Various subject description systems
WORST CASE
184. 184
Methods for efficient information
retrieval: conclusions
• For end users, the underlying methods of most
information systems are either
“not clear” (= negative formulation)
“transparent” (= positive formulation)
185. 185
Methods for
efficient
information
retrieval:
conclusions
• The examples given
show at least that
progress in this field
is impressive.
☺
187. 187
• You are free to copy, distribute, display this work under
the following conditions:
»Attribution:
You must mention the author.
»Noncommercial:
You may not use this work for commercial purposes.
»No Derivative Works:
You may not change, modify, alter, transform, or build
upon this work.
• For any reuse or distribution, you must make clear to
others the license terms of this work.