The first workshop on the "Qatar Digital Library Project”, held at Qatar University on May 20, 2013.
This project is part of a program of national priorities for scientific research NPRP, and funded by the Qatar National Research Fund (QNRF).
The project is managed by Dr. Edward Fox, the Lead Principal Investigator from Virginia Tech and Dr. Mohamed Samaka the Co-LPI from the Department of Computer Science and Engineering at Qatar University, and shared by many experts in digital libraries such as Dr. Lee Giles from Pennsylvania State University, and Dr. Richard Furuta from Texas A & M University. Consultants such as Dr. John Impagliazzo from Hofstra University in New York and Dr. Susan Lukesh, and Carol Thompson and Robert Laws, researchers Myrna Tabet and Asad Nafes from Qatar University and Tarek Kanan from Virginia Tech, Hamed AlHouri from Texas A & M University.
This workshop is the first part of a series of workshops and seminars to present the project and to train faculty, students, librarians and digital Qatari community members interested in joining the project and expand the national collections and services.
More info at http://qdl.qu.edu.qa/
2. Qatar Digital Library (QDL)
Initiative
Workshop #1
Monday, 20 May 2013
08:45 to 14:00
Auditorium Room 117, New Library
Qatar University — Doha, Qatar
2Monday, 20 May 2013
4. Qatar Digital Library Project
• Global Explosion of information:
o More than 30,000 peer-reviewed research journals
exist worldwide
o 2.5 million articles published per year
• Knowledge society requires:
o Deep awareness and access to the best content
o Real-time research results to assist and improve
strategic decision-making
Monday, 20 May 2013 4
5. Qatar Digital Library Project Team
Qatar University, Qatar:
Mohammed Samaka (Ph.D., Co-Lead PI)
Myrna Tabet
Khalid AbualSaud
Asad Nafees
Sumaya Ali S A Al-Maadeed
(Ph.D., Key Investigator)
This project was made possible by NPRP Grant # 4 - 029 - 1 – 007 from
the Qatar National Research Fund (a member of Qatar Foundation).
Virginia Tech, USA:
Edward Fox (Ph.D., Lead-PI)
Tarek Kanan
Penn. State University, USA:
C. Lee Giles (Ph.D., PI)
Texas A&M, USA:
Richard Furuta (Ph.D., PI)
Hamed Alhoori
Monday, 20 May 2013 5
Consultants:
John Impagliazzo (Ph.D., Key Investigator)
Susan Lukesh (Ph.D.)
Carole Thompson
Robert Laws
6. Project Mission
Transform the use of information in Qatar,
moving toward a knowledge society,
in accord with the Qatar National Vision 2030.
Monday, 20 May 2013 6
Qatar Digital Library Project
7. Project Objectives/Aims
A. Research and prototype digital library systems and infrastructure for
Qatar, focusing initially on Qatari information related to government
and scholarly activities.
Leverage the crawling engine fromPenn State‘s SeerSuite software infrastructure, and
extend it beyond its current focus on English to support Arabic-English collections, and
to cover a broad range of scholarly disciplines, and all types of government
information.
B. Research and build the digital library community in Qatar, supporting
digital library use, services, collection development, tailored systems,
and advancing toward a Knowledge Society.
Study scholarly activities, and engage in community building in Qatar, so DLs can be
tailored to specific domains and to the unique needs of Qatar. Through workshops, a
consulting center at the proposed Institute, and collaborative efforts with libraries and
museums in Qatar, we will identify particular needs and uses, and tailor collections,
systems, and services, to lead toward the Qatari Knowledge Society.
Monday, 20 May 2013 7
Qatar Digital Library Project
8. Welcoming Comments
Dr. Rashid Alammari
Dean, College of Engineering
Qatar University — Doha, Qatar
8Monday, 20 May 2013
09:05 – 09:15
9. Acknowledgment
This workshop and presentation is due to
partial support from a grant from the
Qatar National Research Fund (QNRF)
through its
National Priority Research Program (NPRP)
Number 4-029-1-007
9Monday, 20 May 2013
09:15 – 09:15
11. Overview of Digital Libraries
Dr. Edward Fox
Department of Computer Science
Virginia Tech — Blacksburg, Virginia USA
Project Lead Principal Investigator
11Monday, 20 May 2013
09:15 – 09:40
12. Philosophy & Message
Collaboration
Local
National
Regional, Global
Empowerment
Uploading
Sharing
Open Access
Research
Computing,
Digital libraries,
Info. retrieval, …
Education
DL curriculum
Graduate: ETDs
Ugrad: Ensemble
Monday, 20 May 2013 12
13. Outline
• Acknowledgements
• Digital Libraries
• NDLTD (electronic theses / dissertations)
• Digital Library Curriculum Project
• Ensemble (Pathway in US NSDL)
• Crisis, Tragedy & Recovery Network (CTRnet)
• Saudi Digital Library - SDL
Monday, 20 May 2013 13
14. Acknowledgements
• Mentors (Licklider & Kessler 1967-71 MIT, Salton
1978-1983 Cornell)
• QNRF, Qatar University, NSF and other sponsors
• Students, colleagues, co-investigators
• Virginia Tech: Computer Science, Digital Library
Research Lab, Information Technology
• Collaborators on local, national, and international
projects
14Monday, 20 May 2013
15. DLs — Objectives in 1991
• World Lit.: 24hr / 7day / from desktop
• Integrated “super” information systems: 5S: Table of related
areas and their coverage
• Ubiquitous, Higher Quality, Lower Cost
• Education, Knowledge Sharing, Discovery
• Disintermediation -> Collaboration
• Universities Reclaim Property
• Interactive Courseware, Student Works
• Scalable, Sustainable, Usable, Useful
Monday, 20 May 2013 15
16. DL Overview: Why of Global Interest?
• National projects can preserve antiquities and
heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to economic
and technological growth, education
• DL - a domain for international collaboration
o wherein all can contribute and benefit
o which leverages investment in networking
o which provides useful content on Internet & WWW
o which will tie nations and peoples together more strongly
and through deeper understanding
Monday, 20 May 2013 16
22. 22
Digital Objects (DOs)
• “Born digital”
oCreated digitally
• Digitized version of “real” object
oIs the DO version the same, better, or worse?
oSuggestion for documents : structured + rendered
• Surrogate for “real” object
oScanned versions
o3D models
Monday, 20 May 2013
23. 23
Institutional Repositories
• “Institutional repositories are digital collections
that capture and preserve the intellectual output of
a single university or a multiple institution
community of colleges and universities.”
• Crow, R. “Institutional repository checklist and
resource guide”, SPARC, Washington, D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
Monday, 20 May 2013
24. 24
Goals of Institutional Repositories
(by Steven Harnad, University of Southampton)
• Self Archiving of Institutional Research
o Theses and Dissertations (VTLS NDLTD Project)
o Article preprints and post prints
o Internal documents and map
• Management of digital collections
• Preservation of materials – decentralized approach
• Housing of teaching materials
• Electronic publishing of journals, books, posters,
maps, audio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
Monday, 20 May 2013
25. NDLTD: www.ndltd.org
• Networked Digital Library of Theses and
Dissertations (NDLTD)
• N D Ltd or “Noodle TD”
• Vision:
Every thesis and dissertation in the world is:
o Devised to take advantage of the most helpful
electronic publishing methods
o Shared globally and easily found
o Supported by a suite of digital library services to aid
authors, researchers, learners, universities
o Preserved and migrated permanently
Monday, 20 May 2013 25
26. • Aiding universities and nations to enhance graduate
education, publishing, preservation (data sets next!),
and Intellectual Property Rights efforts
• Helping improve the availability and content of
(electronic) theses and dissertations (ETDs)
• Educating ALL future scholars so they can publish
electronically and effectively use digital libraries (i.e.,
are Information Literate and can be more expressive)
What are we doing?
Monday, 20 May 2013 26
http:// curric.dlib.vt.edu/wiki/index.php/ETD_Guide
27. 27
Why ETD? Short Answer
• For Students:
o Gain knowledge and skills for the Information Age
o Richer communication
(digital information, multimedia, …)
• For Universities:
o Easy way to enter the digital library field and
benefit thereby
• For the World:
o Global digital library – large, useful, many services
• General:
o Save time and money
o Increased visibility for all associated with research
results
Monday, 20 May 2013
29. Curriculum Module Template
1. Module name
2. Scope
3. Learning objectives
4. 5S characteristics of the module
(streams, structures, spaces, scenarios,
society)
5. Level of effort required (in-class and
out-of-class time required for students)
6. Relationships with other modules
(flow between modules)
7. Prerequisite knowledge/skills
required (what the students need to
know prior to beginning the module;
completion optional; complete only if
prerequisite knowledge/skills are not
included in other modules)
8. Introductory remedial instruction
(the body of knowledge to be taught
for the prerequisite knowledge/skills
required; completion optional)
9. Body of knowledge (theory +
practice; an outline that could be used
as the basis for class lectures)
10. Resources (required readings for
students; additional suggested readings
for instructor and students)
11. Exercises / Learning activities
12. Evaluation of learning objective
achievement (graded exercises or
assignments)
13. Glossary
14. Additional useful links
15. Contributors (authors of module,
reviewers of module)
Monday, 20 May 2013 29
36. Ensemble, Pathway in NSDL
•National STEM (science, technology,
engineering, and mathematics)
education Digital Library – NSDL
•National Science Digital Library
•www.nsdl.org
•Many projects, largest now called
…… Pathways
Monday, 20 May 2013 36
37. 37
NSDL Connects
Users: students, educators, life-long learners
Content: structured learning materials;
large real-time or archived datasets;
audio, images, animations;
primary sources;
digital learning objects (e.g. applets);
interactive (virtual, remote) laboratories; ...
Tools: search; refer; validate; integrate; create;
customize; publish; share; notify; collaborate; ...
This slide
from Lee Zia
Monday, 20 May 2013
39. 39
• Human tragedies that result from man-made
and natural events affect humans and
communities significantly.
• During and after a tragic event, there are a
series of needs that have to be addressed.
o Compounded by communication failures and a
confusing plethora of data and information
Crisis, Tragedy, and Recovery (CTR)
Monday, 20 May 2013
42. Saudi Digital Library - about SDL
• … wide spreading of scientific blocks or groupings
• … linking between academic and research communities.
• … supporting these scientific groupings at the national level,
where it provides
o sophisticated information services
o … digital information resources in various forms,
o … accessible to faculty staff, researchers and students
o …
Monday, 20 May 2013 42
43. Saudi Digital Library- about SDL
• The digital library includes:
o The largest gathering of e-books in the Arab world.
More than (100.000) e-book in full text in various scientific
specializations
o More than 300 global publishers
such as Elsevier, Springer, Pearson Wiley, Taylor & Francis,
McGraw-Hill, Yale University, Oxford University, Harvard
University
Monday, 20 May 2013 43
46. Project Aims / Non-Textual Content
Dr. John Impagliazzo
Emeritus, Computer Science Department
Hofstra University — Hempstead, New York USA
Project Consultant and Key Investigator
46Monday, 20 May 2013
09:40 – 09:50
Additional Content Contributions by
Carole Thompson and Robert Laws
Project Consultants
47. Project Aims (1 of 5)
Aim #1
Research and prototype digital library systems
and infrastructure for Qatar, focusing initially on
Qatari information related to government and
scholarly activities.
Aim #2
Research and build the digital library community
in Qatar, supporting digital library use, services,
collection development, tailored systems, and
advancing toward a knowledge society.
Monday, 20 May 2013 47
48. Project Aims (2 of 5)
Regarding Aim 1:
• Leverage Penn State’s SeerSuite software infrastructure
• Implement novel advanced systems on the proposed
equipment
• Extend SeerSuite beyond its current focus on English to
support Arabic-English collections and cross-language
discovery
• Extend the effort to cover a broad range of scholarly
disciplines: computing, chemistry, …
• Support all types of government information
Monday, 20 May 2013 48
49. Project Aims (3 of 5)
Regarding Aim 1 (continued):
• Demonstrate how deep analysis of digital objects and
collections provides superior capabilities beyond those in
commercial systems
• Obtain pages, reports, and other information from all
branches of the government through websites as well as
other databases and other accessible online venues
• Collect information related to education and museums
• Focus on automatic or semi-automatic collection
development
• Cover key aspects of Qatari information currently available
Monday, 20 May 2013 49
50. Project Aims (4 of 5)
Regarding Aim 2:
• Study scholarly activities (by surveys and at your locations)
• Identify particular needs and uses
• Tailor DL content to specific domains and to the unique
needs of Qatar
• Establish a consulting center at the QDL Institute
• Collaborate in efforts with libraries and museums in Qatar
• Engage in community building in Qatar (join us!)
Monday, 20 May 2013 50
51. Project Aims (5 of 5)
Regarding Aim 2 (continued):
• Tailor collections, systems, and services to lead toward the
Qatari Knowledge Society
• Extend work on social networks to collect and utilize data,
allowing personalized as well as group and agency tailoring.
• Include key communities such as citizens, educators,
scholars, and students
• Partner with the new digital librarians to add other
collections, especially covering Qatari culture and heritage
• Identify collections with key metadata
Monday, 20 May 2013 51
52. The Need to Safeguard Culture
Maulana Khan (2001)
Doha Conference of Ulama on Islam and Cultural Heritage
“Every group or community has its own
particular culture and has the absolute
right to safeguard that culture”
Maulana Khan. Proceedings of the Doha Conference of ‘Ulama on Islam and Cultural Heritage.
Doha, Qatar. December 30–31, 2001. p. 66 New York: UNESCO, 2005. Retrieved Nov. 15, 2010 from
http://unesdoc.unesco.org/images/0014/001408/140834m.pdf
Monday, 20 May 2013 52
53. Non-Textual Collections (1 of 2)
• Initial focus of QDL on automatic or semi-automatic collection
development
• Project also includes supplemental collections consisting of non-
automatic material.
• Sampling would complete the spectrum of data collection useful
in supporting research and the long-term interests of Qatar.
• Project also seeks to develop a collection that demonstrates the
various aspects of content preservation
• Research effort seeks to explore non-text artifacts to:
o Enhance the textual research and collection
o Preserve the heritage and culture of the country and its people
o Offer a basis for related research
Monday, 20 May 2013 53
54. Non-Textual Collections (2 of 2)
• Examples of the sampling includes areas such as:
o Qatari Art, Literature, Music
o Qatari Sports, Politics
o Qatari Education (historical and contemporary perspectives)
o Qatari Museum Collections
• Appropriate metadata would:
o Document each item
o Serve as examples of the varying types of description
o Serve as a basis to build new digital libraries in Qatar
• Sampling of some of these materials would:
o Serve the people of Qatar
o Become a model for other efforts
o Demonstrate the potential for future research
Monday, 20 May 2013 54
55. Non-Textual Examples (1 of 3)
• Images landmarks in Qatar
o Buildings
o Mosques
• Oral Histories
o Over 2000 CDs
o Need to document them
o Make them available for preservation
• Literature
o Ministry of Culture, Arts, and Heritage (MoC) making an effort
o http://www.moc.gov.qa/English/Authors/Pages/default.aspx
Mubarak Al Malik
Monday, 20 May 2013 55
56. Non-Textual Examples (2 of 3)
• Qatari Art
o Qatari Artist Directory
o Names of Artists
o Examples of their Work
• Work of Dr. Wafa Al-Hamad
• Exhibits now at QU, New Library Exhibition Hall
o Need to make examples of these collections
• Sports in Qatar
o Document exhibits and activities of historical interest
o Falconry
o Camel Racing
o Horse racing
Talal Nayef Al Qasim
Monday, 20 May 2013 56
57. Non-Textual Examples (3 of 3)
• Music of Qatar
o Qatari Musical Band Concert images (MoC)
o Qatar Philharmonic Orchestra
o Document the Development of Music in Qatar
o Information from the QF Music Academy at Katara
• Education
o Qatar University (State University)
o Hamad bin Khalifa University (Education City)
• Media Collections
o Al Jazeera
o Newspapers
Monday, 20 May 2013 57
58. QDL and Librarian / Corporate /
Government Perspectives
Myrna Tabet
Library Services
Qatar University — Doha, Qatar
Project Research Associate
58Monday, 20 May 2013
09:50 – 10:00
59. Qatar Digital Library (QDL)
• Importance, Benefits, and Content
• Preservation of Culture and History of Qatar
• Scholarly, Governmental, Institutional, and
Corporate Viewpoints
Monday, 20 May 2013 59
60. Importance of the QDL Project (1 of 2)
• Users have:
o Become sophisticated and adept at using technology
o Higher expectations of service provided by libraries
• Role of (digital) librarians:
o Changing as a consequence to the digital shift
o New ways in provision of information
Monday, 20 May 2013 60
61. Importance of the QDL Project (2 of 2)
• Digital libraries can:
o Assist in the transformation of data into information
o Help in building a knowledge-based society
• Governments aim:
o Provide access to relevant information for their citizens
• Nationwide digital library community can work
together toward these goals
Monday, 20 May 2013 61
62. DL Benefits to Library Users(1 of 2)
• Vastly more information at your fingertips
• Access 24/7, from anywhere, anytime
• Rapidly updated: Current + Historical information
• Information sharing and collaboration
Monday, 20 May 2013 62
63. DL Benefits to Library Users(2 of 2)
• New forms of access
o Multilingual / multimedia
o Hypermedia (Linked data, Text, Images, Audio, Video)
• Improved preservation with:
o Metadata
o Information exchange protocols
Monday, 20 May 2013 63
64. Significance to Librarians, Corporations,
and Governmental Agencies (1 of 2)
• The need to preserve cultural and historical heritage =>
o Collections of fragile and precious artifacts =>
o Libraries, museums, and archives developing digital
collections =>
o Users from all over the world accessing and studying
• SeerSuite (crawler, search engine)
o Collect from the Web and from curated collection of artifacts
o Images, audio, and text for browsing and searching
o Extractions of tables and references / citations
o Machine learning and artificial intelligence (AI) – beyond
commercial options
Monday, 20 May 2013 64
65. Significance to Librarians, Corporations,
and Governmental Agencies(2 of 2)
• A one stop search of:
o Information about Qatar
o Information to preserve the culture of Qatar
• Indexing, analysis, and retrieval of:
o Resources, reports, statistics, and other types of
information
o Information in the Arabic language as well as in English
Monday, 20 May 2013 65
66. Available Content(1 of 2)
• Materials captured:
o Local scholarly, cultural
o Governmental documents
• Metadata, data, and many types of documents
(including full text)
• Free and open as well:
o Freely accessible for anyone to use
o Available for authorized users due to:
• Licenses
• Cost issues
Monday, 20 May 2013 66
67. Available Content(2 of 2)
• Main resources:
o First appeared in digital form
o Often referred to as being ‘born’ digital
• At a later stage the project will include:
o Digital versions of material already existing in print
o Multimedia (image, audio, video) forms
• Surveys and studies will:
o Guide collection strategies and priorities
o Satisfy the needs of the Qatari community
Monday, 20 May 2013 67
68. Selected Digital Library References
Lesk, M. (2005).
Understanding digital libraries, 2nd ed.
San Francisco, CA: Morgan Kaufmann.
Tedd, L. & Large, A. (2005).
Digital libraries: Principles and practice in a global
environment.
Munchen: K.G. Saur.
Witten, I., Bainbridge, D. & Nichols, D. (2010).
How to build a digital library, 2nd ed.
Burlington, MA: Morgan Kaufmann, Elsevier.
Monday, 20 May 2013 68
69. QDL and Researcher Perspectives
Hamed Alhoori
Department of Computer Science & Engineering
Texas A&M University — College Station, Texas USA
Project Research Associate
69Monday, 20 May 2013
10:00 – 10:10
71. Introduction – Objective
Understand and support the dynamic information
needs, information-seeking behavior, information
use, and other scholarly activities of researchers,
scientists, engineers, scholars and students in Qatar.
Monday, 20 May 2013 71
72. Introduction - Research Questions
• How do researchers currently search, select, and
manage their information sources?
• What difficulties are researchers facing during the
literature review process?
• How social reference management and
recommendation systems, used in scholarly
communities, influenced the research process?
• How to measure a better scientific impact for each
discipline using multi-dimensional metrics?
• What are the current scholarly research needs?
Monday, 20 May 2013 72
73. Related Studies
• New patterns of searching (Hallmark, J., 2004).
• Difficulty locating information (George, C., 2006).
• Not aware of or familiar with some of the services
and do not consult librarians (Kuruppu, P.U., 2006).
• Limitations
Monday, 20 May 2013 73
74. Methodology
• Qualitative and quantitative research methods
• Statistical hypothesis testing techniques
Monday, 20 May 2013 74
75. Initial Results — 1
• (Alhoori, et al., 2011) - acceptance rate 9%
• 164 researchers participated in the study
o 25 faculty members, 5 postdocs, 84 doctoral students,
28 master students, 22 undergraduate students.
o 131 male and 33 female
o Participants were from 13 different disciplines from
Texas A&M University – College Station
Monday, 20 May 2013 75
76. Initial Results — 2
• Differences in reading habits
• Difficulties locating their needs
• Getting lost
• Repeated results
• Printing articles and using folders to organize
• Notes
• Research updates
• Social reference management - lack of awareness,
accuracy
Monday, 20 May 2013 76
77. Initial Results — 3
• Saving methods
• Significant relationship between
oSaving methods and collaboration
oSaving methods and retrieving articles
• Researchers’ satisfaction
• Search differences
• Research interests
• Publication overload (78%)
Monday, 20 May 2013 77
78. References
• Boote, D.N., Beile, P. Scholars Before Researchers: On the Centrality of the
Dissertation Literature Review in Research Preparation. Educational Researcher.
34, 3-15 (2005).
• Hallmark, J. Access and Retrieval of Recent Journal Articles: A Comparative
Study of Chemists and Geoscientists. Issues in Science and Technology
Librarianship. 40 (2004).
• George, C., Bright, A., Hurlbert, T., Linke, E.C., ST Clair, G., Stein, J. Scholarly use
of information: graduate studentsʼ information seeking behaviour. Information
Research. 11, 1-19 (2006).
• Kuruppu, P.U., Gruber, A.M. Understanding the Information Needs of Academic
Scholars in Agricultural and Biological Sciences. The Journal of Academic
Librarianship. 32, 609-623 (2006).
• Alhoori, H., Furuta, R. ,“Understanding the dynamic scholarly research needs
and behavior as applied to social reference management,” International
Conference on Theory and Practice of Digital Libraries, TPDL 2011.
Monday, 20 May 2013 78
81. Web Archiving, Crawling,
and SeerSuite
Dr. Edward Fox and Tarek Kanan
Department of Computer Science
Virginia Tech — Blacksburg, Virginia USA
Project Lead Principal Investigator
Project Research Associate
81Monday, 20 May 2013
10:40 – 11:15
82. Web Archiving and Web Crawling
Dr. Edward Fox
Department of Computer Science
Virginia Tech — Blacksburg, Virginia USA
Project Lead Principal Investigator
82Monday, 20 May 2013
10:40 – 11:05
83. Archiving by Assembling
Content + Metadata
• Collect digital objects
o Digitize / purchase / obtain submissions
oCatalog each => metadata record
• Aggregate into a metadata catalog
o Searchable
o Browsable
o Usually free of intellectual property rights concerns
oCan be shared through the Open Archives Initiative (OAI)
Monday, 20 May 2013 83
84. 84
OAI = Technical Umbrella for
Practical Interoperability…
Reference
Libraries
Publishers
E-Print
Archives
…that can be exploited by different
communities
Museums
Monday, 20 May 2013
86. Web Archiving
• Introduction: Web archiving is the process of
gathering up data recorded on the World Wide Web,
• storing it,
• ensuring the data is preserved in an archive, and
• making the collected data available for future
research.
• The Internet Archive and several national libraries
initiated Web archiving practices in 1996.
Monday, 20 May 2013 86
88. Web Archiving
• 2001: International Web Archiving Workshop (IWAW):
o share experiences and exchange ideas.
• 2003: International Internet Preservation Consortium
(IIPC):
o international collaboration in
o developing standards and open source tools for the
o creation of Web archives.
• Tools: Heritrix, Memento, SiteStory
• Web growth =>
o concern with change and loss =>
o local and national Web archiving initiatives
Monday, 20 May 2013 88
89. Web Crawlers
• A Web crawler is an Internet bot that systematically
browses the World Wide Web, typically for the purpose
of Web indexing.
• A Web crawler also may be called a Web spider, an ant,
or an automatic indexer.
• Web search engines and some other sites use Web
crawling or spidering software to update their Web
content or indexes of others sites’ Web content.
• Web crawlers can copy all the pages they visit for later
processing by a search engine that indexes the
downloaded pages so that users can search them much
more quickly.
Monday, 20 May 2013 89
90. Web Crawler
• A Web crawler starts with a list of URLs to visit,
called the seeds.
• On those page, identifies all the hyperlinks
• adds them to the list of URLs to visit
• recursively visits pages pointed to
• according to a set of policies.
• Prioritizes its downloads – some pages change often.
Monday, 20 May 2013 90
91. Web Crawlers
Difficulties and Limitations
• Technical challenges of Web archiving
• Intellectual property laws.
• Peter Lyman, states that "although the Web is popularly
regarded as a public domain resource, it is copyrighted;
thus, archivists have no legal right to copy the Web".
• However national libraries in many countries have a
legal right to copy portions of the Web
• under an extension of a requirement for legal deposit.
Monday, 20 May 2013 91
92. Web Crawlers
Difficulties and Limitations
• Removal requests: implemented by:
• WebCite, the Internet Archive, or Internet Memory
• Other Web archives are only accessible from
certain locations or have regulated usage.
• WebCite cites a recent lawsuit against Google's
caching, which Google won.
Monday, 20 May 2013 92
93. Focused Crawlers
• For a particular topic or event
• to build a Web collection focused in that area
• Start with URLs of interest, viewed as seeds to grow from
• Expand in a ‘smart’ way to get all and only what is relevant
• Use information retrieval / artificial intelligence / machine
learning
o Require ‘knowledge bases’ and/or human training examples
• Nevertheless, there is a tradeoff between the resulting
o Recall (i.e., coverage of what is out there)
o Precision (i.e., freedom from noise in what is collected)
Monday, 20 May 2013 93
94. SeerSuite
Tarek Kanan
Department of Computer Science
Virginia Tech — Blacksburg, Virginia USA
Project Research Associate
94Monday, 20 May 2013
11:05 – 11:15
95. SeerSuite (1 of 4)
• Prof. C. Lee Giles, QDL Principal Investigator
o Created CiteSeer in 1997
o Associates: Steve Lawrence and Kurt Bollacker at the
NEC Research Institute (now NEC Labs) in Princeton,
New Jersey, USA
o Now at Penn State University, he continues to lead
SeerSuite into the “Next Generation”
• http://citeseerx.sourceforge.net/
Monday, 20 May 2013 95
96. SeerSuite (2 of 4)
• What is SeerSuite?
• SeerSuite built by…
• SeerSuite supports…
Monday, 20 May 2013 96
97. SeerSuite (3 of 4)
• SeerSuite was designed to provide a
framework that would replace CiteSeer
• SeerSuite improves on aspects of the
original CiteSeer, with features such as:
o Reliability
o Robustness
o Scalability
Monday, 20 May 2013 97
98. SeerSuite (4 of 4)
• The motivation behind SeerSuite
• SeerSuite design
• SeerSuite enables access to:
o Extensive document collections
o Citations
o Author metadata
Monday, 20 May 2013 98
99. SeerSuite - CiteSeerX
• CiteSeerX shares several components with
digital libraries and search engines
o Web Interface
o Crawlers
o Index
o Databases
Monday, 20 May 2013 99
100. SeerSuite - CiteSeerX
• Domain specific repositories and digital
library systems
o arXiv for physics
o RePEc for economics
o Greenstone
• CiteSeerX
o Closest in design to Google Scholar
Monday, 20 May 2013 100
104. SeerSuite - MyCiteSeerX
• SeerSuite improves users’ information access.
• MyCiteSeerX roles….
• MyCiteSeerX allows users to store
o Queries
o Document portfolios
o Tag documents
o Monitor and track documents of interest
• MyCiteSeerX needs:
o Registration
o Login
Monday, 20 May 2013 104
110. Site Objectives
• Provide in-depth information on everything related to the
“QDL Project”. This includes objectives, progress, data sets,
presentation slides, and interim reports.
• Raise awareness and provide material on concepts related
to digital libraries, associated systems, and processes.
• Create an online identity for the proposed institute with
information on activities and opportunities to participate.
• Collect feedback and data through surveys and opinion
polls.
Monday, 20 May 2013 110
111. Primary Audience
• Groups of visitors making up the project’s primary
audience will be the main focus of our site:
o Librarians and libraries in Qatar
o Researchers and academics
o Government organizations
o Non-Governmental organizations
(such as http://www.fsd.org.qa/)
Monday, 20 May 2013 111
112. Secondary Audience
• Groups of visitors making up the secondary
audience – these are important but not critical:
o University / School Students
o Teachers / Faculty
o Managers
o Qatari citizens
o Other stakeholders
Monday, 20 May 2013 112
113. Content
(Published or Under Development)
• Information and details about the project.
• A list of digital libraries currently available in Qatar.
• Information on the digital library systems available
so they can set up collections for their end users.
• Searchable database of example digital libraries
around the world - in English, Arabic, or other
languages, filtered by topic or other criteria.
• Lessons and tutorials on how to define and create
collections.
Monday, 20 May 2013 113
114. Content
(Published or Under Development)
• A ‘Try it now’ function that would allow an end user to go through the
steps of setting up a digital library, creating collections, and uploading
electronic artifacts.
• An online forum for librarians allowing for an open discussion of issues
and ideas that be used to help organize collections and artifacts.
• Information page on how content in digital library research archives can
• increase their visibility around the world,
• put them in contact with other researchers and collaborators, and
• improve their funding opportunities.
• Examples and links to peer sites in other countries where government
information is successfully archived and accessed from a digital library
Monday, 20 May 2013 114
115. The Survey
• We have a survey
• Responses to this survey will help us tailor our
efforts to the resources and needs of Qatar.
• Provide your contact information; we will keep you
informed about the project’s progress over time,
and workshops that are offered.
• English:
http://qdl.qu.edu.qa/content/survey
• Arabic:
http://qdl.qu.edu.qa/ar/content/survey
Monday, 20 May 2013 115
116. Join the Mailing List
• Keep up to date with
information about our
project by joining our
mailing list.
• Submit your email and
we'll keep you informed
about our new
initiatives, upcoming
events, seminars and
news.
Monday, 20 May 2013 116
117. Audience Participation
Dr. John Impagliazzo
Emeritus, Computer Science Department
Hofstra University — Hempstead, New York USA
Project Consultant and Key Investigator
117Monday, 20 May 2013
11:25 – 11:45
120. Global Perspective
and QDL Next Steps
Dr. Edward Fox
Department of Computer Science
Virginia Tech — Blacksburg, Virginia USA
Project Lead Principal Investigator
120Monday, 20 May 2013
11:45 – 11:55
121. World Digital Library
• Free and growing Internet available collection of significant
cultural treasures from many countries and cultures
• Photographs, books, manuscripts, maps, audio recordings,
and films
• Searchable in 7 languages: Arabic, English, French, Russian,
Chinese, Spanish and Portuguese
o With an easy to use interface that enables
o browsing by place, time, topic, type of item, and
contributing institution, or
o open-ended search.
Monday, 20 May 2013 121
122. World Digital Library
and Qatar National Library
• The World Digital Library -- for use by students,
scholars, and members of the public.
• UNESCO program led by the US Library of Congress in
partnership with libraries all over the world.
• The National Library Qatar Foundation is one of a
number of Library members from Arabic/Islamic
countries that contribute content.
• The Qatar Foundation is a major financial sponsor of
the World Digital Library.
Monday, 20 May 2013 122
123. World Digital Library
and British Library
• Qatar Foundation Qatar National Library
o partnership with the British Library to make Arabic
science and Gulf history available for worldwide
research
• Began in July 2012
o to digitize more than half a million pages of
o historic documents detailing
o Arab history and culture.
Monday, 20 May 2013 123
124. World Digital Library
and British Library
• The three-year project
o to transform people’s understanding of the history of
the Middle East
o material from the UK’s India Office archive +
o medieval Arabic manuscripts on science and medicine.
• At the British Library, in close cooperation with the
Qatar National Library.
Monday, 20 May 2013 124
125. QDL Focus
Community in Qatar
• Identify interested stakeholders, to tailor to needs
• Train next generation of digital librarians, archivists,
and curators
• Partners helping with additional collection
development
Advanced Technology for Enhanced Access
• “Low hanging fruit” by crawling Qatar-related Web
• Improved analysis (citations, tables, chemicals, …)
• Support for both Arabic and English
Monday, 20 May 2013 125
126. Closing Remarks
Dr. Mohammed Samaka
College of Engineering
Qatar University — Doha, Qatar
Project Co-Lead Principal Investigator
126Monday, 20 May 2013
11:55 – 12:00