SlideShare uma empresa Scribd logo
1 de 48
COLLECTION METHODS




      Dr. Essam Obaid
COLLECTION METHODS
     WWW = Interplay between = Web Client + Web Server

Web server stores contents in HTML pages or images which it
can delivers/serves to a web browser in response to the request of
that web browser.
A web browser request content from web server and than provide
the received contents to the user.
Mechanism of Interaction
The protocol defines the standard format for communication between
the server and the browser.

Example :
The most commonly used protocol on the web is HTTP (hyper text
transfer protocol). When a browser sends a request to the web server
that request takes the format of HTTP message. Same reply would be
done from the server side.
URL (Unified Resource Locator)

All the contents on the web server is identified
by using a uniform resource locator (URL). A
reference which describe where the content on
                  web is located.
Fundamental Categories of Collection

There two types of collection techniques

1- Content driven collection methods
2- Event driven collection methods
Content Driven Collection Methods

     Seek to archive the underlying content of the website.

        Event Driven Collection Methods
            Collect the actual transaction that occur


Further distinctions can be made based on the source from which
the contents is collected. It can be archived from the
1- Web Server (Server Side Collection)
2- Web Browser (Client side collection )
Applicability of Approach

Depends upon the type of websites

- Dynamic Websites
- Static Websites
Static Websites

A static website consist of a series of pre existing web pages,
each of which is linked to from at least one other page. Each web
page is typically composed of one or more individual elements.

The structure will contained within the HTML document which
contain hyperlinks to other elements, such as images and other
pages.

All elements of the website can be stored in a hierarchal folder
structure on the web server and the URL describes the location
of each element within that structure.
Form of URL
The target of a hyperlink is normally specified in the “HREF”
attribute of an HTML element and defines the URL of the target
resource.
The form of the URL may be absolute or relative. These can be
further illustrated by using the following examples
Absolute and Relative

    In absolute a fully qualified domain and the path name is mentioned.
<A
href = http://www.mysite.com/products/new.html>NewProducts
 </A>

In relative, only including the path name relative to the source object is mentioned.

<A href= “new.html” > New Products</A>
Dynamic Websites

In a dynamic website the pages are generated from smaller
elements of contents. When a request is received the required
elements are assembled into a web page and delivered. Types of
dynamic contents are

-   Databases
-   Syndicated Content
-   Scripts
-   Personalization
Databases
The content used to create web pages is often stored in a database,
such as a Content Management System, and dynamically assembled
into web pages.
                              Scripts

Scripts may be used to generate the dynamic contents, responding
differently depending on the values of certain variables, such as the
date, type of browser making the request, or identity of the user.
Syndicated Content
 A website may include content which is drawn from external
 resources, such as pop ups or RSS feeds and than dynamically
 inserted into the web pages.

                      Personalization
Many websites make increasing use of personalization, to deliver
content which is customized to an individual user.

Example : Cookies may be used to store information about a
         user’s computer and returned by their browser whenever
         they make a request to that website.
Depending on the nature of a dynamic website these virtual pages
may be linked to from other pages or may only be available
through searching. Websites may contain both static and dynamic
elements.

Example:
The home page and other pages that only change infrequently,
may be static, whereas pages are updated on a regular basis such
as product catalogue, may be dynamic
The Matrix of Collection Method

The range of possible methods for collecting web content is dictated by these
considerations. Four alternative collection methods are currently available.

Table 4.1                 The Matrix of Collection Methods
                          Content Driven                  Event Driven

 Client Side        Remote Harvesting             No method available

 Server Side        Direct Transfer               Transactional Archiving
                    Database Archiving
Direct Transfer

 The simplest method of collecting web resources is to acquire a
     copy of the data directly from the original resource. This
approach which requires direct access to the host web server, and
therefore the co-operation of the website owner, involves copying
the selected resources from the web server and transferring them
 to the collecting institution, either on removable device such as
                  CD, or online using email FTP.
Direct Transfer

Direct transfer is most suited for static websites which only
comprise HTML documents and other objects stored in a
hierarchal folder structure on the web server. The whole or a part
websites can be acquired simply by copying the relevant files and
folders to the collecting institutions storage system.
The copies website will function in precisely the same way as the
original one but with two limitations.
- The hyperlinks should be relative not absolute
- Any functionality in the original website will no longer be
   operable unless the appropriate search engine is installed in
   the new environment.
Strengths

The principal advantage of the direct transfer method is that it
potentially offers the most authentic rendition of the collected
website. By collecting from source, it is possible to ensure that
the complete content is captured with its original structure. In
effect the collecting institution re-host a complete copy of the
original website. The degree of authenticity which it is possible
   to
recreate will depend upon the complexities of the technical
dependencies, and the extent to which the collecting institution is
capable of reproducing them.
Limitations

The major limitation of this approach are
- The resources required to effect each transfer, and
  sustainability of the supporting technologies.
- This method requires cooperation on the part of the website
  owner, to provide both the data and the necessary
  documentation.
Go through the
Case Study: Bristol Royal Infirmary Inquiry
              See on page : 48 from the book
Database Archiving
The increase use of web database have made the development of
the new web archiving tools a priority and such tools are now
beginning to appear. The process of archiving database driven
sites involved three stages…

1- The repository defines a standard data model and format for
   archived database.
2- Each source database is converted to that standard format.
3- A standard access interface is provided to the archived
   database.
Database Format

The obvious technology to use for define archiving database
format is XML, which is an open standard specifically designed
for transforming data structures. Several tools are available which
converts the proprietary database to XML format.
Tools for Conversion in XML

- SIARD   = Swiss Federal Archive
- DEEPARC = Bibliotheque Nationale de France

Both of these tools allow the structure and content of a relational
           database to be exported into standard formats.
SIARD
The workflow of SIARD is

1- Automatically analysis and maps the database structure of the
   source database.
2- Export the definition of the database structure as a text file
   containing the data definition described using SQL.
3- The content is exported as plain text files together with any
   large binary objects stored in the database and the metadata is
   exported as a XML document.
4- The data can then be related into any relational database
   management system to provide access.
DeepArc
- It enables a user to map the relational database model of the
  original database to an XML schema and then export the
  context of the database into an XML document.
- It is intended to be used by the database owner since its use in
  any particular case requires detailed knowledge of the
  underlying structure of the database being archived.
Flow of Work of DeepArc Tool
• First the user creates a view of the database called skeleton
  which is created by using XML
• That skeleton describe the desired structure of the XML
  documents that will be generated from the database.
• The user than builds the associations to map the database to
  this view.
• This entails mapping both the database structure (i.e. the
  tables) and the contents (i.e columns within that tables) once
  these associations have been creaed and configured the user
  can then export the content of the database into XML
  document which conforms to the defined schema.
• If the collecting institution defines a standard XML data
  model for its archived database, it can therefore use a tool such
  as DeepArc to transform each database to that structure.
Strengths
It offers a generic approach to collecting and preserving database
content which avoids the problems of supporting multiple
technologies incurred by alternative approach of direct transfer.
This limits issues of preservation and access to a single format,
against which all resources cab be brought to bear. For example,
archives can use standard access interfaces such as that provided
by the XINQ Tool
Limitations
• Web database archiving tools are a recent development and are
  therefore still technology immature compared to some other
  collection methods.
• Supporting Technologies is currently limited
• Nature and timings of collection
• Original ‘look and feel’. (It should collect the website rather
  than the collection of database content)
• Active cooperation and participation of website owner
Remote Harvesting Technique

Remote Harvesting is the most common and most
widely employed method for collecting websites. It
    involves the use of web crawler software to
      harvest content from remote web servers.
  ‘Crawlers’ are software programs designed to
interact with the online services like human users,
 principally to gather information of the required
   content. Most of the search engine use these
      crawlers to collect and index web pages.
Web Crawler
A web crawler shares many similarities with a desktop web browser, it
submits the HTTP request to a web server and stores the content that
it receives in return. The actions of the web crawler are dictated by a
list of URL’s (or seeds) to visit. The crawler visits the first URL on the
list and collects the web page, identifies all the hyperlinks within the
page, and adds them to the seed list.
In this way, a web crawler that begins on the home page of a web site
will eventually visit every linked page within that website. This is
recursive process and is normally controlled by certain parameters,
such as number of hyperlinks that should be followed.
Infrastructure

The infrastructure required to operate a web crawler can be
minimal; the software simply needs to be installed on a computer
system within an available internet connection and sufficient
storage space for the collected data. However in most large scale
archiving programmes, the crawler software is deployed from
networked servers with attached disk or tape storage.
Types of Web Crawlers
There is a wide variety of web crawlers software available, both
proprietary and open source. Three most widely used web
crawlers are

1- HTTrack
2- NEDLIB Harvester
3- Heritrix

We have already discuss these web crawlers in the first lecture I
will not discuss here in this lecture again
Parameters
Web Crawlers provide a number of parameters can be set to
specify their exact behavior. Many crawlers are highly
configurable, offering a very wide variety of settings. Most
crawlers provide variations on the following parameters.

-   Connection
-   Crawl
-   Collection
-   Storage
-   Scheduling settings
Connection Settings
These setting relate to the manner in which the crawler connects
to web servers.

- Transfer Rate
- Connections

- Transfer Rate: The maximum rate at which the crawler will
   attempt to transfer the data. In this way a specific transfer rate
   is specified so that the data is captured at a sufficient rate to
   enable an entire site to be collected in a reasonable timescale.
- Connections: to specify the number of simultaneous
   connections the web crawler can attempt to make with a host,
   or the delay between the establishing connections
Crawl Settings

These settings allow the user to control the behavior of the
crawler as it traverse a website, such as the direction and depth of
the crawl
- Link depth and Limits
- Robot Exclusion Notices
- Link Discovery

Settings will normally be available to control the size and
duration of the crawl. For example, it may be desirable to halt a
crawl after it has collected a given volume of data, or within a
given timeframe.
• Link depths and Limits: This will determine the number of
  links that the crawler should follow away from its starting
  point, and the direction in which it should move. It is possible
  to determine the limit of the crawler in terms of whether or not
  the crawler is restricted to follow links within the same path,
  website or domain, and to what depth.
• Robot Exclusion Notice: A robot exclusion notice is a
  method used by websites to control the behavior of robots
  such as web crawlers. It uses a standard protocol to define
  which parts of a website are accessible to the robot. These
  rules are contained within a ‘robots.txt’ sile in the top level
  folder of the website
• Link Discovery:The user may also be able to configure how
  the crawler analysis hyperlinks: these links may be
  dynamically constructed by scripts, or hidden within content
  such as flash files and therefore not transparent to the crawler.
  However, more sophisticated crawlers can be configured to
  discover many of these hidden links.
Collection Settings

These settings allow the user to fine tune the behavior of the
crawler, and particularly to determine the content that is
collected. Filters can be defined to include or exclude certain
paths and file types:

For Example: To exclude links to pop ups advertisements or to
collect only links to PDF files. Filters may also be used to avoid
crawlers traps, whereby the crawler becomes locked into an
endless loop, by detecting repeating patterns of links. The user
may also be able to place limits on the maximum size of files to
be collected.
Storage Settings

These settings determine how the crawler stores the collected
content. By default, most crawlers will mirror the original
structure of the website, building a directory structure which
corresponds to the original hierarchy. However it may be
   possible
to dictate other options, such as forcing all images to be stored in
a single folder. These options are unlikely to be useful in most
web archiving scenarios, where preservation of the original
structure will be considered desirable. The crawler can rewrite
the hyperlink to convert an absolute link into relative link.
Scheduling Settings

Tools such as PANDAS, which provide workflow capabilities,
allow the scheduling of crawls to be controlled. Typical
parameters will include:

Frequency: Daily or Weekly
Dates: Start or Commencement of process
Non-schedule Dates: It may also be possible to define the
specific dates for crawling including the standard schedule.
Identifying the Crawler
Software agents such as web browsers and crawlers identify
themselves to the online services with which they connect
through a ‘user agent’ identifier within the HTTP headers of the
requests they send. Thus, internet explorer 6.0; identifies itself
with the user agent Mozilla/4.0(compatiable; MSIE 6.0; windows
NT 5.1). The user agent string displayed by a web crawler can
generally be modified by the user.
Advantages of Identification
There are three advantages for this identification..
1- Crawler identify himself that from which institution it belongs
   to
2- web servers may be configured to block certain user agents,
   including web crawlers and search engines robots. Defining a
   more specific user agent ca prevent such blocking, even if
   using a crawler that would otherwise be blocked.
3- some websites are designed to display correctly only in certain
   browsers and check the user agents in any HTTP request
   accordingly. User agents which do not indicate correct
   browser compatibility will then be redirected to a warning
   page.
Strengths
The greatest strengths of remote harvesting is
-   Ease of use
-   Flexibility
-   Widespread applicability
-   Availability of number of mature software tools.

- A remote harvesting program can be established very quickly and
  allows large number of websites to be collected in a relatively short
  period.
- The infrastructure requirements are relatively simple and it requires
  no active participation from website owners: the process is entirely
  in the control of the archiving body.
- Most web crawlers software is comparatively straight forward to
  use, and can be operated by non-technical staff with some training.
Limitations

- Careful Configuration
- Inability to collect dynamic contents
- The large volume of data can be archived with
  the maximum speed availability which is a
  draw back.
Transactional Archiving
Transactional archiving is a fundamentally different approach from any
of those previously described, being event driven rather than content
driven.
• Transactional archiving is an event-driven approach, which collects
    the actual transactions which take place between a web server and a
    web browser. It is primarily used as a means of preserving evidence
    of the content which was actually viewed on a particular website, on
    a given date. This may be particularly important for organizations
    which need to comply with legal or regulatory requirements for
    disclosing and retaining information.
• A transactional archiving system typically operates by intercepting
    every HTTP request to, and response from, the web server, filtering
    each response to eliminate duplicate content, and permanently
    storing the responses as bitstreams. A transactional archiving system
    requires the installation of software on the web server, and cannot
    therefore be used to collect content from a remote website.
Example of Transactional Archive
• pageVault supports the archiving of all unique responses
  generated by a web server.
• It allows you to know exactly what information you have
  published on your web site, whether static pages or
  dynamically generated content, and regardless of format
  (HTML, XML, PDF, zip, Microsoft Office formats, images,
  sound), regardless of rate of change.
• Although every unique HTTP response can be archived and
  indexed, you can define non-material content (such as the
  current date/time and trivial site personalisation) on a per-
  URL, directory or regular expression basis which pageVault
  will exclude when calculating the novelty of a response.
Strengths

The great strength of transactional archiving is that it
collects what is actually viewed. It offers the best option
for collecting evidence of how a website was used, and
what content was actually available at any given
moment. It can be a good solution for archiving certain
kinds of dynamic website.
Limitations
- The transactional collection does not collect content which has
  never been viewed by a user.
- Transactional collection takes place on the web server, it
  cannot capture variations in the user experience which are
  introduced by the web browser.
- Transactional archiving must takes place server side and
  therefore requires the active co operation of the web site
  owner.
- The time taken for the server to process and respond to each
  request will be longer.

Mais conteúdo relacionado

Mais procurados

Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawlingDenis Shestakov
 
Digital resources management_information_outreach_CSE
Digital resources management_information_outreach_CSEDigital resources management_information_outreach_CSE
Digital resources management_information_outreach_CSESrijan Technologies
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Benoit Pauwels
 
Handout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsHandout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsJenn Riley
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG: connecting the knowledge community
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Hector Correa
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)floyd taag
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG: connecting the knowledge community
 

Mais procurados (17)

Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawling
 
Digital resources management_information_outreach_CSE
Digital resources management_information_outreach_CSEDigital resources management_information_outreach_CSE
Digital resources management_information_outreach_CSE
 
Browser
BrowserBrowser
Browser
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Handout for Metadata for your Digital Collections
Handout for Metadata for your Digital CollectionsHandout for Metadata for your Digital Collections
Handout for Metadata for your Digital Collections
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 

Destaque

publishing production
publishing productionpublishing production
publishing productionEssam Obaid
 
PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archivingEssam Obaid
 
7 شخصيات يجب أن تحذفهم فورا من الفيسبوك
7 شخصيات يجب أن تحذفهم فورا من الفيسبوك7 شخصيات يجب أن تحذفهم فورا من الفيسبوك
7 شخصيات يجب أن تحذفهم فورا من الفيسبوكEssam Obaid
 
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتية
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتيةتقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتية
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتيةEssam Obaid
 
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعى
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعىتفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعى
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعىEssam Obaid
 
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Essam Obaid
 
ادارة السجلات والارشفة الالكترونية - E archive
ادارة السجلات والارشفة الالكترونية - E archiveادارة السجلات والارشفة الالكترونية - E archive
ادارة السجلات والارشفة الالكترونية - E archiveEssam Obaid
 
ECM نظم إدارة المحتوى المؤسسى
 ECM نظم إدارة المحتوى المؤسسى ECM نظم إدارة المحتوى المؤسسى
ECM نظم إدارة المحتوى المؤسسىEssam Obaid
 
models of e publishing
models of e publishingmodels of e publishing
models of e publishingEssam Obaid
 
الاتجاهات البحثية فى إدارة المعرفة
الاتجاهات البحثية فى إدارة المعرفةالاتجاهات البحثية فى إدارة المعرفة
الاتجاهات البحثية فى إدارة المعرفةEssam Obaid
 
introduction to electronic publishing
 introduction to electronic publishing introduction to electronic publishing
introduction to electronic publishingEssam Obaid
 
E archive ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات
E archive  ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات   E archive  ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات
E archive ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات Essam Obaid
 
content analysis
content analysiscontent analysis
content analysisEssam Obaid
 
1356947482.9353caiibgbmmarketingmngtmodule d
1356947482.9353caiibgbmmarketingmngtmodule d1356947482.9353caiibgbmmarketingmngtmodule d
1356947482.9353caiibgbmmarketingmngtmodule dمحمد الجوري
 
كتيب ورش عمل قياس الجاهزية مفرغ
كتيب ورش عمل قياس الجاهزية مفرغكتيب ورش عمل قياس الجاهزية مفرغ
كتيب ورش عمل قياس الجاهزية مفرغابراهيم الهدهود
 

Destaque (20)

publishing production
publishing productionpublishing production
publishing production
 
PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archiving
 
7 شخصيات يجب أن تحذفهم فورا من الفيسبوك
7 شخصيات يجب أن تحذفهم فورا من الفيسبوك7 شخصيات يجب أن تحذفهم فورا من الفيسبوك
7 شخصيات يجب أن تحذفهم فورا من الفيسبوك
 
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتية
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتيةتقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتية
تقنيات 6 سيجما فى المؤسسات الاكاديمية والمعلوماتية
 
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعى
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعىتفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعى
تفاعل ادارة السجلات والوثائق مع مواقع التواصل الاجتماعى
 
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
 
ادارة السجلات والارشفة الالكترونية - E archive
ادارة السجلات والارشفة الالكترونية - E archiveادارة السجلات والارشفة الالكترونية - E archive
ادارة السجلات والارشفة الالكترونية - E archive
 
ECM نظم إدارة المحتوى المؤسسى
 ECM نظم إدارة المحتوى المؤسسى ECM نظم إدارة المحتوى المؤسسى
ECM نظم إدارة المحتوى المؤسسى
 
models of e publishing
models of e publishingmodels of e publishing
models of e publishing
 
الاتجاهات البحثية فى إدارة المعرفة
الاتجاهات البحثية فى إدارة المعرفةالاتجاهات البحثية فى إدارة المعرفة
الاتجاهات البحثية فى إدارة المعرفة
 
introduction to electronic publishing
 introduction to electronic publishing introduction to electronic publishing
introduction to electronic publishing
 
E archive ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات
E archive  ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات   E archive  ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات
E archive ادارة السجلات والارشفة الالكترونية - المفاهيم والمصطلحات
 
content analysis
content analysiscontent analysis
content analysis
 
1356947482.9353caiibgbmmarketingmngtmodule d
1356947482.9353caiibgbmmarketingmngtmodule d1356947482.9353caiibgbmmarketingmngtmodule d
1356947482.9353caiibgbmmarketingmngtmodule d
 
من سيربح المليون للنشر
من سيربح المليون   للنشرمن سيربح المليون   للنشر
من سيربح المليون للنشر
 
دورة صيانة الذات
دورة صيانة الذاتدورة صيانة الذات
دورة صيانة الذات
 
خداع البصر
خداع البصرخداع البصر
خداع البصر
 
Brothers meetting للنشر
Brothers meetting   للنشرBrothers meetting   للنشر
Brothers meetting للنشر
 
اكتشف الصورة
اكتشف الصورةاكتشف الصورة
اكتشف الصورة
 
كتيب ورش عمل قياس الجاهزية مفرغ
كتيب ورش عمل قياس الجاهزية مفرغكتيب ورش عمل قياس الجاهزية مفرغ
كتيب ورش عمل قياس الجاهزية مفرغ
 

Semelhante a COLLECTION METHODS

Crawler-Friendly Web Servers
Crawler-Friendly Web ServersCrawler-Friendly Web Servers
Crawler-Friendly Web Serverswebhostingguy
 
21 Www Web Services
21 Www Web Services21 Www Web Services
21 Www Web Servicesroyans
 
Sharepoint 2010 enterprise content management features
Sharepoint 2010 enterprise content management featuresSharepoint 2010 enterprise content management features
Sharepoint 2010 enterprise content management featuresManish Rawat
 
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW
 
Migrating Very Large Site Collections (SPSDC)
Migrating Very Large Site Collections (SPSDC)Migrating Very Large Site Collections (SPSDC)
Migrating Very Large Site Collections (SPSDC)kiwiboris
 
Sitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN serversSitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN serversAnindita Bhattacharya
 
introduction to Web system
introduction to Web systemintroduction to Web system
introduction to Web systemhashim102
 
0_Leksion_Web_Servers (1).pdf
0_Leksion_Web_Servers (1).pdf0_Leksion_Web_Servers (1).pdf
0_Leksion_Web_Servers (1).pdfZani10
 
Evolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecurityEvolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecuritySanjeev Verma, PhD
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_internSai Ganesh
 
WEB MODULE 5.pdf
WEB MODULE 5.pdfWEB MODULE 5.pdf
WEB MODULE 5.pdfDeepika A B
 
Migrating very large site collections
Migrating very large site collectionsMigrating very large site collections
Migrating very large site collectionskiwiboris
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advancedarcomem
 

Semelhante a COLLECTION METHODS (20)

Web browser architecture.pptx
Web browser architecture.pptxWeb browser architecture.pptx
Web browser architecture.pptx
 
Crawler-Friendly Web Servers
Crawler-Friendly Web ServersCrawler-Friendly Web Servers
Crawler-Friendly Web Servers
 
21 Www Web Services
21 Www Web Services21 Www Web Services
21 Www Web Services
 
Sharepoint 2010 enterprise content management features
Sharepoint 2010 enterprise content management featuresSharepoint 2010 enterprise content management features
Sharepoint 2010 enterprise content management features
 
Ch-1_.ppt
Ch-1_.pptCh-1_.ppt
Ch-1_.ppt
 
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
 
Migrating Very Large Site Collections (SPSDC)
Migrating Very Large Site Collections (SPSDC)Migrating Very Large Site Collections (SPSDC)
Migrating Very Large Site Collections (SPSDC)
 
Sitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN serversSitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN servers
 
introduction to Web system
introduction to Web systemintroduction to Web system
introduction to Web system
 
0_Leksion_Web_Servers (1).pdf
0_Leksion_Web_Servers (1).pdf0_Leksion_Web_Servers (1).pdf
0_Leksion_Web_Servers (1).pdf
 
Seminar on crawler
Seminar on crawlerSeminar on crawler
Seminar on crawler
 
Basics of the Web Platform
Basics of the Web PlatformBasics of the Web Platform
Basics of the Web Platform
 
Evolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecurityEvolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser Security
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
 
WEB MODULE 5.pdf
WEB MODULE 5.pdfWEB MODULE 5.pdf
WEB MODULE 5.pdf
 
WEB Mod5@AzDOCUMENTS.in.pdf
WEB Mod5@AzDOCUMENTS.in.pdfWEB Mod5@AzDOCUMENTS.in.pdf
WEB Mod5@AzDOCUMENTS.in.pdf
 
Nadee2018
Nadee2018Nadee2018
Nadee2018
 
Migrating very large site collections
Migrating very large site collectionsMigrating very large site collections
Migrating very large site collections
 
L017447590
L017447590L017447590
L017447590
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advanced
 

Mais de Essam Obaid

دورة مجاناً تاسيس إدارة الاعلام بالمؤسسات
دورة مجاناً تاسيس إدارة الاعلام بالمؤسساتدورة مجاناً تاسيس إدارة الاعلام بالمؤسسات
دورة مجاناً تاسيس إدارة الاعلام بالمؤسساتEssam Obaid
 
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكية
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكيةاستراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكية
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكيةEssam Obaid
 
الادارة الالكترونية
الادارة الالكترونيةالادارة الالكترونية
الادارة الالكترونيةEssam Obaid
 
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطالدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطEssam Obaid
 
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطالدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطEssam Obaid
 
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندرية
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندريةمكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندرية
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندريةEssam Obaid
 
مراقب وثائق الجودة
مراقب وثائق الجودةمراقب وثائق الجودة
مراقب وثائق الجودةEssam Obaid
 
برمجيات الأرشفة والسجلات الالكترونية بين التسويق والتطبيق
برمجيات الأرشفة والسجلات الالكترونية  بين التسويق والتطبيقبرمجيات الأرشفة والسجلات الالكترونية  بين التسويق والتطبيق
برمجيات الأرشفة والسجلات الالكترونية بين التسويق والتطبيقEssam Obaid
 
دليل لقيادة المشاريع واجتياز اختبار PMP
دليل لقيادة المشاريع واجتياز اختبار PMPدليل لقيادة المشاريع واجتياز اختبار PMP
دليل لقيادة المشاريع واجتياز اختبار PMPEssam Obaid
 
إدارة المعرفة والادارة الالكترونية فى المؤسسات
إدارة المعرفة  والادارة الالكترونية فى المؤسساتإدارة المعرفة  والادارة الالكترونية فى المؤسسات
إدارة المعرفة والادارة الالكترونية فى المؤسساتEssam Obaid
 
التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي
 التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي   التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي
التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي Essam Obaid
 
إدارة السجلات والارشفة الالكترونية
إدارة السجلات والارشفة الالكترونيةإدارة السجلات والارشفة الالكترونية
إدارة السجلات والارشفة الالكترونيةEssam Obaid
 
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...Essam Obaid
 
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...Essam Obaid
 
ادارة المشروعات الرقمية
ادارة المشروعات الرقميةادارة المشروعات الرقمية
ادارة المشروعات الرقميةEssam Obaid
 
إدارة محتوى مواقع التواصل الاجتماعي في المؤسسات الخدمية والتجارية
إدارة  محتوى مواقع التواصل الاجتماعي  في المؤسسات الخدمية والتجاريةإدارة  محتوى مواقع التواصل الاجتماعي  في المؤسسات الخدمية والتجارية
إدارة محتوى مواقع التواصل الاجتماعي في المؤسسات الخدمية والتجاريةEssam Obaid
 
تطبيق مبادئ إدارة الجودة الشاملة
تطبيق مبادئ  إدارة الجودة الشاملةتطبيق مبادئ  إدارة الجودة الشاملة
تطبيق مبادئ إدارة الجودة الشاملةEssam Obaid
 
تأثير النشر الالكتروني في خدمات المكتبات الجامعية
تأثير النشر الالكتروني في خدمات المكتبات الجامعية  تأثير النشر الالكتروني في خدمات المكتبات الجامعية
تأثير النشر الالكتروني في خدمات المكتبات الجامعية Essam Obaid
 
واقع العمل التطوعي فى المكتبات العامة المصرية
واقع العمل التطوعي فى المكتبات العامة المصريةواقع العمل التطوعي فى المكتبات العامة المصرية
واقع العمل التطوعي فى المكتبات العامة المصريةEssam Obaid
 
التخطيط الاستراتيجى فى مؤسسات المعلومات السعودية
التخطيط الاستراتيجى فى مؤسسات المعلومات السعوديةالتخطيط الاستراتيجى فى مؤسسات المعلومات السعودية
التخطيط الاستراتيجى فى مؤسسات المعلومات السعوديةEssam Obaid
 

Mais de Essam Obaid (20)

دورة مجاناً تاسيس إدارة الاعلام بالمؤسسات
دورة مجاناً تاسيس إدارة الاعلام بالمؤسساتدورة مجاناً تاسيس إدارة الاعلام بالمؤسسات
دورة مجاناً تاسيس إدارة الاعلام بالمؤسسات
 
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكية
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكيةاستراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكية
استراتيجية الاعلام الاجتماعى فى ادارة المعرفة الذكية
 
الادارة الالكترونية
الادارة الالكترونيةالادارة الالكترونية
الادارة الالكترونية
 
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطالدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
 
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوطالدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
الدوريات الأجنبية فى مكتبات الكليات العلمية فى جامعة أسيوط
 
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندرية
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندريةمكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندرية
مكتبات الجمعيات الأهلية و المؤسسات الخاصة بمحافظة الإسكندرية
 
مراقب وثائق الجودة
مراقب وثائق الجودةمراقب وثائق الجودة
مراقب وثائق الجودة
 
برمجيات الأرشفة والسجلات الالكترونية بين التسويق والتطبيق
برمجيات الأرشفة والسجلات الالكترونية  بين التسويق والتطبيقبرمجيات الأرشفة والسجلات الالكترونية  بين التسويق والتطبيق
برمجيات الأرشفة والسجلات الالكترونية بين التسويق والتطبيق
 
دليل لقيادة المشاريع واجتياز اختبار PMP
دليل لقيادة المشاريع واجتياز اختبار PMPدليل لقيادة المشاريع واجتياز اختبار PMP
دليل لقيادة المشاريع واجتياز اختبار PMP
 
إدارة المعرفة والادارة الالكترونية فى المؤسسات
إدارة المعرفة  والادارة الالكترونية فى المؤسساتإدارة المعرفة  والادارة الالكترونية فى المؤسسات
إدارة المعرفة والادارة الالكترونية فى المؤسسات
 
التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي
 التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي   التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي
التطوع الالكتروني واستقطاب المتطوعين مهارات التطوع الافتراضي
 
إدارة السجلات والارشفة الالكترونية
إدارة السجلات والارشفة الالكترونيةإدارة السجلات والارشفة الالكترونية
إدارة السجلات والارشفة الالكترونية
 
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...
أنظمة البحث والاسترجاع في المكتبات العامة دراسة تقييميه لنظام مكتبة الملك عبد...
 
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...
تطبيق منهجية 6 سيجما (Six Sigma) في المكتبات: دراسة استطلاعية لآراء مدراء الم...
 
ادارة المشروعات الرقمية
ادارة المشروعات الرقميةادارة المشروعات الرقمية
ادارة المشروعات الرقمية
 
إدارة محتوى مواقع التواصل الاجتماعي في المؤسسات الخدمية والتجارية
إدارة  محتوى مواقع التواصل الاجتماعي  في المؤسسات الخدمية والتجاريةإدارة  محتوى مواقع التواصل الاجتماعي  في المؤسسات الخدمية والتجارية
إدارة محتوى مواقع التواصل الاجتماعي في المؤسسات الخدمية والتجارية
 
تطبيق مبادئ إدارة الجودة الشاملة
تطبيق مبادئ  إدارة الجودة الشاملةتطبيق مبادئ  إدارة الجودة الشاملة
تطبيق مبادئ إدارة الجودة الشاملة
 
تأثير النشر الالكتروني في خدمات المكتبات الجامعية
تأثير النشر الالكتروني في خدمات المكتبات الجامعية  تأثير النشر الالكتروني في خدمات المكتبات الجامعية
تأثير النشر الالكتروني في خدمات المكتبات الجامعية
 
واقع العمل التطوعي فى المكتبات العامة المصرية
واقع العمل التطوعي فى المكتبات العامة المصريةواقع العمل التطوعي فى المكتبات العامة المصرية
واقع العمل التطوعي فى المكتبات العامة المصرية
 
التخطيط الاستراتيجى فى مؤسسات المعلومات السعودية
التخطيط الاستراتيجى فى مؤسسات المعلومات السعوديةالتخطيط الاستراتيجى فى مؤسسات المعلومات السعودية
التخطيط الاستراتيجى فى مؤسسات المعلومات السعودية
 

Último

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

COLLECTION METHODS

  • 1. COLLECTION METHODS Dr. Essam Obaid
  • 2. COLLECTION METHODS WWW = Interplay between = Web Client + Web Server Web server stores contents in HTML pages or images which it can delivers/serves to a web browser in response to the request of that web browser. A web browser request content from web server and than provide the received contents to the user.
  • 3. Mechanism of Interaction The protocol defines the standard format for communication between the server and the browser. Example : The most commonly used protocol on the web is HTTP (hyper text transfer protocol). When a browser sends a request to the web server that request takes the format of HTTP message. Same reply would be done from the server side.
  • 4. URL (Unified Resource Locator) All the contents on the web server is identified by using a uniform resource locator (URL). A reference which describe where the content on web is located.
  • 5. Fundamental Categories of Collection There two types of collection techniques 1- Content driven collection methods 2- Event driven collection methods
  • 6. Content Driven Collection Methods Seek to archive the underlying content of the website. Event Driven Collection Methods Collect the actual transaction that occur Further distinctions can be made based on the source from which the contents is collected. It can be archived from the 1- Web Server (Server Side Collection) 2- Web Browser (Client side collection )
  • 7. Applicability of Approach Depends upon the type of websites - Dynamic Websites - Static Websites
  • 8. Static Websites A static website consist of a series of pre existing web pages, each of which is linked to from at least one other page. Each web page is typically composed of one or more individual elements. The structure will contained within the HTML document which contain hyperlinks to other elements, such as images and other pages. All elements of the website can be stored in a hierarchal folder structure on the web server and the URL describes the location of each element within that structure.
  • 9. Form of URL The target of a hyperlink is normally specified in the “HREF” attribute of an HTML element and defines the URL of the target resource. The form of the URL may be absolute or relative. These can be further illustrated by using the following examples
  • 10. Absolute and Relative In absolute a fully qualified domain and the path name is mentioned. <A href = http://www.mysite.com/products/new.html>NewProducts </A> In relative, only including the path name relative to the source object is mentioned. <A href= “new.html” > New Products</A>
  • 11. Dynamic Websites In a dynamic website the pages are generated from smaller elements of contents. When a request is received the required elements are assembled into a web page and delivered. Types of dynamic contents are - Databases - Syndicated Content - Scripts - Personalization
  • 12. Databases The content used to create web pages is often stored in a database, such as a Content Management System, and dynamically assembled into web pages. Scripts Scripts may be used to generate the dynamic contents, responding differently depending on the values of certain variables, such as the date, type of browser making the request, or identity of the user.
  • 13. Syndicated Content A website may include content which is drawn from external resources, such as pop ups or RSS feeds and than dynamically inserted into the web pages. Personalization Many websites make increasing use of personalization, to deliver content which is customized to an individual user. Example : Cookies may be used to store information about a user’s computer and returned by their browser whenever they make a request to that website.
  • 14. Depending on the nature of a dynamic website these virtual pages may be linked to from other pages or may only be available through searching. Websites may contain both static and dynamic elements. Example: The home page and other pages that only change infrequently, may be static, whereas pages are updated on a regular basis such as product catalogue, may be dynamic
  • 15. The Matrix of Collection Method The range of possible methods for collecting web content is dictated by these considerations. Four alternative collection methods are currently available. Table 4.1 The Matrix of Collection Methods Content Driven Event Driven Client Side Remote Harvesting No method available Server Side Direct Transfer Transactional Archiving Database Archiving
  • 16. Direct Transfer The simplest method of collecting web resources is to acquire a copy of the data directly from the original resource. This approach which requires direct access to the host web server, and therefore the co-operation of the website owner, involves copying the selected resources from the web server and transferring them to the collecting institution, either on removable device such as CD, or online using email FTP.
  • 17. Direct Transfer Direct transfer is most suited for static websites which only comprise HTML documents and other objects stored in a hierarchal folder structure on the web server. The whole or a part websites can be acquired simply by copying the relevant files and folders to the collecting institutions storage system. The copies website will function in precisely the same way as the original one but with two limitations. - The hyperlinks should be relative not absolute - Any functionality in the original website will no longer be operable unless the appropriate search engine is installed in the new environment.
  • 18. Strengths The principal advantage of the direct transfer method is that it potentially offers the most authentic rendition of the collected website. By collecting from source, it is possible to ensure that the complete content is captured with its original structure. In effect the collecting institution re-host a complete copy of the original website. The degree of authenticity which it is possible to recreate will depend upon the complexities of the technical dependencies, and the extent to which the collecting institution is capable of reproducing them.
  • 19. Limitations The major limitation of this approach are - The resources required to effect each transfer, and sustainability of the supporting technologies. - This method requires cooperation on the part of the website owner, to provide both the data and the necessary documentation.
  • 20. Go through the Case Study: Bristol Royal Infirmary Inquiry See on page : 48 from the book
  • 21. Database Archiving The increase use of web database have made the development of the new web archiving tools a priority and such tools are now beginning to appear. The process of archiving database driven sites involved three stages… 1- The repository defines a standard data model and format for archived database. 2- Each source database is converted to that standard format. 3- A standard access interface is provided to the archived database.
  • 22. Database Format The obvious technology to use for define archiving database format is XML, which is an open standard specifically designed for transforming data structures. Several tools are available which converts the proprietary database to XML format.
  • 23. Tools for Conversion in XML - SIARD = Swiss Federal Archive - DEEPARC = Bibliotheque Nationale de France Both of these tools allow the structure and content of a relational database to be exported into standard formats.
  • 24. SIARD The workflow of SIARD is 1- Automatically analysis and maps the database structure of the source database. 2- Export the definition of the database structure as a text file containing the data definition described using SQL. 3- The content is exported as plain text files together with any large binary objects stored in the database and the metadata is exported as a XML document. 4- The data can then be related into any relational database management system to provide access.
  • 25. DeepArc - It enables a user to map the relational database model of the original database to an XML schema and then export the context of the database into an XML document. - It is intended to be used by the database owner since its use in any particular case requires detailed knowledge of the underlying structure of the database being archived.
  • 26. Flow of Work of DeepArc Tool • First the user creates a view of the database called skeleton which is created by using XML • That skeleton describe the desired structure of the XML documents that will be generated from the database. • The user than builds the associations to map the database to this view.
  • 27. • This entails mapping both the database structure (i.e. the tables) and the contents (i.e columns within that tables) once these associations have been creaed and configured the user can then export the content of the database into XML document which conforms to the defined schema. • If the collecting institution defines a standard XML data model for its archived database, it can therefore use a tool such as DeepArc to transform each database to that structure.
  • 28. Strengths It offers a generic approach to collecting and preserving database content which avoids the problems of supporting multiple technologies incurred by alternative approach of direct transfer. This limits issues of preservation and access to a single format, against which all resources cab be brought to bear. For example, archives can use standard access interfaces such as that provided by the XINQ Tool
  • 29. Limitations • Web database archiving tools are a recent development and are therefore still technology immature compared to some other collection methods. • Supporting Technologies is currently limited • Nature and timings of collection • Original ‘look and feel’. (It should collect the website rather than the collection of database content) • Active cooperation and participation of website owner
  • 30. Remote Harvesting Technique Remote Harvesting is the most common and most widely employed method for collecting websites. It involves the use of web crawler software to harvest content from remote web servers. ‘Crawlers’ are software programs designed to interact with the online services like human users, principally to gather information of the required content. Most of the search engine use these crawlers to collect and index web pages.
  • 31. Web Crawler A web crawler shares many similarities with a desktop web browser, it submits the HTTP request to a web server and stores the content that it receives in return. The actions of the web crawler are dictated by a list of URL’s (or seeds) to visit. The crawler visits the first URL on the list and collects the web page, identifies all the hyperlinks within the page, and adds them to the seed list. In this way, a web crawler that begins on the home page of a web site will eventually visit every linked page within that website. This is recursive process and is normally controlled by certain parameters, such as number of hyperlinks that should be followed.
  • 32. Infrastructure The infrastructure required to operate a web crawler can be minimal; the software simply needs to be installed on a computer system within an available internet connection and sufficient storage space for the collected data. However in most large scale archiving programmes, the crawler software is deployed from networked servers with attached disk or tape storage.
  • 33. Types of Web Crawlers There is a wide variety of web crawlers software available, both proprietary and open source. Three most widely used web crawlers are 1- HTTrack 2- NEDLIB Harvester 3- Heritrix We have already discuss these web crawlers in the first lecture I will not discuss here in this lecture again
  • 34. Parameters Web Crawlers provide a number of parameters can be set to specify their exact behavior. Many crawlers are highly configurable, offering a very wide variety of settings. Most crawlers provide variations on the following parameters. - Connection - Crawl - Collection - Storage - Scheduling settings
  • 35. Connection Settings These setting relate to the manner in which the crawler connects to web servers. - Transfer Rate - Connections - Transfer Rate: The maximum rate at which the crawler will attempt to transfer the data. In this way a specific transfer rate is specified so that the data is captured at a sufficient rate to enable an entire site to be collected in a reasonable timescale. - Connections: to specify the number of simultaneous connections the web crawler can attempt to make with a host, or the delay between the establishing connections
  • 36. Crawl Settings These settings allow the user to control the behavior of the crawler as it traverse a website, such as the direction and depth of the crawl - Link depth and Limits - Robot Exclusion Notices - Link Discovery Settings will normally be available to control the size and duration of the crawl. For example, it may be desirable to halt a crawl after it has collected a given volume of data, or within a given timeframe.
  • 37. • Link depths and Limits: This will determine the number of links that the crawler should follow away from its starting point, and the direction in which it should move. It is possible to determine the limit of the crawler in terms of whether or not the crawler is restricted to follow links within the same path, website or domain, and to what depth. • Robot Exclusion Notice: A robot exclusion notice is a method used by websites to control the behavior of robots such as web crawlers. It uses a standard protocol to define which parts of a website are accessible to the robot. These rules are contained within a ‘robots.txt’ sile in the top level folder of the website • Link Discovery:The user may also be able to configure how the crawler analysis hyperlinks: these links may be dynamically constructed by scripts, or hidden within content such as flash files and therefore not transparent to the crawler. However, more sophisticated crawlers can be configured to discover many of these hidden links.
  • 38. Collection Settings These settings allow the user to fine tune the behavior of the crawler, and particularly to determine the content that is collected. Filters can be defined to include or exclude certain paths and file types: For Example: To exclude links to pop ups advertisements or to collect only links to PDF files. Filters may also be used to avoid crawlers traps, whereby the crawler becomes locked into an endless loop, by detecting repeating patterns of links. The user may also be able to place limits on the maximum size of files to be collected.
  • 39. Storage Settings These settings determine how the crawler stores the collected content. By default, most crawlers will mirror the original structure of the website, building a directory structure which corresponds to the original hierarchy. However it may be possible to dictate other options, such as forcing all images to be stored in a single folder. These options are unlikely to be useful in most web archiving scenarios, where preservation of the original structure will be considered desirable. The crawler can rewrite the hyperlink to convert an absolute link into relative link.
  • 40. Scheduling Settings Tools such as PANDAS, which provide workflow capabilities, allow the scheduling of crawls to be controlled. Typical parameters will include: Frequency: Daily or Weekly Dates: Start or Commencement of process Non-schedule Dates: It may also be possible to define the specific dates for crawling including the standard schedule.
  • 41. Identifying the Crawler Software agents such as web browsers and crawlers identify themselves to the online services with which they connect through a ‘user agent’ identifier within the HTTP headers of the requests they send. Thus, internet explorer 6.0; identifies itself with the user agent Mozilla/4.0(compatiable; MSIE 6.0; windows NT 5.1). The user agent string displayed by a web crawler can generally be modified by the user.
  • 42. Advantages of Identification There are three advantages for this identification.. 1- Crawler identify himself that from which institution it belongs to 2- web servers may be configured to block certain user agents, including web crawlers and search engines robots. Defining a more specific user agent ca prevent such blocking, even if using a crawler that would otherwise be blocked. 3- some websites are designed to display correctly only in certain browsers and check the user agents in any HTTP request accordingly. User agents which do not indicate correct browser compatibility will then be redirected to a warning page.
  • 43. Strengths The greatest strengths of remote harvesting is - Ease of use - Flexibility - Widespread applicability - Availability of number of mature software tools. - A remote harvesting program can be established very quickly and allows large number of websites to be collected in a relatively short period. - The infrastructure requirements are relatively simple and it requires no active participation from website owners: the process is entirely in the control of the archiving body. - Most web crawlers software is comparatively straight forward to use, and can be operated by non-technical staff with some training.
  • 44. Limitations - Careful Configuration - Inability to collect dynamic contents - The large volume of data can be archived with the maximum speed availability which is a draw back.
  • 45. Transactional Archiving Transactional archiving is a fundamentally different approach from any of those previously described, being event driven rather than content driven. • Transactional archiving is an event-driven approach, which collects the actual transactions which take place between a web server and a web browser. It is primarily used as a means of preserving evidence of the content which was actually viewed on a particular website, on a given date. This may be particularly important for organizations which need to comply with legal or regulatory requirements for disclosing and retaining information. • A transactional archiving system typically operates by intercepting every HTTP request to, and response from, the web server, filtering each response to eliminate duplicate content, and permanently storing the responses as bitstreams. A transactional archiving system requires the installation of software on the web server, and cannot therefore be used to collect content from a remote website.
  • 46. Example of Transactional Archive • pageVault supports the archiving of all unique responses generated by a web server. • It allows you to know exactly what information you have published on your web site, whether static pages or dynamically generated content, and regardless of format (HTML, XML, PDF, zip, Microsoft Office formats, images, sound), regardless of rate of change. • Although every unique HTTP response can be archived and indexed, you can define non-material content (such as the current date/time and trivial site personalisation) on a per- URL, directory or regular expression basis which pageVault will exclude when calculating the novelty of a response.
  • 47. Strengths The great strength of transactional archiving is that it collects what is actually viewed. It offers the best option for collecting evidence of how a website was used, and what content was actually available at any given moment. It can be a good solution for archiving certain kinds of dynamic website.
  • 48. Limitations - The transactional collection does not collect content which has never been viewed by a user. - Transactional collection takes place on the web server, it cannot capture variations in the user experience which are introduced by the web browser. - Transactional archiving must takes place server side and therefore requires the active co operation of the web site owner. - The time taken for the server to process and respond to each request will be longer.