SlideShare uma empresa Scribd logo
1 de 74
Baixar para ler offline
ResourceSync:
                    Web-Based
                      Resource
                 Synchronization
                     Herbert Van de Sompel
                           Los Alamos National Laboratory
                                              @hvdsomp


                            ResourceSync is funded by
                          The Sloan Foundation & JISC


         ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Core Team – NISO & OAI


Cornell University & OAI:
   Berhard Haslhofer, Carl Lagoze, Simeon Warner

Old Dominion University & OAI:
   Michael L. Nelson

Los Alamos National Laboratory & OAI:
   Martin Klein, Robert Sanderson, Herbert Van de Sompel

NISO:
   Todd Carpenter, Nettie Lagace, Peter Murray


                             ResourceSync – Herbert Van de Sompel
                    TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Technical Group


•  Manuel Bernhardt, Delving B.V.
•  Kevin Ford, Library of Congress
•  Richard Jones, JISC
•  Graham Klyne, JISC
•  Stuart Lewis, JISC
•  David Rosenthal, LOCKSS
•  Christian Sadilek, Red Hat
•  Shlomo Sanders, Ex Libris, Inc.
•  Sjoerd Siebinga, Delving B.V.
•  Ed Summers, Library of Congress
•  Jeff Young, OCLC Online Computer Library Center


                             ResourceSync – Herbert Van de Sompel
                    TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Synchronize What?

•  Web resources – things with a URI that can be dereferenced and
   are cache-able (no dependency on underlying OS, technologies
   etc.)

•  Small websites/repositories (a few resources) to large
   repositories/datasets/linked data collections (many millions of
   resources)

•  That change slowly (weeks/months) or quickly (seconds), and
   where latency needs may vary

•  Focus on needs of research communication and cultural heritage
   organizations, but aim for generality


                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Why?

… because lots of projects and services are doing synchronization
but have to resort to ad-hoc, case by case, approaches!

•  Project team involved with projects that need this

•  Experience with OAI-PMH: widely used in repos but
    o  XML metadata only

    o  Attempts at synchronizing actual content via OAI-PMH

       (complex object formats, dc:identifier) not successful.
    o  Web technology has moved on since 1999




•  Devise a shared solution for data, metadata, linked data?


                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Cases – The Basics




              ResourceSync – Herbert Van de Sompel
     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Cases - More




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Out Of Scope (For Now)

•  Bidirectional synchronization

•  Destination-defined selective synchronization (query)

•  Bulk URI migration




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Case: arXiv Mirroring

•  1M article versions, ~800/day created or
   updated at 8 PM US Eastern Time

•  Metadata and full-text for each article

•  Accuracy important

•  Want low barrier for others to use

•  Look for more general solution than current
   homebrew mirroring (running with minor
   modifications since 1994!) and occasional rsync
   (filesystem layout specific, auth issues)

                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Case: DBpedia Live Duplication

•  Average of 2 updates per second
•  Want low latency => need a push technology




                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Problem


•  Consideration:
    •  Source (server) A has resources that change over time: they
       get created, modified, deleted
    •  Destination (servers) X, Y, and Z leverage (some) resources
       of Source A.
•  Problem:
    •  Destinations want to keep in step with the resource changes
       at Source A: resource synchronization.
•  Goal:
    •  Design an approach for resource synchronization aligned
       with the Web Architecture that has a fair chance of adoption
       by different communities.
        •  The approach must scale better than recurrent HTTP
           HEAD/GET on resources.


                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Destination: 3 Basic Synchronization Needs

1.  Baseline synchronization – A destination must be able to
    perform an initial load or catch-up with a source
       -  avoid out-of-band setup

2.  Incremental synchronization – A destination must have some
    way to keep up-to-date with changes at a source
       -  subject to some latency; minimal: create/update/delete
       -  allow to catch-up after destination has been offline

3.  Audit – A destination should be able to determine whether it is
    synchronized with a source
       -  subject to some latency



                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations
to know about, it may describe them:

    o    Publish an inventory of resource URIs and possibly
         associated metadata
         -  Destination GETs the Content Description
         -  Destination GETs listed resources by their URI




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
           updated resources, removes deleted resources.




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.

   o     2.2. Push Change Set: Push a list of recent change events
         (create, update, delete resource) towards (a) destination(s)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.




                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set

   o    3.2. Historical Content: Provide access to prior resource versions




                                     ResourceSync – Herbert Van de Sompel
                            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump

   o    4.2. Alternate Content Transfer: Support alternative
        mechanisms to optimize getting content, e.g. content via a
        mirror site, only changes not the entire changed resource.



                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source: Advertise Capabilities

A source needs to advertise the capabilities it supports to allow a
destination to discover them

•     Some capabilities may be provided by a third party, not the
      source itself
     o   e.g. Historical Change Sets, Historical Content
     o   But the source should still make those third party capabilities
         discoverable - trust




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
So Many Choices

                                                   Push
  DSNotify
                OAI-PMH                                                   Pull
     rsync
                   Crawl
                                         OAI-ORE
        RDFsync
                                                           WebDAV Col. Syn.
                                 XMPP
 Atom                                                SWORD                    AtomPub
             Sitemap             RSS

SPARQLpush                                                       PubSubHubbub
                  SDShare                  XMPP

                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
A Framework Based on Sitemaps

•  Modular framework allowing selective deployment

•  Sitemap is the core component throughout the
   framework

   o    Introduce extension elements and attributes:
          -  In ResourceSync namespace (rs:) to
             accommodate synchronization needs
          -  In XHTML namespace (xhtml:) mainly to
             accommodate discovery needs
   o    Reuse Sitemap format for Change Sets (both
        current and historical) and for manifest in Dump

                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap with Added Datetime




                ResourceSync – Herbert Van de Sompel
       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Types: Extend lastmod, Use expires
                                        !




                       ResourceSync – Herbert Van de Sompel
              TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap with lastmod and expires
                               !




                   ResourceSync – Herbert Van de Sompel
          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap Discovery via robots.txt
                               !




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set: An rs Typed Sitemap




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
More rs Extension Elements




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set with rs and xhtml Extensions




                      ResourceSync – Herbert Van de Sompel
             TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set Discovery via Sitemap




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pushing Change Sets via XMPP PubSub

XMPP Publish-Subscribe: Client to Subscription Service,
  Subscription Service to Client(s) communication

•  One of the XMPP (Extensible Messaging and Presence Protocol)
   extensions http://xmpp.org/extensions/xep-0060.html
•  Apple Notifications based on XMPP PubSub
•  Available tools, see http://xmpp.org/about-xmpp/
   technology-overview/pubsub/#impl-client
    o  XMPP Servers with PubSub support:

        -  ejabberd , OpenFire , Tigase , SleekXMPP
    o  XMPP libraries with PubSub support:

        -  Strophe (C, JavaScript), XMPP4R (Ruby), SleekXMPP
           (Python), PubSub Client (Python)



                                 ResourceSync – Herbert Van de Sompel
                        TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pushing Change Sets via XMPP PubSub




                    ResourceSync – Herbert Van de Sompel
           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set via XMPP




            ResourceSync – Herbert Van de Sompel
   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Push Change Set Discovery via Sitemap




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering a Historical Change Set via a Current Change Set




                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering Historical Content – Link to Version Resource




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Memento Intermezzo




 http://www.mementoweb.org/

            ResourceSync – Herbert Van de Sompel
   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Original Resources and Mementos




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Bridge from Present to Past




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Bridge from Past to Present




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Memento Framework




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering Historical Content – Link to Memento TimeGate




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Dump

•  Two formats currently under discussion:

   o    Format based on ZIP:
         -  Package content
         -  Add manifest (manifest.xml) expressed in
            Sitemap format
         -  ZIP it up

   o    WARC files as used by the web archiving
        community



                               ResourceSync – Herbert Van de Sompel
                      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Mapping URI to File Path with rs:path
                                    !




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Manifest (manifest.xml) Expressed in Sitemap Format




                            ResourceSync – Herbert Van de Sompel
                   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Dump Discovery via Sitemap




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Alternate Location




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Alternate Protocol, e.g. Obtain Changes Only




                        ResourceSync – Herbert Van de Sompel
               TICER Summer School, August 22 2012, Tilburg, The Netherlands
Timeline
•  August 2012
    o  First draft spec shared for feedback with ResourceSync team




•  September 2012
    o  In-person meeting of ResourceSync Team

    o  Revise spec, conduct experiments

    o  Solicit broad feedback

    o  Paper in D-Lib Magazine




•  December 2012 – Finalize specification (?)




                                 ResourceSync – Herbert Van de Sompel
                        TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pointers
•  First draft spec:
   http://www.openarchives.org/rs/0.1/resourcesync!

•  Simulator code on github
   http://github.org/resync/simulator!

•  NISO workspace
   http://www.niso.org/workrooms/resourcesync/!
   !
•  List for public comment coming soon




                           ResourceSync – Herbert Van de Sompel
                  TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync:
                    Web-Based
                      Resource
                 Synchronization
                     Herbert Van de Sompel
                           Los Alamos National Laboratory
                                              @hvdsomp


                            ResourceSync is funded by
                          The Sloan Foundation & JISC


         ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands

Mais conteúdo relacionado

Mais procurados

NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
National Information Standards Organization (NISO)
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
guest27e6764
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Mat Kelly
 

Mais procurados (20)

ResourceSync
ResourceSyncResourceSync
ResourceSync
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
 
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Lessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data ArchiveLessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data Archive
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
End-to-End : Open Access Process Review and Improvements
End-to-End: Open Access Process Review and ImprovementsEnd-to-End: Open Access Process Review and Improvements
End-to-End : Open Access Process Review and Improvements
 

Semelhante a ResourceSync: Web-Based Resource Synchronization

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
Robert Sanderson
 

Semelhante a ResourceSync: Web-Based Resource Synchronization (20)

A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
IPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDPIPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDP
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Open Energy Data
Open Energy DataOpen Energy Data
Open Energy Data
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...
 
FFL & CNYH
FFL & CNYHFFL & CNYH
FFL & CNYH
 

Mais de Herbert Van de Sompel

Mais de Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly record
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
Memento 101
Memento 101Memento 101
Memento 101
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

ResourceSync: Web-Based Resource Synchronization

  • 1. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 2. ResourceSync Core Team – NISO & OAI Cornell University & OAI: Berhard Haslhofer, Carl Lagoze, Simeon Warner Old Dominion University & OAI: Michael L. Nelson Los Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de Sompel NISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 3. ResourceSync Technical Group •  Manuel Bernhardt, Delving B.V. •  Kevin Ford, Library of Congress •  Richard Jones, JISC •  Graham Klyne, JISC •  Stuart Lewis, JISC •  David Rosenthal, LOCKSS •  Christian Sadilek, Red Hat •  Shlomo Sanders, Ex Libris, Inc. •  Sjoerd Siebinga, Delving B.V. •  Ed Summers, Library of Congress •  Jeff Young, OCLC Online Computer Library Center ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 4. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 5. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 6. Synchronize What? •  Web resources – things with a URI that can be dereferenced and are cache-able (no dependency on underlying OS, technologies etc.) •  Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) •  That change slowly (weeks/months) or quickly (seconds), and where latency needs may vary •  Focus on needs of research communication and cultural heritage organizations, but aim for generality ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 7. Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! •  Project team involved with projects that need this •  Experience with OAI-PMH: widely used in repos but o  XML metadata only o  Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o  Web technology has moved on since 1999 •  Devise a shared solution for data, metadata, linked data? ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 8. Use Cases – The Basics ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 9. Use Cases - More ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 10. Out Of Scope (For Now) •  Bidirectional synchronization •  Destination-defined selective synchronization (query) •  Bulk URI migration ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 11. Use Case: arXiv Mirroring •  1M article versions, ~800/day created or updated at 8 PM US Eastern Time •  Metadata and full-text for each article •  Accuracy important •  Want low barrier for others to use •  Look for more general solution than current homebrew mirroring (running with minor modifications since 1994!) and occasional rsync (filesystem layout specific, auth issues) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 12. Use Case: DBpedia Live Duplication •  Average of 2 updates per second •  Want low latency => need a push technology ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 13. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 14. ResourceSync Problem •  Consideration: •  Source (server) A has resources that change over time: they get created, modified, deleted •  Destination (servers) X, Y, and Z leverage (some) resources of Source A. •  Problem: •  Destinations want to keep in step with the resource changes at Source A: resource synchronization. •  Goal: •  Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. •  The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 15. Destination: 3 Basic Synchronization Needs 1.  Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source -  avoid out-of-band setup 2.  Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source -  subject to some latency; minimal: create/update/delete -  allow to catch-up after destination has been offline 3.  Audit – A destination should be able to determine whether it is synchronized with a source -  subject to some latency ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 16. Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o  Publish an inventory of resource URIs and possibly associated metadata -  Destination GETs the Content Description -  Destination GETs listed resources by their URI ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 17.
  • 18.
  • 19. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. o  2.2. Push Change Set: Push a list of recent change events (create, update, delete resource) towards (a) destination(s) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 25. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 26.
  • 27.
  • 28.
  • 29. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set o  3.2. Historical Content: Provide access to prior resource versions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 30. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 31.
  • 32. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump o  4.2. Alternate Content Transfer: Support alternative mechanisms to optimize getting content, e.g. content via a mirror site, only changes not the entire changed resource. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 33. Source: Advertise Capabilities A source needs to advertise the capabilities it supports to allow a destination to discover them •  Some capabilities may be provided by a third party, not the source itself o  e.g. Historical Change Sets, Historical Content o  But the source should still make those third party capabilities discoverable - trust ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 34. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 35. So Many Choices Push DSNotify OAI-PMH Pull rsync Crawl OAI-ORE RDFsync WebDAV Col. Syn. XMPP Atom SWORD AtomPub Sitemap RSS SPARQLpush PubSubHubbub SDShare XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 36. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 37. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 38. A Framework Based on Sitemaps •  Modular framework allowing selective deployment •  Sitemap is the core component throughout the framework o  Introduce extension elements and attributes: -  In ResourceSync namespace (rs:) to accommodate synchronization needs -  In XHTML namespace (xhtml:) mainly to accommodate discovery needs o  Reuse Sitemap format for Change Sets (both current and historical) and for manifest in Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 39. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 40. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 41. Sitemap with Added Datetime ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 42. Change Types: Extend lastmod, Use expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 43. Sitemap with lastmod and expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 44. Sitemap Discovery via robots.txt ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 45. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 46. Change Set: An rs Typed Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 47. More rs Extension Elements ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 48. Change Set with rs and xhtml Extensions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 49. Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 50. Pushing Change Sets via XMPP PubSub XMPP Publish-Subscribe: Client to Subscription Service, Subscription Service to Client(s) communication •  One of the XMPP (Extensible Messaging and Presence Protocol) extensions http://xmpp.org/extensions/xep-0060.html •  Apple Notifications based on XMPP PubSub •  Available tools, see http://xmpp.org/about-xmpp/ technology-overview/pubsub/#impl-client o  XMPP Servers with PubSub support: -  ejabberd , OpenFire , Tigase , SleekXMPP o  XMPP libraries with PubSub support: -  Strophe (C, JavaScript), XMPP4R (Ruby), SleekXMPP (Python), PubSub Client (Python) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 51. Pushing Change Sets via XMPP PubSub ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 52. Change Set via XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 53. Push Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 54. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 55. Discovering a Historical Change Set via a Current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 56. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 57. Discovering Historical Content – Link to Version Resource ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 58. Memento Intermezzo http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 59. Original Resources and Mementos ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 60. Bridge from Present to Past ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 61. Bridge from Past to Present ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 62. Memento Framework ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 63. Discovering Historical Content – Link to Memento TimeGate ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 64. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 65. Dump •  Two formats currently under discussion: o  Format based on ZIP: -  Package content -  Add manifest (manifest.xml) expressed in Sitemap format -  ZIP it up o  WARC files as used by the web archiving community ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 66. Mapping URI to File Path with rs:path ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 67. Manifest (manifest.xml) Expressed in Sitemap Format ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 68. Dump Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 69. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 70. Alternate Location ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 71. Alternate Protocol, e.g. Obtain Changes Only ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 72. Timeline •  August 2012 o  First draft spec shared for feedback with ResourceSync team •  September 2012 o  In-person meeting of ResourceSync Team o  Revise spec, conduct experiments o  Solicit broad feedback o  Paper in D-Lib Magazine •  December 2012 – Finalize specification (?) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 73. Pointers •  First draft spec: http://www.openarchives.org/rs/0.1/resourcesync! •  Simulator code on github http://github.org/resync/simulator! •  NISO workspace http://www.niso.org/workrooms/resourcesync/! ! •  List for public comment coming soon ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 74. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands