SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Using DISC to Reindex ~50Tb of URIs


                                   Robert Sanderson
                                              rsanderson@lanl.gov
                                              @azaroth42

                                   Herbert Van de Sompel
                                              herbertv@lanl.gov
                                              @hvdsomp

                                     http://www.mementoweb.org/



Towards Seamless Navigation of the Web of the Past
            Big Data Interest Group, LANL, Feb 21st 2013            1
                            [Unclassified]
Summary


Memento

Project Overview
Participants

The Plan …
… and the Reality

Technical Details

Next Run


             Big Data Interest Group, LANL, Feb 21st 2013   2
                             [Unclassified]
Memento Project



   “       Provide web citizens seamless access
                   to the web of the past                      ”
Some accolades:
•  $1M funding from the Library of Congress
•  Winner of 2010 International Digital Preservation Award
•  Accepted by International Internet Preservation Consortium
      as the way forwards for access to web archives
•  Nominated for Best Paper at JCDL 2010
•  Best Poster at JCDL 2012
•  Tim Berners-Lee @ LDOW10: “We absolutely need this”
•  Internet Draft 6 (final call) by May




                Big Data Interest Group, LANL, Feb 21st 2013       3
                                [Unclassified]
Old Copies have New Names
Everyone knows the URL for CNN is:               http://www.cnn.com/
Everyone knows the URL for CNN was:              http://www.cnn.com/




               Big Data Interest Group, LANL, Feb 21st 2013            4
                               [Unclassified]
Old Copies have New Names
Copies of the past versions of to http://www.cnn.com/ exist
… but you don’t go there to get them, they have new URLs
… and there’s no way to automatically discover those new URLs




          http://web.archive.org/web/20031013111028/http://www2.cnn.com/


                  Big Data Interest Group, LANL, Feb 21st 2013             5
                                  [Unclassified]
Discovery is Hard
People want to find, not search
… and especially not searching by hand!




               http://web.archive.org/web/*/http://www2.cnn.com/


                Big Data Interest Group, LANL, Feb 21st 2013       6
                                [Unclassified]
Navigating in the Past
Links to the real resource bring you back out of the past
… or trap you in a single incomplete archive.




                           Pentagon




     Dec 20 2001, 4:51:00 UTC                                          current
                   http://web.archive.org/web/*/http://www2.cnn.com/


                    Big Data Interest Group, LANL, Feb 21st 2013                 7
                                    [Unclassified]
Memento: Time Travel for the Web
  TimeGate: Resource that knows the locations of archived copies
  Memento: An archived copy of previous state of a web resource

Link header to TimeGate
                           302 redirect to Memento




  How does the TimeGate know where the appropriate Memento is?


                  Big Data Interest Group, LANL, Feb 21st 2013           8
                                  [Unclassified]
Project Overview

Goal:
•  To aggregate the metadata of the distributed archives
   of the IIPC, and
    •  To provide Memento based access to the holdings
       of open archives
    •  To provide knowledge of the holdings of restricted
       archives
    •  To provide knowledge to IIPC members of the
       holdings of totally closed archives




              Big Data Interest Group, LANL, Feb 21st 2013   9
                              [Unclassified]
Experiment Participants

•    Austrian National Library
•    Bibliothèque Nationale de France
•    British Library
•    Institut National de l'Audiovisuel
•    Internet Archive
•    Koninklijke Bibliotheek
•    Library of Congress
•    Netarchive.dk
•    Swiss National Library
•    University of North Texas

• Los Alamos National Laboratory
• Old Dominion University

                Big Data Interest Group, LANL, Feb 21st 2013   10
                                [Unclassified]
Data

Canonical URL     kosovakosovo.com/photo.php?id=5785
   Datestamp      20090608161553
 Request URL      http://www.kosovakosovo.com/photo.php?id=5785
   MIME type      text/html
  HTTP Status     200
   Checksum       M36MRHSBVPLKMUN6PFOIEV3AH5ADITAN
    Redirect?     -
        Bytes     563096

  Storage File    AIT-1068-20090608161511.warc.gz
                  

           Multiplied by 6Tb compressed, ~50Tb uncompressed



                   Big Data Interest Group, LANL, Feb 21st 2013   11
                                   [Unclassified]
The Plan…

•  To provide fast access to distributed archives, LANL
   would merge the indexes of the holdings of multiple
   archives and provide Memento based access

•  Step 1: Library of Congress gathers CDX files
   Step 2: LANL indexes (a.k.a. “…”)
   Step 3: Profit

•  Data: 6T of gzipped CDX files (mostly from IA)
    •  Shipped on hard drives
•  Computing: 210 node DISC cluster at LANL
    •  2x 2ghz processors, 2x 2T HDD, 8G RAM


              Big Data Interest Group, LANL, Feb 21st 2013   12
                              [Unclassified]
… and the Reality

•  Hardware failure killed one of the drives en route
    •  Transferred remaining files via BagIt from LoC

•  DISC has restricted access:
    •  Had to transfer data over intranet
    •  2 weeks to sync (5Mb/sec)
    •  And then 2 weeks to get the processed results off

•  Compute cluster has faulty switch, unreliable nodes:
    •  Ran original processing 15 times without success
       due to hardware failures



              Big Data Interest Group, LANL, Feb 21st 2013   13
                              [Unclassified]
Processing Design

•  For each CDX file,
    •  For each URI + timestamps,
        •  Map URI to an appropriate database slice
        •  Merge timestamps with those of previous CDXs

•  Possible because:
    •  No need to do truncated search
    •  No need to walk through URIs in order
    •  No need for time based access, only URI

•  Problem is “Embarrassingly Parallel”



              Big Data Interest Group, LANL, Feb 21st 2013   14
                              [Unclassified]
Approach 1: Online Messaging

•  25 read nodes, 1 control node, 150 write nodes
•  Messages (1000 URIs) sent via control node to write




•  Failed 15 times due to hardware issues
             Big Data Interest Group, LANL, Feb 21st 2013   15
                             [Unclassified]
Approach 1: Implementation

•  Python

•  MPI        L
•  PyMPI       LL
•  No failure detection, no auto restart, no zombie killing,
   no …        LLL
•  Several months of experiments to find issues, fine tune
   parameters most likely to complete…




              Big Data Interest Group, LANL, Feb 21st 2013     16
                              [Unclassified]
Approach 2: No Interaction

•  43 read/split nodes
•  Phase 1: Read nodes split CDX files to 3000 slices




              Big Data Interest Group, LANL, Feb 21st 2013   17
                              [Unclassified]
Approach 2: No Interaction

•  Phase 2a: Transfer CDX slices to Control node
•  Phase 2b: Transfer CDX slices to Write nodes




             Big Data Interest Group, LANL, Feb 21st 2013   18
                             [Unclassified]
Approach 2: No Interaction

•  50 write nodes (* 60 slices each = 3000 slices)
•  Phase 3: Merge slices from nodes to BerkeleyDBs




             Big Data Interest Group, LANL, Feb 21st 2013   19
                             [Unclassified]
Next Steps


•  Re-index, using non-interactive approach
    •  New data: 14 Tb (~120Tb uncompressed?)
    •  Use FileMap for better automation?

•  Two eSata 8T LaCie cubes, 2 3T internal drives to
   install locally

•  More intelligent data partitioning
•  Some sort of error detection and handling!




              Big Data Interest Group, LANL, Feb 21st 2013   20
                              [Unclassified]
Memento

                              http://mementoweb.org/

                                          Robert Sanderson
                                        rsanderson@lanl.gov
                                                @azaroth42

                                     Herbert Van de Sompel
                                         herbertv@lanl.gov
                                                @hvdsomp


Big Data Interest Group, LANL, Feb 21st 2013              21
                [Unclassified]

Mais conteúdo relacionado

Destaque

Analyzing the Persistence of Referenced Web Resources with Memento
Analyzing the Persistence of Referenced Web Resources with MementoAnalyzing the Persistence of Referenced Web Resources with Memento
Analyzing the Persistence of Referenced Web Resources with MementoRobert Sanderson
 
Linked Data: Building Standards and Communities
Linked Data: Building Standards and CommunitiesLinked Data: Building Standards and Communities
Linked Data: Building Standards and CommunitiesRobert Sanderson
 
Transcending Silos: Shared Canvas Data Model for Digital Facsimiles
Transcending Silos: Shared Canvas Data Model for Digital FacsimilesTranscending Silos: Shared Canvas Data Model for Digital Facsimiles
Transcending Silos: Shared Canvas Data Model for Digital FacsimilesRobert Sanderson
 
No uso de las TIC en el Aula
No uso de las TIC en el AulaNo uso de las TIC en el Aula
No uso de las TIC en el AulaAndrés Mov
 
Erika Pricyla Cerino HernáNdez
Erika Pricyla Cerino HernáNdezErika Pricyla Cerino HernáNdez
Erika Pricyla Cerino HernáNdezguest1cc234
 
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...Robert Sanderson
 
NLLC 2011: Memento, Open Annotation, SharedCanvas
NLLC 2011: Memento, Open Annotation, SharedCanvasNLLC 2011: Memento, Open Annotation, SharedCanvas
NLLC 2011: Memento, Open Annotation, SharedCanvasRobert Sanderson
 
Dit Heb Je Nog Nooit Gezien
Dit Heb Je Nog Nooit GezienDit Heb Je Nog Nooit Gezien
Dit Heb Je Nog Nooit Gezienguest6964ce
 
W3C Open Annotation: Status and Use Cases
W3C Open Annotation: Status and Use CasesW3C Open Annotation: Status and Use Cases
W3C Open Annotation: Status and Use CasesRobert Sanderson
 
Making Web Annotations Persistent over Time
Making Web Annotations Persistent over TimeMaking Web Annotations Persistent over Time
Making Web Annotations Persistent over TimeRobert Sanderson
 
NISO Annotation Meeting (San Francisco)
NISO Annotation Meeting (San Francisco)NISO Annotation Meeting (San Francisco)
NISO Annotation Meeting (San Francisco)Robert Sanderson
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)Robert Sanderson
 
IIIF Overview for Linked Data Exhibitions
IIIF Overview for Linked Data ExhibitionsIIIF Overview for Linked Data Exhibitions
IIIF Overview for Linked Data ExhibitionsRobert Sanderson
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelRobert Sanderson
 

Destaque (20)

Analyzing the Persistence of Referenced Web Resources with Memento
Analyzing the Persistence of Referenced Web Resources with MementoAnalyzing the Persistence of Referenced Web Resources with Memento
Analyzing the Persistence of Referenced Web Resources with Memento
 
Linked Data: Building Standards and Communities
Linked Data: Building Standards and CommunitiesLinked Data: Building Standards and Communities
Linked Data: Building Standards and Communities
 
Transcending Silos: Shared Canvas Data Model for Digital Facsimiles
Transcending Silos: Shared Canvas Data Model for Digital FacsimilesTranscending Silos: Shared Canvas Data Model for Digital Facsimiles
Transcending Silos: Shared Canvas Data Model for Digital Facsimiles
 
No uso de las TIC en el Aula
No uso de las TIC en el AulaNo uso de las TIC en el Aula
No uso de las TIC en el Aula
 
Erika Pricyla Cerino HernáNdez
Erika Pricyla Cerino HernáNdezErika Pricyla Cerino HernáNdez
Erika Pricyla Cerino HernáNdez
 
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemina...
 
Niso Annotation Webinar
Niso Annotation WebinarNiso Annotation Webinar
Niso Annotation Webinar
 
Python Web Interaction
Python Web InteractionPython Web Interaction
Python Web Interaction
 
NLLC 2011: Memento, Open Annotation, SharedCanvas
NLLC 2011: Memento, Open Annotation, SharedCanvasNLLC 2011: Memento, Open Annotation, SharedCanvas
NLLC 2011: Memento, Open Annotation, SharedCanvas
 
Dit Heb Je Nog Nooit Gezien
Dit Heb Je Nog Nooit GezienDit Heb Je Nog Nooit Gezien
Dit Heb Je Nog Nooit Gezien
 
W3C Open Annotation: Status and Use Cases
W3C Open Annotation: Status and Use CasesW3C Open Annotation: Status and Use Cases
W3C Open Annotation: Status and Use Cases
 
Making Web Annotations Persistent over Time
Making Web Annotations Persistent over TimeMaking Web Annotations Persistent over Time
Making Web Annotations Persistent over Time
 
NISO Annotation Meeting (San Francisco)
NISO Annotation Meeting (San Francisco)NISO Annotation Meeting (San Francisco)
NISO Annotation Meeting (San Francisco)
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)
 
IIIF Presentation API
IIIF Presentation API IIIF Presentation API
IIIF Presentation API
 
IIIF Overview for Linked Data Exhibitions
IIIF Overview for Linked Data ExhibitionsIIIF Overview for Linked Data Exhibitions
IIIF Overview for Linked Data Exhibitions
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 
Hemoptysis jack
Hemoptysis jackHemoptysis jack
Hemoptysis jack
 
Lactate by jack.
Lactate by jack.Lactate by jack.
Lactate by jack.
 
Pneumothorax ..jack
Pneumothorax ..jackPneumothorax ..jack
Pneumothorax ..jack
 

Semelhante a Big Data: Indexing ~50Tb of URIs

Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageDebasish Ghosh
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015spectralogic
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and HadoopSalil Navgire
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Labours of Love & Convenience - Open Repositories 2018
Labours of Love & Convenience - Open Repositories 2018Labours of Love & Convenience - Open Repositories 2018
Labours of Love & Convenience - Open Repositories 2018Stefano Cossu
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemUldis Bojars
 
Honey on the Wire KohaCon18
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18Joy Nelson
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkScrapinghub
 
January 2022: Central Iowa Linux Users Group: Git
January 2022: Central Iowa Linux Users Group: GitJanuary 2022: Central Iowa Linux Users Group: Git
January 2022: Central Iowa Linux Users Group: GitAndrew Denner
 
Pg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enoughPg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enoughRenaud Bruyeron
 
DIGITAL LIBRARIES
DIGITAL LIBRARIESDIGITAL LIBRARIES
DIGITAL LIBRARIESviedma2
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataSchubert Zhang
 

Semelhante a Big Data: Indexing ~50Tb of URIs (20)

Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new age
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
 
Labours of Love & Convenience - Open Repositories 2018
Labours of Love & Convenience - Open Repositories 2018Labours of Love & Convenience - Open Repositories 2018
Labours of Love & Convenience - Open Repositories 2018
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
 
Honey on the Wire KohaCon18
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
January 2022: Central Iowa Linux Users Group: Git
January 2022: Central Iowa Linux Users Group: GitJanuary 2022: Central Iowa Linux Users Group: Git
January 2022: Central Iowa Linux Users Group: Git
 
Pg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enoughPg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enough
 
DIGITAL LIBRARIES
DIGITAL LIBRARIESDIGITAL LIBRARIES
DIGITAL LIBRARIES
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 

Mais de Robert Sanderson

LUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at YaleLUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at YaleRobert Sanderson
 
Zoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable DataZoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable DataRobert Sanderson
 
Provenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked ArtProvenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked ArtRobert Sanderson
 
Data is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD SustainabilityData is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD SustainabilityRobert Sanderson
 
A Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and UsabilityA Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and UsabilityRobert Sanderson
 
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataLinked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataRobert Sanderson
 
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataIllusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataRobert Sanderson
 
Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)Robert Sanderson
 
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data EcosystemSanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data EcosystemRobert Sanderson
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingRobert Sanderson
 
The Importance of being LOUD
The Importance of being LOUDThe Importance of being LOUD
The Importance of being LOUDRobert Sanderson
 
Introduction to Linked Art Model
Introduction to Linked Art ModelIntroduction to Linked Art Model
Introduction to Linked Art ModelRobert Sanderson
 
Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...Robert Sanderson
 
Strong Opinions, Weakly Held
Strong Opinions, Weakly HeldStrong Opinions, Weakly Held
Strong Opinions, Weakly HeldRobert Sanderson
 
IIIF Discovery Walkthrough
IIIF Discovery WalkthroughIIIF Discovery Walkthrough
IIIF Discovery WalkthroughRobert Sanderson
 
Linked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRMLinked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRMRobert Sanderson
 
Euromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over CommitteeEuromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over CommitteeRobert Sanderson
 
Linked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data ModelLinked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data ModelRobert Sanderson
 
EuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUDEuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUDRobert Sanderson
 

Mais de Robert Sanderson (20)

Understanding Linked Art
Understanding Linked ArtUnderstanding Linked Art
Understanding Linked Art
 
LUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at YaleLUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at Yale
 
Zoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable DataZoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable Data
 
Provenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked ArtProvenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked Art
 
Data is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD SustainabilityData is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD Sustainability
 
A Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and UsabilityA Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and Usability
 
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataLinked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
 
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataIllusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
 
Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)
 
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data EcosystemSanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
 
The Importance of being LOUD
The Importance of being LOUDThe Importance of being LOUD
The Importance of being LOUD
 
Introduction to Linked Art Model
Introduction to Linked Art ModelIntroduction to Linked Art Model
Introduction to Linked Art Model
 
Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...
 
Strong Opinions, Weakly Held
Strong Opinions, Weakly HeldStrong Opinions, Weakly Held
Strong Opinions, Weakly Held
 
IIIF Discovery Walkthrough
IIIF Discovery WalkthroughIIIF Discovery Walkthrough
IIIF Discovery Walkthrough
 
Linked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRMLinked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRM
 
Euromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over CommitteeEuromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over Committee
 
Linked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data ModelLinked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data Model
 
EuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUDEuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUD
 

Último

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Último (20)

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

Big Data: Indexing ~50Tb of URIs

  • 1. Using DISC to Reindex ~50Tb of URIs Robert Sanderson rsanderson@lanl.gov @azaroth42 Herbert Van de Sompel herbertv@lanl.gov @hvdsomp http://www.mementoweb.org/ Towards Seamless Navigation of the Web of the Past Big Data Interest Group, LANL, Feb 21st 2013 1 [Unclassified]
  • 2. Summary Memento Project Overview Participants The Plan … … and the Reality Technical Details Next Run Big Data Interest Group, LANL, Feb 21st 2013 2 [Unclassified]
  • 3. Memento Project “ Provide web citizens seamless access to the web of the past ” Some accolades: •  $1M funding from the Library of Congress •  Winner of 2010 International Digital Preservation Award •  Accepted by International Internet Preservation Consortium as the way forwards for access to web archives •  Nominated for Best Paper at JCDL 2010 •  Best Poster at JCDL 2012 •  Tim Berners-Lee @ LDOW10: “We absolutely need this” •  Internet Draft 6 (final call) by May Big Data Interest Group, LANL, Feb 21st 2013 3 [Unclassified]
  • 4. Old Copies have New Names Everyone knows the URL for CNN is: http://www.cnn.com/ Everyone knows the URL for CNN was: http://www.cnn.com/ Big Data Interest Group, LANL, Feb 21st 2013 4 [Unclassified]
  • 5. Old Copies have New Names Copies of the past versions of to http://www.cnn.com/ exist … but you don’t go there to get them, they have new URLs … and there’s no way to automatically discover those new URLs http://web.archive.org/web/20031013111028/http://www2.cnn.com/ Big Data Interest Group, LANL, Feb 21st 2013 5 [Unclassified]
  • 6. Discovery is Hard People want to find, not search … and especially not searching by hand! http://web.archive.org/web/*/http://www2.cnn.com/ Big Data Interest Group, LANL, Feb 21st 2013 6 [Unclassified]
  • 7. Navigating in the Past Links to the real resource bring you back out of the past … or trap you in a single incomplete archive. Pentagon Dec 20 2001, 4:51:00 UTC current http://web.archive.org/web/*/http://www2.cnn.com/ Big Data Interest Group, LANL, Feb 21st 2013 7 [Unclassified]
  • 8. Memento: Time Travel for the Web TimeGate: Resource that knows the locations of archived copies Memento: An archived copy of previous state of a web resource Link header to TimeGate 302 redirect to Memento How does the TimeGate know where the appropriate Memento is? Big Data Interest Group, LANL, Feb 21st 2013 8 [Unclassified]
  • 9. Project Overview Goal: •  To aggregate the metadata of the distributed archives of the IIPC, and •  To provide Memento based access to the holdings of open archives •  To provide knowledge of the holdings of restricted archives •  To provide knowledge to IIPC members of the holdings of totally closed archives Big Data Interest Group, LANL, Feb 21st 2013 9 [Unclassified]
  • 10. Experiment Participants •  Austrian National Library •  Bibliothèque Nationale de France • British Library • Institut National de l'Audiovisuel • Internet Archive • Koninklijke Bibliotheek • Library of Congress •  Netarchive.dk •  Swiss National Library •  University of North Texas • Los Alamos National Laboratory • Old Dominion University Big Data Interest Group, LANL, Feb 21st 2013 10 [Unclassified]
  • 11. Data Canonical URL kosovakosovo.com/photo.php?id=5785 Datestamp 20090608161553 Request URL http://www.kosovakosovo.com/photo.php?id=5785 MIME type text/html HTTP Status 200 Checksum M36MRHSBVPLKMUN6PFOIEV3AH5ADITAN Redirect? - Bytes 563096 Storage File AIT-1068-20090608161511.warc.gz Multiplied by 6Tb compressed, ~50Tb uncompressed Big Data Interest Group, LANL, Feb 21st 2013 11 [Unclassified]
  • 12. The Plan… •  To provide fast access to distributed archives, LANL would merge the indexes of the holdings of multiple archives and provide Memento based access •  Step 1: Library of Congress gathers CDX files Step 2: LANL indexes (a.k.a. “…”) Step 3: Profit •  Data: 6T of gzipped CDX files (mostly from IA) •  Shipped on hard drives •  Computing: 210 node DISC cluster at LANL •  2x 2ghz processors, 2x 2T HDD, 8G RAM Big Data Interest Group, LANL, Feb 21st 2013 12 [Unclassified]
  • 13. … and the Reality •  Hardware failure killed one of the drives en route •  Transferred remaining files via BagIt from LoC •  DISC has restricted access: •  Had to transfer data over intranet •  2 weeks to sync (5Mb/sec) •  And then 2 weeks to get the processed results off •  Compute cluster has faulty switch, unreliable nodes: •  Ran original processing 15 times without success due to hardware failures Big Data Interest Group, LANL, Feb 21st 2013 13 [Unclassified]
  • 14. Processing Design •  For each CDX file, •  For each URI + timestamps, •  Map URI to an appropriate database slice •  Merge timestamps with those of previous CDXs •  Possible because: •  No need to do truncated search •  No need to walk through URIs in order •  No need for time based access, only URI •  Problem is “Embarrassingly Parallel” Big Data Interest Group, LANL, Feb 21st 2013 14 [Unclassified]
  • 15. Approach 1: Online Messaging •  25 read nodes, 1 control node, 150 write nodes •  Messages (1000 URIs) sent via control node to write •  Failed 15 times due to hardware issues Big Data Interest Group, LANL, Feb 21st 2013 15 [Unclassified]
  • 16. Approach 1: Implementation •  Python •  MPI L •  PyMPI LL •  No failure detection, no auto restart, no zombie killing, no … LLL •  Several months of experiments to find issues, fine tune parameters most likely to complete… Big Data Interest Group, LANL, Feb 21st 2013 16 [Unclassified]
  • 17. Approach 2: No Interaction •  43 read/split nodes •  Phase 1: Read nodes split CDX files to 3000 slices Big Data Interest Group, LANL, Feb 21st 2013 17 [Unclassified]
  • 18. Approach 2: No Interaction •  Phase 2a: Transfer CDX slices to Control node •  Phase 2b: Transfer CDX slices to Write nodes Big Data Interest Group, LANL, Feb 21st 2013 18 [Unclassified]
  • 19. Approach 2: No Interaction •  50 write nodes (* 60 slices each = 3000 slices) •  Phase 3: Merge slices from nodes to BerkeleyDBs Big Data Interest Group, LANL, Feb 21st 2013 19 [Unclassified]
  • 20. Next Steps •  Re-index, using non-interactive approach •  New data: 14 Tb (~120Tb uncompressed?) •  Use FileMap for better automation? •  Two eSata 8T LaCie cubes, 2 3T internal drives to install locally •  More intelligent data partitioning •  Some sort of error detection and handling! Big Data Interest Group, LANL, Feb 21st 2013 20 [Unclassified]
  • 21. Memento http://mementoweb.org/ Robert Sanderson rsanderson@lanl.gov @azaroth42 Herbert Van de Sompel herbertv@lanl.gov @hvdsomp Big Data Interest Group, LANL, Feb 21st 2013 21 [Unclassified]