SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
SCAPE


SCAPE
Building Digital Preservation Infrastructure
Dr. Ross King
AIT Austrian Institute of Technology GmbH

eSciDoc Days
Berlin, October 27, 2011
SCAPE
                                                                 Digital Preservation
• For the first time, the rate of
  increase of information creation is
  beginning to exceed the rate of
  increase in storage capacity.

• This massive volume of digital
  material raises a number of issues:
         •        What is worth preserving?
         •        How to preserve so much?
         •        How to access preserved data?
         •        How to create incentives to
                  preserve?

 http://arstechnica.com/business/consumerization-of-it/2011/09/information-explosion-how-rapidly-expanding-storage-spurs-innovation.ars




                                                                                                   07.11.2011
                                                                                                                                             2
SCAPE
                    Digital Preservation
• Standards, best-practices, and technologies utilized in order to
  ensure access to digital information over time

• How long?

  “Digital documents last forever – or five years,
   whichever comes first.”
       http://www.clir.org/pubs/reports/rothenberg/introduction.html


• Generally we mean decades or centuries

                               07.11.2011
                                                                       3
SCAPE
             SCAPE – what is it about?

• Planning and managing computing-intensive (digital)
  preservation processes such as the large-scale
  ingestion or migration of large (multi-Terabyte)
  data sets

  SCAPE is a follow-up to the highly successful FP6 IP Planets.
SCAPE
                 SCAPE Project Data
• Project instrument: FP7 Integrated Project
• 6. Call
   • Objective ICT-2009.4.1:
     Digital Libraries and Digital Preservation
   • Target outcome (a) Scalable systems and services for
     preserving digital content
• Duration: 42 months
   • February 2011 – July 2014
• Budget: 11.3 Million Euro
   • Funded: 8.6 Million Euro
SCAPE
                          SCAPE Consortium
   Number         Partner name                                Partner short name   Country
1 (coordinator)   AIT Austrian Institute of Technology GmbH          AIT             AT
       2          British Library                                    BL              UK
       3          Internet Memory Foundation                        IMF              NL
       4          Ex Libris Ltd                                      EXL             IL
       5          Fachinformationszentrum Karlsruhe                  FIZ             DE
       6          Koninklijke Bibliotheek                            KB              NL
       7          KEEP Solutions                                   KEEPS             PT
       8          Microsoft Research                                MSR              UK
       9          Österreichische Nationalbibliothek                ONB              AT
      10          Open Planets Foundation                           OPF              UK
      11          Statsbiblioteket Aarhus                            SB              DK
      12          Science and Technology Facilities Council         STFC             UK
      13          Technische Universität Berlin                     TUB              DE
      14          Technische Universität Wien                      TUW               AT
      15          University of Manchester                        UNIMAN             UK
      16          Pierre & Marie Curie Université Paris 6          UPMC              FR
SCAPE
                                SCAPE Project Overview
SCAPE will enhance the state of the art in digital preservation in three ways:
• Infrastructure and tools for scalable preservation actions
• A framework for automated, quality-assured preservation workflows
• Integration of these components with policy-based automated
preservation planning and watch                                             Takeup

                                                                                 Stakeholders
                                                                                 Communities
                                                                                 Dissemination
                                                                               Training Activities
                                                                                 Sustainability
SCAPE results will be validated in three large-scale testbeds:
• Digital Repositories                                                            Testbeds
• Web Content                                                                      Corpora
                                                                                 Integration
• Research Data Sets                                                            Benchmarking
                                                                                  Validation



The SCAPE Consortium brings together                                                                   Cross-project Activities
                                                                                                          Project Management
a broad spectrum of expertise from                                                 Platform
                                                                                                         Technical Coordination
                                                                                                           Research Roadmap

• Memory institutions                                                            Automation
                                                                                 Workflows
• Data centres                                        Planning and Watch        Parallelization          Preservation
                                                                                                         Components
                                                                                Virtualization
• Research labs                                                                                        Quality Assurance
                                                      Institutional Policies                         Scalable Components
• Universities                                          Technical Watch
                                                      Automated Planning
                                                                                                      Automation-ready
                                                                                                             Tools
• Industrial firms

                                                                                                                                  7
SCAPE
              Selected SCAPE Testbed Scenarios
• Characterise large video files
   •   The master MPEG2 files are so large that it is difficult to apply JHOVE and
       insufficient detail is provided. A detailed characterisation of the MPEG2 streams
       is needed in order to identify technical dependencies for extracting from or
       rendering the MPEG2 stream. This would enable preservation risks related to
       current access services to be monitored and action taken as necessary to ensure
       continued access and preservation.

• Carry out large scale migrations
   •   Migrating from one format to another introduces the possibility of damaging the
       content or failing to capture significant properties of the original in the resulting
       destination format.
   •   Specific requirements include:
         • Solution tools that operate reliably at scale (80TB, 2 million pages)
         • Automated QA, ideally with no manual intervention on a file by file basis
         • QA performed by independent process from the migration process                      from digitalbevaring.dk

         • QA demonstrates strong evidence of significant properties being captured
              in the destination format

• Quality assurance in web harvesting
   •   For large scale crawls, automation of the quality control processes is a necessary
       requirement. Currently, this process relies on random sampling and very basic
       quantitative checks.                                                                                              8
SCAPE
                Selected SCAPE Challenges
• Bridging the gap between test workflows and
  scalable workflows
• Applying Map/Reduce to binary data
• Locality of data
    • Bring the data to the computation, or
      bring the computation to the data?
• Repository Integration
    • Repository Consistency
    • Scalable Ingest
• Preservation Planning
    • How to scale?
    • How to automate?
• Research data sets                            from digitalbevaring.dk


    • How to preserve contextual information?
                                                                          9
SCAPE
                    SCAPE Solutions

• SCAPE Platform
  • HADOOP, Stratosphere
  • Virtualized cluster
  • Repository integration
     • HBASE, HDFS - Fedora
  • Three levels of parallelization    from digitalbevaring.dk



     • Distribution of files
     • Splitting binary files
     • Parallelisation of algorithms
  • Mapping Taverna to HADOOP

                                                                 10
SCAPE
                   SCAPE Solutions

• Automated Planning and Watch
  • Building on the Planets PLATO tool
  • Automated watch based on
     • Results Evaluation Framework (REF) database
     • Monitoring trends in web harvests
  • Automated planning based on semantically
    formalized policies
• Automated Quality Assurance
  • QA in web harvesting through automated comparison of
    rendered pages – combined structural and image analysis

                                                              11
SCAPE
                       SCAPE Achievements
• Public Website
    • http://www.scape-project.eu/
• Development Infrastructure
    • Hosted by the Open Planets Foundation and GitHub
    • Development Wiki
        • http://wiki.opf-labs.org/display/SP/Home
• Deliverables
    • First Deliverables available for download
• Publications
    • 13 in the first nine months, including 6 at iPres next week
    • Report: comparative analysis of identification tools
• Platform
    • 10-node, 20 TB experimental cluster hosted by AIT

                                                                       12
SCAPE
           SCAPE Contact Information

• http://www.scape-project.eu/

• office@list.scape-project.eu

• Dr. Ross King
  AIT Austrian Institute of Technology GmbH
  Donau-City-Strasse 1
  A-1220 Wien


                                                 13
SCAPE



Thank you for your attention!




                                   14

Mais conteúdo relacionado

Destaque

Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...SCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonSCAPE Project
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationSCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
SCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationSCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsSCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationEvolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationSCAPE Project
 

Destaque (20)

Historical Development of Photogrammetry
Historical Development of PhotogrammetryHistorical Development of Photogrammetry
Historical Development of Photogrammetry
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
SCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and Deployment
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collections
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Historical Development of Photogrammetry
Historical Development of PhotogrammetryHistorical Development of Photogrammetry
Historical Development of Photogrammetry
 
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationEvolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
 

Semelhante a SCAPE - Building Digital Preservation Infrastructure

SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE Project
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsRafael C. Jimenez
 
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutGemeente Almere
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Spin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalSpin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalcrebusproject
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...MarieThrseCulligan
 
The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...MarieThrseCulligan
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...tulipbiru64
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...EUDAT
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 

Semelhante a SCAPE - Building Digital Preservation Infrastructure (20)

SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
Session 36 - Engage Results
Session 36 - Engage ResultsSession 36 - Engage Results
Session 36 - Engage Results
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handout
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
Spin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalSpin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinal
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...
 
The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 

Mais de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 

Mais de SCAPE Project (18)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

SCAPE - Building Digital Preservation Infrastructure

  • 1. SCAPE SCAPE Building Digital Preservation Infrastructure Dr. Ross King AIT Austrian Institute of Technology GmbH eSciDoc Days Berlin, October 27, 2011
  • 2. SCAPE Digital Preservation • For the first time, the rate of increase of information creation is beginning to exceed the rate of increase in storage capacity. • This massive volume of digital material raises a number of issues: • What is worth preserving? • How to preserve so much? • How to access preserved data? • How to create incentives to preserve? http://arstechnica.com/business/consumerization-of-it/2011/09/information-explosion-how-rapidly-expanding-storage-spurs-innovation.ars 07.11.2011 2
  • 3. SCAPE Digital Preservation • Standards, best-practices, and technologies utilized in order to ensure access to digital information over time • How long? “Digital documents last forever – or five years, whichever comes first.” http://www.clir.org/pubs/reports/rothenberg/introduction.html • Generally we mean decades or centuries 07.11.2011 3
  • 4. SCAPE SCAPE – what is it about? • Planning and managing computing-intensive (digital) preservation processes such as the large-scale ingestion or migration of large (multi-Terabyte) data sets SCAPE is a follow-up to the highly successful FP6 IP Planets.
  • 5. SCAPE SCAPE Project Data • Project instrument: FP7 Integrated Project • 6. Call • Objective ICT-2009.4.1: Digital Libraries and Digital Preservation • Target outcome (a) Scalable systems and services for preserving digital content • Duration: 42 months • February 2011 – July 2014 • Budget: 11.3 Million Euro • Funded: 8.6 Million Euro
  • 6. SCAPE SCAPE Consortium Number Partner name Partner short name Country 1 (coordinator) AIT Austrian Institute of Technology GmbH AIT AT 2 British Library BL UK 3 Internet Memory Foundation IMF NL 4 Ex Libris Ltd EXL IL 5 Fachinformationszentrum Karlsruhe FIZ DE 6 Koninklijke Bibliotheek KB NL 7 KEEP Solutions KEEPS PT 8 Microsoft Research MSR UK 9 Österreichische Nationalbibliothek ONB AT 10 Open Planets Foundation OPF UK 11 Statsbiblioteket Aarhus SB DK 12 Science and Technology Facilities Council STFC UK 13 Technische Universität Berlin TUB DE 14 Technische Universität Wien TUW AT 15 University of Manchester UNIMAN UK 16 Pierre & Marie Curie Université Paris 6 UPMC FR
  • 7. SCAPE SCAPE Project Overview SCAPE will enhance the state of the art in digital preservation in three ways: • Infrastructure and tools for scalable preservation actions • A framework for automated, quality-assured preservation workflows • Integration of these components with policy-based automated preservation planning and watch Takeup Stakeholders Communities Dissemination Training Activities Sustainability SCAPE results will be validated in three large-scale testbeds: • Digital Repositories Testbeds • Web Content Corpora Integration • Research Data Sets Benchmarking Validation The SCAPE Consortium brings together Cross-project Activities Project Management a broad spectrum of expertise from Platform Technical Coordination Research Roadmap • Memory institutions Automation Workflows • Data centres Planning and Watch Parallelization Preservation Components Virtualization • Research labs Quality Assurance Institutional Policies Scalable Components • Universities Technical Watch Automated Planning Automation-ready Tools • Industrial firms 7
  • 8. SCAPE Selected SCAPE Testbed Scenarios • Characterise large video files • The master MPEG2 files are so large that it is difficult to apply JHOVE and insufficient detail is provided. A detailed characterisation of the MPEG2 streams is needed in order to identify technical dependencies for extracting from or rendering the MPEG2 stream. This would enable preservation risks related to current access services to be monitored and action taken as necessary to ensure continued access and preservation. • Carry out large scale migrations • Migrating from one format to another introduces the possibility of damaging the content or failing to capture significant properties of the original in the resulting destination format. • Specific requirements include: • Solution tools that operate reliably at scale (80TB, 2 million pages) • Automated QA, ideally with no manual intervention on a file by file basis • QA performed by independent process from the migration process from digitalbevaring.dk • QA demonstrates strong evidence of significant properties being captured in the destination format • Quality assurance in web harvesting • For large scale crawls, automation of the quality control processes is a necessary requirement. Currently, this process relies on random sampling and very basic quantitative checks. 8
  • 9. SCAPE Selected SCAPE Challenges • Bridging the gap between test workflows and scalable workflows • Applying Map/Reduce to binary data • Locality of data • Bring the data to the computation, or bring the computation to the data? • Repository Integration • Repository Consistency • Scalable Ingest • Preservation Planning • How to scale? • How to automate? • Research data sets from digitalbevaring.dk • How to preserve contextual information? 9
  • 10. SCAPE SCAPE Solutions • SCAPE Platform • HADOOP, Stratosphere • Virtualized cluster • Repository integration • HBASE, HDFS - Fedora • Three levels of parallelization from digitalbevaring.dk • Distribution of files • Splitting binary files • Parallelisation of algorithms • Mapping Taverna to HADOOP 10
  • 11. SCAPE SCAPE Solutions • Automated Planning and Watch • Building on the Planets PLATO tool • Automated watch based on • Results Evaluation Framework (REF) database • Monitoring trends in web harvests • Automated planning based on semantically formalized policies • Automated Quality Assurance • QA in web harvesting through automated comparison of rendered pages – combined structural and image analysis 11
  • 12. SCAPE SCAPE Achievements • Public Website • http://www.scape-project.eu/ • Development Infrastructure • Hosted by the Open Planets Foundation and GitHub • Development Wiki • http://wiki.opf-labs.org/display/SP/Home • Deliverables • First Deliverables available for download • Publications • 13 in the first nine months, including 6 at iPres next week • Report: comparative analysis of identification tools • Platform • 10-node, 20 TB experimental cluster hosted by AIT 12
  • 13. SCAPE SCAPE Contact Information • http://www.scape-project.eu/ • office@list.scape-project.eu • Dr. Ross King AIT Austrian Institute of Technology GmbH Donau-City-Strasse 1 A-1220 Wien 13
  • 14. SCAPE Thank you for your attention! 14