SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Dr. Ross King
AIT Austrian Institute of Technology GmbH
Preservation at Scale Workshop
Lisbon, September 5, 2013
SCAPE
Tools and Infrastructure for Preservation at Scale
• SCAPE Project
• SCAPE Solutions
• Scalable Planning
• Scalable Tools
• Scalable Computation
• Scalable Repositories
• SCAPE Testbeds
• SCAPE Additional Information
• Online Resources
• Training Events
• Contact Information
2
Outline
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
SCAPE – what is it about?
• Planning and executing computing-intensive digital preservation
processes such as the large-scale ingestion, characterisation or
migration of large (multi-Terabyte) and complex data sets
• SCAPE results include
• Preservation scenarios
• Preservation tools
• Preservation workflows
• Preservation infrastructure
• Preservation best-practices
SCAPE is a follow-up to the highly successful FP6 IP Planets.
3
SCAPE Project Data
• Project instrument: FP7 Collaborative Project
• 6. Call
• Objective ICT-2009.4.1: Digital Libraries and Digital
Preservation
• Target outcome (a) Scalable systems and services for
preserving digital content
• 10. Call
• Objective ICT-2013.11.4: Supplements to Strengthen
Cooperation in ICT R&D in an Enlarged European Union
• Duration: 42 44 months
• February 2011 – July September 2014
• Budget: 11.3 12.0 Million Euro
• Funded: 8.6 9.2 Million Euro
4
SCAPE Consortium
5
SCAPE Solutions
6
• SCOUT: an automated preservation watch system
• Enables planning tool and decision makers to monitor the world and the organisation
• Collects relevant knowledge and enable automated notification
• Open and extensible
• c3po: scalable content profiling
• c3po analyses characterisation data based on fits
• Scale-out MongoDB (100k/min/node)
• Visual drill-down and well-documented profile
• Automated sample selection
• PLATO 4.1: scalable preservation planning
• www.ifs.tuwien.ac.at/dp/plato
• Technology upgrade - refactored, rebuilt, standardised, tested
• New features
• Groups allow collaborative planning
• Integration of control policies for group
• Quality domain – measures
7
Scalable Planning and Watch
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• Tool Wrapper
• Application that adapts existing tools to the SCAPE Platform
• https://github.com/openplanets/scape-toolwrapper
• Enhances wrapped tools
• Standard naming scheme for CC, AS and QA tools
• Standard invocation method (CLI)
• Debian packages for easy deployment on the cluster
• Support for data streaming (useful for Hadoop jobs)
• Generates Preservation Components
• Taverna workflows with embedded metadata for easy discovery
• Automatic publication of components on myExperiment (to support discoverability)
• Standard ports to enable composition of Preservation Components (based on well defined component
profiles, CC, AS & QA)
• Digital Preservation Toolkit
• Software suite that contains a large set of DP tools
• 77 operations in total
• Easy to deploy on Linux machines (via apt-get)
• apt - get i nst al l di gi t al - pr eser vat i on- t ool s
8
Scalable Tools
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• Deployment of environments
• XEN Hypervisor
• Eucalyptus
• Deployment of tools
• Debian Packages
• Tool Spec
• Job Execution Service (JES)
• Apache Oozie
• Apache Hadoop
9
Scalable Computation
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
from digitalbevaring.dk
User‐view on SCAPE development cloud at AIT: Eucalyptus web
interface, Hybridfox browser add‐on, and terminal‐based interaction.
• Fedora 4.0.0
• All REST, no SOAP
• RDF as first class objects
• JCR 2.0 Implementation (ModeShape)
• Infinispan distributed NoSQL datastore
• Lily 2.0
• Built on top of HBase/HDFS
• Integration of computation and storage
10
Scalable Repositories
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
11
SCAPE Architecture
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Plan
Management
API
Digital Object
Repository
Execution
Platform
JES
Hadoop
JES API
Data
Connector API
Automated Watch
Automated Planning
PLATO
Plan
Management
GUI
Digital
Objects/
Metadata
Preservation
Plan Store
Plan
Component
Catalogue
Component
Lookup
API
Taverna
Workbench
Component
Registration
API
Component
Profile
Validator
Automated Watch
Sources
Push
API
Pull
API
Knowledge
Source
Adaptor
Client
Service
Watch Request
API
Notification API
Report
API
Assessment
Data
Publication
Platform
LDS3
APIData
Loader
Application
SCAPE Testbeds
12
SCAPE Testbeds
• Large-scale Digital Repositories
• Carry out large scale image migrations
• The master files from legacy digitized image collections are typically TIFF files that can be costly to store due
to their size. The cost benefit can only be realized if one can remove the original TIFFs and this can only be
done if one can provide evidence of successful migration. (2.2 million pages, 80 TB)
• Detect poor sound quality
• In a collection of mp3 files (20 TB - 360.000 files) we have discovered files with very bad sound quality. Before
ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get
those re-digitized from the original analogue media.
• Research Data Sets
• RAW to NEXUS conversion
• There are file size and volume of content challenges identified for nexus files
the raw to nexus format migration tool can be customised to account for
various other types of experiment data files in the process of the migration.
However, the scalability challenge here is that for different instrument specific
to each facility), the other types of experiment data files vary significantly.
13
from digitalbevaring.dk
See http://wiki.opf-labs.org/display/SP/Scenarios
SCAPE Testbeds
• Web Content
• Quality assurance in web harvesting
• Web crawling is a process that is highly susceptible to errors. Often, essential data is
missed by the crawler and thus not captured and preserved. Currently, quality
assurance requires manual effort and because crawls often contain millions of pages,
manual quality assurance will be neither very efficient
• Data Centers
• Anonymization of medical data
• In order to fulfil the requirements for storing medical data in terms of safety
and security, it will be necessary to develop encryption and anonymization
services that will allow medical data transfer to a data center’s remote storage
facilities. On one hand, the encryption techniques will be used to secure
sensitive personal data (e.g. internal documents, patient databases) which
must only be accessible from authorized services and users. On the other hand,
the anonymization services will enable medical data (like x-ray generator
outputs, x-ray computed tomography outputs, surgery recordings) being stored
in the data center without having sensitive data attached.
14
from digitalbevaring.dk
SCAPE Additional Information
15
Additional Resources of Interest
• Development Infrastructure
• Code repository hosted by the Open Planets Foundation and GitHub
• https://github.com/openplanets/scape/
• Development Wiki
• http://wiki.opf-labs.org/display/SP/Home
• Experimental Workflows
• http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search
• Publications
• http://www.scape-project.eu/category/publication
• Public Deliverables
• http://www.scape-project.eu/category/deliverable
• Tools
• http://www.scape-project.eu/tools
16
SCAPE Training Events
• Future Formats First:
Application Infrastructures for Action Services
• 16-17 September 2013, London
• Registration: http://scape-future-formats-first.eventbrite.co.uk/
• Critical Path: Effective Evidence Based Preservation Planning
• 13 November 2013, Aarhus
• Hadoop-driven Digital Preservation (Hackathon)
• 2-4 December 2013, Vienna
17
See http://www.scape-project.eu/events
SCAPE Contact Information
• http://www.scape-project.eu/
• Twitter: #scapeproject
• office@list.scape-project.eu
• Dr. Ross King
AIT Austrian Institute of Technology GmbH
Donau-City-Strasse 1
A-1220 Wien
18
Thank you for your attention!
Questions?
19

Mais conteúdo relacionado

Mais procurados

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache AccumuloSqrrl
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real WorldDataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 

Mais procurados (20)

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real World
 
KNIME tutorial
KNIME tutorialKNIME tutorial
KNIME tutorial
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 

Destaque

Gp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosGp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosLuara Schamó
 
Presentación groupstowork
Presentación groupstoworkPresentación groupstowork
Presentación groupstoworkJose Artiach
 
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBerliner Informationsdienst
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System
 
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...EAE Business School
 
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014MICProductivity
 
Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Zuze Benaviddes Salas
 
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerThe Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerInfoArmy
 
Grafton Recruitment Eng
Grafton Recruitment   EngGrafton Recruitment   Eng
Grafton Recruitment EngPSGrafton
 
El mejor empleo del mundo
El mejor empleo del mundoEl mejor empleo del mundo
El mejor empleo del mundonadiairacheta
 
Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres Acosta
 
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, NetsprintKongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprintecommerce poland expo
 
Presentación Grupo 2
Presentación Grupo 2 Presentación Grupo 2
Presentación Grupo 2 Andrea Badilla
 
FY 2010 Annual Report-Tobacco Prevention and Control Program
FY 2010 Annual Report-Tobacco Prevention and Control  Program FY 2010 Annual Report-Tobacco Prevention and Control  Program
FY 2010 Annual Report-Tobacco Prevention and Control Program State of Utah, Salt Lake City
 
Dn nfor mobile_download_en
Dn nfor mobile_download_enDn nfor mobile_download_en
Dn nfor mobile_download_enmbeatrizoliveira
 

Destaque (20)

Gp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosGp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgos
 
Presentación groupstowork
Presentación groupstoworkPresentación groupstowork
Presentación groupstowork
 
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)
 
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
 
Aa 125 gp-results-2010
Aa 125 gp-results-2010Aa 125 gp-results-2010
Aa 125 gp-results-2010
 
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
 
Manual del usuario
Manual del usuarioManual del usuario
Manual del usuario
 
Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.
 
Contaminación emitida por los barcos
Contaminación emitida por los barcosContaminación emitida por los barcos
Contaminación emitida por los barcos
 
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerThe Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
 
Grafton Recruitment Eng
Grafton Recruitment   EngGrafton Recruitment   Eng
Grafton Recruitment Eng
 
El mejor empleo del mundo
El mejor empleo del mundoEl mejor empleo del mundo
El mejor empleo del mundo
 
Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2
 
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, NetsprintKongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
 
Presentación Grupo 2
Presentación Grupo 2 Presentación Grupo 2
Presentación Grupo 2
 
Curso de espanhol em português
Curso de espanhol em portuguêsCurso de espanhol em português
Curso de espanhol em português
 
TLCAN
TLCANTLCAN
TLCAN
 
FY 2010 Annual Report-Tobacco Prevention and Control Program
FY 2010 Annual Report-Tobacco Prevention and Control  Program FY 2010 Annual Report-Tobacco Prevention and Control  Program
FY 2010 Annual Report-Tobacco Prevention and Control Program
 
Dn nfor mobile_download_en
Dn nfor mobile_download_enDn nfor mobile_download_en
Dn nfor mobile_download_en
 

Semelhante a SCAPE - Scalable Preservation Environments

Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutGemeente Almere
 
SCAPE general presentation
SCAPE general presentationSCAPE general presentation
SCAPE general presentationSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDeltares
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakRam Kishor Tak
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDVFileCatalyst
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platformsterradue
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 

Semelhante a SCAPE - Scalable Preservation Environments (20)

Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handout
 
SCAPE general presentation
SCAPE general presentationSCAPE general presentation
SCAPE general presentation
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTak
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDV
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platforms
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation Infrastructure
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 

Mais de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 

Mais de SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

SCAPE - Scalable Preservation Environments

  • 1. Dr. Ross King AIT Austrian Institute of Technology GmbH Preservation at Scale Workshop Lisbon, September 5, 2013 SCAPE Tools and Infrastructure for Preservation at Scale
  • 2. • SCAPE Project • SCAPE Solutions • Scalable Planning • Scalable Tools • Scalable Computation • Scalable Repositories • SCAPE Testbeds • SCAPE Additional Information • Online Resources • Training Events • Contact Information 2 Outline This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 3. SCAPE – what is it about? • Planning and executing computing-intensive digital preservation processes such as the large-scale ingestion, characterisation or migration of large (multi-Terabyte) and complex data sets • SCAPE results include • Preservation scenarios • Preservation tools • Preservation workflows • Preservation infrastructure • Preservation best-practices SCAPE is a follow-up to the highly successful FP6 IP Planets. 3
  • 4. SCAPE Project Data • Project instrument: FP7 Collaborative Project • 6. Call • Objective ICT-2009.4.1: Digital Libraries and Digital Preservation • Target outcome (a) Scalable systems and services for preserving digital content • 10. Call • Objective ICT-2013.11.4: Supplements to Strengthen Cooperation in ICT R&D in an Enlarged European Union • Duration: 42 44 months • February 2011 – July September 2014 • Budget: 11.3 12.0 Million Euro • Funded: 8.6 9.2 Million Euro 4
  • 7. • SCOUT: an automated preservation watch system • Enables planning tool and decision makers to monitor the world and the organisation • Collects relevant knowledge and enable automated notification • Open and extensible • c3po: scalable content profiling • c3po analyses characterisation data based on fits • Scale-out MongoDB (100k/min/node) • Visual drill-down and well-documented profile • Automated sample selection • PLATO 4.1: scalable preservation planning • www.ifs.tuwien.ac.at/dp/plato • Technology upgrade - refactored, rebuilt, standardised, tested • New features • Groups allow collaborative planning • Integration of control policies for group • Quality domain – measures 7 Scalable Planning and Watch This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 8. • Tool Wrapper • Application that adapts existing tools to the SCAPE Platform • https://github.com/openplanets/scape-toolwrapper • Enhances wrapped tools • Standard naming scheme for CC, AS and QA tools • Standard invocation method (CLI) • Debian packages for easy deployment on the cluster • Support for data streaming (useful for Hadoop jobs) • Generates Preservation Components • Taverna workflows with embedded metadata for easy discovery • Automatic publication of components on myExperiment (to support discoverability) • Standard ports to enable composition of Preservation Components (based on well defined component profiles, CC, AS & QA) • Digital Preservation Toolkit • Software suite that contains a large set of DP tools • 77 operations in total • Easy to deploy on Linux machines (via apt-get) • apt - get i nst al l di gi t al - pr eser vat i on- t ool s 8 Scalable Tools This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 9. • Deployment of environments • XEN Hypervisor • Eucalyptus • Deployment of tools • Debian Packages • Tool Spec • Job Execution Service (JES) • Apache Oozie • Apache Hadoop 9 Scalable Computation This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). from digitalbevaring.dk User‐view on SCAPE development cloud at AIT: Eucalyptus web interface, Hybridfox browser add‐on, and terminal‐based interaction.
  • 10. • Fedora 4.0.0 • All REST, no SOAP • RDF as first class objects • JCR 2.0 Implementation (ModeShape) • Infinispan distributed NoSQL datastore • Lily 2.0 • Built on top of HBase/HDFS • Integration of computation and storage 10 Scalable Repositories This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 11. 11 SCAPE Architecture This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). Plan Management API Digital Object Repository Execution Platform JES Hadoop JES API Data Connector API Automated Watch Automated Planning PLATO Plan Management GUI Digital Objects/ Metadata Preservation Plan Store Plan Component Catalogue Component Lookup API Taverna Workbench Component Registration API Component Profile Validator Automated Watch Sources Push API Pull API Knowledge Source Adaptor Client Service Watch Request API Notification API Report API Assessment Data Publication Platform LDS3 APIData Loader Application
  • 13. SCAPE Testbeds • Large-scale Digital Repositories • Carry out large scale image migrations • The master files from legacy digitized image collections are typically TIFF files that can be costly to store due to their size. The cost benefit can only be realized if one can remove the original TIFFs and this can only be done if one can provide evidence of successful migration. (2.2 million pages, 80 TB) • Detect poor sound quality • In a collection of mp3 files (20 TB - 360.000 files) we have discovered files with very bad sound quality. Before ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get those re-digitized from the original analogue media. • Research Data Sets • RAW to NEXUS conversion • There are file size and volume of content challenges identified for nexus files the raw to nexus format migration tool can be customised to account for various other types of experiment data files in the process of the migration. However, the scalability challenge here is that for different instrument specific to each facility), the other types of experiment data files vary significantly. 13 from digitalbevaring.dk See http://wiki.opf-labs.org/display/SP/Scenarios
  • 14. SCAPE Testbeds • Web Content • Quality assurance in web harvesting • Web crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler and thus not captured and preserved. Currently, quality assurance requires manual effort and because crawls often contain millions of pages, manual quality assurance will be neither very efficient • Data Centers • Anonymization of medical data • In order to fulfil the requirements for storing medical data in terms of safety and security, it will be necessary to develop encryption and anonymization services that will allow medical data transfer to a data center’s remote storage facilities. On one hand, the encryption techniques will be used to secure sensitive personal data (e.g. internal documents, patient databases) which must only be accessible from authorized services and users. On the other hand, the anonymization services will enable medical data (like x-ray generator outputs, x-ray computed tomography outputs, surgery recordings) being stored in the data center without having sensitive data attached. 14 from digitalbevaring.dk
  • 16. Additional Resources of Interest • Development Infrastructure • Code repository hosted by the Open Planets Foundation and GitHub • https://github.com/openplanets/scape/ • Development Wiki • http://wiki.opf-labs.org/display/SP/Home • Experimental Workflows • http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search • Publications • http://www.scape-project.eu/category/publication • Public Deliverables • http://www.scape-project.eu/category/deliverable • Tools • http://www.scape-project.eu/tools 16
  • 17. SCAPE Training Events • Future Formats First: Application Infrastructures for Action Services • 16-17 September 2013, London • Registration: http://scape-future-formats-first.eventbrite.co.uk/ • Critical Path: Effective Evidence Based Preservation Planning • 13 November 2013, Aarhus • Hadoop-driven Digital Preservation (Hackathon) • 2-4 December 2013, Vienna 17 See http://www.scape-project.eu/events
  • 18. SCAPE Contact Information • http://www.scape-project.eu/ • Twitter: #scapeproject • office@list.scape-project.eu • Dr. Ross King AIT Austrian Institute of Technology GmbH Donau-City-Strasse 1 A-1220 Wien 18
  • 19. Thank you for your attention! Questions? 19