SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Dr. Ross King
AIT Austrian Institute of Technology GmbH
Preservation at Scale Workshop
Lisbon, September 5, 2013
SCAPE
Tools and Infrastructure for Preservation at Scale
• SCAPE Project
• SCAPE Solutions
• Scalable Planning
• Scalable Tools
• Scalable Computation
• Scalable Repositories
• SCAPE Testbeds
• SCAPE Additional Information
• Online Resources
• Training Events
• Contact Information
2
Outline
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
SCAPE – what is it about?
• Planning and executing computing-intensive digital preservation
processes such as the large-scale ingestion, characterisation or
migration of large (multi-Terabyte) and complex data sets
• SCAPE results include
• Preservation scenarios
• Preservation tools
• Preservation workflows
• Preservation infrastructure
• Preservation best-practices
SCAPE is a follow-up to the highly successful FP6 IP Planets.
3
SCAPE Project Data
• Project instrument: FP7 Collaborative Project
• 6. Call
• Objective ICT-2009.4.1: Digital Libraries and Digital
Preservation
• Target outcome (a) Scalable systems and services for
preserving digital content
• 10. Call
• Objective ICT-2013.11.4: Supplements to Strengthen
Cooperation in ICT R&D in an Enlarged European Union
• Duration: 42 44 months
• February 2011 – July September 2014
• Budget: 11.3 12.0 Million Euro
• Funded: 8.6 9.2 Million Euro
4
SCAPE Consortium
5
SCAPE Solutions
6
• SCOUT: an automated preservation watch system
• Enables planning tool and decision makers to monitor the world and the organisation
• Collects relevant knowledge and enable automated notification
• Open and extensible
• c3po: scalable content profiling
• c3po analyses characterisation data based on fits
• Scale-out MongoDB (100k/min/node)
• Visual drill-down and well-documented profile
• Automated sample selection
• PLATO 4.1: scalable preservation planning
• www.ifs.tuwien.ac.at/dp/plato
• Technology upgrade - refactored, rebuilt, standardised, tested
• New features
• Groups allow collaborative planning
• Integration of control policies for group
• Quality domain – measures
7
Scalable Planning and Watch
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• Tool Wrapper
• Application that adapts existing tools to the SCAPE Platform
• https://github.com/openplanets/scape-toolwrapper
• Enhances wrapped tools
• Standard naming scheme for CC, AS and QA tools
• Standard invocation method (CLI)
• Debian packages for easy deployment on the cluster
• Support for data streaming (useful for Hadoop jobs)
• Generates Preservation Components
• Taverna workflows with embedded metadata for easy discovery
• Automatic publication of components on myExperiment (to support discoverability)
• Standard ports to enable composition of Preservation Components (based on well defined component
profiles, CC, AS & QA)
• Digital Preservation Toolkit
• Software suite that contains a large set of DP tools
• 77 operations in total
• Easy to deploy on Linux machines (via apt-get)
• apt - get i nst al l di gi t al - pr eser vat i on- t ool s
8
Scalable Tools
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• Deployment of environments
• XEN Hypervisor
• Eucalyptus
• Deployment of tools
• Debian Packages
• Tool Spec
• Job Execution Service (JES)
• Apache Oozie
• Apache Hadoop
9
Scalable Computation
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
from digitalbevaring.dk
User‐view on SCAPE development cloud at AIT: Eucalyptus web
interface, Hybridfox browser add‐on, and terminal‐based interaction.
• Fedora 4.0.0
• All REST, no SOAP
• RDF as first class objects
• JCR 2.0 Implementation (ModeShape)
• Infinispan distributed NoSQL datastore
• Lily 2.0
• Built on top of HBase/HDFS
• Integration of computation and storage
10
Scalable Repositories
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
11
SCAPE Architecture
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Plan
Management
API
Digital Object
Repository
Execution
Platform
JES
Hadoop
JES API
Data
Connector API
Automated Watch
Automated Planning
PLATO
Plan
Management
GUI
Digital
Objects/
Metadata
Preservation
Plan Store
Plan
Component
Catalogue
Component
Lookup
API
Taverna
Workbench
Component
Registration
API
Component
Profile
Validator
Automated Watch
Sources
Push
API
Pull
API
Knowledge
Source
Adaptor
Client
Service
Watch Request
API
Notification API
Report
API
Assessment
Data
Publication
Platform
LDS3
APIData
Loader
Application
SCAPE Testbeds
12
SCAPE Testbeds
• Large-scale Digital Repositories
• Carry out large scale image migrations
• The master files from legacy digitized image collections are typically TIFF files that can be costly to store due
to their size. The cost benefit can only be realized if one can remove the original TIFFs and this can only be
done if one can provide evidence of successful migration. (2.2 million pages, 80 TB)
• Detect poor sound quality
• In a collection of mp3 files (20 TB - 360.000 files) we have discovered files with very bad sound quality. Before
ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get
those re-digitized from the original analogue media.
• Research Data Sets
• RAW to NEXUS conversion
• There are file size and volume of content challenges identified for nexus files
the raw to nexus format migration tool can be customised to account for
various other types of experiment data files in the process of the migration.
However, the scalability challenge here is that for different instrument specific
to each facility), the other types of experiment data files vary significantly.
13
from digitalbevaring.dk
See http://wiki.opf-labs.org/display/SP/Scenarios
SCAPE Testbeds
• Web Content
• Quality assurance in web harvesting
• Web crawling is a process that is highly susceptible to errors. Often, essential data is
missed by the crawler and thus not captured and preserved. Currently, quality
assurance requires manual effort and because crawls often contain millions of pages,
manual quality assurance will be neither very efficient
• Data Centers
• Anonymization of medical data
• In order to fulfil the requirements for storing medical data in terms of safety
and security, it will be necessary to develop encryption and anonymization
services that will allow medical data transfer to a data center’s remote storage
facilities. On one hand, the encryption techniques will be used to secure
sensitive personal data (e.g. internal documents, patient databases) which
must only be accessible from authorized services and users. On the other hand,
the anonymization services will enable medical data (like x-ray generator
outputs, x-ray computed tomography outputs, surgery recordings) being stored
in the data center without having sensitive data attached.
14
from digitalbevaring.dk
SCAPE Additional Information
15
Additional Resources of Interest
• Development Infrastructure
• Code repository hosted by the Open Planets Foundation and GitHub
• https://github.com/openplanets/scape/
• Development Wiki
• http://wiki.opf-labs.org/display/SP/Home
• Experimental Workflows
• http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search
• Publications
• http://www.scape-project.eu/category/publication
• Public Deliverables
• http://www.scape-project.eu/category/deliverable
• Tools
• http://www.scape-project.eu/tools
16
SCAPE Training Events
• Future Formats First:
Application Infrastructures for Action Services
• 16-17 September 2013, London
• Registration: http://scape-future-formats-first.eventbrite.co.uk/
• Critical Path: Effective Evidence Based Preservation Planning
• 13 November 2013, Aarhus
• Hadoop-driven Digital Preservation (Hackathon)
• 2-4 December 2013, Vienna
17
See http://www.scape-project.eu/events
SCAPE Contact Information
• http://www.scape-project.eu/
• Twitter: #scapeproject
• office@list.scape-project.eu
• Dr. Ross King
AIT Austrian Institute of Technology GmbH
Donau-City-Strasse 1
A-1220 Wien
18
Thank you for your attention!
Questions?
19

Mais conteúdo relacionado

Mais procurados

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache AccumuloSqrrl
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real WorldDataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 

Mais procurados (20)

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real World
 
KNIME tutorial
KNIME tutorialKNIME tutorial
KNIME tutorial
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 

Destaque

Gp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosGp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosLuara Schamó
 
Presentación groupstowork
Presentación groupstoworkPresentación groupstowork
Presentación groupstoworkJose Artiach
 
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBerliner Informationsdienst
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System
 
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...EAE Business School
 
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014MICProductivity
 
Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Zuze Benaviddes Salas
 
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerThe Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerInfoArmy
 
Grafton Recruitment Eng
Grafton Recruitment   EngGrafton Recruitment   Eng
Grafton Recruitment EngPSGrafton
 
El mejor empleo del mundo
El mejor empleo del mundoEl mejor empleo del mundo
El mejor empleo del mundonadiairacheta
 
Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres Acosta
 
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, NetsprintKongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprintecommerce poland expo
 
Presentación Grupo 2
Presentación Grupo 2 Presentación Grupo 2
Presentación Grupo 2 Andrea Badilla
 
FY 2010 Annual Report-Tobacco Prevention and Control Program
FY 2010 Annual Report-Tobacco Prevention and Control  Program FY 2010 Annual Report-Tobacco Prevention and Control  Program
FY 2010 Annual Report-Tobacco Prevention and Control Program State of Utah, Salt Lake City
 
Dn nfor mobile_download_en
Dn nfor mobile_download_enDn nfor mobile_download_en
Dn nfor mobile_download_enmbeatrizoliveira
 

Destaque (20)

Gp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgosGp cibercultura taciana de lima burgos
Gp cibercultura taciana de lima burgos
 
Presentación groupstowork
Presentación groupstoworkPresentación groupstowork
Presentación groupstowork
 
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKSBID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
BID.workshop Sicherheitspolitik für Parlamentsmitarbeiter - Präsentation BAKS
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)
 
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
Javier del Villar, antiguo Alumno EAE, nuevo Director Comercial de Sambil Out...
 
Aa 125 gp-results-2010
Aa 125 gp-results-2010Aa 125 gp-results-2010
Aa 125 gp-results-2010
 
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
Programa Microsoft Aceleración de Startups de Base Tecnológica 2014
 
Manual del usuario
Manual del usuarioManual del usuario
Manual del usuario
 
Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.Comunicación humana por medio de herramientas.
Comunicación humana por medio de herramientas.
 
Contaminación emitida por los barcos
Contaminación emitida por los barcosContaminación emitida por los barcos
Contaminación emitida por los barcos
 
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim FowlerThe Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
The Jigsaw Story - Data 2.0 2012 Keynote by Jim Fowler
 
Grafton Recruitment Eng
Grafton Recruitment   EngGrafton Recruitment   Eng
Grafton Recruitment Eng
 
El mejor empleo del mundo
El mejor empleo del mundoEl mejor empleo del mundo
El mejor empleo del mundo
 
Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2Andres acosta riesgos_internet_actividad3.2
Andres acosta riesgos_internet_actividad3.2
 
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, NetsprintKongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
Kongres Mobilny: Łukasz Ciechanek, Przemysław Jurgiel-Żyła, Netsprint
 
Presentación Grupo 2
Presentación Grupo 2 Presentación Grupo 2
Presentación Grupo 2
 
Curso de espanhol em português
Curso de espanhol em portuguêsCurso de espanhol em português
Curso de espanhol em português
 
TLCAN
TLCANTLCAN
TLCAN
 
FY 2010 Annual Report-Tobacco Prevention and Control Program
FY 2010 Annual Report-Tobacco Prevention and Control  Program FY 2010 Annual Report-Tobacco Prevention and Control  Program
FY 2010 Annual Report-Tobacco Prevention and Control Program
 
Dn nfor mobile_download_en
Dn nfor mobile_download_enDn nfor mobile_download_en
Dn nfor mobile_download_en
 

Semelhante a SCAPE - Scalable Preservation Environments

Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutGemeente Almere
 
SCAPE general presentation
SCAPE general presentationSCAPE general presentation
SCAPE general presentationSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDeltares
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakRam Kishor Tak
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDVFileCatalyst
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platformsterradue
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 

Semelhante a SCAPE - Scalable Preservation Environments (20)

Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handout
 
SCAPE general presentation
SCAPE general presentationSCAPE general presentation
SCAPE general presentation
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTak
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDV
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platforms
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation Infrastructure
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 

Mais de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 

Mais de SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

SCAPE - Scalable Preservation Environments

  • 1. Dr. Ross King AIT Austrian Institute of Technology GmbH Preservation at Scale Workshop Lisbon, September 5, 2013 SCAPE Tools and Infrastructure for Preservation at Scale
  • 2. • SCAPE Project • SCAPE Solutions • Scalable Planning • Scalable Tools • Scalable Computation • Scalable Repositories • SCAPE Testbeds • SCAPE Additional Information • Online Resources • Training Events • Contact Information 2 Outline This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 3. SCAPE – what is it about? • Planning and executing computing-intensive digital preservation processes such as the large-scale ingestion, characterisation or migration of large (multi-Terabyte) and complex data sets • SCAPE results include • Preservation scenarios • Preservation tools • Preservation workflows • Preservation infrastructure • Preservation best-practices SCAPE is a follow-up to the highly successful FP6 IP Planets. 3
  • 4. SCAPE Project Data • Project instrument: FP7 Collaborative Project • 6. Call • Objective ICT-2009.4.1: Digital Libraries and Digital Preservation • Target outcome (a) Scalable systems and services for preserving digital content • 10. Call • Objective ICT-2013.11.4: Supplements to Strengthen Cooperation in ICT R&D in an Enlarged European Union • Duration: 42 44 months • February 2011 – July September 2014 • Budget: 11.3 12.0 Million Euro • Funded: 8.6 9.2 Million Euro 4
  • 7. • SCOUT: an automated preservation watch system • Enables planning tool and decision makers to monitor the world and the organisation • Collects relevant knowledge and enable automated notification • Open and extensible • c3po: scalable content profiling • c3po analyses characterisation data based on fits • Scale-out MongoDB (100k/min/node) • Visual drill-down and well-documented profile • Automated sample selection • PLATO 4.1: scalable preservation planning • www.ifs.tuwien.ac.at/dp/plato • Technology upgrade - refactored, rebuilt, standardised, tested • New features • Groups allow collaborative planning • Integration of control policies for group • Quality domain – measures 7 Scalable Planning and Watch This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 8. • Tool Wrapper • Application that adapts existing tools to the SCAPE Platform • https://github.com/openplanets/scape-toolwrapper • Enhances wrapped tools • Standard naming scheme for CC, AS and QA tools • Standard invocation method (CLI) • Debian packages for easy deployment on the cluster • Support for data streaming (useful for Hadoop jobs) • Generates Preservation Components • Taverna workflows with embedded metadata for easy discovery • Automatic publication of components on myExperiment (to support discoverability) • Standard ports to enable composition of Preservation Components (based on well defined component profiles, CC, AS & QA) • Digital Preservation Toolkit • Software suite that contains a large set of DP tools • 77 operations in total • Easy to deploy on Linux machines (via apt-get) • apt - get i nst al l di gi t al - pr eser vat i on- t ool s 8 Scalable Tools This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 9. • Deployment of environments • XEN Hypervisor • Eucalyptus • Deployment of tools • Debian Packages • Tool Spec • Job Execution Service (JES) • Apache Oozie • Apache Hadoop 9 Scalable Computation This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). from digitalbevaring.dk User‐view on SCAPE development cloud at AIT: Eucalyptus web interface, Hybridfox browser add‐on, and terminal‐based interaction.
  • 10. • Fedora 4.0.0 • All REST, no SOAP • RDF as first class objects • JCR 2.0 Implementation (ModeShape) • Infinispan distributed NoSQL datastore • Lily 2.0 • Built on top of HBase/HDFS • Integration of computation and storage 10 Scalable Repositories This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 11. 11 SCAPE Architecture This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). Plan Management API Digital Object Repository Execution Platform JES Hadoop JES API Data Connector API Automated Watch Automated Planning PLATO Plan Management GUI Digital Objects/ Metadata Preservation Plan Store Plan Component Catalogue Component Lookup API Taverna Workbench Component Registration API Component Profile Validator Automated Watch Sources Push API Pull API Knowledge Source Adaptor Client Service Watch Request API Notification API Report API Assessment Data Publication Platform LDS3 APIData Loader Application
  • 13. SCAPE Testbeds • Large-scale Digital Repositories • Carry out large scale image migrations • The master files from legacy digitized image collections are typically TIFF files that can be costly to store due to their size. The cost benefit can only be realized if one can remove the original TIFFs and this can only be done if one can provide evidence of successful migration. (2.2 million pages, 80 TB) • Detect poor sound quality • In a collection of mp3 files (20 TB - 360.000 files) we have discovered files with very bad sound quality. Before ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get those re-digitized from the original analogue media. • Research Data Sets • RAW to NEXUS conversion • There are file size and volume of content challenges identified for nexus files the raw to nexus format migration tool can be customised to account for various other types of experiment data files in the process of the migration. However, the scalability challenge here is that for different instrument specific to each facility), the other types of experiment data files vary significantly. 13 from digitalbevaring.dk See http://wiki.opf-labs.org/display/SP/Scenarios
  • 14. SCAPE Testbeds • Web Content • Quality assurance in web harvesting • Web crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler and thus not captured and preserved. Currently, quality assurance requires manual effort and because crawls often contain millions of pages, manual quality assurance will be neither very efficient • Data Centers • Anonymization of medical data • In order to fulfil the requirements for storing medical data in terms of safety and security, it will be necessary to develop encryption and anonymization services that will allow medical data transfer to a data center’s remote storage facilities. On one hand, the encryption techniques will be used to secure sensitive personal data (e.g. internal documents, patient databases) which must only be accessible from authorized services and users. On the other hand, the anonymization services will enable medical data (like x-ray generator outputs, x-ray computed tomography outputs, surgery recordings) being stored in the data center without having sensitive data attached. 14 from digitalbevaring.dk
  • 16. Additional Resources of Interest • Development Infrastructure • Code repository hosted by the Open Planets Foundation and GitHub • https://github.com/openplanets/scape/ • Development Wiki • http://wiki.opf-labs.org/display/SP/Home • Experimental Workflows • http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search • Publications • http://www.scape-project.eu/category/publication • Public Deliverables • http://www.scape-project.eu/category/deliverable • Tools • http://www.scape-project.eu/tools 16
  • 17. SCAPE Training Events • Future Formats First: Application Infrastructures for Action Services • 16-17 September 2013, London • Registration: http://scape-future-formats-first.eventbrite.co.uk/ • Critical Path: Effective Evidence Based Preservation Planning • 13 November 2013, Aarhus • Hadoop-driven Digital Preservation (Hackathon) • 2-4 December 2013, Vienna 17 See http://www.scape-project.eu/events
  • 18. SCAPE Contact Information • http://www.scape-project.eu/ • Twitter: #scapeproject • office@list.scape-project.eu • Dr. Ross King AIT Austrian Institute of Technology GmbH Donau-City-Strasse 1 A-1220 Wien 18
  • 19. Thank you for your attention! Questions? 19