SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Evolving Domains, Problems and Solutions for
       Long Term Digital Preservation

                      Dr. Ross King
         AIT Austrian Institute of Technology GmbH
Co-Authors
•   Orit Edelstein – IBM Research, Haifa
•   Michael Factor – IBM Research, Haifa
•   Thomas Risse – L3S Research Center, Hannover
•   Eliot Salant – IBM Research, Haifa
•   Philip Taylor – SAP Research, Belfast
Outline
• Why these projects?
• Introducing the projects
• Comparing and contrasting the projects
  – Motivation
  – Objectives
  – Approach
• Trends in Digital Preservation
Why these projects?
Timeline of Digital Preservation Projects




from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf


Coordinated Action                  Network of Excellence                   STREP               Collaborative Project



                          FP7 6th Call, Objective ICT-2009.4.1:
                      Digital Libraries and Digital Preservation

                                       5                  07.11.2011
EU Funding for Digital Preservation Projects
            from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf




              FP7                                                             FP6                FP5
            68.4 M€                                                         24.9 M€             0.9 M€



        6          07.11.2011
Introducing the projects
ARCOMEM
•   Transforming Web archives into community memories that are much more
    tightly integrated with their community of current and future users.
•   Developing methods and tools based on novel socially-aware and socially-
    driven Web preservation models.
•   Three dimensions
     –   Social Web analysis: leverage Social Web information, relying on the Wisdom of the
         Crowds for intelligent content appraisal, selection, contextualization and preservation.
     –   Archive enrichment: extract information about entities, events, topics, and opinions.
     –   Intelligent and collaborative content acquisition support for archives


•   Two testbeds
     –   Media-related web archives
         (Sudwestrundfunk, Deutsche Welle)
     –   Political archives
         (Helenic and Austrian Parliaments)
ENSURE
Enabling kNowledge Sustainability, Usability and Recovery for Economic value
• EVALUATE Cost and Value
      •   Ability to compose different quality solutions at different costs
      •   Build a software stack that balances the cost of preservation against the value of the data
•   AUTOMATE Preservation Lifecycle
      •   Control the preservation lifecycle based on
            • the changing value of business data over time
            • changes in regulation
            • advances in underlying technology
•   PROTECT



                                                                     4 3
      •   Content-aware data protection
            • Focus on long term access control, privacy and IPR,
              and de-identification
                                                                                                        Healthcare
•   SCALE using ICT innovations
      •   Investigate economical and scalable solutions             INNOVATIONS      USE CASES             Clinical Studies

          such as cloud storage
                                                                                                        Financial Services
            • include issues of security and data locality
•   Three testbeds
      •   Healthcare
      •   Clinical Trials
      •   Financial Services
SCAPE
SCAlable Preservation Environments
• Making preservation planning and preservation
  workflows scalable
   – Define and test an infrastructure for scalable
     preservation actions
   – Provide a framework for automated quality assurance
     workflows
   – Develop a policy-based preservation planning tool with
     automated preservation watch

• Three testbeds
   – Web archives
   – Large-scale repositories
   – Research data sets

                                                      from digitalbevaring.dk
TIMBUS
Timeless Business Processes and Services
• Exploring scenarios where the important digital information to be preserved is the
   execution context within which data are processed, analysed, transformed and
   rendered.
     –   Although there are significant advantages to SaaS and IoS models, there is the danger of services and
         service providers disappearing (for various reasons), leaving partially complete business processes.
•   Enlarging the understanding of digital preservation to include the set of activities,
    processes and tools that ensure continued access to services and software necessary
    to produce the context within which information can be accessed, properly rendered,
    validated and transformed into context based knowledge.
•   Three testbeds
     – engineering services and systems
       for digital preservation
     – civil engineering infrastructures
     – e-science and mathematical simulations
Comparing and contrasting
      the projects
Motivation
• ACROMEM is unique in dealing with publically available and non-regulated
  data and in harnessing the "wisdom of crowds" to help decide what to
  preserve.
• TIMBUS focuses on the environments that produce the data rather than
  the data itself.
• ENSURE and TIMBUS are motivated in part by accurate risk assessment
  and preservation lifecycle issues related to regulations.
• ENSURE, SCAPE and TIMBUS address the scalability of technology and
  software infrastructure for digital preservation.

• Targeted Stakeholders:
    –   scientific data (SCAPE, ENSURE, TIMBUS)
    –   memory institutions (SCAPE, ACROMEM)
    –   web (SCAPE, ACROMEM)
    –   engineering (TIMBUS)
    –   health care (ENSURE)
    –   finance (ENSURE)
Objectives
• ENSURE, SCAPE, and TIMBUS are focused on organisations (organization-
  focused projects); ARCOMEM is focused on the web
• All project address the question "what is to be preserved"
    –   ARCOMEM: social media can tell us
    –   ENSURE: extract this information from business rules
    –   SCAPE and TIMBUS: provide tools for responsible persons (curators)
    –   TIMBUS driven by risk management, ENSURE by cost/benefit
• ARCOMEM, ENSURE and SCAPE focus on issues of scalability
    – ARCOMEM, SCAPE: computational
    – ENSURE: storage infrastructure
• The organisation-focused projects also consider
    – the automation of the preservation lifecycle
    – the automation of quality assurance for preservation actions
• Both ENSURE and TIMBUS have the goal of re-running software after long
  periods of time
Approach
•   All four projects will produce prototype software frameworks
     –   The organisation-focused projects all propose to implement platforms for the execution of
         preservation workflows
•   SCAPE and ENSURE will make use of service-oriented architectures
     –   SCAPE for prototyping only; SOA model workflows should be translated in to Map/Reduce jobs
•   Digital Lifecycle approach
     –   TIMBUS focuses on the legal and IPR aspects
     –   ENSURE focuses on the trade-offs between quality, cost and economic performance
•   Preservation planning plays a role in all projects
     –   ENSURE plans a configuration layer with special emphasis on cost versus value
     –   The TIMBUS approach is based on dependency and risk management
     –   Both ARCOMEM and SCAPE rely on the internet to guide preservation
           •   ARCOMEM through the monitoring of social media
           •   SCAPE through the monitoring of web harvests

•   Virtualisation plays a role in all organisation-focused projects
     –   ENSURE: as a means to access digital objects
     –   SCAPE: as a means to deploy complex preservation action environments
     –   TIMBUS: as a means to preserve and recover the entire business process
Some trends
in Digital Preservation
Trends in Digital Preservation Projects
2006               2007                    2008                 2009              2010                 2011                2012


                CONTENT-DRIVEN


 Semantic                 Semantic
 Web Services             Web Services +
                          Agents                    EMULATION                                                        Virtualization



 PANIC                                                                                    Workflow
                                                               Linked Open Data
       SEMANTIC WEB
                                                                                     WORKFLOW

                              SOA: Web Services


                             WEB SERVICES
                                                                                                              Security and Trust
                                                                                         Distributed
                                                                                         Storage                      Quality Assurance
                                    GRID

                                                                                         Distributed
                            Distributed                                                  Processing
                            Storage
                                                                                                 CLOUD
                               17                 07.11.2011
Thank you for your attention!
            Ross King – AIT, Vienna
     Orit Edelstein – IBM Research, Haifa
     Michael Factor – IBM Research, Haifa
 Thomas Risse – L3S Research Center, Hannover
      Eliot Salant – IBM Research, Haifa
     Philip Taylor – SAP Research, Belfast

      ARCOMEM:     www.arcomem.eu
      ENSURE:      ensure-fp7.eu
      SCAPE:       www.scape-project.eu
      TIMBUS:      timbusproject.net

Mais conteúdo relacionado

Semelhante a Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutGemeente Almere
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE Project
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
Content Management Lifecycle for ANM
Content Management Lifecycle for ANMContent Management Lifecycle for ANM
Content Management Lifecycle for ANMAzri Jamil
 
Presentation cloud computing foundation technologies and research challenges
Presentation   cloud computing foundation technologies and research challengesPresentation   cloud computing foundation technologies and research challenges
Presentation cloud computing foundation technologies and research challengesxKinAnx
 
S100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aS100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aTony Pearson
 
Broadcast Digital Media Technology Trends
Broadcast Digital Media Technology TrendsBroadcast Digital Media Technology Trends
Broadcast Digital Media Technology TrendsKuppa Srinivas
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataStorage Switzerland
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayBob Sokol
 
Reducing Cost with DNA Automation
Reducing Cost with DNA AutomationReducing Cost with DNA Automation
Reducing Cost with DNA AutomationCisco Canada
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of DataJohn Domingue
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016Chris Evans
 
EMC InfoArchive Overview: Offered by Sigma
EMC InfoArchive Overview: Offered by SigmaEMC InfoArchive Overview: Offered by Sigma
EMC InfoArchive Overview: Offered by SigmaJonathan Simpson
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Cloud Computing basic concept to understand
Cloud Computing basic concept to understandCloud Computing basic concept to understand
Cloud Computing basic concept to understandRahulBhole12
 

Semelhante a Evolving Domains, Problems and Solutions for Long Term Digital Preservation (20)

Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handout
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation Infrastructure
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Content Management Lifecycle for ANM
Content Management Lifecycle for ANMContent Management Lifecycle for ANM
Content Management Lifecycle for ANM
 
Presentation cloud computing foundation technologies and research challenges
Presentation   cloud computing foundation technologies and research challengesPresentation   cloud computing foundation technologies and research challenges
Presentation cloud computing foundation technologies and research challenges
 
S100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aS100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804a
 
Broadcast Digital Media Technology Trends
Broadcast Digital Media Technology TrendsBroadcast Digital Media Technology Trends
Broadcast Digital Media Technology Trends
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps Day
 
Reducing Cost with DNA Automation
Reducing Cost with DNA AutomationReducing Cost with DNA Automation
Reducing Cost with DNA Automation
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
 
EMC InfoArchive Overview: Offered by Sigma
EMC InfoArchive Overview: Offered by SigmaEMC InfoArchive Overview: Offered by Sigma
EMC InfoArchive Overview: Offered by Sigma
 
APM
APMAPM
APM
 
Hogan Kusnadi - Cloud Computing Secutity
Hogan Kusnadi - Cloud Computing SecutityHogan Kusnadi - Cloud Computing Secutity
Hogan Kusnadi - Cloud Computing Secutity
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Cloud Computing basic concept to understand
Cloud Computing basic concept to understandCloud Computing basic concept to understand
Cloud Computing basic concept to understand
 

Mais de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 

Mais de SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Evolving Domains, Problems and Solutions for Long Term Digital Preservation

  • 1. Evolving Domains, Problems and Solutions for Long Term Digital Preservation Dr. Ross King AIT Austrian Institute of Technology GmbH
  • 2. Co-Authors • Orit Edelstein – IBM Research, Haifa • Michael Factor – IBM Research, Haifa • Thomas Risse – L3S Research Center, Hannover • Eliot Salant – IBM Research, Haifa • Philip Taylor – SAP Research, Belfast
  • 3. Outline • Why these projects? • Introducing the projects • Comparing and contrasting the projects – Motivation – Objectives – Approach • Trends in Digital Preservation
  • 5. Timeline of Digital Preservation Projects from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf Coordinated Action Network of Excellence STREP Collaborative Project FP7 6th Call, Objective ICT-2009.4.1: Digital Libraries and Digital Preservation 5 07.11.2011
  • 6. EU Funding for Digital Preservation Projects from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf FP7 FP6 FP5 68.4 M€ 24.9 M€ 0.9 M€ 6 07.11.2011
  • 8. ARCOMEM • Transforming Web archives into community memories that are much more tightly integrated with their community of current and future users. • Developing methods and tools based on novel socially-aware and socially- driven Web preservation models. • Three dimensions – Social Web analysis: leverage Social Web information, relying on the Wisdom of the Crowds for intelligent content appraisal, selection, contextualization and preservation. – Archive enrichment: extract information about entities, events, topics, and opinions. – Intelligent and collaborative content acquisition support for archives • Two testbeds – Media-related web archives (Sudwestrundfunk, Deutsche Welle) – Political archives (Helenic and Austrian Parliaments)
  • 9. ENSURE Enabling kNowledge Sustainability, Usability and Recovery for Economic value • EVALUATE Cost and Value • Ability to compose different quality solutions at different costs • Build a software stack that balances the cost of preservation against the value of the data • AUTOMATE Preservation Lifecycle • Control the preservation lifecycle based on • the changing value of business data over time • changes in regulation • advances in underlying technology • PROTECT 4 3 • Content-aware data protection • Focus on long term access control, privacy and IPR, and de-identification Healthcare • SCALE using ICT innovations • Investigate economical and scalable solutions INNOVATIONS USE CASES Clinical Studies such as cloud storage Financial Services • include issues of security and data locality • Three testbeds • Healthcare • Clinical Trials • Financial Services
  • 10. SCAPE SCAlable Preservation Environments • Making preservation planning and preservation workflows scalable – Define and test an infrastructure for scalable preservation actions – Provide a framework for automated quality assurance workflows – Develop a policy-based preservation planning tool with automated preservation watch • Three testbeds – Web archives – Large-scale repositories – Research data sets from digitalbevaring.dk
  • 11. TIMBUS Timeless Business Processes and Services • Exploring scenarios where the important digital information to be preserved is the execution context within which data are processed, analysed, transformed and rendered. – Although there are significant advantages to SaaS and IoS models, there is the danger of services and service providers disappearing (for various reasons), leaving partially complete business processes. • Enlarging the understanding of digital preservation to include the set of activities, processes and tools that ensure continued access to services and software necessary to produce the context within which information can be accessed, properly rendered, validated and transformed into context based knowledge. • Three testbeds – engineering services and systems for digital preservation – civil engineering infrastructures – e-science and mathematical simulations
  • 13. Motivation • ACROMEM is unique in dealing with publically available and non-regulated data and in harnessing the "wisdom of crowds" to help decide what to preserve. • TIMBUS focuses on the environments that produce the data rather than the data itself. • ENSURE and TIMBUS are motivated in part by accurate risk assessment and preservation lifecycle issues related to regulations. • ENSURE, SCAPE and TIMBUS address the scalability of technology and software infrastructure for digital preservation. • Targeted Stakeholders: – scientific data (SCAPE, ENSURE, TIMBUS) – memory institutions (SCAPE, ACROMEM) – web (SCAPE, ACROMEM) – engineering (TIMBUS) – health care (ENSURE) – finance (ENSURE)
  • 14. Objectives • ENSURE, SCAPE, and TIMBUS are focused on organisations (organization- focused projects); ARCOMEM is focused on the web • All project address the question "what is to be preserved" – ARCOMEM: social media can tell us – ENSURE: extract this information from business rules – SCAPE and TIMBUS: provide tools for responsible persons (curators) – TIMBUS driven by risk management, ENSURE by cost/benefit • ARCOMEM, ENSURE and SCAPE focus on issues of scalability – ARCOMEM, SCAPE: computational – ENSURE: storage infrastructure • The organisation-focused projects also consider – the automation of the preservation lifecycle – the automation of quality assurance for preservation actions • Both ENSURE and TIMBUS have the goal of re-running software after long periods of time
  • 15. Approach • All four projects will produce prototype software frameworks – The organisation-focused projects all propose to implement platforms for the execution of preservation workflows • SCAPE and ENSURE will make use of service-oriented architectures – SCAPE for prototyping only; SOA model workflows should be translated in to Map/Reduce jobs • Digital Lifecycle approach – TIMBUS focuses on the legal and IPR aspects – ENSURE focuses on the trade-offs between quality, cost and economic performance • Preservation planning plays a role in all projects – ENSURE plans a configuration layer with special emphasis on cost versus value – The TIMBUS approach is based on dependency and risk management – Both ARCOMEM and SCAPE rely on the internet to guide preservation • ARCOMEM through the monitoring of social media • SCAPE through the monitoring of web harvests • Virtualisation plays a role in all organisation-focused projects – ENSURE: as a means to access digital objects – SCAPE: as a means to deploy complex preservation action environments – TIMBUS: as a means to preserve and recover the entire business process
  • 16. Some trends in Digital Preservation
  • 17. Trends in Digital Preservation Projects 2006 2007 2008 2009 2010 2011 2012 CONTENT-DRIVEN Semantic Semantic Web Services Web Services + Agents EMULATION Virtualization PANIC Workflow Linked Open Data SEMANTIC WEB WORKFLOW SOA: Web Services WEB SERVICES Security and Trust Distributed Storage Quality Assurance GRID Distributed Distributed Processing Storage CLOUD 17 07.11.2011
  • 18. Thank you for your attention! Ross King – AIT, Vienna Orit Edelstein – IBM Research, Haifa Michael Factor – IBM Research, Haifa Thomas Risse – L3S Research Center, Hannover Eliot Salant – IBM Research, Haifa Philip Taylor – SAP Research, Belfast ARCOMEM: www.arcomem.eu ENSURE: ensure-fp7.eu SCAPE: www.scape-project.eu TIMBUS: timbusproject.net