Overview of FP7 projects, including ARCOMEM, ENSURE, SCAPE and TIMBUS. Presentation by Dr. Ross King, AIT Austrian Institute of Technology GmbH, at iPres 2011, Singapore. In Proceedings of the 8th International Conference on Preservation of Digital Objects (iPRES 2011), 2011, 194-204 ISBN 978-981-07-0441-4
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
1. Evolving Domains, Problems and Solutions for
Long Term Digital Preservation
Dr. Ross King
AIT Austrian Institute of Technology GmbH
2. Co-Authors
• Orit Edelstein – IBM Research, Haifa
• Michael Factor – IBM Research, Haifa
• Thomas Risse – L3S Research Center, Hannover
• Eliot Salant – IBM Research, Haifa
• Philip Taylor – SAP Research, Belfast
3. Outline
• Why these projects?
• Introducing the projects
• Comparing and contrasting the projects
– Motivation
– Objectives
– Approach
• Trends in Digital Preservation
5. Timeline of Digital Preservation Projects
from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf
Coordinated Action Network of Excellence STREP Collaborative Project
FP7 6th Call, Objective ICT-2009.4.1:
Digital Libraries and Digital Preservation
5 07.11.2011
6. EU Funding for Digital Preservation Projects
from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf
FP7 FP6 FP5
68.4 M€ 24.9 M€ 0.9 M€
6 07.11.2011
8. ARCOMEM
• Transforming Web archives into community memories that are much more
tightly integrated with their community of current and future users.
• Developing methods and tools based on novel socially-aware and socially-
driven Web preservation models.
• Three dimensions
– Social Web analysis: leverage Social Web information, relying on the Wisdom of the
Crowds for intelligent content appraisal, selection, contextualization and preservation.
– Archive enrichment: extract information about entities, events, topics, and opinions.
– Intelligent and collaborative content acquisition support for archives
• Two testbeds
– Media-related web archives
(Sudwestrundfunk, Deutsche Welle)
– Political archives
(Helenic and Austrian Parliaments)
9. ENSURE
Enabling kNowledge Sustainability, Usability and Recovery for Economic value
• EVALUATE Cost and Value
• Ability to compose different quality solutions at different costs
• Build a software stack that balances the cost of preservation against the value of the data
• AUTOMATE Preservation Lifecycle
• Control the preservation lifecycle based on
• the changing value of business data over time
• changes in regulation
• advances in underlying technology
• PROTECT
4 3
• Content-aware data protection
• Focus on long term access control, privacy and IPR,
and de-identification
Healthcare
• SCALE using ICT innovations
• Investigate economical and scalable solutions INNOVATIONS USE CASES Clinical Studies
such as cloud storage
Financial Services
• include issues of security and data locality
• Three testbeds
• Healthcare
• Clinical Trials
• Financial Services
10. SCAPE
SCAlable Preservation Environments
• Making preservation planning and preservation
workflows scalable
– Define and test an infrastructure for scalable
preservation actions
– Provide a framework for automated quality assurance
workflows
– Develop a policy-based preservation planning tool with
automated preservation watch
• Three testbeds
– Web archives
– Large-scale repositories
– Research data sets
from digitalbevaring.dk
11. TIMBUS
Timeless Business Processes and Services
• Exploring scenarios where the important digital information to be preserved is the
execution context within which data are processed, analysed, transformed and
rendered.
– Although there are significant advantages to SaaS and IoS models, there is the danger of services and
service providers disappearing (for various reasons), leaving partially complete business processes.
• Enlarging the understanding of digital preservation to include the set of activities,
processes and tools that ensure continued access to services and software necessary
to produce the context within which information can be accessed, properly rendered,
validated and transformed into context based knowledge.
• Three testbeds
– engineering services and systems
for digital preservation
– civil engineering infrastructures
– e-science and mathematical simulations
13. Motivation
• ACROMEM is unique in dealing with publically available and non-regulated
data and in harnessing the "wisdom of crowds" to help decide what to
preserve.
• TIMBUS focuses on the environments that produce the data rather than
the data itself.
• ENSURE and TIMBUS are motivated in part by accurate risk assessment
and preservation lifecycle issues related to regulations.
• ENSURE, SCAPE and TIMBUS address the scalability of technology and
software infrastructure for digital preservation.
• Targeted Stakeholders:
– scientific data (SCAPE, ENSURE, TIMBUS)
– memory institutions (SCAPE, ACROMEM)
– web (SCAPE, ACROMEM)
– engineering (TIMBUS)
– health care (ENSURE)
– finance (ENSURE)
14. Objectives
• ENSURE, SCAPE, and TIMBUS are focused on organisations (organization-
focused projects); ARCOMEM is focused on the web
• All project address the question "what is to be preserved"
– ARCOMEM: social media can tell us
– ENSURE: extract this information from business rules
– SCAPE and TIMBUS: provide tools for responsible persons (curators)
– TIMBUS driven by risk management, ENSURE by cost/benefit
• ARCOMEM, ENSURE and SCAPE focus on issues of scalability
– ARCOMEM, SCAPE: computational
– ENSURE: storage infrastructure
• The organisation-focused projects also consider
– the automation of the preservation lifecycle
– the automation of quality assurance for preservation actions
• Both ENSURE and TIMBUS have the goal of re-running software after long
periods of time
15. Approach
• All four projects will produce prototype software frameworks
– The organisation-focused projects all propose to implement platforms for the execution of
preservation workflows
• SCAPE and ENSURE will make use of service-oriented architectures
– SCAPE for prototyping only; SOA model workflows should be translated in to Map/Reduce jobs
• Digital Lifecycle approach
– TIMBUS focuses on the legal and IPR aspects
– ENSURE focuses on the trade-offs between quality, cost and economic performance
• Preservation planning plays a role in all projects
– ENSURE plans a configuration layer with special emphasis on cost versus value
– The TIMBUS approach is based on dependency and risk management
– Both ARCOMEM and SCAPE rely on the internet to guide preservation
• ARCOMEM through the monitoring of social media
• SCAPE through the monitoring of web harvests
• Virtualisation plays a role in all organisation-focused projects
– ENSURE: as a means to access digital objects
– SCAPE: as a means to deploy complex preservation action environments
– TIMBUS: as a means to preserve and recover the entire business process
17. Trends in Digital Preservation Projects
2006 2007 2008 2009 2010 2011 2012
CONTENT-DRIVEN
Semantic Semantic
Web Services Web Services +
Agents EMULATION Virtualization
PANIC Workflow
Linked Open Data
SEMANTIC WEB
WORKFLOW
SOA: Web Services
WEB SERVICES
Security and Trust
Distributed
Storage Quality Assurance
GRID
Distributed
Distributed Processing
Storage
CLOUD
17 07.11.2011
18. Thank you for your attention!
Ross King – AIT, Vienna
Orit Edelstein – IBM Research, Haifa
Michael Factor – IBM Research, Haifa
Thomas Risse – L3S Research Center, Hannover
Eliot Salant – IBM Research, Haifa
Philip Taylor – SAP Research, Belfast
ARCOMEM: www.arcomem.eu
ENSURE: ensure-fp7.eu
SCAPE: www.scape-project.eu
TIMBUS: timbusproject.net