A report on the open source initiative to write a version 4 of the Fedora Commons repository platform, delivered at the Open Repositories 2013 conference in Prince Edward Island.
1. Repository Redux:
The Past, Present & Future of Fedora
Open Repositories 2013
Charlottetown, Prince Edward Island
Thursday, 11 July 2013
Tom Cramer
@tcramer
2. A (Funding) History of Fedora
• Original software created 2000–08 with
$2.4M from the Mellon Foundation
➪ +$500,000 from UVA Library
• Moore Grant: 2007-11, $4.9M
• Committers from 10 institutions
• 2009–present: DuraSpace’s
sponsorship program has provided
funding for a tech lead
3. The Success of Fedora
• Architecture is demonstrably
Flexible & Extensible
• Support for Durability
• One foot in the linked data world
• A decade of maturity & proven use
• Substantial community of adopters,
contributors, vendors
5. New Opportunities
• Front-ends: eSciDoc, Hydra,
Islandora
– Attracting new energy and adopters
– Creating new technical demands
• Evolved technical environment
– Web architectures & horizontal scaling
– Linked data
• Data management mandates
6. Fedora Futures Takes Shape
• OR12: Ad hoc meeting
• September ‘12:
– Meet to compare needs and notes
– Charter a 3 month investigation
• December ‘12:
– Commit to a three year project
– Announce, Invite and Launch
7. Fedora Futures Objectives
• Preserve the strengths of the
architecture and community
• Address the needs for robust and full-
featured repository services (that we
now understand very well)
• Provide a platform in the repository
ecosystem for the next 5-10 years
8. Technical Requirements
• Highly scalable
• High availability
• Higher performance
• Flexible storage
• Robust auditing, reporting and metrics
• Enhanced fixity and versioning
9. Requirements, continued
• Work for small, medium and large
institutions
– Easy to deploy, administer
• Support breadth of needs
– Traditional IR
– Heterogeneous content (e.g., media)
– Emerging data management needs
• Interoperate with other systems
– Lean core, modular, APIs
10. Organizational Requirements
• Revitalized corps of developers
• Robust community investment and
governance
• Bigger community base
– Geographic
– Commercial & Non-Profit
– Additional domains
11. Fedora Futures Delivers
• January – Feb ‘13
– Evaluate platforms for Alpha
• March – June ‘13
– Alpha v1 development
• July ‘13
– Alpha v1 released!
• 2nd
half of ‘13: Beta development
12. Fedora 4 Alpha 1 Highlights
• Roughly 80% of Fedora 3.x functionality
– in 7% of the lines of code
– with 72% test coverage (vs. 10% for Fedora 3.x)
• Clustering
• Batch operations
• Transaction support
• Policy-driven & projected storage
• Self-healing
• One step install…
15. 1. Donate money
http://duraspace.org/sponsors
2. Add a developer
contact awoods@duraspace.org
3. Join the email list
ff-tech@googlegroups.com
4. Install the alpha
github.com/futures/fcrepo4/
5. Chime in!
Give use cases, feedback
Next Steps
Notas do Editor
Original software created 2000-2008 ○ $2.4 million from the Mellon Foundation ○ $500,000+ from UVA Library ● Codebase management and improvement has been continued by the committers group ○ Acuity Unlimited, Columbia, Cornell, DTU, FIZ Karlsruhe, the Denmark State Library, MediaShelf, UVa, U. of Wisconsin ○ Plus 3 independent software developers ● DuraSpace's sponsorship program has provided funding for the technical lead V1 = 2003 V2 = 2005 V3 = 2008
3 full versions over 12 years Hundreds of adopters worldwide Its own 501(c)3! With dozens of institutional sponsors Its own ecosystem (vendors, other systems) A shared international annual conference (OR) Demonstrated success as a Flexible, Extensible digital repository architecture
Chinese sign for crisis = danger + opportunity
Grass roots across bottom Duraspace relative to F and FF
Grass roots across bottom Duraspace relative to F and FF
One of the very early decision points for Fedora Futures was whether to pursue a) iterating on the existing Fedora 3.x codebase, b) a burn-it-all-down and build-it-anew greenfield project or c) build on top of or extend an existing platform For reasons of efficiency and risk management, we elected to build atop of an existing platform. We devoted two sprints to candidate evaluation, which also included developing a test harness to help us measure performance. By the end of Sprint 2, we had selected ModeShape, which is a JCR implementation from JBoss. By building on top of ModeShape: Rather than rebuilding from scratch (and therefore also taking the responsibility of maintaining a full stack) we avoid re-inventing the wheel and re-use some best-of-breed products and technologies so we can focus our always-limited resources on delivering a best-of-breed preservation repository service. Provides us a solid foundation for building a highly available, scalable, repository service Specifically, Fedora will finally be able to support transactions, clustering and offer higher performance with more flexible storage options As a result, we are on target to deliver (with varying degrees of completeness) on most if not all of the following objectives by mid-year (I’ve starred the objectives where development is already underway) =============== High scalability, high availability architecture Scale out (horizontally) to meet high volume access or ingest requirements Cluster configurations to avoid any single point of failure Flexible storage policy-driven storage support storage such as AWS Glacier to meet retention requirements, but reduce costs Durability Where Fedora 3 was preservation-enabling, Fedora 4 will be preservation-enabled One of the exciting features of Fedora 4 is really delivering on durability. In the current sprint, we're developing features to enable repository to be self-healing: that is detecting fixity failures and automatically restoring from a known-good redundant store Reporting and metrics More reporting and metrics to assist repository managers to make informed decisions Developer-friendliness Modular architecture Extensibility for non-Java developers (e.g. Ruby, Python & Scala) Ease of deployment Allow dev-ops and sysadmins to deploy Fedora easily and consistently across VMs or cloud infrastructure Better support for configuration management tools such as Puppet or Chef
One of the very early decision points for Fedora Futures was whether to pursue a) iterating on the existing Fedora 3.x codebase, b) a burn-it-all-down and build-it-anew greenfield project or c) build on top of or extend an existing platform For reasons of efficiency and risk management, we elected to build atop of an existing platform. We devoted two sprints to candidate evaluation, which also included developing a test harness to help us measure performance. By the end of Sprint 2, we had selected ModeShape, which is a JCR implementation from JBoss. By building on top of ModeShape: Rather than rebuilding from scratch (and therefore also taking the responsibility of maintaining a full stack) we avoid re-inventing the wheel and re-use some best-of-breed products and technologies so we can focus our always-limited resources on delivering a best-of-breed preservation repository service. Provides us a solid foundation for building a highly available, scalable, repository service Specifically, Fedora will finally be able to support transactions, clustering and offer higher performance with more flexible storage options As a result, we are on target to deliver (with varying degrees of completeness) on most if not all of the following objectives by mid-year (I’ve starred the objectives where development is already underway) =============== High scalability, high availability architecture Scale out (horizontally) to meet high volume access or ingest requirements Cluster configurations to avoid any single point of failure Flexible storage policy-driven storage support storage such as AWS Glacier to meet retention requirements, but reduce costs Durability Where Fedora 3 was preservation-enabling, Fedora 4 will be preservation-enabled One of the exciting features of Fedora 4 is really delivering on durability. In the current sprint, we're developing features to enable repository to be self-healing: that is detecting fixity failures and automatically restoring from a known-good redundant store Reporting and metrics More reporting and metrics to assist repository managers to make informed decisions Developer-friendliness Modular architecture Extensibility for non-Java developers (e.g. Ruby, Python & Scala) Ease of deployment Allow dev-ops and sysadmins to deploy Fedora easily and consistently across VMs or cloud infrastructure Better support for configuration management tools such as Puppet or Chef
Dev team with 11 developers from 8 institutions = 4 FTE’s, managed by Eddie Shin, working for 12 2-week sprints
Grass roots across bottom Duraspace relative to F and FF