1. CineGrid ExchangeBuilding A Global Networked Testbed for Distributed Media Management and Preservation Shaofeng Liu, UCSD/CSE Jurgen Schulze, UCSD/Calit2 Laurin Herr, Pacific Interface CENIC March 9, 2010
2. What is CineGrid? Formed 2004 – non-profit international membership organization Mission – to build an interdisciplinary community Focus – the research, development & demonstration of networked collaborative tools To enable – the production, use, preservation & exchange of very high-quality digital media over high-speed photonic networks Members – media arts schools, research universities, scientific labs, post-production facilities & hardware and software developers, network operators from around the world (~50 members worldwide) Connected via 1 G Ethernet & 10 G Ethernet networks For – research & education
3. “Learning by Doing” Early CineGrid Projects CineGrid @ iGrid 2005 CineGrid @ AES 2006 CineGrid @ GLIF 2007 CineGrid @ Holland Festival 2007
4. CineGrid ExchangeMotivations TERABYTES PILING UP. Need to store & distribute own collection of digital media assets. Members access materials for experiments and demonstrations. THE DIGITAL DILEMMA. Report published by AMPAS – lessons learned by NDIPP and NARA + pioneering research at Stanford (LOCKSS) & UCSD (SRB and iRODS) Create global-scale testbed = high quality media assets + distributed storage + fast networks. Explore strategic issues in digital media storage, access, distribution and preservation
5. CineGrid Exchange 2009Build on the Basic Concept Receive “seed” funding from AMPAS to start building multi-layer open source asset management & user access framework for distributed digital media repository Establish Working Group: AMPAS, PII, Ryerson, UCSD, UIC, Naval Postgraduate School, UW, UvA, Keio, NTT, CESNET Write CineGrid Exchange functional requirements document Expand CineGrid Exchange storage capacity
6. CineGrid Exchange 2009Phase 1 Development Goals Define basic workflows for ingest, distribution, & deletion Define CineGrid Exchange metadata schema Implement basic workflows using iRODS rules-based microservices to automate the management of CineGrid Exchange assets Deploy iRODS to all CineGrid Exchange nodes Enable large amounts of data transfers between nodes Prepare for integration of CollectiveAccess cataloging application atop iRODS-managed distributed repository
11. CineGrid Exchange MiddleWareiRODS (Integrated Rules Oriented Data Services) Programmable micro-services to implement management policies for automated operations across distributed storage repositories Transparent management of files in a distributed repository Centralized catalog (iCAT) File transfer using parallel TCP & UDP Suitable for experimenting with digital media preservation strategies
19. Register files into iRODS database For all copies of each file, set the right access permissions
20.
21.
22. CineGrid Exchange testing suggests checksums may be a potential bottleneck
23.
24. Acknowledgements Academy of Motion Picture Arts and Sciences, Science and Technology Council Naval Postgraduate School, MOVES Institute UCSD/Calit2 UIC/EVL University of Washington, Research Channel Ryerson University CESNET NTT Network Innovation Laboratory Keio University/DMC University of Amsterdam Pacific Interface DICE Team at UNC & SDSC
26. CineGrid Exchange DevelopmentMulti-layer Open Software Stack User Interface & Access CineGrid Exchange Access Portal For CineGrid by CineGrid Open Source Application for Collections Management & On-Line Access Collective AccessExtended by Whirl-I-Gig for AMPAS Open Source Digital Repository Interface iRODS by DICE For CineGrid By Calit2 Open Source Middleware for Rule-based Management of Distributed Digital Repository Resources & Asset Testbed Infrastructure of Distributed Storage and Network Links Resource Description Framework For GLIF and CineGrid by UvA
27. CineGrid ExchangeDistribution Workflow Upon request for content, the CX metadata management interface (CollectiveAccess) will be searched to identify assets for distribution (pending integration) CollectiveAccess will query the iRODS resource management module to confirm availability of asset within CX distributed repository iRODS module will identify the best location from which to fetch the asset, set parameters and start the transfer using either TCP or UDP Integrity of all fetched files confirmed by checksum comparison Distribution permissions associated with asset checked Member access authorization confirmed Assets distributed via network or HDD
Notas do Editor
These are the goals of the first phase CX design and development. Basically the primary goal is to integrate all the distributed storages provided by CineGrid members andProvide a transparent way to access it; to define the right way to manage and distribute digital contents; to design the metadata management scheme.
In 2008,Cinegrid Exchange started from three major sites in San Diego, Tokyo and Amsterdam. These three sites are connected by 10Gb networks.
And in 2009, we have expanded CX to include 7 major sites worldwide including San Dieog, Los Angeles, Chicago, Tokyo, Toronto, Prague, and Amsterdam. Still, the connections Between them are 10Gbps networks.
This table listed the current storage provided by those sites. There are about 250TB currently managed by Cinegrid Exchange.
We designed CX as a three level hierarchical structures. The lowest level is the physical distributed content repositories, connected by high speed optical networks. The middle ware mange the resources and provide support for applications. iRODS is the major middleware in CX, it is datagrid software that can manages distributed storages. It is also programmable through it’s rule system. CollectiveAccess is a metadata management software that we are working on. Our partners in AMPASAnd NPS are working on binding iRODS and CA together.
Here are some features of iRODS. It’s programmable, it’s transparent. It uses a centralized catalog to maintain file location, checksum, and some metadata. We integrated UDP into iRODS, so that it now support Very fast file transfer within CX IT forms the base of the test bed we want to build in CX. I have a short demo of how this works.
We have defined a few workflows in CX to start with. Here is an examples. I will give a small demo of the ingest workflow. I will put a file into the dropBox, and that file will be automatically replicated into two other storage system and then moved to a central points. This is our first phase implementation only, so if you are not surprised at the demo, I am not surprised either.
A distributed testbed like CX requires a good network, and good protocols. We integrated both TCP and UDP into CX. And performance is satisfactory. This figures shows the file transfer speed with TCP and UDP. The x-asix is the number of flows, and y-axis is the overall transfer speed. With 3 UDP streams, we can transfer files at a speed of 350MB/sec from Japan to US, and with TCP, we can achieve 250MB/s with 15 streams or so.
The performance of ingest/distribution are subjected to other factors, like checksum, we found with checksum performed with the transfer, the overall speed is significantly slowed down. We are investigating ways to solve these problems. For example, we can do delayed checksum, or do pipelined checksum, or do FEC coding. This will be solved.