Compute Canada is deploying a new national data cyberinfrastructure to provide robust, highly available large-scale storage across its hosting sites. This includes software defined storage using commodity storage building blocks, object storage software for efficient data replication, and backup capabilities. The infrastructure aims to control costs while meeting performance and capacity needs, avoiding vendor lock-in, and supporting data access, sharing, and integration with research platforms.
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
How Researchers Will Benefit from Canada’s National Data Cyberinfrastructure
1. Presentation to the DDN User Group
November 14, 2016
Compute Canada's National Data
Cyberinfrastructure
2. 1. About Compute Canada
2. Technology Refresh: Challenge 2 Stage 1 (& beyond)
3. Compute Canada’s new National Data Cyberinfrastructure
4. Software defined storage and storage building blocks
5. The role of object storage
6. Visions of data availability, resiliency and usability
Today’s presentation
3. Abstract
Compute Canada is the national platform for Advanced Research
Computing, serving essentially all academic disciplines with
computational or storage needs “beyond the desktop.” Member
institutions include research universities and institutes, and there are
more than 3,000 active research projects that utilize the national
platform. As a result of a new federal funding program, matched by
provinces and member institutions, an ambitious technology refresh
program is underway. A cornerstone of the updated platform is a new
national data cyberinfrastructure. The NDC is deploying robust, highly
available, large scale storage to the hosting sites. Building on
concepts of software defined storage and commodity storage building
blocks, the NDC is delivering backup and nearline services, persistent
filesystem-based storage, and object storage.
5. Technology Refresh: Challenge 2 Stage 1
System RFP
Issued
RFP
Closed
Delivered In Production
National Data
Cyberinfrastructure
(Ongoing
delivery)
Fall 2016
GP1 - UVIC
Cloud
Fall 2016
GP2 - SFU
General Purpose
Early 2017
GP3 - Waterloo
General Purpose
Spring 2017
LP - UofT
Large Parallel
Late 2017
Federal funding: $30M, total value of $75M with matching and
in-kind. Project time span: 2016-2018.
6. Technology Refresh: Challenge 2 Stage 2
Proposal submitted, outcomes not yet public
System/service type CFI capital Notes
Deep storage $2,500,000 One additional deep storage site, plus additional capacity for the
current two sites.
Experimental systems $750,000 Small experimental systems at some Stage 2 sites; modest investment
in commercial cloud.
Services infrastructure $250,000 1 FTE for 2 years, plus small purchases of existing software and/or
services.
Elastic secure cloud (ESC) $750,000 One standalone ESC site.
Expand LP - No expansion of LP.
GPx $15,750,000 Expansion of one or more GPx systems, and addition of one or more
new GPx systems. All GPx systems will have ESC partitions.
TOTAL $20,000,000
Details and descriptions of system/service types are in the “Cyberinfrastructure Initiative
Challenge 2” proposal, online at: https://www.computecanada.ca/publications/
7. Major Elements to Date of Compute Canada’s new
National Data Cyberinfrastructure
1. Storage Building Blocks (SBBs). Commodity storage systems that are
flexible, configurable, and will evolve over time as technology improves.
a. Provider: Scalar Decisions, Inc. (Toronto).
b. Technologies: SBB systems from Dell and Seagate.
c. Configurations to be provided: Mult. performance tiers & capacities.
2. Object Storage Software. Automated, efficient data replication across
the wide-area network, S3-compatible interface to data objects, and
POSIX-style access to object storage.
a. Provider: DDN Storage
b. Technologies: WOS
c. Configurations to be provided: Software at Stage 1 sites & beyond
3. Backup capabilities. To provide cost-efficient bulk storage of data
copies, including archives and nearline storage.
a. Provider: IBM Canada
b. Technologies: Spectrum Protect software; TS3500 tape silos and
LTO7 tapes+drives; supporting infrastructure systems
c. Configurations to be provided: Multi-site redundant backups to SFU &
uWaterloo; other configurations and uses as needed.
RFP evaluation criteria focused on total cost of ownership, for desired
capabilities and capacities.
8. Software Defined Storage and Storage Building Blocks
Software Defined Storage (SDS): Compute Canada anticipates that increasingly, the
software layer will present storage features, irrespective of hardware. This will often
occur with flexible, interchangeable, and vendor-agnostic underlying hardware layers.
Key features of software defined storage include:
● Incorporation of different performance layers;
● Multiple access points and/or modalities to the same data items;
● Ease of expansion.
Storage building blocks (SBBs): Compute Canada is focused on cost-effective
technology deployment and growth. Total cost of ownership (TCO) calculations for
solutions are intended to include capital costs, operating costs, and all aspects of
support. Storage building blocks helps to control TCO:
● Obtaining the needed level of performance and other features;
● Controlling costs, by emphasizing commodity-based solutions;
● Expanding capacity as-needed, to take advantage of price/performance
improvements over time;
● Avoiding proprietary solutions and vendor lock-in.
9. The Role of Object Storage
Compute Canada engages in continuous assessment of current and future needs of the
user community (see https://www.computecanada.ca/research-portal/sparc2/ for the
early 2016 activities). Indications are that object storage will address several key
current and future needs:
● Modernizing and modularizing research platforms and portals, by providing an
object storage interface to data;
● Providing easy and cost-effective replication of data, including replication over the
wide-area network;
● Adding a compatibility layer for users seeking to employ commercial cloud
services;
● Adding an interoperability layer for data access via POSIX or S3;
● Enabling diverse metadata;
● Access control mechanisms, including public sharing of data.
The Storage RFP included solicitation of bids for object storage software, which was to
be software defined storage capable of running on commodity storage building blocks.
10. WOS Status, Hopes and Plans
Access via S3 or POSIX bridge (Lustre or GPFS).
11. Visions of Data Availability, Resiliency and Usability
Step 1: Science DMZ + Persistent storage (object, filesystem, backups)
Step 2 (2017): Integrate with HPC systems
Step 3 (2017-2018): Integrate with research data
management systems, research platforms and portals
13. Your Presenter
Dr. Greg Newby is Chief Technology Officer of Compute Canada
He has a passion for enabling diverse scientific, social and
educational opportunities. He has devoted his professional career
to advanced research computing. Born in Montreal, Dr. Newby
received his doctorate in Information Transfer from Syracuse
University and most recently completed an M.B.A. in Sustainable
Systems from the Bainbridge Graduate Institute of Presidio. Dr.
Newby also obtained a Masters in Communications from
University at Albany, State University of New York.
Author of several books and numerous publications, Dr. Newby
was a faculty member at two major US universities where he
developed and taught courses in information systems, information
security, and computer technology. His most recent roles include
Manager of the Supercomputing Core Laboratory at King Abdullah
University of Science and Technology in Saudi Arabia. Dr. Newby
was Director of the Arctic Region Supercomputing Center at the
University of Alaska Fairbanks, where he also served as a faculty
member for 11 years.
14. Questions, Discussion and Closing Thoughts
Visit Compute Canada at SC16 booth #4430.
Find Compute Canada online at www.computecanada.ca
Twitter: @ComputeCanada
Email: gbnewby@computecanada.ca