The National Center for Biotechnology Information (NCBI) at the National Institutes of Health (NIH) requires high-performance storage to support genome research for its public databases and projects like the 1000 Genomes Project. NCBI's existing storage could not scale to meet the demands of projects generating over 1.5 petabytes of genetic data. NCBI implemented a Panasas storage system connected to a 1800-core Dell cluster, which improved application performance by 5 times, enabled faster database updates, and provided scalable storage for NCBI's growing archives. The Panasas solution supports NCBI's mission of advancing biomedical discovery through public databases and genome research.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
National Institutes of Health Maximize Computing Resources with Panasas
1. Customer Success Story
National Institutes of Health
L INST
National Institutes of Health
NA The National Center for Biotechnology Information (NCBI), a division of the
IT
NATIO
UTES
National Library of Medicine (NLM) at the National Institutes of Health (NIH),
F serves as a national resource for molecular biology information serving
O
H E A LT
H
research groups from around the world. Established in 1988, NCBI develops
new information technologies to aid in the understanding of fundamental
molecular and genetic processes that control health and disease. NCBI
creates public databases, conducts research in computational biology,
develops software tools for analyzing genomic data, and disseminates
biomedical information. Some 450 people—ranging from NCBI researchers
and staff scientists to programmers, curators, and indexers—generate, store,
and access NCBI databases.
SUMMARY The Challenge Designed specifically to accelerate the
Industry: Researchers at NCBI depend on high- performance of applications deployed
Life Sciences/Government performance compute clusters to run on Linux compute clusters, the Panasas
complex analyses of genotyping and storage cluster effectively eliminated the
THE CHALLENGE sequencing data. The existing storage research-impacting I/O bottlenecks.
Meet demands of researchers from architecture did not effectively scale
around the globe accessing the NCBI to support such efforts as the 1000 PAS storage now provides scalable
public database to conduct genome
Genomes Project, an ambitious endeavor performance and capacity to multiple
research. Eliminate I/O bottlenecks and
maximize computing resources for public to sequence the genomes of at least internal production systems (both Linux-
databases, including an estimated 1.5 1,000 people from around the world. and Windows-based platforms), including
PB of genetic information for the 1000 NCBI’s 1800-core Dell PowerEdge cluster
The project, creating the most detailed
Genomes Project.
and medically useful picture to date of that provides computing resources to some
human genetic variation, is expected to 80 applications used by ten NCBI research
THE SOLUTION groups. Panasas Storage supports much
generate more than 1.5 PB of genetic
PAS Storage system with the PanFSTM information. NCBI will be required to of the daily computation that generates
parallel file system, 1800-core Dell
PowerEdge Cluster, Cisco 6509 archive and provide timely investigator the data for such high-visibility services
Network Switch access to as much as 3 TB of new as NCBI’s PubMed resource that brings
genome data arriving weekly from each of together more than 18 million citations from
THE RESULT the six institutes participating in the 1000 MEDLINE and other life science journals
Genomes Project. To accommodate the for biomedical articles.
• 5X application performance
improvement expected high demand for data access
• Timelier database updates with NCBI requires a storage solution that is Most recently, NCBI implemented a PAS
faster time-to-results reliable, manageable, and affordable. Storage system that provides economical
• High performance irrespective of second-tier storage for the high-density
access patterns/dataset size The Solution data requirements of the 1000 Genomes
• Affordable scalability for fast- NCBI selected Panasas Storage for the Project. The PAS solution also provides
growing archives Center’s Dell PowerEdge compute farm. storage resources to projects such as
• Administrative efficiencies across The decision was based in part on testing the NCBI Short Read Archive (SRA), a
primary and secondary storage
results that indicated the Panasas Storage central repository for short read sequencing
solution delivers a significant performance data, and the dbGaP public repository of
improvement over existing installed storage. genotypes and phenotypes.
1-888-panasas www.panasas.com