A view of the directions storage is taking in science & technology from Ryan Sayre, technical strategist in the office of the CTO for EMC Isilon, using examples from recent work in life science genomics and other industries taking advantage of the combination of extreme computing (HPC) and big data. As presented at the Bull sponsored Science & Innovation 2013 conference Westminster.
High Performance Computing has influenced and changed the way we manage our scientific endeavours in the UK and beyond. The evolution of how we use scale-out compute infrastructure also affects the way we store data as well. Traditional islands of data storage used in previous eras cannot scale to solve the current challenges of bioinformatics, complex scientific simulations, and technical innovation. Scaling-out is the only way to manage the size of the problems that are being solved today. UK case studies in research and technology and related opportunities to be discussed.
Note to Presenter: View in Slide Show mode for animation.We hear a lot about Big Data, but sometimes the definition isn’t clear. Here is a useful definition of Big Data from Wikipedia: Big Data is data that challenges the capabilities of a system to capture, manage, and process it within a tolerable elapsed time.In the context of today’s presentation, two key attributes that we’ll be discussing is the volume of data and the composition of the data. In terms of “volume,” we’ll focus on the multi-terabyte to multi-petabyte range. And for “composition,” we’ll focus primarily on unstructured, file-based data. In this context, Big Data includes audio, video, graphics, images, and enterprise file data sets such as office files, home directories, VMDKs, and large-scale file archives. Isilon supports all kinds of unstructured and file-based data.
http://www.nhsconfed.org/priorities/political-engagement/Pages/NHS-statistics.aspx source for 88 million outpatientsIf ¼ of the patients opted for genomic analysis due to a possible genetic factor in their health, this would factor to over an exabyte of storage. To put that into perspective, that’s about 10 days worth of data processing that all of the servers at Google compute daily – extrapolating for data growth from (http://techcrunch.com/2008/01/09/google-processing-20000-terabytes-a-day-and-growing/)
Here’s an example of one of these next generation sequencing machines. It’s beautiful, and can output a lot of useful data that scientists can sift through and discover meaning out of the data.
These are prime examples of data-intensive industries where Isilon storage systems have been proven to deliver significant customer benefits: Medical ImagingGene SequencingSeismic Exploration in the Oil & Gas industryVideo & Graphics (Media & Entertainment)Satellite Images Product DevelopmentCompanies in these industries have been the leading edge because large-scale files and unstructured data—Big Data—have caused these firms to adopt innovative storage approaches and embrace Isilon.
Legacy scale-up file systems and volume sizes are inadequate. Leads to multiple file system, hundreds of volumes Increases management overheadLowers capacity efficiencyAdds complexity
Here we see the Dilemma of Scale-Out and Scale-Up in graphic formScalabilityScale-up achieves with Capacity growth only, with limited performance options In contrast, Scale-out provides both Performance and Capacity scalabilityPerformanceWith Scale-Up, we see a true degradation of performance & capacity at scale. In contrast, Scale-Out has true linear predictability
Isilon scale-out NAS is an ideal storage platform for consolidation of your application data.Note to Presenter: Click now in Slide Show mode for animation.We’ll go into these capabilities in more detail later, but here is a summary of a number of important innovations from Isilon:Isilon storage is easy to scale and can support over 20PB of data in a single Isilon clusterUnlike traditional storage alternatives, Isilon storage performance increases linearly with growth in storage capacityIsilon storage is highly efficient;you can achieve over 80 percent storage utilization with Isilon’s scale-out NAS solutionsIsilon’s storage systems are highly resilient and can maintain 100 percent data availability, even with multiple component failures (including disk drives or entire nodes)Isilon provides a comprehensive portfolio of data protection and management software to help you get the full value of your Isilon storage systemsAnd with Isilon, you never need to migrate data again
Note to Presenter: Click now in Slide Show mode for animation.This slide shows how Isilon SmartPools software can help you optimize storage resources with automated tiering.SmartPools is integrated with the Isilon OneFS operating system to allow a single point of management, with a single scalable file system that offer multiple tiers of performance—depending on the data.The automated, policy-based data movement is transparent to the users, and there are no application changes required.
Note to Presenter: View in Slide Show mode for animation.Isilon storage systems are highly resilient and provide unmatched data protection and availability. Isilon uses the proven Reed-Solomon erasure encoding algorithm rather than RAID to provide a level of data protection that goes far beyond traditional storage systems.Here is an example of the flexibility and types of data protection that are standard in an Isilon cluster:With N+1 protection, data is 100 percent available even if a single drive or node fails. This is similar to RAID 5 in conventional storage.Note to Presenter: Click now in Slide Show mode for animation.N+2 protection allows two components to fail within the system, similar to RAID 6.With N+3 or N+4 protection, three or four components can fail, keeping the data 100 percent available.Isilon FlexProtect is the foundation for data resiliency and availability in Isilon storage solutions.Legacy “scale-up” systems are still dependent on traditional data protection. They typically use traditional RAID, which consumes 30 to 50 percent of the available disk capacity. The time to rebuild a RAID group after a drive failure continues to increase with drive capacity, and data loss is susceptible to a two-disk failure.Isilon’s industry-leading data protection will provide 100 percent accessibility to data with one-, two-, three-, or four-node failures in a pool. And, data protection levels can be established on a file, directory, or file system level so all data can be treated independently—meeting SLAs based on the application or type of data.And due to the distributed yet symmetric nature of the cluster, all nodes participate in accelerating the restoration of the portions of files from a failed drive. As the cluster grows, the rebuild times become faster and more efficient, making the adoption of larger-capacity drives very simple. With Isilon, a drive replacement can be rebuilt quickly—the larger the storage system, the faster. And in Isilon solutions, drives are hot pluggable and hot swappable with no downtime.
With Isilon, you can streamline your storage infrastructure by consolidating large-scale file and unstructured data assets, eliminating silos of storage. Platform REST API: Isilon solutions incorporate a platform REST (representational state transfer) API to provide you and third-party ISVs with a robust control interface to the Isilon OneFS operating system for further automation, orchestration, and provisioning of your Isilon storage cluster.VMware integration: Isilon storage solutions readily integrate with your VMware environment and incorporate VMware VAAI and VASA APIs to simplify storage management in your virtualized IT environment. Multi-protocol support: Isilon scale-out NAS includes integrated support for a wide range of industry-standard protocols, including NFS, SMB, HTTP, FTP, and native Hadoop HDFS to: Simplify your business analytics initiativesSimplify and consolidate workflowsIncrease flexibilityGet more value from your enterprise applications and data These levels of interoperability help you leverage your large data assets more flexibly with a broad range of applications and workloads, and across a diverse IT infrastructure environment.
Isilon storage systems are extremely easy to use. This “simple to manage” approach translates into a significant cost savings for you.A recent IDC white paper details Isilon’s cost advantages for enterprise environments. As shown in the graphic on the left, IDC investigated the relative amount of time needed by IT professionals to perform a wide range of data and storage management functions (listed on y axis) for Isilon as well as traditional storage systems.Isilon storage is easier to manage and requires less time. The study showed that with Isilon scale-out NAS, enterprises were able to increase IT productivity by 48 percent and thereby reduce OpEx (operating expenditures).
The IDC study also found that as a result of Isilon storage systems’ unmatched efficiency—over 80 percent storage utilization—organizations were able to reduce CAPEX (capital expenditures) significantly.With the reduced CapEx and increase in IT productivity, enterprise customers were able to reduce their overall storage costs by 40 percent with Isilon scale-out NAS (compared to traditional storage systems).