<Note to speakers:The EMC Isilon presenter will cover the 1st half of the presentation, through slide 24. The EMC Greenplum presenter will cover the 2nd half of the presentation, slides 25 – 37Both presenters will participate in the Q+A (with backup from other EMC team members attending the event><To kick off the presentation>:Welcome the audience + thank them for joining usIntroduce yourself + the EMC Greenplum presenter
Here’s what we’re going to cover in today’s session:Walk through agenda
Isilon has been a leading innovator in scale-out NAS for more than10 years.Isilon scale-out storage is being used today across a wide range of organizations:Data-intensive, high performance computing (HPC) environments such as Life Sciences, Electronic Design Automation, and Media & Entertainment, to name a few examples.Traditional enterprise IT environments: Isilon’s storage systems are used to support a variety of large-scale use cases including archiving, home directories and file shares; virtualization (Tier 3 and Tier 4); and business analytics (Hadoop).In total, Isilon’s scale-out storage solutions are being used by over 3,000 organizations around the world today and, thanks to the success that customers have enjoyed, the business is growing rapidly…about 100percent per year last year. The key engine of customers’ success is the Isilon OneFS operating system. It is instrumental in providing customers with an innovative, scale-out data environment. Note to Presenter: Here are some additional facts that you may want to point out about Isilon:Isilon was founded more than 10 years ago (as Isilon Systems) and is now recognized as the industry leader in scale-out NAS storage solutions. Isilon joined the EMC team in December 2010 (when EMC acquired Isilon Systems). Since then, Isilon’s scale-out storage solutions business has continued to grow rapidly—being adopted in large enterprises across a wide range of industries.Gartner report can be found here: http://www.gartner.com/id=1960515 (abstract only)
This slide shows just a sampling of customers who are benefiting from Isilon scale-out storage.
One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
The Isilon OneFS operating system provides the intelligence behind all Isilon scale-out storage systems. It combines the three layers of traditional storage architectures—file system, volume manager, and data protection—into one unified software layer, creating a single intelligent file system that spans all nodes within an Isilon cluster.Note to Presenter: Click now in Slide Show mode for animation.OneFS provides a number of important advantages: A single file system for great ease of management Unmatched efficiency with over 80 percent storage utilization plus automated storage tiering to gain additional efficienciesHigh-performance NASEasy, “grow as you go” flexibility Linear scalabilitylets you can scale performance and capacity to over 15 PB
Putting It All Together.The Isilon IQ X-Series, powered by the OneFS® operating system, uses Isilon's scale-out storage architecture to speed access to massive amounts of critical data, while dramatically reducing cost and complexity. Isilon delivers a flexible solution to accelerate your high-concurrent and sequential-throughput applications. With SSD technology for file-system metadata, the Isilon X-Series significantly accelerates namespace intensive operations. S-Series nodes provide balanced throughput and performance and the NL nodes form the foundation for nearline, and archive.Isilon’s modular architecture and intelligent software make deployment and management simple. You can have an Isilon cluster online in less than 10 minutes, without time-consuming, expensive integration services. Scale a cluster in performance and capacity in about one minute all within a single pool of storage with a global namespace, eliminating the need to support multiple volumes and file systems. Isilon’s suite of applications then work together to provide the data management and protection capabilities required by corporate IT – from the front end intelligence that eliminates client and data migration to quota management for file shares. SnapshotIQ and SyncIQ work in concert to protect and replicate important data for local and remote archive while SnapLock provides for the immutability of data. And finally, backup accelerator speeds file replication to tape with a scalable, parallel infrastructure that insures backup windows and recovery time objectives are always met.
It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
There are 5 basic roles to every hadoop environment:HDFS is made up of the namenode, secondary namenode, and datanode roles.Mapreduce is comprised of the jobtracker and task tracker.
The job tracker is effectively the queue master of a hadoopmapreduce environment. It schedules jobs, distributes tasks across available task-trackers, and allows administrators to get a glimpse into the overall activity for a hadoop environment.
To go into more detail, the namenode is effectively the metadata server for all HDFS data and data blocks. In large hadoop clusters, this role is run on a dedicated host, typically with a large amount of D-RAM. This is because all metadata for the entire HDFS namespace is stored in local DRAM on this host. As such, traditional hadoop architectures have limitations on the number of objects which can be stored within each HDFS namespace.The namenode is contacted for every block request, both for reads and writes, and is responsible for making sure data blocks are mirrored to multiple datanodes, spanning multiple racks.
One challenge associated with traditional deployments of Hadoop, is that it has largely been done on a dedicated infrastructure and not integrated with or connected to any other applications. In effect, a silo’d environment, often outside the realm of the IT team. This poses a number inefficiencies and risks.<click>A well-recognized issue with traditional Hadoop deployments is the “single-point-of-failure” problem with the HadoopNamenode. In a Hadoop environment, a single namenode manages the hadoopfilesystem. If it goes down, the Hadoop environment will immediately go off-line. If the namenode does not come back online, the data stored within all of HDFS is lost and cannot be reconstructed.<Click to next build slide>
Another issue with traditional Hadoop environments is the lack of enterprise-level data protection. Typical Hadoop deployments do not have rigorous data protection backup and recovery capabilities such as snapshots or data replication for disaster recovery (DR) purposes.<click> Traditional Hadoop deployments on direct-attached storage (DAS) are also extremely inefficient. It’s not unusual for a DAS environment to operate with a 30-35% storage utilization rate (or less). Compounding this inefficiency is the fact that data is often mirrored (the default is 3 times). In addition to storage inefficiency, this type of infrastructure is very management-intensive.<click>Another issue with Hadoop running with direct attached storage is that ‘server’ and ‘storage’ resources must be increased together in lock-step. For example, if more storage resources are required, a new server must be deployed (and vice versa). This rigidity adds additional inefficiencies. Another issue is the manual import/export of data that is required in a traditional hadoop environment. In addition to being time and resource (bandwith) consuming, the hadoop data in typical environments can not be accessed or shared with other enterprise applications due to the lack of industry-standard protocol support.To address these challenges and to enable enterprises to begin realizing the benefits of Hadoop quickly and easily, EMC has recently introduced an exciting new Hadoop solution.<click to advance to next slide>
Isilon is able to “pretend” to be a HDFS cluster: it mimics the NameNode and DataNode protocols to host data.Underlying system is OneFS and does not follow the traditional HDFS scheme.Point HDFS clients (MapReduce, command line, etc.) to the DNS name of the Isilon cluster.
One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
The new EMC solution also eliminates the “single-point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage cluster to become, in effect, namenodes. This greatly improves the resiliency of your hadoop environment.The EMC solution for hadoop also provides reliable, end-to-end data protection for Hadoop data including snapshoting for backup and recovery and data replication (with SyncIQ) for disaster recovery capabilities.Our new hadoop solution also takes advantage of the outstanding efficiency of EMC Isilon storage systems. With our solutions, customers can achieve up to 80% or more storage utilization.EMC Hadoop solutions can also scale easily and independently. This means if you need to add more storage capacity, you don’t need to add another server (and vice versa). With EMC isilon, you also get the added benefit of linear increases in performance as the scale increases.EMC also recently announced that we are the 1st vendor to integrate the HDFS (Hadoop Distributed File System) into our storage solutions. This means that with EMC Isilon storage, you can readily use your Hadoop data with other enterprise applications and workloads while eliminating the need to manually move data around as you would with direct-attached storage.
Math Logic on 28 hours.100 TB = 100,000,000 MB10GB can transfer approx 1GB per second (not including spindle speeds in calculations)So, 100TB/1GB = # of seconds to transfer then divide by 60 seconds / 60 minutes = 28 hours (ish)
It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”
With Isilon, Return Path now has a single repository for all its Big Data, accessible to email analysts, product development teams and external customers. Previously, performing analytics on email data residing in shared storage required making a separate copy of the data set and manually moving it to the Hadoop environment. Today, Isilon delivers real-time data to Return Path’s end-user applications while providing seamless integration with Hadoop for back-end data analytics, boosting customer satisfaction and business productivity.“To have all this data being generated by our email intelligence products, but no way to access it directly by Hadoop, was a major hindrance,” Carter remarks. “Now, Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a huge business enabler because we're able to develop products much faster.” Pam please add a place holder for time savings from the old process of manually creating multiple copies to now with Isilon
Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”