SlideShare uma empresa Scribd logo
1 de 42
1© Copyright 2011 EMC Corporation. All rights reserved.
EMC Isilon HDFS –
Enterprise Storage for
Hadoop
Featuring EMC Isilon Scale-Out NAS
Storage
Shai Harmelin
EMC System Enginer – Isilon Specialist
May 21, 2013
2© Copyright 2011 EMC Corporation. All rights reserved.
Today’s Agenda
• EMC Isilon Background
• HDFS Architectural Challenges
• Isilon HDFS Benefits
• Performance Comparison
• Customer Case Study
• Q+A
3© Copyright 2011 EMC Corporation. All rights reserved.
EMC Isilon
Setting the standard for scale-out NAS
• Founded in 2000 as the leader in Scaleout NAS (Gartner 2010)
• Broad adoption across many markets
– High Performance Computing (HPC): Life Sciences, Oil & Gas, Electronic
Design Automation, Media & Entertainment, Financial Services
– Enterprise IT: Archive, Home Directories, File Shares, Virtualization,
Business Analytics
• Acquired by EMC in 2011 for $2.5B
• Over 3,500 global customers
• Isilon OneFS: Seventh generation, industry-proven, innovative
scale-out operating environment
• 2012 – EMC Isilon is Industry’s First Scale-Out NAS System with Native
HDFS Support
4© Copyright 2011 EMC Corporation. All rights reserved.
Isilon Growing Momentum
3,500+ customers
5© Copyright 2011 EMC Corporation. All rights reserved.
Why Hadoop is Important to EMC
Isilon Customers
Pragmatic approach to analytics on a very large scale
– Opens up new ways of gaining insights and identifying
opportunities for businesses
Designed to address the rise of unstructured data
– Enterprise data to grow by 650% over next 5 years
– More than 80% of this growth will be unstructured data
Hadoop is only ONE component of
Enterprise Big Data Analytics PIPELINE
6© Copyright 2011 EMC Corporation. All rights reserved.
Isilon Scale-Out NAS Architecture
OneFS Operating
Environment
Intra-cluster
Communication Layer
Servers
Client/Application Layer Ethernet Layer
Servers
Servers
SingleFS/Volume
CIFSNFS
FTPHTTP
HDFS
for
Hadoop
7© Copyright 2011 EMC Corporation. All rights reserved.
Isilon Core Innovation
OneFS scale-out operating system
Single File System
Simplicity
Leadership Efficiency
High Performance
Easy Growth
Automated Tiering
Linear Scalability
8© Copyright 2011 EMC Corporation. All rights reserved.
Largest and Most Scalable File System
500X More Scalable than Traditional Storage Systems
OneFS™ can scale from 18TB to over 20,000 TB in a
single file system
•
•
•
9© Copyright 2011 EMC Corporation. All rights reserved.
AutoBalance
Automated data balancing across nodes reduces costs,
complexity and risks for scaling storage
“Using Software to do Work Unfit for Humans”
• AutoBalance migrates
content to new storage nodes
while system is
online and in production
• Requires NO manual
intervention, NO
reconfiguration,
NO server or client mount point
or application changes
• Eliminate “Hot Spots”
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
FULL
FULL
FULL
FULL
BALANCED
BALANCED
BALANCED
BALANCED
BALANCED
10© Copyright 2011 EMC Corporation. All rights reserved. Back to Navigation
11© Copyright 2011 EMC Corporation. All rights reserved.
• Load balancing
• Seamless failover
• Performance zones
• Quota
management
• Thin provisioning
• High speed replication
• Disaster recovery
• Business continuance
• Instant recovery
• Data protection
Isilon, Scale-Out NAS for Big Data
Single File System, Single Volume Simplicity For Active,
Persistent, And Archive Data
WAN/LAN
Primary &
Nearline Storage
Local/Remote
Archive
Client/Application
Layer
Virtualized Servers
Virtualized
Servers
Clients
X-series
Network
NL-series
• File immutability
• Protection from
deletion/change
NL-series
Backup
Accelerator
S-series
• Automated
storage tiering
12© Copyright 2011 EMC Corporation. All rights reserved. Back to Navigation
Easiest Storage System to Manage
Single-level of
Management
Manage a 18TB to 10PB
single file system from
one intuitive console
"Isilon has made some very
bold claims with respect to its
clustered storage products -
not least the idea of
genuinely revolutionizing the
ease and speed with which
mass storage - over 500
Terabytes - can be added and
managed thereafter. We have
conducted rigorous testing
and unanimously agree with
their assertions. This stuff
is almost frighteningly
simple to use.”
Steve Broadhead, Founder,
Broadband-Testing
Laboratories
14© Copyright 2011 EMC Corporation. All rights reserved.
HDFS Overview
15© Copyright 2011 EMC Corporation. All rights reserved.
Secondary NameNode
DataNode / Task TrackerJob Tracker
NameNode
Core Hadoop Components
16© Copyright 2011 EMC Corporation. All rights reserved.
Job Tracker
Manages all the jobs to the cluster
Tracks and reports the status of jobs and tasks
Provides job queuing functionality
Communicates with NameNode and tries to align TaskTracker to Data Nodes
The compute workhorse
Serves read/write requests from the clients
Executes Map/Reduce tasks
Typically performs I/O against local or remote DataNodes
Task Tracker
Compute Components
17© Copyright 2011 EMC Corporation. All rights reserved.
NameNode
Manages the file system namespace
Stores all the Metadata in the RAM – a
limitation on file system size
Filenames, owners, group, access info
Knows associated blocks
Manages block replication across
DataNodes
Manages edit log and check-
pointing of name node metadata
Does not provide name node hot
failover
CDH4 has a solution for this, but
is not in full scale production in
most environments
Secondary NameNode
Stores blocks of files on top of native host OS file system (e.g. EXT3, XFS, ZFS)
Same block is stored on multiple DataNodes for redundancy
Has no “awareness” of data blocks living elsewhere (only the namenode does)
DataNode
File System
Components
18© Copyright 2011 EMC Corporation. All rights reserved.
Enterprise Challenges of Hadoop
Hadoop DAS Environment
1
Dedicated Storage Infrastructure
– One-off for Hadoop only
2
Single Point of Failure
– Namenode
3
Lacking Enterprise Data Protection
– No Snapshots, replication, backup
4
Poor Storage Efficiency
– 3X mirroring
5
Fixed Scalability
– Rigid compute to storage ratio
6
Manual Import/Export
– No protocol interoperability support
Name node
19© Copyright 2011 EMC Corporation. All rights reserved.
Enterprise Challenges of Hadoop
Hadoop DAS Environment
1
Dedicated Storage Infrastructure
– One-off for Hadoop only
2
Single Point of Failure
– Namenode
3
Lacking Enterprise Data Protection
– No Snapshots, replication, backup
4
Poor Storage Efficiency
– 3X mirroring
5
Fixed Scalability
– Rigid compute to storage ratio
6
Manual Import/Export
– No protocol support
1x
1x
2x
2x
3x
2x
3x
3x
1x
Namenode
20© Copyright 2011 EMC Corporation. All rights reserved.
Isilon HDFS Support
Isilon supports the HDFS
interfaces for the NameNode
and DataNode to host and
metadata and data
Underlying file system is
OneFS
As simple as pointing the
Hadoop Nodes to the DNS
name of the Isilon cluster!
21© Copyright 2011 EMC Corporation. All rights reserved.
HDFS is a protocol!
Each Isilon node now “speaks” the HDFS NameNode and
DataNode protocol
We eliminate need to run these services on the Hadoop compute
cluster
Every Isilon node acts as both a namenode and datanode
(isi_hdfs_d)
Data is laid out within OneFS exactly the same as for NFS, SMB,
etc.
Data is protected just like any other data in the Isilon File
System. No Mirroring, only Parity = 80% utilization
All Isilon Enterprise Features are applied to Hadoop data:
Snapshots, Replication, SmartCache, SmartLock, etc…
22© Copyright 2011 EMC Corporation. All rights reserved.
HDFS Writes on Isilon
Jobtracker asks Isilon namenode (isi_hdfs_d) “tell me where to
place /path/file”
OneFS isi_hdfs_d hands JT list of 3 “datanode” addresses for
each block (aligned to block size defined on Hadoop cluster)
Jobtracker assigns task tracker to communicate to data-node
(isi_hdfs_d) to write each data block (an abstraction in our case)
When complete, isi_hdfs_d responds by saying the block is
replicated (a lie) because Data is striped like any other file,
written over any protocol.
HDFS files are laid out on Isilon File Systems (IFS) similarly to any other
protocol (NFS, CIFS, FTP)
File can be written over NFS (nfsd) or CIFS (lwiod) and accessed
over HDFS (isi_hdfs_d)
23© Copyright 2011 EMC Corporation. All rights reserved.
HDFS Reads on Isilon
Jobtracker asks Isilon namenode (isi_hdfs_d) “tell me where
/path/file lives”
isi_hdfs_d responds with list of block addresses (3 datanode IP’s
per block). Note that the blocksize in this case is configurable
on isilon (default 64MB)
Jobtracker assigns task trackers to read each block (first address
out of 3 for each)
Tasks within each task tracker ask namenode (again) for block
locations, then initiate I/O transactions to read the data over the
network
The concept of locality is eliminated accept for rack awareness.
24© Copyright 2011 EMC Corporation. All rights reserved.
Isilon HDFS Settings
25© Copyright 2011 EMC Corporation. All rights reserved.
How EMC Isilon Addresses the Hadoop
Challenge
1
Dedicated Storage Infrastructure
– One-off for Hadoop only
2
Single Point of Failure
– Namenode
3
Lacking Enterprise Data Protection
– No Snapshots, replication, backup
4
Poor Storage Efficiency
– 3X mirroring
5
Fixed Scalability
– Rigid compute to storage ratio
6
Manual Import/Export
– No protocol support
1
Scale-Out Storage Platform
– Multiple applications & workflows
2
No Single Point of Failure
– Distributed Namenode
3
End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
4
Industry-Leading Storage Efficiency
– >80% Storage Utilization
5
Independent Scalability
– Add compute & storage separately
6
Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
27© Copyright 2011 EMC Corporation. All rights reserved.
Distributed (Clustered) Name Node When Using Isilon
MTTDL = 5,000 years
Metadata stored across
systems same way as
standard file metadata
Built-in clustered redundancy
across many nodesName Node
Clustering the
NameNode on
Isilon allows
for the failure
protection
level Isilon
already
provides
ClusteredNameNode
28© Copyright 2011 EMC Corporation. All rights reserved.
Fixed Scaling / Independent Scaling
Hadoop
Isilon
Storage to Compute ratio is fixed
Scaling compute means scaling
capacity
Difficult to provide QoS
Compute upgrade is a forklift
Scale compute independent of
storage
Achieve optimal performance
balance even as workloads evolve
No data migrations, ever!
Add new performance as
hardware evolves
storage
compute
Desired
performance/
capacity
29© Copyright 2011 EMC Corporation. All rights reserved.
Protocol Support
Servers
Servers
Servers
Before
After
HDFS is not visible to
Windows, Unix, Linux,
Apple, or any other file
system natively
Big Data is only used for
Big Data
Inherent Multi-Protocol
Support in Isilon allows
ubiquitous access to all
file systems including
Hadoop
Big Data is actual data!
Servers
30© Copyright 2011 EMC Corporation. All rights reserved.
Data Center Network
Time-to-Results
Data Copy Analysis In-Place Analysis
Existing Primary Storage
Hadoop on a Stick
Have you ever
copied 100TB from
Primary Storage to
a Hadoop system?
How long does it
take ≈ to copy
100TB from one
place to another
over a 10GB link?
>24 Hours
Data Center Network
Existing Primary Storage
Hadoop Processing Nodes
Reading relevant
data to analysis
31© Copyright 2011 EMC Corporation. All rights reserved.
Snapshot/Version Control
Before
After
Traditional HDFS does not
have replication
No Snapshotting of data
Loss of Version control
Not designed for Mission
Critical
Full Snapshot IQTM
integration identifies
changes
Multi-threaded, Multi-Node
Scale-Out replication
Improved RPO/RTO for
business continuity
Geo-replicated Hadoop!
5 5
32© Copyright 2011 EMC Corporation. All rights reserved.
Hadoop Distributions Support on Isilon HDFS
• Available now in 7.0.1.5
• Multiple HDFS:// namespaces
– hdfs://DAS + hdfs://isilon
– Potential for archive/tiering
– Hadoop cluster version mixing
• Distributions:
– Cloudera CDH4.x
– Hortonworks HDP-2
– PivotalHD 1.0 (aka: GPHD 2.0)
– Apache 0.23 / apache 2.0
HDFS v2HDFS v1
33© Copyright 2011 EMC Corporation. All rights reserved.
Performance
34© Copyright 2011 EMC Corporation. All rights reserved.
Test Used HiBench
Developed by Intel and Open Sourced
– Collection of standard Hadoop jobs
– Our tests focused on TeraSort and TestDFSIO
All results normalized as throughput per node to allow comparison of differing
configs
TestDFSIO tests were uncompressed, which shows actual I/O efficiency
– Compressed gives much higher performance, but is not actual I/O
35© Copyright 2011 EMC Corporation. All rights reserved.
GPHD-Isilon is Highly Competitive
36© Copyright 2011 EMC Corporation. All rights reserved.
Terasort Performance is Comparable
Between Configurations
37© Copyright 2011 EMC Corporation. All rights reserved.
I/O Performance Scales As Isilon Nodes
Are Added
38© Copyright 2011 EMC Corporation. All rights reserved.
For Typical Workloads, 1.5 Compute
Nodes Per Isilon x400 Node is Good
(4) Isilon x400
Nodes Tested
39© Copyright 2011 EMC Corporation. All rights reserved.
Return Path
http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf
Challenges
Limited performance and capacity to support intensive Hadoop analytics
NFS and Hadoop environments struggled to handle unique data sets comprised of
hundreds of millions of small email files, and large analytics files, which hindered
analytics and delivery of customer solutions
25 different DAS and NAS storage systems lacked performance and capacity
Storage projected to increase from 150TB to 2PB over the next 5 years
Company background:
• Return Path is the worldwide leader in email intelligence, serving Internet
service providers (ISPs), businesses, and individuals.
• The company’s email intelligence solutions process and analyze massive volumes
of data to maximize email performance, ensure email delivery, and protect users
from spam and other abuse.
• Developed Hadoop based email intelligence solutions combined with NAS based
data access
40© Copyright 2011 EMC Corporation. All rights reserved.
Return Path
Results
Return Path now has a single repository for all its Big Data, accessible to email
analysts, product development teams and external customers.
Isilon delivers real-time data to Return Path’s end-user applications while
providing seamless integration with Hadoop for back-end data analytics
Reduces shared storage data center footprint by 30 percent
Shortens weekly administration time by more than 35 percent
Improves availability and reliability for Hadoop analytics
Savings of $350,000 from lower power, cooling, and maintenance
Isilon Solution and Benefits
Solution
Isilon X400 Scaleout NAS – Approx 200TB capacity
SmartConnect, SmartQuotas, InsightIQ Software suite
NFS and HDFS Data Access Protocols
41© Copyright 2011 EMC Corporation. All rights reserved.
Return Path
“To have all this data being generated by our email intelligence products, but no way
to access it directly by Hadoop, was a major hindrance,”
“Isilon serves NFS data across multiple product suites and makes it easily accessible to
our Hadoop analytics team. That’s a significant business enabler, allowing Return Path to
develop customer solutions much faster.”
“Isilon InsightIQ software has been invaluable, providing visibility into our infrastructure
and managing our space efficiently as we grow.”
DIZ CARTER
VP Infrastructure
Operations
Customer Quotes
42© Copyright 2011 EMC Corporation. All rights reserved.
Questions?
43© Copyright 2011 EMC Corporation. All rights reserved.
Thank You!
7. emc isilon hdfs   enterprise storage for hadoop

Mais conteúdo relacionado

Mais procurados

Software defined datacenter SDDC
Software defined datacenter SDDCSoftware defined datacenter SDDC
Software defined datacenter SDDCpsjitha
 
Dimitri Bellini - Monitoring Large Multi-Site Data Environment
Dimitri Bellini - Monitoring Large Multi-Site Data EnvironmentDimitri Bellini - Monitoring Large Multi-Site Data Environment
Dimitri Bellini - Monitoring Large Multi-Site Data EnvironmentZabbix
 
Object storage의 이해와 활용
Object storage의 이해와 활용Object storage의 이해와 활용
Object storage의 이해와 활용Seoro Kim
 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVKingston Smiler
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
VSAN – Architettura e Design
VSAN – Architettura e DesignVSAN – Architettura e Design
VSAN – Architettura e DesignVMUG IT
 
Presentation v mware virtual san 6.0
Presentation   v mware virtual san 6.0Presentation   v mware virtual san 6.0
Presentation v mware virtual san 6.0solarisyougood
 
Volume Encryption In CloudStack
Volume Encryption In CloudStackVolume Encryption In CloudStack
Volume Encryption In CloudStackShapeBlue
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...xKinAnx
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStackHaim Ateya
 
Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail Jürgen Ambrosi
 
ASA Firepower NGFW Update and Deployment Scenarios
ASA Firepower NGFW Update and Deployment ScenariosASA Firepower NGFW Update and Deployment Scenarios
ASA Firepower NGFW Update and Deployment ScenariosCisco Canada
 
RUCKUS Unleashed & SmartZone
RUCKUS Unleashed & SmartZoneRUCKUS Unleashed & SmartZone
RUCKUS Unleashed & SmartZoneCarla Nadin
 
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...Maichino Sepede
 
Do you have your Cloud Exit Plan Ready? | Sysfore
Do you have your Cloud Exit Plan Ready? | SysforeDo you have your Cloud Exit Plan Ready? | Sysfore
Do you have your Cloud Exit Plan Ready? | SysforeSysfore Technologies
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875Duncan Epping
 
Data center architecture
Data center architecture Data center architecture
Data center architecture RajuPrasad33
 

Mais procurados (20)

Software defined datacenter SDDC
Software defined datacenter SDDCSoftware defined datacenter SDDC
Software defined datacenter SDDC
 
Dimitri Bellini - Monitoring Large Multi-Site Data Environment
Dimitri Bellini - Monitoring Large Multi-Site Data EnvironmentDimitri Bellini - Monitoring Large Multi-Site Data Environment
Dimitri Bellini - Monitoring Large Multi-Site Data Environment
 
Object storage의 이해와 활용
Object storage의 이해와 활용Object storage의 이해와 활용
Object storage의 이해와 활용
 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFV
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
VSAN – Architettura e Design
VSAN – Architettura e DesignVSAN – Architettura e Design
VSAN – Architettura e Design
 
Presentation v mware virtual san 6.0
Presentation   v mware virtual san 6.0Presentation   v mware virtual san 6.0
Presentation v mware virtual san 6.0
 
Volume Encryption In CloudStack
Volume Encryption In CloudStackVolume Encryption In CloudStack
Volume Encryption In CloudStack
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStack
 
Basic VSAM
Basic VSAMBasic VSAM
Basic VSAM
 
Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail
 
ASA Firepower NGFW Update and Deployment Scenarios
ASA Firepower NGFW Update and Deployment ScenariosASA Firepower NGFW Update and Deployment Scenarios
ASA Firepower NGFW Update and Deployment Scenarios
 
RUCKUS Unleashed & SmartZone
RUCKUS Unleashed & SmartZoneRUCKUS Unleashed & SmartZone
RUCKUS Unleashed & SmartZone
 
NetApp & Storage fundamentals
NetApp & Storage fundamentalsNetApp & Storage fundamentals
NetApp & Storage fundamentals
 
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
 
Do you have your Cloud Exit Plan Ready? | Sysfore
Do you have your Cloud Exit Plan Ready? | SysforeDo you have your Cloud Exit Plan Ready? | Sysfore
Do you have your Cloud Exit Plan Ready? | Sysfore
 
CloudStack Architecture
CloudStack ArchitectureCloudStack Architecture
CloudStack Architecture
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875
 
Data center architecture
Data center architecture Data center architecture
Data center architecture
 

Destaque

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusterst_ivanov
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonRSD
 
EMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data ArchivesEMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data Archivessolarisyougood
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonBoni Bruno
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesPhilip Say
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveChris Wahl
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
EMC isilon for -media-and-entertainment-sales-deck
EMC isilon for -media-and-entertainment-sales-deckEMC isilon for -media-and-entertainment-sales-deck
EMC isilon for -media-and-entertainment-sales-decksolarisyougood
 
MT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewMT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewDell EMC World
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
David Goulden keynote at Dell EMC World
David Goulden keynote at Dell EMC WorldDavid Goulden keynote at Dell EMC World
David Goulden keynote at Dell EMC WorldDell EMC World
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...EMC
 

Destaque (20)

EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec Isilon
 
EMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data ArchivesEMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data Archives
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter Kit
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud Services
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
EMC isilon for -media-and-entertainment-sales-deck
EMC isilon for -media-and-entertainment-sales-deckEMC isilon for -media-and-entertainment-sales-deck
EMC isilon for -media-and-entertainment-sales-deck
 
MT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewMT129 Isilon Data Lake Overview
MT129 Isilon Data Lake Overview
 
Cloud Management with vRealize Operations
Cloud Management with vRealize OperationsCloud Management with vRealize Operations
Cloud Management with vRealize Operations
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
David Goulden keynote at Dell EMC World
David Goulden keynote at Dell EMC WorldDavid Goulden keynote at Dell EMC World
David Goulden keynote at Dell EMC World
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 

Semelhante a 7. emc isilon hdfs enterprise storage for hadoop

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessDataWorks Summit
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBoni Bruno
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveClaudioFahey1
 
Transform Your Business with Big Data Storage
Transform Your Business with Big Data StorageTransform Your Business with Big Data Storage
Transform Your Business with Big Data StorageEMC
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop StacksDataWorks Summit
 
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOrgad Kimchi
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategywalshe1
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio, Inc.
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC
 
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aS106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aTony Pearson
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Alluxio, Inc.
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...DataWorks Summit
 
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mwareBenchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mwaresolarisyougood
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 

Semelhante a 7. emc isilon hdfs enterprise storage for hadoop (20)

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-Validation
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
 
Transform Your Business with Big Data Storage
Transform Your Business with Big Data StorageTransform Your Business with Big Data Storage
Transform Your Business with Big Data Storage
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop Stacks
 
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategy
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aS106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902a
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mwareBenchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data Analytics
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 

Mais de Taldor Group

5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013Taldor Group
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Yossi cohen 3 base
Yossi cohen   3 baseYossi cohen   3 base
Yossi cohen 3 baseTaldor Group
 
פיני מנדל תובנות עסקיות מיישומי Hadoop
פיני מנדל   תובנות עסקיות מיישומי Hadoopפיני מנדל   תובנות עסקיות מיישומי Hadoop
פיני מנדל תובנות עסקיות מיישומי HadoopTaldor Group
 
נתן פרידחי הקדמה לכנס Hadoop
נתן פרידחי   הקדמה לכנס Hadoopנתן פרידחי   הקדמה לכנס Hadoop
נתן פרידחי הקדמה לכנס HadoopTaldor Group
 
הערך העסקי שבאיכות הנתונים קוסטין מרזאה
הערך העסקי שבאיכות הנתונים   קוסטין מרזאההערך העסקי שבאיכות הנתונים   קוסטין מרזאה
הערך העסקי שבאיכות הנתונים קוסטין מרזאהTaldor Group
 
Dcl צביקה מנלה - סיפורי לקוחות
Dcl   צביקה מנלה - סיפורי לקוחותDcl   צביקה מנלה - סיפורי לקוחות
Dcl צביקה מנלה - סיפורי לקוחותTaldor Group
 
Taldor data quality einat shimoni - stki
Taldor data quality   einat shimoni - stkiTaldor data quality   einat shimoni - stki
Taldor data quality einat shimoni - stkiTaldor Group
 
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 32013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3Taldor Group
 
Loshin operationalizingdatagovernance
Loshin operationalizingdatagovernanceLoshin operationalizingdatagovernance
Loshin operationalizingdatagovernanceTaldor Group
 

Mais de Taldor Group (12)

5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Yossi cohen 3 base
Yossi cohen   3 baseYossi cohen   3 base
Yossi cohen 3 base
 
פיני מנדל תובנות עסקיות מיישומי Hadoop
פיני מנדל   תובנות עסקיות מיישומי Hadoopפיני מנדל   תובנות עסקיות מיישומי Hadoop
פיני מנדל תובנות עסקיות מיישומי Hadoop
 
נתן פרידחי הקדמה לכנס Hadoop
נתן פרידחי   הקדמה לכנס Hadoopנתן פרידחי   הקדמה לכנס Hadoop
נתן פרידחי הקדמה לכנס Hadoop
 
הערך העסקי שבאיכות הנתונים קוסטין מרזאה
הערך העסקי שבאיכות הנתונים   קוסטין מרזאההערך העסקי שבאיכות הנתונים   קוסטין מרזאה
הערך העסקי שבאיכות הנתונים קוסטין מרזאה
 
Dcl צביקה מנלה - סיפורי לקוחות
Dcl   צביקה מנלה - סיפורי לקוחותDcl   צביקה מנלה - סיפורי לקוחות
Dcl צביקה מנלה - סיפורי לקוחות
 
Taldor data quality einat shimoni - stki
Taldor data quality   einat shimoni - stkiTaldor data quality   einat shimoni - stki
Taldor data quality einat shimoni - stki
 
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 32013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3
2013 04 irm mdmdg - jon asprey 4 most asked dg questions v 1 3
 
Loshin operationalizingdatagovernance
Loshin operationalizingdatagovernanceLoshin operationalizingdatagovernance
Loshin operationalizingdatagovernance
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

7. emc isilon hdfs enterprise storage for hadoop

  • 1. 1© Copyright 2011 EMC Corporation. All rights reserved. EMC Isilon HDFS – Enterprise Storage for Hadoop Featuring EMC Isilon Scale-Out NAS Storage Shai Harmelin EMC System Enginer – Isilon Specialist May 21, 2013
  • 2. 2© Copyright 2011 EMC Corporation. All rights reserved. Today’s Agenda • EMC Isilon Background • HDFS Architectural Challenges • Isilon HDFS Benefits • Performance Comparison • Customer Case Study • Q+A
  • 3. 3© Copyright 2011 EMC Corporation. All rights reserved. EMC Isilon Setting the standard for scale-out NAS • Founded in 2000 as the leader in Scaleout NAS (Gartner 2010) • Broad adoption across many markets – High Performance Computing (HPC): Life Sciences, Oil & Gas, Electronic Design Automation, Media & Entertainment, Financial Services – Enterprise IT: Archive, Home Directories, File Shares, Virtualization, Business Analytics • Acquired by EMC in 2011 for $2.5B • Over 3,500 global customers • Isilon OneFS: Seventh generation, industry-proven, innovative scale-out operating environment • 2012 – EMC Isilon is Industry’s First Scale-Out NAS System with Native HDFS Support
  • 4. 4© Copyright 2011 EMC Corporation. All rights reserved. Isilon Growing Momentum 3,500+ customers
  • 5. 5© Copyright 2011 EMC Corporation. All rights reserved. Why Hadoop is Important to EMC Isilon Customers Pragmatic approach to analytics on a very large scale – Opens up new ways of gaining insights and identifying opportunities for businesses Designed to address the rise of unstructured data – Enterprise data to grow by 650% over next 5 years – More than 80% of this growth will be unstructured data Hadoop is only ONE component of Enterprise Big Data Analytics PIPELINE
  • 6. 6© Copyright 2011 EMC Corporation. All rights reserved. Isilon Scale-Out NAS Architecture OneFS Operating Environment Intra-cluster Communication Layer Servers Client/Application Layer Ethernet Layer Servers Servers SingleFS/Volume CIFSNFS FTPHTTP HDFS for Hadoop
  • 7. 7© Copyright 2011 EMC Corporation. All rights reserved. Isilon Core Innovation OneFS scale-out operating system Single File System Simplicity Leadership Efficiency High Performance Easy Growth Automated Tiering Linear Scalability
  • 8. 8© Copyright 2011 EMC Corporation. All rights reserved. Largest and Most Scalable File System 500X More Scalable than Traditional Storage Systems OneFS™ can scale from 18TB to over 20,000 TB in a single file system • • •
  • 9. 9© Copyright 2011 EMC Corporation. All rights reserved. AutoBalance Automated data balancing across nodes reduces costs, complexity and risks for scaling storage “Using Software to do Work Unfit for Humans” • AutoBalance migrates content to new storage nodes while system is online and in production • Requires NO manual intervention, NO reconfiguration, NO server or client mount point or application changes • Eliminate “Hot Spots” EMPTY EMPTY EMPTY EMPTY EMPTY FULL FULL FULL FULL BALANCED BALANCED BALANCED BALANCED BALANCED
  • 10. 10© Copyright 2011 EMC Corporation. All rights reserved. Back to Navigation
  • 11. 11© Copyright 2011 EMC Corporation. All rights reserved. • Load balancing • Seamless failover • Performance zones • Quota management • Thin provisioning • High speed replication • Disaster recovery • Business continuance • Instant recovery • Data protection Isilon, Scale-Out NAS for Big Data Single File System, Single Volume Simplicity For Active, Persistent, And Archive Data WAN/LAN Primary & Nearline Storage Local/Remote Archive Client/Application Layer Virtualized Servers Virtualized Servers Clients X-series Network NL-series • File immutability • Protection from deletion/change NL-series Backup Accelerator S-series • Automated storage tiering
  • 12. 12© Copyright 2011 EMC Corporation. All rights reserved. Back to Navigation Easiest Storage System to Manage Single-level of Management Manage a 18TB to 10PB single file system from one intuitive console "Isilon has made some very bold claims with respect to its clustered storage products - not least the idea of genuinely revolutionizing the ease and speed with which mass storage - over 500 Terabytes - can be added and managed thereafter. We have conducted rigorous testing and unanimously agree with their assertions. This stuff is almost frighteningly simple to use.” Steve Broadhead, Founder, Broadband-Testing Laboratories
  • 13. 14© Copyright 2011 EMC Corporation. All rights reserved. HDFS Overview
  • 14. 15© Copyright 2011 EMC Corporation. All rights reserved. Secondary NameNode DataNode / Task TrackerJob Tracker NameNode Core Hadoop Components
  • 15. 16© Copyright 2011 EMC Corporation. All rights reserved. Job Tracker Manages all the jobs to the cluster Tracks and reports the status of jobs and tasks Provides job queuing functionality Communicates with NameNode and tries to align TaskTracker to Data Nodes The compute workhorse Serves read/write requests from the clients Executes Map/Reduce tasks Typically performs I/O against local or remote DataNodes Task Tracker Compute Components
  • 16. 17© Copyright 2011 EMC Corporation. All rights reserved. NameNode Manages the file system namespace Stores all the Metadata in the RAM – a limitation on file system size Filenames, owners, group, access info Knows associated blocks Manages block replication across DataNodes Manages edit log and check- pointing of name node metadata Does not provide name node hot failover CDH4 has a solution for this, but is not in full scale production in most environments Secondary NameNode Stores blocks of files on top of native host OS file system (e.g. EXT3, XFS, ZFS) Same block is stored on multiple DataNodes for redundancy Has no “awareness” of data blocks living elsewhere (only the namenode does) DataNode File System Components
  • 17. 18© Copyright 2011 EMC Corporation. All rights reserved. Enterprise Challenges of Hadoop Hadoop DAS Environment 1 Dedicated Storage Infrastructure – One-off for Hadoop only 2 Single Point of Failure – Namenode 3 Lacking Enterprise Data Protection – No Snapshots, replication, backup 4 Poor Storage Efficiency – 3X mirroring 5 Fixed Scalability – Rigid compute to storage ratio 6 Manual Import/Export – No protocol interoperability support Name node
  • 18. 19© Copyright 2011 EMC Corporation. All rights reserved. Enterprise Challenges of Hadoop Hadoop DAS Environment 1 Dedicated Storage Infrastructure – One-off for Hadoop only 2 Single Point of Failure – Namenode 3 Lacking Enterprise Data Protection – No Snapshots, replication, backup 4 Poor Storage Efficiency – 3X mirroring 5 Fixed Scalability – Rigid compute to storage ratio 6 Manual Import/Export – No protocol support 1x 1x 2x 2x 3x 2x 3x 3x 1x Namenode
  • 19. 20© Copyright 2011 EMC Corporation. All rights reserved. Isilon HDFS Support Isilon supports the HDFS interfaces for the NameNode and DataNode to host and metadata and data Underlying file system is OneFS As simple as pointing the Hadoop Nodes to the DNS name of the Isilon cluster!
  • 20. 21© Copyright 2011 EMC Corporation. All rights reserved. HDFS is a protocol! Each Isilon node now “speaks” the HDFS NameNode and DataNode protocol We eliminate need to run these services on the Hadoop compute cluster Every Isilon node acts as both a namenode and datanode (isi_hdfs_d) Data is laid out within OneFS exactly the same as for NFS, SMB, etc. Data is protected just like any other data in the Isilon File System. No Mirroring, only Parity = 80% utilization All Isilon Enterprise Features are applied to Hadoop data: Snapshots, Replication, SmartCache, SmartLock, etc…
  • 21. 22© Copyright 2011 EMC Corporation. All rights reserved. HDFS Writes on Isilon Jobtracker asks Isilon namenode (isi_hdfs_d) “tell me where to place /path/file” OneFS isi_hdfs_d hands JT list of 3 “datanode” addresses for each block (aligned to block size defined on Hadoop cluster) Jobtracker assigns task tracker to communicate to data-node (isi_hdfs_d) to write each data block (an abstraction in our case) When complete, isi_hdfs_d responds by saying the block is replicated (a lie) because Data is striped like any other file, written over any protocol. HDFS files are laid out on Isilon File Systems (IFS) similarly to any other protocol (NFS, CIFS, FTP) File can be written over NFS (nfsd) or CIFS (lwiod) and accessed over HDFS (isi_hdfs_d)
  • 22. 23© Copyright 2011 EMC Corporation. All rights reserved. HDFS Reads on Isilon Jobtracker asks Isilon namenode (isi_hdfs_d) “tell me where /path/file lives” isi_hdfs_d responds with list of block addresses (3 datanode IP’s per block). Note that the blocksize in this case is configurable on isilon (default 64MB) Jobtracker assigns task trackers to read each block (first address out of 3 for each) Tasks within each task tracker ask namenode (again) for block locations, then initiate I/O transactions to read the data over the network The concept of locality is eliminated accept for rack awareness.
  • 23. 24© Copyright 2011 EMC Corporation. All rights reserved. Isilon HDFS Settings
  • 24. 25© Copyright 2011 EMC Corporation. All rights reserved. How EMC Isilon Addresses the Hadoop Challenge 1 Dedicated Storage Infrastructure – One-off for Hadoop only 2 Single Point of Failure – Namenode 3 Lacking Enterprise Data Protection – No Snapshots, replication, backup 4 Poor Storage Efficiency – 3X mirroring 5 Fixed Scalability – Rigid compute to storage ratio 6 Manual Import/Export – No protocol support 1 Scale-Out Storage Platform – Multiple applications & workflows 2 No Single Point of Failure – Distributed Namenode 3 End-to-End Data Protection – SnapshotIQ, SyncIQ, NDMP Backup 4 Industry-Leading Storage Efficiency – >80% Storage Utilization 5 Independent Scalability – Add compute & storage separately 6 Multi-Protocol – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS
  • 25. 27© Copyright 2011 EMC Corporation. All rights reserved. Distributed (Clustered) Name Node When Using Isilon MTTDL = 5,000 years Metadata stored across systems same way as standard file metadata Built-in clustered redundancy across many nodesName Node Clustering the NameNode on Isilon allows for the failure protection level Isilon already provides ClusteredNameNode
  • 26. 28© Copyright 2011 EMC Corporation. All rights reserved. Fixed Scaling / Independent Scaling Hadoop Isilon Storage to Compute ratio is fixed Scaling compute means scaling capacity Difficult to provide QoS Compute upgrade is a forklift Scale compute independent of storage Achieve optimal performance balance even as workloads evolve No data migrations, ever! Add new performance as hardware evolves storage compute Desired performance/ capacity
  • 27. 29© Copyright 2011 EMC Corporation. All rights reserved. Protocol Support Servers Servers Servers Before After HDFS is not visible to Windows, Unix, Linux, Apple, or any other file system natively Big Data is only used for Big Data Inherent Multi-Protocol Support in Isilon allows ubiquitous access to all file systems including Hadoop Big Data is actual data! Servers
  • 28. 30© Copyright 2011 EMC Corporation. All rights reserved. Data Center Network Time-to-Results Data Copy Analysis In-Place Analysis Existing Primary Storage Hadoop on a Stick Have you ever copied 100TB from Primary Storage to a Hadoop system? How long does it take ≈ to copy 100TB from one place to another over a 10GB link? >24 Hours Data Center Network Existing Primary Storage Hadoop Processing Nodes Reading relevant data to analysis
  • 29. 31© Copyright 2011 EMC Corporation. All rights reserved. Snapshot/Version Control Before After Traditional HDFS does not have replication No Snapshotting of data Loss of Version control Not designed for Mission Critical Full Snapshot IQTM integration identifies changes Multi-threaded, Multi-Node Scale-Out replication Improved RPO/RTO for business continuity Geo-replicated Hadoop! 5 5
  • 30. 32© Copyright 2011 EMC Corporation. All rights reserved. Hadoop Distributions Support on Isilon HDFS • Available now in 7.0.1.5 • Multiple HDFS:// namespaces – hdfs://DAS + hdfs://isilon – Potential for archive/tiering – Hadoop cluster version mixing • Distributions: – Cloudera CDH4.x – Hortonworks HDP-2 – PivotalHD 1.0 (aka: GPHD 2.0) – Apache 0.23 / apache 2.0 HDFS v2HDFS v1
  • 31. 33© Copyright 2011 EMC Corporation. All rights reserved. Performance
  • 32. 34© Copyright 2011 EMC Corporation. All rights reserved. Test Used HiBench Developed by Intel and Open Sourced – Collection of standard Hadoop jobs – Our tests focused on TeraSort and TestDFSIO All results normalized as throughput per node to allow comparison of differing configs TestDFSIO tests were uncompressed, which shows actual I/O efficiency – Compressed gives much higher performance, but is not actual I/O
  • 33. 35© Copyright 2011 EMC Corporation. All rights reserved. GPHD-Isilon is Highly Competitive
  • 34. 36© Copyright 2011 EMC Corporation. All rights reserved. Terasort Performance is Comparable Between Configurations
  • 35. 37© Copyright 2011 EMC Corporation. All rights reserved. I/O Performance Scales As Isilon Nodes Are Added
  • 36. 38© Copyright 2011 EMC Corporation. All rights reserved. For Typical Workloads, 1.5 Compute Nodes Per Isilon x400 Node is Good (4) Isilon x400 Nodes Tested
  • 37. 39© Copyright 2011 EMC Corporation. All rights reserved. Return Path http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Challenges Limited performance and capacity to support intensive Hadoop analytics NFS and Hadoop environments struggled to handle unique data sets comprised of hundreds of millions of small email files, and large analytics files, which hindered analytics and delivery of customer solutions 25 different DAS and NAS storage systems lacked performance and capacity Storage projected to increase from 150TB to 2PB over the next 5 years Company background: • Return Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. • The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse. • Developed Hadoop based email intelligence solutions combined with NAS based data access
  • 38. 40© Copyright 2011 EMC Corporation. All rights reserved. Return Path Results Return Path now has a single repository for all its Big Data, accessible to email analysts, product development teams and external customers. Isilon delivers real-time data to Return Path’s end-user applications while providing seamless integration with Hadoop for back-end data analytics Reduces shared storage data center footprint by 30 percent Shortens weekly administration time by more than 35 percent Improves availability and reliability for Hadoop analytics Savings of $350,000 from lower power, cooling, and maintenance Isilon Solution and Benefits Solution Isilon X400 Scaleout NAS – Approx 200TB capacity SmartConnect, SmartQuotas, InsightIQ Software suite NFS and HDFS Data Access Protocols
  • 39. 41© Copyright 2011 EMC Corporation. All rights reserved. Return Path “To have all this data being generated by our email intelligence products, but no way to access it directly by Hadoop, was a major hindrance,” “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path to develop customer solutions much faster.” “Isilon InsightIQ software has been invaluable, providing visibility into our infrastructure and managing our space efficiently as we grow.” DIZ CARTER VP Infrastructure Operations Customer Quotes
  • 40. 42© Copyright 2011 EMC Corporation. All rights reserved. Questions?
  • 41. 43© Copyright 2011 EMC Corporation. All rights reserved. Thank You!

Notas do Editor

  1. <Note to speakers:The EMC Isilon presenter will cover the 1st half of the presentation, through slide 24. The EMC Greenplum presenter will cover the 2nd half of the presentation, slides 25 – 37Both presenters will participate in the Q+A (with backup from other EMC team members attending the event><To kick off the presentation>:Welcome the audience + thank them for joining usIntroduce yourself + the EMC Greenplum presenter
  2. Here’s what we’re going to cover in today’s session:Walk through agenda
  3. Isilon has been a leading innovator in scale-out NAS for more than10 years.Isilon scale-out storage is being used today across a wide range of organizations:Data-intensive, high performance computing (HPC) environments such as Life Sciences, Electronic Design Automation, and Media & Entertainment, to name a few examples.Traditional enterprise IT environments: Isilon’s storage systems are used to support a variety of large-scale use cases including archiving, home directories and file shares; virtualization (Tier 3 and Tier 4); and business analytics (Hadoop).In total, Isilon’s scale-out storage solutions are being used by over 3,000 organizations around the world today and, thanks to the success that customers have enjoyed, the business is growing rapidly…about 100percent per year last year. The key engine of customers’ success is the Isilon OneFS operating system. It is instrumental in providing customers with an innovative, scale-out data environment. Note to Presenter: Here are some additional facts that you may want to point out about Isilon:Isilon was founded more than 10 years ago (as Isilon Systems) and is now recognized as the industry leader in scale-out NAS storage solutions. Isilon joined the EMC team in December 2010 (when EMC acquired Isilon Systems). Since then, Isilon’s scale-out storage solutions business has continued to grow rapidly—being adopted in large enterprises across a wide range of industries.Gartner report can be found here: http://www.gartner.com/id=1960515 (abstract only)
  4. This slide shows just a sampling of customers who are benefiting from Isilon scale-out storage.
  5. One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  6. The Isilon OneFS operating system provides the intelligence behind all Isilon scale-out storage systems. It combines the three layers of traditional storage architectures—file system, volume manager, and data protection—into one unified software layer, creating a single intelligent file system that spans all nodes within an Isilon cluster.Note to Presenter: Click now in Slide Show mode for animation.OneFS provides a number of important advantages: A single file system for great ease of management Unmatched efficiency with over 80 percent storage utilization plus automated storage tiering to gain additional efficienciesHigh-performance NASEasy, “grow as you go” flexibility Linear scalabilitylets you can scale performance and capacity to over 15 PB
  7. Putting It All Together.The Isilon IQ X-Series, powered by the OneFS® operating system, uses Isilon's scale-out storage architecture to speed access to massive amounts of critical data, while dramatically reducing cost and complexity. Isilon delivers a flexible solution to accelerate your high-concurrent and sequential-throughput applications. With SSD technology for file-system metadata, the Isilon X-Series significantly accelerates namespace intensive operations. S-Series nodes provide balanced throughput and performance and the NL nodes form the foundation for nearline, and archive.Isilon’s modular architecture and intelligent software make deployment and management simple. You can have an Isilon cluster online in less than 10 minutes, without time-consuming, expensive integration services. Scale a cluster in performance and capacity in about one minute all within a single pool of storage with a global namespace, eliminating the need to support multiple volumes and file systems. Isilon’s suite of applications then work together to provide the data management and protection capabilities required by corporate IT – from the front end intelligence that eliminates client and data migration to quota management for file shares. SnapshotIQ and SyncIQ work in concert to protect and replicate important data for local and remote archive while SnapLock provides for the immutability of data. And finally, backup accelerator speeds file replication to tape with a scalable, parallel infrastructure that insures backup windows and recovery time objectives are always met.
  8. It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
  9. There are 5 basic roles to every hadoop environment:HDFS is made up of the namenode, secondary namenode, and datanode roles.Mapreduce is comprised of the jobtracker and task tracker.
  10. The job tracker is effectively the queue master of a hadoopmapreduce environment. It schedules jobs, distributes tasks across available task-trackers, and allows administrators to get a glimpse into the overall activity for a hadoop environment.
  11. To go into more detail, the namenode is effectively the metadata server for all HDFS data and data blocks. In large hadoop clusters, this role is run on a dedicated host, typically with a large amount of D-RAM. This is because all metadata for the entire HDFS namespace is stored in local DRAM on this host. As such, traditional hadoop architectures have limitations on the number of objects which can be stored within each HDFS namespace.The namenode is contacted for every block request, both for reads and writes, and is responsible for making sure data blocks are mirrored to multiple datanodes, spanning multiple racks.
  12. One challenge associated with traditional deployments of Hadoop, is that it has largely been done on a dedicated infrastructure and not integrated with or connected to any other applications. In effect, a silo’d environment, often outside the realm of the IT team. This poses a number inefficiencies and risks.<click>A well-recognized issue with traditional Hadoop deployments is the “single-point-of-failure” problem with the HadoopNamenode. In a Hadoop environment, a single namenode manages the hadoopfilesystem. If it goes down, the Hadoop environment will immediately go off-line. If the namenode does not come back online, the data stored within all of HDFS is lost and cannot be reconstructed.<Click to next build slide>
  13. Another issue with traditional Hadoop environments is the lack of enterprise-level data protection. Typical Hadoop deployments do not have rigorous data protection backup and recovery capabilities such as snapshots or data replication for disaster recovery (DR) purposes.<click> Traditional Hadoop deployments on direct-attached storage (DAS) are also extremely inefficient. It’s not unusual for a DAS environment to operate with a 30-35% storage utilization rate (or less). Compounding this inefficiency is the fact that data is often mirrored (the default is 3 times). In addition to storage inefficiency, this type of infrastructure is very management-intensive.<click>Another issue with Hadoop running with direct attached storage is that ‘server’ and ‘storage’ resources must be increased together in lock-step. For example, if more storage resources are required, a new server must be deployed (and vice versa). This rigidity adds additional inefficiencies. Another issue is the manual import/export of data that is required in a traditional hadoop environment. In addition to being time and resource (bandwith) consuming, the hadoop data in typical environments can not be accessed or shared with other enterprise applications due to the lack of industry-standard protocol support.To address these challenges and to enable enterprises to begin realizing the benefits of Hadoop quickly and easily, EMC has recently introduced an exciting new Hadoop solution.<click to advance to next slide>
  14. Isilon is able to “pretend” to be a HDFS cluster: it mimics the NameNode and DataNode protocols to host data.Underlying system is OneFS and does not follow the traditional HDFS scheme.Point HDFS clients (MapReduce, command line, etc.) to the DNS name of the Isilon cluster.
  15. One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  16. One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  17. One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  18. The new EMC solution also eliminates the “single-point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage cluster to become, in effect, namenodes. This greatly improves the resiliency of your hadoop environment.The EMC solution for hadoop also provides reliable, end-to-end data protection for Hadoop data including snapshoting for backup and recovery and data replication (with SyncIQ) for disaster recovery capabilities.Our new hadoop solution also takes advantage of the outstanding efficiency of EMC Isilon storage systems. With our solutions, customers can achieve up to 80% or more storage utilization.EMC Hadoop solutions can also scale easily and independently. This means if you need to add more storage capacity, you don’t need to add another server (and vice versa). With EMC isilon, you also get the added benefit of linear increases in performance as the scale increases.EMC also recently announced that we are the 1st vendor to integrate the HDFS (Hadoop Distributed File System) into our storage solutions. This means that with EMC Isilon storage, you can readily use your Hadoop data with other enterprise applications and workloads while eliminating the need to manually move data around as you would with direct-attached storage.
  19. Math Logic on 28 hours.100 TB = 100,000,000 MB10GB can transfer approx 1GB per second (not including spindle speeds in calculations)So, 100TB/1GB = # of seconds to transfer then divide by 60 seconds / 60 minutes = 28 hours (ish)
  20. It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
  21. Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”
  22. With Isilon, Return Path now has a single repository for all its Big Data, accessible to email analysts, product development teams and external customers. Previously, performing analytics on email data residing in shared storage required making a separate copy of the data set and manually moving it to the Hadoop environment.  Today, Isilon delivers real-time data to Return Path’s end-user applications while providing seamless integration with Hadoop for back-end data analytics, boosting customer satisfaction and business productivity.“To have all this data being generated by our email intelligence products, but no way to access it directly by Hadoop, was a major hindrance,” Carter remarks. “Now, Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a huge business enabler because we're able to develop products much faster.” Pam please add a place holder for time savings from the old process of manually creating multiple copies to now with Isilon
  23. Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”
  24. Thank you