SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services

© 2009 VMware Inc. All rights reserved
Not Just for the Web Giants – The Intelligent Enterprise

2
Real-time analysis allows
instant understanding of
market dynamics.
Retailers can have intimate
understanding of their
customers needs and use
direct targeted marketing.

Market Segment Analysis ! Personalized Customer Targeting`
3
The Emerging Pattern of Big Data Systems: Retail Example

Real-Time
Processing
Data Science

Real-Time
Streams

Machine
Learning

Parallel Data
Processing

Exa-scale Data Store
Cloud Infrastructure

4
Storage: Plan for Peta-scale Data Storage and Processing
Analytics Rapidly Outgrows Traditional Data Size
by 100x

1000

100

10

Online Apps
PB of
Data

Analytics
1

0.1

0.01
2000

5

2003

2006

2009

2012

2015
Unprecedented Scale

We are creating an Exabyte
of data every minute in 2013

Yottabyte by 2030

“Data transparency,
amplified by Social Networks
generates data at a
scale never seen before”
- The Human Face of Big Data
6
A single GE Jet Engine produces
10 Terabytes of data in one hour
– 90 Petabytes per year.
Enabling early detection of
faults, common mode failures,
product engineering feedback.

Post Mortem ! Proactively Maintained Connected Product

7
The Emerging Pattern of Big Data Systems: Manufacturing
Real-Time
Sensor
Analytics

Product
Engineering

Support

Data Science
Real-Time
Processing

Machine
Learning

Parallel Data
Processing

Exa-scale Data Store
Cloud Infrastructure

8
Cloud Platform

© 2009 VMware Inc. All rights reserved
Cloud Platform: Supporting Mixed Big Data Workloads

Machine
Learning
Machine
Learning

Real-Time
Analytics

Real-Time
Analytics

Hadoop

Cloud Infrastructure

Compute
Hadoop

Management

Storage/Availability
Network/Security

10
Cloud Platform: Supporting Multiple Tenants

Web User
Analytics

Historical Customer
Behavior

Financial
Analysis

Cloud Infrastructure

Compute
Management

Storage/Availability
Network/Security

11
What if you can…
Recommendation engine

Production

Test

Ad targeting

Production

One physical platform to support multiple virtual big
data clusters

Test
Experimentation

Production
recommendation engine

Experimentation

12

Experimentation

Test/Dev
Production
Ad Targeting
Values of a Cloud Platform for Big Data

1

Agility / Rapid deployment

2

Isolation for resource control
and security

3

Lower Capex

4

Operational efficiency

Compute
Management

Storage/Availability
Network/Security

13
Hadoop as a Service

Operational Simplicity

High Availability

Elasticity & Multi-tenancy

!  Rapid deployment

!  High availability for
entire Hadoop stack

!  Shrink and expand
cluster on demand

!  One click to setup

!  Independent scaling of
Compute and data

!  One stop command
center
!  Easy to configure/
reconfigure

14

!  Battle-tested
!  Strong multi-tenancy
Self Service Access to Big Data Environments

Developer

Data Scientist

High priority

…

•  3 Hadoop nodes
•  Cloudera, Pivotal
MapR
•  Small VM
•  Local storage
•  No HA
•  …

• 
• 
• 
• 
• 
• 

• 
• 
• 
• 
• 
• 

•  …
•  …

5 Hadoop nodes
Cloudera, Pivotal
Hive, Pig
Medium VM
HA
…

50 Hadoop nodes
Cloudera
Hive, Pig
Large VM
HA
…

Templates for Different Cloud Users

15
Big Data needs a Mix of Workloads

Other

Hadoop
batch analysis

Compute
layer

NoSQL
Big SQL

HBase
real-time queries

Impala,
Pivotal HawQ

Cassandra,
Mongo, etc

Spark,
Shark,
Solr,
Platfora,
Etc,…

File System/Data Store
Platform Virtualization Technology
Host

16

Host

Host

Host

Host

Host

Host
Strong Isolation between Workloads is Key

Hungry
Workload 1

Reckless
Workload 2

Virtualization Platform

17

Nosy
Workload 3
Community activity in Isolation and Resource Management

!  YARN
•  Goal: Support workloads other than M-R on Hadoop
•  Initial need is for MPI/M-R from Yahoo
•  Non-posix File system self selects workload types

!  Mesos
•  Distributed Resource Broker
•  Mixed Workloads with some RM
•  Active project, in use at Twitter
•  Leverages OS Virtualization – e.g. cgroups

!  Virtualization
•  Virtual machine as the primary isolation, resource management and
versioned deployment container

•  Basis for Project Serengeti

18
Use case: Elastic Hadoop with Tiered SLA

•  Production workloads has high priority
•  Experimentation workloads has lower priority
Experimentation
Mapreduce

Compute layer

Dynamic resourcepool

Compute
VM

Compute
VM

Experimentation
Experimentation

Production
Mapreduce
Compute
VM

Compute
VM

Compute
VM

19

Compute
VM

Compute
VM

Production
recommendation engine

Production

vSphere

Data layer

Compute
VM
Cloud Enabled Auto-elastic Hadoop

J
T

N
N

T
T

VHM

T
T

T
T

J
T

T
T

T
T

DATA VM

DATA VM

DATA VM

ESX

ESX

ESX

Local Disks

JT: JobTracker
TT: TaskTracker
NN: NameNode
VHM: Virtual Hadoop Manager

20

T
T

Hadoop HDFS VMs
Hadoop Compute VMs

SAN/NAS

Non-Hadoop VMs
Hadoop Performance with Virtualization
32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM

(lower is better)

[http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013]
21
Network Platform

© 2009 VMware Inc. All rights reserved
A Typical Network Architecture
Aggrega7ng%
Switch%

Aggrega7ng%
Switch%

L2%Switch%

L2%Switch%

L2%Switch%

L2%Switch%

Top%of%Rack%Switch%

Top%of%Rack%Switch%

Top%of%Rack%Switch%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Host%

23

Host%

Host%

Host%

Host%
Traditional Networks: Core Switch is the Choke Point

Core

Aggregation

Network
Topology

Rack

Hosts

Hosts

Non Uniform Bandwidth
10s of Gbits
100s of Gbits
24

Modeled
Bandwidth
Modern Networks: Great for Big Data

Spine

Network
Topology

Leaf

Hosts

Uniform Bandwidth

25

Modeled
Bandwidth
Flat Networks Allow for New Infrastructure Models

Converged
Storage

Separated
Storage

Top%of%Rack%Switch%

Top%of%Rack%Switch%

Host%
Host%

Host%

Compute
Top%of%Rack%Switch%

Host%

Storage
Top%of%Rack%Switch%

Host%

Storage%

Host%

Storage%

Host%

Storage%

Host%

Storage%

Host%
Host%

Host%

Separated
Storage

Host%
Host%
Host%

26

Storage%
Storage Platform

© 2009 VMware Inc. All rights reserved
Use Local Disk where it’s Needed

SAN Storage

Local Storage

$2 - $10/Gigabyte

$1 - $5/Gigabyte

$0.05/Gigabyte

$1M gets:
0.5Petabytes
200,000 IOPS
8Gbyte/sec
28

NAS Filers

$1M gets:
1 Petabyte
200,000 IOPS
10Gbyte/sec

$1M gets:
10 Petabytes
400,000 IOPS
250 Gbytes/sec
Storage Economics: Traditional vs. Scale-out

$5.50

Traditional
SAN/NAS

$5.00
$4.50
$4.00
$3.50
$3.00
Cost per GB

$2.50
$2.00
$1.50

Distributed
Object
Storage

$1.00
$0.50
$0.5

1

2

4

8

16

Petabytes Deployed

Scale-out NAS
Isilon

29

32

64

128

HDFS
MAPR
CEPH
Gluster
Scality
Big Data Storage Architectures
External SAN
With HDFS

Local Disks
With HDFS or
Other

HDFS,
CEPH,
MAPR,
Gluster,
Scality,
…
30

External
Scale-out NAS
Features from New Storage Solutions

Snapshots

Universal File Store

Clones

Geo Replication

Erasure Coding
NFS Access
Posix Support
31

QoS Controls
SSD Capability
Summary

© 2009 VMware Inc. All rights reserved
Customers Winning from Consolidated Big Data Platforms
“Dedicated hardware makes no
sense”
“Software-defined Datacenter
enables rapid deployment
multiple tenants and labs”
“Our mixed workloads include
Hadoop, Database, ETL and
App-servers”

Compute
Management

Storage/Availability
Network/Security

33

“Any performance penalties are
minor”
Cloud Infrastructure is Ready for Big Data – Are you?

Cloud Infrastructure

34
Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services
@richardmcdougll

© 2009 VMware Inc. All rights reserved

Mais conteúdo relacionado

Mais procurados

Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Top 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridTop 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridScaleOut Software
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Jstorm introduction-0.9.6
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6longda feng
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DanceEMC
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsAmr Awadallah
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 

Mais procurados (18)

Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Top 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridTop 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data Grid
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Jstorm introduction-0.9.6
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 

Destaque

Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Richard McDougall
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareRichard McDougall
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011Dan Brinkmann
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution SoupDan Brinkmann
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingDan Brinkmann
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6Vepsun Technologies
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialRichard McDougall
 

Destaque (18)

Making of the Burner Board
Making of the Burner BoardMaking of the Burner Board
Making of the Burner Board
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMware
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution Soup
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance Troubleshooting
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6
 
IdP, SAML, OAuth
IdP, SAML, OAuthIdP, SAML, OAuth
IdP, SAML, OAuth
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
 

Semelhante a Is your cloud ready for Big Data? Strata NY 2013

Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationVlad Ponomarev
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-dataVortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-dataAravindharamanan S
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSShapeBlue
 

Semelhante a Is your cloud ready for Big Data? Strata NY 2013 (20)

Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-dataVortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-data
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
 

Último

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 

Último (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 

Is your cloud ready for Big Data? Strata NY 2013

  • 1. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services © 2009 VMware Inc. All rights reserved
  • 2. Not Just for the Web Giants – The Intelligent Enterprise 2
  • 3. Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis ! Personalized Customer Targeting` 3
  • 4. The Emerging Pattern of Big Data Systems: Retail Example Real-Time Processing Data Science Real-Time Streams Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 4
  • 5. Storage: Plan for Peta-scale Data Storage and Processing Analytics Rapidly Outgrows Traditional Data Size by 100x 1000 100 10 Online Apps PB of Data Analytics 1 0.1 0.01 2000 5 2003 2006 2009 2012 2015
  • 6. Unprecedented Scale We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030 “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data 6
  • 7. A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem ! Proactively Maintained Connected Product 7
  • 8. The Emerging Pattern of Big Data Systems: Manufacturing Real-Time Sensor Analytics Product Engineering Support Data Science Real-Time Processing Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 8
  • 9. Cloud Platform © 2009 VMware Inc. All rights reserved
  • 10. Cloud Platform: Supporting Mixed Big Data Workloads Machine Learning Machine Learning Real-Time Analytics Real-Time Analytics Hadoop Cloud Infrastructure Compute Hadoop Management Storage/Availability Network/Security 10
  • 11. Cloud Platform: Supporting Multiple Tenants Web User Analytics Historical Customer Behavior Financial Analysis Cloud Infrastructure Compute Management Storage/Availability Network/Security 11
  • 12. What if you can… Recommendation engine Production Test Ad targeting Production One physical platform to support multiple virtual big data clusters Test Experimentation Production recommendation engine Experimentation 12 Experimentation Test/Dev Production Ad Targeting
  • 13. Values of a Cloud Platform for Big Data 1 Agility / Rapid deployment 2 Isolation for resource control and security 3 Lower Capex 4 Operational efficiency Compute Management Storage/Availability Network/Security 13
  • 14. Hadoop as a Service Operational Simplicity High Availability Elasticity & Multi-tenancy !  Rapid deployment !  High availability for entire Hadoop stack !  Shrink and expand cluster on demand !  One click to setup !  Independent scaling of Compute and data !  One stop command center !  Easy to configure/ reconfigure 14 !  Battle-tested !  Strong multi-tenancy
  • 15. Self Service Access to Big Data Environments Developer Data Scientist High priority … •  3 Hadoop nodes •  Cloudera, Pivotal MapR •  Small VM •  Local storage •  No HA •  … •  •  •  •  •  •  •  •  •  •  •  •  •  … •  … 5 Hadoop nodes Cloudera, Pivotal Hive, Pig Medium VM HA … 50 Hadoop nodes Cloudera Hive, Pig Large VM HA … Templates for Different Cloud Users 15
  • 16. Big Data needs a Mix of Workloads Other Hadoop batch analysis Compute layer NoSQL Big SQL HBase real-time queries Impala, Pivotal HawQ Cassandra, Mongo, etc Spark, Shark, Solr, Platfora, Etc,… File System/Data Store Platform Virtualization Technology Host 16 Host Host Host Host Host Host
  • 17. Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Virtualization Platform 17 Nosy Workload 3
  • 18. Community activity in Isolation and Resource Management !  YARN •  Goal: Support workloads other than M-R on Hadoop •  Initial need is for MPI/M-R from Yahoo •  Non-posix File system self selects workload types !  Mesos •  Distributed Resource Broker •  Mixed Workloads with some RM •  Active project, in use at Twitter •  Leverages OS Virtualization – e.g. cgroups !  Virtualization •  Virtual machine as the primary isolation, resource management and versioned deployment container •  Basis for Project Serengeti 18
  • 19. Use case: Elastic Hadoop with Tiered SLA •  Production workloads has high priority •  Experimentation workloads has lower priority Experimentation Mapreduce Compute layer Dynamic resourcepool Compute VM Compute VM Experimentation Experimentation Production Mapreduce Compute VM Compute VM Compute VM 19 Compute VM Compute VM Production recommendation engine Production vSphere Data layer Compute VM
  • 20. Cloud Enabled Auto-elastic Hadoop J T N N T T VHM T T T T J T T T T T DATA VM DATA VM DATA VM ESX ESX ESX Local Disks JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager 20 T T Hadoop HDFS VMs Hadoop Compute VMs SAN/NAS Non-Hadoop VMs
  • 21. Hadoop Performance with Virtualization 32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM (lower is better) [http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013] 21
  • 22. Network Platform © 2009 VMware Inc. All rights reserved
  • 23. A Typical Network Architecture Aggrega7ng% Switch% Aggrega7ng% Switch% L2%Switch% L2%Switch% L2%Switch% L2%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% 23 Host% Host% Host% Host%
  • 24. Traditional Networks: Core Switch is the Choke Point Core Aggregation Network Topology Rack Hosts Hosts Non Uniform Bandwidth 10s of Gbits 100s of Gbits 24 Modeled Bandwidth
  • 25. Modern Networks: Great for Big Data Spine Network Topology Leaf Hosts Uniform Bandwidth 25 Modeled Bandwidth
  • 26. Flat Networks Allow for New Infrastructure Models Converged Storage Separated Storage Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Compute Top%of%Rack%Switch% Host% Storage Top%of%Rack%Switch% Host% Storage% Host% Storage% Host% Storage% Host% Storage% Host% Host% Host% Separated Storage Host% Host% Host% 26 Storage%
  • 27. Storage Platform © 2009 VMware Inc. All rights reserved
  • 28. Use Local Disk where it’s Needed SAN Storage Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 8Gbyte/sec 28 NAS Filers $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec
  • 29. Storage Economics: Traditional vs. Scale-out $5.50 Traditional SAN/NAS $5.00 $4.50 $4.00 $3.50 $3.00 Cost per GB $2.50 $2.00 $1.50 Distributed Object Storage $1.00 $0.50 $0.5 1 2 4 8 16 Petabytes Deployed Scale-out NAS Isilon 29 32 64 128 HDFS MAPR CEPH Gluster Scality
  • 30. Big Data Storage Architectures External SAN With HDFS Local Disks With HDFS or Other HDFS, CEPH, MAPR, Gluster, Scality, … 30 External Scale-out NAS
  • 31. Features from New Storage Solutions Snapshots Universal File Store Clones Geo Replication Erasure Coding NFS Access Posix Support 31 QoS Controls SSD Capability
  • 32. Summary © 2009 VMware Inc. All rights reserved
  • 33. Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” Compute Management Storage/Availability Network/Security 33 “Any performance penalties are minor”
  • 34. Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure 34
  • 35. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services @richardmcdougll © 2009 VMware Inc. All rights reserved