SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
HIGH PERFORMANCE
HARDWARE FOR DATA
ANALYSIS
Michael Pittaro
Michael_Pittaro@dell.com
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
WWW.SLIDESHARE.NET/LHRC_MIKEYP
WWW.GITHUB.COM/LHRC-MIKEYP
@pmikeyp
mikeyp@acm.org
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
3
About This Talk
‱ We can’t cover everything about hardware in a 30 minute session.
‱ We can go deep enough to help you
– Understand tradeoffs and balanced architectures
– Ask the right questions about choices
– Learn from what others are doing
‱ My Approach Today
1. Why look at high performance hardware ?
2. Look at a production cluster design
3. Look at the choices and tradeoffs behind the scene
4
Why consider High Performance Hardware ?
‱ Choice of hardware can have large impacts
– On performance
– On budget
‱ Understanding the hardware helps with the software
– Scalable and parallel systems deal with both
‱ Data is heavy
– Local clusters are persistent
– Large data transfer may not be a viable option.
‱ Cloud hosting may not be an option
– You can’t or won’t delegate critical infrastructure to third parties.
– You need every bit of performance you can get.
5
Servers
Processors
Memory
Lack of Trusted Information
Jargon
Disk Drives
Networking
Choices, Choices - The Hardware Toolbox
5
6
Performance
Reliability
Predictability
Cost
Management
Proven
Solutions
Tested
Configurations
What the Customer Wants
6
7
Reference Architectures Fill The Gap
‱ Tested Server Configurations
‱ Tested Network Configurations
‱ Recommended Software Configuration
– Application and Workload Software
– OS Infrastructure
– Operational Infrastructure
‱ Opinionated Point of View
– Based on real world experience
‱ Recommended starting point
– Customization is possible
7
8
The secret to a good architecture is balance
Price
Performance
Fault Zones
Application Workload
Software
9
Cluster Architecture
‱ The Dell In-Memory Appliance for Cloudera Enterprise
9
10
Dell In-Memory Appliance – Summary Specs
Cluster Starter Mid-Size Small Enterprise Maximum
Data Nodes 4 12 20 44
Total Memory 1536 GB 4608 GB 7680 GB 26896 GB
Total Storage 176TB 528 TB 880 TB 2112 TB
Processing Cores 80 280 400 880
Racks (42U) 1 2 2 4
Data Node Characteristic Configuration
Server Dell R720xd (2 Rack Units)
Processor Two Intel Xeon E5-2670v2 2.5GHz, 25M Cache, 10 Core
Memory 384GB
Memory Speed 1866 Mt/s DRAM
Disks 12 X 4TB SATA, 3.0 Gbps (48 TB)
Networking Dual 10GbE interfaces, with active bonding
Management Network
Two x 1GbE interfaces
11
Server Examples
M1000e Blade Chassis (10U)
4 Socket R920 (4U)
2 Socket R730xd (2U)
12
Server Choices
‱ 4 Socket Servers (e.g. Dell R920)
– Optimized for enterprise applications - Large RDBMS servers, SAP, SAP HANA,
Microsoft Exchange
– Very large memory available (6 TB)
– Often use direct or network attached storage
‱ ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis)
– Pluggable Processor and Storage modules
– Backplane and Chassis has a lot of shared interconnect logic
– Flexibility for enterprise applications - Virtualization is popular
‱ 2 Socket Servers (e.g. Dell R620, R630, R720, R730)
– Many options available
– 1U and 2U chassis footprints
– Developed for Web Hosting and Large Scale-Out Clusters
– Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)
13
‱ Assume 1-1.5 Hadoop tasks per core
– allows headroom for other processes
‱ Hyperthreading
– Enable for Hadoop, Spark
– for others: it depends
‱ Hadoop: aim for 1 core / disk spindle
‱ Impala: can handle more spindles and cores easily
‱ Spark: I/O depends on back end storage
‱ Faster processor is better
– Most Hadoop jobs are I/O bound, not processor bound
– Hadoop compression uses processor cycles
– Less cores with a faster clock is often a good tradeoff
– The Map / Reduce balance depends on actual workload
– It’s hard to optimize more without knowing the actual workload
Selecting Processors
14
Intel Xeon Dual Socket Processor Architecture
Haswell CPU
Up to 18 cores
TDP: Up to 145 W (SVR); 160 W (WS)
Socket Socket-R3
Scalability 2S capability
Memory
4xDDR4 channels
1333, 1600, 1866 (2 DPC), 2133 (1 DPC)
RDIMM, LRDIMM
QPI
2xQPI 1.1 channels
6.4, 8.0, 9.6 GT/s
PCIe
PCIe 3.0 (2.5, 5, 8 GT/s)
PCIe Extensions: Dual Cast, Atomics
40xPCIe*3.0
IntelÂź XeonÂź
processor
E5-2600 v3
IntelÂź XeonÂź
processor
E5-2600 v3
QPI
2 Channels
DDR4
LAN
Up to
4x10GbE
PCIe* 3.0, 40 lanes
IntelÂź C610
series
chipset
WBG
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
15
Intel Processor Generations
Product Xeon E5-2600 E5-2600 V2 E5-2600 V3
Microarchitecture SandyBridge IvyBridge Haswell
Cores / Threads 8 / 16 12/24 18/36
Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB
Max Memory Speed 1600 MT/S
DDR3
1866 MT/s
DDR3
2133 MT/s
DDR4
QPI (GT/s) 2 channels
6.4, 7.2, 8.0
2 channels
6.4, 7.2, 8.0
2 channels
6.4, 8.0, 9.6
Max DIMMS 12 12 12
Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz
Process Tech 32nm 22nm 22nm
Year 2012 2013 2014
16
Selecting Memory
‱ DDR3 versus DDR4, RDIMM versus LRDIMM
– DDR3 is cheaper now, DDR4 is faster (15%)
‱ DIMM Sizes
– 8GB, 16GB, 32GB, 64GB, 128GB
‱ Sweet Spot Varies
– DDR4 around 32GB right now
‱ Balance the memory banks
– 4 memory channels per processor
– 4 x 16GB better than 2 x 32GB
‱ Server Class Memory
– It’s all ECC checked
– Dell Server BIOS options to optimize checking method
17
Selecting Disks
‱ 3.5” Drives
– 3TB, 4TB, 6TB per drive
– Pricing sweet spot is 3TB
– Use enterprise grade drives, not consumer !!
– SATA or SAS. SAS slightly faster.
– 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives
‱ 2.5” Drives
– 800GB and 1.2 TB
– More expensive than 3.5” drives
– more spindles and performance
‱ SATA Solid State Drives
– 6.0 Gb/sec
– 2.5” and 1.8” options
– Expensive for now
– Not as deterministic as spindles
18
‱ Hadoop scales processing and storage together
– The cluster grows by adding more data nodes
– The ratio of processor to storage is the main adjustment
‱ Generally, aim for a 1 spindle / 1 core ratio
– I/O is large blocks (64Mb to 256Mb)
– Primarily sequential read/write, very little random I/O
– 8 tasks will be reading or writing 8 individual spindles
‱ Drive Sizes and Types
– NL SAS or Enterprise SATA 6 Gb/sec
– Drive size is mainly a price decision
‱ Depth per node
– Up to 48 TB/node is common
– 112 Tb / node is possible
– Consider how much data is ‘active’
– Very deep storage impacts recovery performance
Spindle / Core / Storage Depth Optimization
1
19
PowerEdge C8000 Hadoop Scaling - 16 core Xeon
1
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
1
26
51
76
101
126
151
176
201
226
TbStorage
(1) 12 spindle 3Tb versus (3) 6 spindle
3Tb
Cores (1)
Storage (1)
IOPS (1)
Storage (3)
IOPS (3)
20
Network Architecture – Layer 2 Switching
21
Network and Switches
‱ Simple Tree Structure
– Top of Rack (TOR) for each rack / group of nodes
– Racks feed up to a Cluster or Aggregation Switch
– All switching is at Layer 2 (Ethernet)
â€ș No fancy routing or layer 3 (IP) packet inspection
– Most switches are 48 ports in this class
‱ Switch Characteristics
– Line rate switching at 10Gbps
– Deep buffers to handle bursts
– Virtual Link Trunking (VLT)– two switches act as one, with failover
– Uplinks are 40GbE
‱ High Availability and Performance
– Use two 10GbE links to alternate switches
– Bond at the Linux level into a single device
22
Model Data Node
Configuration
Comments RA
R730Xd Dual socket, 12 cores,
24 x 2.5” spindles
Most popular platform for
Hadoop
C8000 Dual socket, 16 cores,
16 x 3.5” spindles
Popular for deep/dense
Hadoop applications
C6100 /
C6105
Dual socket, 8/12 cores,
12 x 3.5” spindles
Two node version. C6100 is
hardware EOL
C2100 Dual Socket, 12 cores,
12 x 3.5” spindles
Popular, hardware EOL but
often repurposed for
Hadoop
R620 Dual Socket, 8 cores,
10 x 2.5” spindles
1U form factor
C6220 Dual-socket, 8 cores,
6 x 2.5” spindles
Core/spindle ratio is not
ideal for Hadoop.
In the Wild – Dell Customer Hadoop Configurations
2
23
‱ GPU’s
– Possible, not seen too often with Hadoop
‱ Ingest / Streaming
– Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm)
‱ Dell PowerEdge VRTX
– Designed as a ‘mini-blade’ for branch offices
– Could make a killer data science workstation
What I haven’t talked about!
24
‱ Dell.com/hadoop
– Hadoop Reference Acchitectures
– Optimizing PowerEdge Configurations for Hadoop
‱ Slideshare
– http://www.slideshare.net/lhrc-mikeyp
Download Links / References
25
High Performance Hardware for Data Analysis
‱ Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more
complicated when you need a full cluster for big data analytics.
‱ This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and
Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice
oriented session, and will focus on performance and cost tradeoffs for many different options.

Mais conteĂșdo relacionado

Mais procurados

Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala CookbookCloudera, Inc.
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsMarco Obinu
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopAllen Wittenauer
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 

Mais procurados (20)

Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
ha_module5
ha_module5ha_module5
ha_module5
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 

Semelhante a High Performance Hardware for Data Analysis

Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Howard Marks
 
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e software
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e softwareWebinar NETGEAR - ReadyNAS, le novitĂ  hardware e software
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e softwareNetgear Italia
 
VĂœhody a benefity nasazenĂ­ Oracle Database Appliance
VĂœhody a benefity nasazenĂ­ Oracle Database ApplianceVĂœhody a benefity nasazenĂ­ Oracle Database Appliance
VĂœhody a benefity nasazenĂ­ Oracle Database ApplianceMarketingArrowECS_CZ
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit
 
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...INCOSE Colorado Front Range Chapter
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...Linaro
 
FAQ
FAQFAQ
FAQmobigen
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performancesolarisyougood
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 

Semelhante a High Performance Hardware for Data Analysis (20)

Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
 
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e software
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e softwareWebinar NETGEAR - ReadyNAS, le novitĂ  hardware e software
Webinar NETGEAR - ReadyNAS, le novitĂ  hardware e software
 
VĂœhody a benefity nasazenĂ­ Oracle Database Appliance
VĂœhody a benefity nasazenĂ­ Oracle Database ApplianceVĂœhody a benefity nasazenĂ­ Oracle Database Appliance
VĂœhody a benefity nasazenĂ­ Oracle Database Appliance
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Session 307 ravi pendekanti engineered systems
Session 307  ravi pendekanti engineered systemsSession 307  ravi pendekanti engineered systems
Session 307 ravi pendekanti engineered systems
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
 
FAQ
FAQFAQ
FAQ
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

High Performance Hardware for Data Analysis

  • 1. HIGH PERFORMANCE HARDWARE FOR DATA ANALYSIS Michael Pittaro Michael_Pittaro@dell.com O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. WWW.SLIDESHARE.NET/LHRC_MIKEYP WWW.GITHUB.COM/LHRC-MIKEYP @pmikeyp mikeyp@acm.org O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 3. 3 About This Talk ‱ We can’t cover everything about hardware in a 30 minute session. ‱ We can go deep enough to help you – Understand tradeoffs and balanced architectures – Ask the right questions about choices – Learn from what others are doing ‱ My Approach Today 1. Why look at high performance hardware ? 2. Look at a production cluster design 3. Look at the choices and tradeoffs behind the scene
  • 4. 4 Why consider High Performance Hardware ? ‱ Choice of hardware can have large impacts – On performance – On budget ‱ Understanding the hardware helps with the software – Scalable and parallel systems deal with both ‱ Data is heavy – Local clusters are persistent – Large data transfer may not be a viable option. ‱ Cloud hosting may not be an option – You can’t or won’t delegate critical infrastructure to third parties. – You need every bit of performance you can get.
  • 5. 5 Servers Processors Memory Lack of Trusted Information Jargon Disk Drives Networking Choices, Choices - The Hardware Toolbox 5
  • 7. 7 Reference Architectures Fill The Gap ‱ Tested Server Configurations ‱ Tested Network Configurations ‱ Recommended Software Configuration – Application and Workload Software – OS Infrastructure – Operational Infrastructure ‱ Opinionated Point of View – Based on real world experience ‱ Recommended starting point – Customization is possible 7
  • 8. 8 The secret to a good architecture is balance Price Performance Fault Zones Application Workload Software
  • 9. 9 Cluster Architecture ‱ The Dell In-Memory Appliance for Cloudera Enterprise 9
  • 10. 10 Dell In-Memory Appliance – Summary Specs Cluster Starter Mid-Size Small Enterprise Maximum Data Nodes 4 12 20 44 Total Memory 1536 GB 4608 GB 7680 GB 26896 GB Total Storage 176TB 528 TB 880 TB 2112 TB Processing Cores 80 280 400 880 Racks (42U) 1 2 2 4 Data Node Characteristic Configuration Server Dell R720xd (2 Rack Units) Processor Two Intel Xeon E5-2670v2 2.5GHz, 25M Cache, 10 Core Memory 384GB Memory Speed 1866 Mt/s DRAM Disks 12 X 4TB SATA, 3.0 Gbps (48 TB) Networking Dual 10GbE interfaces, with active bonding Management Network Two x 1GbE interfaces
  • 11. 11 Server Examples M1000e Blade Chassis (10U) 4 Socket R920 (4U) 2 Socket R730xd (2U)
  • 12. 12 Server Choices ‱ 4 Socket Servers (e.g. Dell R920) – Optimized for enterprise applications - Large RDBMS servers, SAP, SAP HANA, Microsoft Exchange – Very large memory available (6 TB) – Often use direct or network attached storage ‱ ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis) – Pluggable Processor and Storage modules – Backplane and Chassis has a lot of shared interconnect logic – Flexibility for enterprise applications - Virtualization is popular ‱ 2 Socket Servers (e.g. Dell R620, R630, R720, R730) – Many options available – 1U and 2U chassis footprints – Developed for Web Hosting and Large Scale-Out Clusters – Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)
  • 13. 13 ‱ Assume 1-1.5 Hadoop tasks per core – allows headroom for other processes ‱ Hyperthreading – Enable for Hadoop, Spark – for others: it depends ‱ Hadoop: aim for 1 core / disk spindle ‱ Impala: can handle more spindles and cores easily ‱ Spark: I/O depends on back end storage ‱ Faster processor is better – Most Hadoop jobs are I/O bound, not processor bound – Hadoop compression uses processor cycles – Less cores with a faster clock is often a good tradeoff – The Map / Reduce balance depends on actual workload – It’s hard to optimize more without knowing the actual workload Selecting Processors
  • 14. 14 Intel Xeon Dual Socket Processor Architecture Haswell CPU Up to 18 cores TDP: Up to 145 W (SVR); 160 W (WS) Socket Socket-R3 Scalability 2S capability Memory 4xDDR4 channels 1333, 1600, 1866 (2 DPC), 2133 (1 DPC) RDIMM, LRDIMM QPI 2xQPI 1.1 channels 6.4, 8.0, 9.6 GT/s PCIe PCIe 3.0 (2.5, 5, 8 GT/s) PCIe Extensions: Dual Cast, Atomics 40xPCIe*3.0 IntelÂź XeonÂź processor E5-2600 v3 IntelÂź XeonÂź processor E5-2600 v3 QPI 2 Channels DDR4 LAN Up to 4x10GbE PCIe* 3.0, 40 lanes IntelÂź C610 series chipset WBG DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4
  • 15. 15 Intel Processor Generations Product Xeon E5-2600 E5-2600 V2 E5-2600 V3 Microarchitecture SandyBridge IvyBridge Haswell Cores / Threads 8 / 16 12/24 18/36 Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB Max Memory Speed 1600 MT/S DDR3 1866 MT/s DDR3 2133 MT/s DDR4 QPI (GT/s) 2 channels 6.4, 7.2, 8.0 2 channels 6.4, 7.2, 8.0 2 channels 6.4, 8.0, 9.6 Max DIMMS 12 12 12 Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz Process Tech 32nm 22nm 22nm Year 2012 2013 2014
  • 16. 16 Selecting Memory ‱ DDR3 versus DDR4, RDIMM versus LRDIMM – DDR3 is cheaper now, DDR4 is faster (15%) ‱ DIMM Sizes – 8GB, 16GB, 32GB, 64GB, 128GB ‱ Sweet Spot Varies – DDR4 around 32GB right now ‱ Balance the memory banks – 4 memory channels per processor – 4 x 16GB better than 2 x 32GB ‱ Server Class Memory – It’s all ECC checked – Dell Server BIOS options to optimize checking method
  • 17. 17 Selecting Disks ‱ 3.5” Drives – 3TB, 4TB, 6TB per drive – Pricing sweet spot is 3TB – Use enterprise grade drives, not consumer !! – SATA or SAS. SAS slightly faster. – 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives ‱ 2.5” Drives – 800GB and 1.2 TB – More expensive than 3.5” drives – more spindles and performance ‱ SATA Solid State Drives – 6.0 Gb/sec – 2.5” and 1.8” options – Expensive for now – Not as deterministic as spindles
  • 18. 18 ‱ Hadoop scales processing and storage together – The cluster grows by adding more data nodes – The ratio of processor to storage is the main adjustment ‱ Generally, aim for a 1 spindle / 1 core ratio – I/O is large blocks (64Mb to 256Mb) – Primarily sequential read/write, very little random I/O – 8 tasks will be reading or writing 8 individual spindles ‱ Drive Sizes and Types – NL SAS or Enterprise SATA 6 Gb/sec – Drive size is mainly a price decision ‱ Depth per node – Up to 48 TB/node is common – 112 Tb / node is possible – Consider how much data is ‘active’ – Very deep storage impacts recovery performance Spindle / Core / Storage Depth Optimization 1
  • 19. 19 PowerEdge C8000 Hadoop Scaling - 16 core Xeon 1 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 1 26 51 76 101 126 151 176 201 226 TbStorage (1) 12 spindle 3Tb versus (3) 6 spindle 3Tb Cores (1) Storage (1) IOPS (1) Storage (3) IOPS (3)
  • 20. 20 Network Architecture – Layer 2 Switching
  • 21. 21 Network and Switches ‱ Simple Tree Structure – Top of Rack (TOR) for each rack / group of nodes – Racks feed up to a Cluster or Aggregation Switch – All switching is at Layer 2 (Ethernet) â€ș No fancy routing or layer 3 (IP) packet inspection – Most switches are 48 ports in this class ‱ Switch Characteristics – Line rate switching at 10Gbps – Deep buffers to handle bursts – Virtual Link Trunking (VLT)– two switches act as one, with failover – Uplinks are 40GbE ‱ High Availability and Performance – Use two 10GbE links to alternate switches – Bond at the Linux level into a single device
  • 22. 22 Model Data Node Configuration Comments RA R730Xd Dual socket, 12 cores, 24 x 2.5” spindles Most popular platform for Hadoop C8000 Dual socket, 16 cores, 16 x 3.5” spindles Popular for deep/dense Hadoop applications C6100 / C6105 Dual socket, 8/12 cores, 12 x 3.5” spindles Two node version. C6100 is hardware EOL C2100 Dual Socket, 12 cores, 12 x 3.5” spindles Popular, hardware EOL but often repurposed for Hadoop R620 Dual Socket, 8 cores, 10 x 2.5” spindles 1U form factor C6220 Dual-socket, 8 cores, 6 x 2.5” spindles Core/spindle ratio is not ideal for Hadoop. In the Wild – Dell Customer Hadoop Configurations 2
  • 23. 23 ‱ GPU’s – Possible, not seen too often with Hadoop ‱ Ingest / Streaming – Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm) ‱ Dell PowerEdge VRTX – Designed as a ‘mini-blade’ for branch offices – Could make a killer data science workstation What I haven’t talked about!
  • 24. 24 ‱ Dell.com/hadoop – Hadoop Reference Acchitectures – Optimizing PowerEdge Configurations for Hadoop ‱ Slideshare – http://www.slideshare.net/lhrc-mikeyp Download Links / References
  • 25. 25 High Performance Hardware for Data Analysis ‱ Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more complicated when you need a full cluster for big data analytics. ‱ This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice oriented session, and will focus on performance and cost tradeoffs for many different options.