High Performance Hardware for Data Analysis

HIGH PERFORMANCE
HARDWARE FOR DATA
ANALYSIS
Michael Pittaro
Michael_Pittaro@dell.com
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci

WWW.SLIDESHARE.NET/LHRC_MIKEYP
WWW.GITHUB.COM/LHRC-MIKEYP
@pmikeyp
mikeyp@acm.org
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci

3
About This Talk
• We can’t cover everything about hardware in a 30 minute session.
• We can go deep enough to help you
– Understand tradeoffs and balanced architectures
– Ask the right questions about choices
– Learn from what others are doing
• My Approach Today
1. Why look at high performance hardware ?
2. Look at a production cluster design
3. Look at the choices and tradeoffs behind the scene

4
Why consider High Performance Hardware ?
• Choice of hardware can have large impacts
– On performance
– On budget
• Understanding the hardware helps with the software
– Scalable and parallel systems deal with both
• Data is heavy
– Local clusters are persistent
– Large data transfer may not be a viable option.
• Cloud hosting may not be an option
– You can’t or won’t delegate critical infrastructure to third parties.
– You need every bit of performance you can get.

5
Servers
Processors
Memory
Lack of Trusted Information
Jargon
Disk Drives
Networking
Choices, Choices - The Hardware Toolbox
5

6
Performance
Reliability
Predictability
Cost
Management
Proven
Solutions
Tested
Configurations
What the Customer Wants
6

7
Reference Architectures Fill The Gap
• Tested Server Configurations
• Tested Network Configurations
• Recommended Software Configuration
– Application and Workload Software
– OS Infrastructure
– Operational Infrastructure
• Opinionated Point of View
– Based on real world experience
• Recommended starting point
– Customization is possible
7

8
The secret to a good architecture is balance
Price
Performance
Fault Zones
Application Workload
Software

9
Cluster Architecture
• The Dell In-Memory Appliance for Cloudera Enterprise
9

10
Dell In-Memory Appliance – Summary Specs
Cluster Starter Mid-Size Small Enterprise Maximum
Data Nodes 4 12 20 44
Total Memory 1536 GB 4608 GB 7680 GB 26896 GB
Total Storage 176TB 528 TB 880 TB 2112 TB
Processing Cores 80 280 400 880
Racks (42U) 1 2 2 4
Data Node Characteristic Configuration
Server Dell R720xd (2 Rack Units)
Processor Two Intel Xeon E5-2670v2 2.5GHz, 25M Cache, 10 Core
Memory 384GB
Memory Speed 1866 Mt/s DRAM
Disks 12 X 4TB SATA, 3.0 Gbps (48 TB)
Networking Dual 10GbE interfaces, with active bonding
Management Network
Two x 1GbE interfaces

11
Server Examples
M1000e Blade Chassis (10U)
4 Socket R920 (4U)
2 Socket R730xd (2U)

12
Server Choices
• 4 Socket Servers (e.g. Dell R920)
– Optimized for enterprise applications - Large RDBMS servers, SAP, SAP HANA,
Microsoft Exchange
– Very large memory available (6 TB)
– Often use direct or network attached storage
• ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis)
– Pluggable Processor and Storage modules
– Backplane and Chassis has a lot of shared interconnect logic
– Flexibility for enterprise applications - Virtualization is popular
• 2 Socket Servers (e.g. Dell R620, R630, R720, R730)
– Many options available
– 1U and 2U chassis footprints
– Developed for Web Hosting and Large Scale-Out Clusters
– Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)

13
• Assume 1-1.5 Hadoop tasks per core
– allows headroom for other processes
• Hyperthreading
– Enable for Hadoop, Spark
– for others: it depends
• Hadoop: aim for 1 core / disk spindle
• Impala: can handle more spindles and cores easily
• Spark: I/O depends on back end storage
• Faster processor is better
– Most Hadoop jobs are I/O bound, not processor bound
– Hadoop compression uses processor cycles
– Less cores with a faster clock is often a good tradeoff
– The Map / Reduce balance depends on actual workload
– It’s hard to optimize more without knowing the actual workload
Selecting Processors

14
Intel Xeon Dual Socket Processor Architecture
Haswell CPU
Up to 18 cores
TDP: Up to 145 W (SVR); 160 W (WS)
Socket Socket-R3
Scalability 2S capability
Memory
4xDDR4 channels
1333, 1600, 1866 (2 DPC), 2133 (1 DPC)
RDIMM, LRDIMM
QPI
2xQPI 1.1 channels
6.4, 8.0, 9.6 GT/s
PCIe
PCIe 3.0 (2.5, 5, 8 GT/s)
PCIe Extensions: Dual Cast, Atomics
40xPCIe*3.0
Intel® Xeon®
processor
E5-2600 v3
Intel® Xeon®
processor
E5-2600 v3
QPI
2 Channels
DDR4
LAN
Up to
4x10GbE
PCIe* 3.0, 40 lanes
Intel® C610
series
chipset
WBG
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4

15
Intel Processor Generations
Product Xeon E5-2600 E5-2600 V2 E5-2600 V3
Microarchitecture SandyBridge IvyBridge Haswell
Cores / Threads 8 / 16 12/24 18/36
Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB
Max Memory Speed 1600 MT/S
DDR3
1866 MT/s
DDR3
2133 MT/s
DDR4
QPI (GT/s) 2 channels
6.4, 7.2, 8.0
2 channels
6.4, 7.2, 8.0
2 channels
6.4, 8.0, 9.6
Max DIMMS 12 12 12
Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz
Process Tech 32nm 22nm 22nm
Year 2012 2013 2014

16
Selecting Memory
• DDR3 versus DDR4, RDIMM versus LRDIMM
– DDR3 is cheaper now, DDR4 is faster (15%)
• DIMM Sizes
– 8GB, 16GB, 32GB, 64GB, 128GB
• Sweet Spot Varies
– DDR4 around 32GB right now
• Balance the memory banks
– 4 memory channels per processor
– 4 x 16GB better than 2 x 32GB
• Server Class Memory
– It’s all ECC checked
– Dell Server BIOS options to optimize checking method

17
Selecting Disks
• 3.5” Drives
– 3TB, 4TB, 6TB per drive
– Pricing sweet spot is 3TB
– Use enterprise grade drives, not consumer !!
– SATA or SAS. SAS slightly faster.
– 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives
• 2.5” Drives
– 800GB and 1.2 TB
– More expensive than 3.5” drives
– more spindles and performance
• SATA Solid State Drives
– 6.0 Gb/sec
– 2.5” and 1.8” options
– Expensive for now
– Not as deterministic as spindles

18
• Hadoop scales processing and storage together
– The cluster grows by adding more data nodes
– The ratio of processor to storage is the main adjustment
• Generally, aim for a 1 spindle / 1 core ratio
– I/O is large blocks (64Mb to 256Mb)
– Primarily sequential read/write, very little random I/O
– 8 tasks will be reading or writing 8 individual spindles
• Drive Sizes and Types
– NL SAS or Enterprise SATA 6 Gb/sec
– Drive size is mainly a price decision
• Depth per node
– Up to 48 TB/node is common
– 112 Tb / node is possible
– Consider how much data is ‘active’
– Very deep storage impacts recovery performance
Spindle / Core / Storage Depth Optimization
1

19
PowerEdge C8000 Hadoop Scaling - 16 core Xeon
1
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
1
26
51
76
101
126
151
176
201
226
TbStorage
(1) 12 spindle 3Tb versus (3) 6 spindle
3Tb
Cores (1)
Storage (1)
IOPS (1)
Storage (3)
IOPS (3)

20
Network Architecture – Layer 2 Switching

21
Network and Switches
• Simple Tree Structure
– Top of Rack (TOR) for each rack / group of nodes
– Racks feed up to a Cluster or Aggregation Switch
– All switching is at Layer 2 (Ethernet)
› No fancy routing or layer 3 (IP) packet inspection
– Most switches are 48 ports in this class
• Switch Characteristics
– Line rate switching at 10Gbps
– Deep buffers to handle bursts
– Virtual Link Trunking (VLT)– two switches act as one, with failover
– Uplinks are 40GbE
• High Availability and Performance
– Use two 10GbE links to alternate switches
– Bond at the Linux level into a single device

22
Model Data Node
Configuration
Comments RA
R730Xd Dual socket, 12 cores,
24 x 2.5” spindles
Most popular platform for
Hadoop
C8000 Dual socket, 16 cores,
Popular for deep/dense
Hadoop applications
C6100 /
C6105
Dual socket, 8/12 cores,
Two node version. C6100 is
hardware EOL
C2100 Dual Socket, 12 cores,
Popular, hardware EOL but
often repurposed for
Hadoop
R620 Dual Socket, 8 cores,
1U form factor
C6220 Dual-socket, 8 cores,
6 x 2.5” spindles
Core/spindle ratio is not
ideal for Hadoop.
In the Wild – Dell Customer Hadoop Configurations
2

23
• GPU’s
– Possible, not seen too often with Hadoop
• Ingest / Streaming
– Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm)
• Dell PowerEdge VRTX
– Designed as a ‘mini-blade’ for branch offices
– Could make a killer data science workstation
What I haven’t talked about!

24
• Dell.com/hadoop
– Hadoop Reference Acchitectures
– Optimizing PowerEdge Configurations for Hadoop
• Slideshare
– http://www.slideshare.net/lhrc-mikeyp
Download Links / References

25
High Performance Hardware for Data Analysis
• Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more
complicated when you need a full cluster for big data analytics.
• This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and
Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice
oriented session, and will focus on performance and cost tradeoffs for many different options.

High Performance Hardware for Data Analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a High Performance Hardware for Data Analysis

Semelhante a High Performance Hardware for Data Analysis (20)

Último

Último (20)

High Performance Hardware for Data Analysis