At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.
4. @Twitter#HadoopSummit2013
Scale
4
Scaling limits
JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent
slots
Namenode 250-300 M objects in single namespace
Namenode @~100 GB heap -> full GC pauses
Shipping job jars to 1,000’s of nodes
JobHistory server at a few 100’s K job history/conf files
•
•
•
•
•
•
# Nodes
5. @Twitter#HadoopSummit2013
When / why to split clusters ?
5
In principle preference for single cluster
Common logs, shared free space, reduced admin burden, more rack
diversity
Varying SLA’s
Workload diversity
Storage intensive
Processing (CPU / Disk IO) intensive
Network intensive
Data access
Hot, Warm, Cold
•
•
•
•
•
•
•
•
•
8. @Twitter#HadoopSummit2013
8
Hadoop does not need live HDD swap
Twitter DC : No SLA on data nodes
Rack SLA : Only 1 rack down at any time in a cluster
•
•
•
Service criteria for hardware
9. @Twitter#HadoopSummit2013
9
Baseline Hadoop Server (~ early 2012)
E56xx
DIMM
DIMM
DIMM
E56xx
DIMM
DIMM
DIMM
PCH NIC
GbE
HBA
Expander
Works for the general cluster,
but...
Need more density for storage
Potential IO bottlenecks
•
•
Characteristics:
Standard 2U
server
20 servers / rack
E5645 CPU
Dual 6-core
72GB memory
12 x 2TB HDD
2 x 1 GbE
•
•
•
•
•
•
•
10. @Twitter#HadoopSummit2013
10
Hadoop Server: Possible evolution
Characteristics:
+ CPU performance
? 20 servers / rack
Candidate for
DW
•
NIC
GbE
HBA
Expander
16 x 2T?
16 x 3T?
24 x 3T?
E5-26xx or
E5-24xx
DIMM
DIMM
DIMM
DIMM
E5-26xx or
E5-24xx
DIMM
DIMM
DIMM
DIMM
10GbE ?
Can deploy into the general DW cluster, but...
Too much CPU for storage intensive apps
Server failure domain too large if we scale up
disks
•
•
13. @Twitter#HadoopSummit2013
13
THS variant for Hadoop-Proc and HBase
NIC
SAS
HBA
10GbE
E3-12xx
DIMM
DIMM
PCH
Characteristics:
+ IO Performance
Few fast cores
E3-1230 V2 CPU
32 GB memory
12 x 1 TB HDD
SSD boot
1 x 10 GbE
•
•
•
•
•
•
Processing / throughput focus:
Cost efficient (single socket, 1T
drives)
More disk and network IO per
socket
•
•
14. @Twitter#HadoopSummit2013
14
THS for cold cluster
NIC
SAS
HBA
E3-12xx
DIMM
DIMM
PCH
GbE
Characteristics:
Disk Efficiency
Some compute
E3-1230 V2 CPU
32 GB memory
12 x 3 TB HDD
2 x 1 GbE
•
•
•
•
•
•Combination of previous 2 use cases:
Space & power efficient
Storage dense and some processing
capabilities
•
•
15. @Twitter#HadoopSummit2013
15
Rack-level view
Baseline
Twitter Hadoop Server
Backups Proc Cold
Power ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kW
CPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GB
Spindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TB
Uplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps
1G TOR
1G TOR
1G TOR
1G TOR
1G TOR10G TOR
16. @Twitter#HadoopSummit2013
16
Processing performance comparison
Benchmark Baseline Server THS (-Cold)
TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / node
TeraGen (30TB replication = 3) 1:36 hrs 1:35 hrs
TeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs
2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrs
Application #1 4:37 min 3:09 min
Application set #2 13:3 hrs 10:57 hrs
Performance benchmark set up:
Each clusters 102 nodes of respective type
Efficient server = 3 racks, Baseline 5+ racks
“Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3
•
•
•
19. @Twitter#HadoopSummit2013
Recap
19
At a certain scale it makes sense to split into multiple clusters
For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP
For large enough clusters, depending on use-case, it may be worth to choose
different HW configurations
•
•
•