DevoxxFR 2024 Reproducible Builds with Apache Maven
Hadoop on a personal supercomputer
1. Hadoop on a Personal
Supercomputer
Paul Dingman – Chief Technologist, Integration Division
pdingman@pervasive.com
PERVASIVE DATA INNOVATION
2. Pervasive and Hadoop
• Pervasive Software develops software products to manage, integrate
and analyze data.
• Innovation Lab projects around big data include:
– Hadoop
• Accelerate MapReduce (DataRush Community Edition)
• High-speed add-ons for HBase, Avro, Hive (TurboRush)
• Augment Sqoop
• Enhance ETL capabilities
– Benchmarks
• Terasort
• TPC-H
• SIEM/LogAnalytics EPS
• Genomics
2
3. Why are many-core systems interesting?
• Many-core processors make it possible to concentrate large amounts
of processing power in a single machine. Coupled with newer
storage technologies these systems can have high speed access to
tremendous amounts of storage.
• We have done a lot of work with multi-core systems at Pervasive
Software. Our Pervasive DataRush ™ Dataflow Engine takes
advantage of all available processor cores to efficiently process large
volumes of data.
– Analytics
– Data mining
– Genomics
• Potential cost and energy savings due to the need for fewer nodes.
• Potential performance gains by eliminating inter-node data exchange.
3
4. Pervasive DataRush™ Speed and Scalability
• World Record Performance set running Smith-Waterman algorithm
• Code written on an 8 core machine scaled to 384 cores with no changes!
4
5. Malstone-B10* Scalability
400
Run-time for 10B rows
Run-time
350 370.0
300
3.2 hours
with 4
250
cores
Time in Minutes
200
192.4
1.5 hours
with 8
150
cores Under 1
hour with
100
16 cores
90.3
50
51.6
31.5
0
2 cores 4 cores 8 cores 16 cores 32 cores
Core Count
* Cyber security benchmark from the Open Cloud Consortium
5
6. How well does Hadoop work on many-core
systems?
• One of the areas we wanted to explore with Hadoop is to determine
how well it works on systems with lots of cores. In other words is it
possible to run Hadoop in an environment where you could exploit the
cores for complex operations, but still have the benefits of the
distributed environment provided by Hadoop and HDFS?
6
7. Master Node (NameNode/JobTracker)
Commodity Box
P1 P1
• 2 Intel Xeon L5310 CPUs 1.6
GHz (8 cores)
Local DRAM (16 GB) • 16 GB DRAM (ECC)
• 8 SATA Hard Disks (4 TB)
• Mellanox ConnectX-2 VPI
Dual Port Adapter Infiniband
500 GB … 500 GB
local
(8 spindles)
7
8. Slave Nodes (DataNode/TaskTracker)
• 4 AMD Opteron 6172 CPUs
P1 P1 P1 P1 (48 cores)
• Supermicro MB
• 1 LSI 8 port HBA (6 GBps)
Local DRAM (256 GB) • 2 SATA SSDs (512 GB)
• 256 GB DRAM (ECC)
• 32 SATA Hard Disks (64 TB)
• Mellanox ConnectX-2 VPI
2TB … 2TB 2TB Dual Port Adapter Infiniband
HDFS local
(24 spindles, JBOD) (8 spindles)
8
9. Hadoop Cluster
Master
P1 P1
• CentOS 5.6 Local DRAM
• Hadoop (Cloudera CDH3u0)
IPoIB IPoIB
P1 P1 P1 P1 P1 P1 P1 P1
Local DRAM IPoIB Local DRAM
2TB
… 2TB 2TB 2TB
… 2TB 2TB
Slave Slave
• 104 cores (8/48/48)
• 128 TB storage (96 TB HDFS)
• 512 GB of memory
• 40 Gb Infiniband interconnects (IPoIB)
9
10. Hadoop Tuning
• We worked from the bottom up.
– Linux (various kernels and kernel settings)
– File systems (EXT2, EXT3, EXT4)
– Drivers (HBA)
– JVMs
• Initial tests were done using a single “fat” node (same config as
worker nodes).
• Made it easier to test different disk configurations.
• For Hadoop tests we primarily used 100 GB Terasort jobs for testing.
This test exercised all phases of the MapReduce process while not
being too large to run frequently.
10
11. Lessons Learned with Single Node Tuning
• We found we could comfortably run 40 maps and 20 reducers given
memory and CPU constraints
• Use large block size for HDFS.
– Execution time for map tasks was around 1 minute using 512 MB block size
• More spindles is better
– 1:1 ratio of map tasks to local HDFS spindles works well
– EXT2 seems to work well with JBOD
• Dedicated spindles for temporary files on each worker node
• Configure JVM settings for larger heap size to avoid spills
– Parallel GC seemed to help as well
• Compression of map outputs is a huge win (LZO)
• HBase scales well in fat nodes with DataRush (> 5M rows/sec bulk
load; >10M rows/sec sequential scan)
11
12. Varying Spindles for HDFS
Terasort Average Execution Time
900
800
700
600
Time (secs)
500
400 Terasort Average Execution Time
300
200
100
0
8 16 24 32 40 48
HDFS Disks (2TB)
12
13. Varying Spindles for Intermediate Outputs
Terasort Average Execution Time
800
700
600
500
Time (secs)
400
Terasort Average Execution Time
300
200
100
0
4 x 2TB 8 x 2TB 16 x 2TB Fusion I/O Drive
Flash RAID 0
(4 x 2TB)
Drives for Intermediate Map Output
13
15. Clustering the Nodes
• We had a total of 64 hard disks for the cluster and had to split them
between the two nodes.
• Installed and configured Open Fabrics OFED to enable IPoIB.
• Reconfigure Hadoop to cluster the nodes.
15
17. Comparisons with Amazon Clusters
• The Amazon clusters were used to get a better idea of what to expect
using more conventionally sized Hadoop nodes (non-EMR).
• We used „Cluster Compute Quadruple Extra Large‟ instances
– 23 GB of memory
– 33.5 EC2 Compute Units (Dual Intel Xeon X5570 quad-core “Nehalem” processors;
8 cores total)
– 1690 GB of instance storage (2 spindles)
– Very high I/O performance (10 GbE)
• Used a similar Hadoop configuration, but dialed back the number of
maps and reducers due to lower core count.
• Used cluster sizes that were roughly core count equivalent for
comparison
17
20. Conclusions
• From what we have seen Hadoop works very well on many-core
systems. In fact, Hadoop runs quite well on even a single node
many-core system.
• Using denser nodes may make failures more expensive for some
system components. When using disk arrays the handling of hard
disk failures should be comparable to smaller nodes.
• The MapReduce framework treats all intermediate outputs as remote
resources. The copy phase of MapReduce doesn‟t benefit from
locality of data.
20