O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Hadoop 3.0
Revolution or evolution?
uweprintz
/whoami
&
/disclaimer
Some Hadoop history
Hadoop 2
HDFS
Redundant, reliable storage
MapReduce
Data processing
YARN
Cluster resource management
H...
Why Hadoop 3.0?
• Deprecated APIs can only be removed in major
release
• Wire-compatibility will be broken
• Change of def...
What is Hadoop 3.0?
20142010 2011 201320122009 2015
2.2.02.0.0-alpha
branch-1
(branch-0.20)
1.0.0 1.1.0 1.2.1 (Stable)0.20...
Hadoop 3.0 in a nutshell
• HDFS
• Erasure codes
• Low-Level Performance enhancements with Intel ISA-L
• 2+ NameNodes
• Int...
HDFS
HDFS - Current implementation
• 3 replicas by default
• Tolerate maximum of 2 failures
Write request
Lease for file
Split ...
Erasure Coding (EC)
• k data blocks + m parity blocks
• Example: Reed-Solomon (6,3)
d d d d d d
Raw
Data
Splitting
d d d d...
EC - Main characteristics
Replication
(Factor 1)
Replication
(Factor 3)
Reed-Solomon
(6,3)
Reed-Solomon
(10,4)
Maximum fau...
EC - Contiguous blocks
• Approach 1: Retain block size and add parity
File A File B File C
128
MB
128
MB
128
MB
128
MB
128...
EC - Striping
• Approach 2: Splitting blocks into smaller cells (1 MB)
File A File B File C
• Pro: Works for small files
•...
• Start from striping to deal with smaller files
EC - Apache Hadoop’s decision (HDFS-7285)
Contiguous
Striping
Replication...
EC - Shell Command
• Create a EC Zone on an empty directory
• All the files under a zone directory are automatically erasu...
EC - Write Path
• Parallel write
• Client writes to 9 data nodes at the same time
• Calculate parity at client, at write t...
EC - Write Failure Handling
• Data node failure
• Client ignores the failed data node and
continues writing
• Write path i...
EC - Read Path
• Read data from 6 data nodes
in parallel
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataN...
EC - Read Failure Handling
• Read data from 6 arbitrary
data nodes in parallel
• Read parity block to reconstruct missing
...
EC - Network behavior
• Pro’s
• Low latency because of parallel read & write
• Good for small file sizes
• Con’s
• Require...
EC - Coder implementation
• Legacy coder
• From Facebook’s HDFS-RAID project
• [Umbrella] HADOOP-11264
• Pure Java coder
•...
EC - Coder performance I
EC - Coder performance II
EC - Coder performance III
• Hadoop 1
• No built-in High Availability
• Needed to solve yourself via e.g. VMware
2+ Name Nodes (HDFS-6440)
• Hadoop 2...
Intra-DataNode Balancer (HDFS-1312)
• Hadoop already has a Balancer between
DataNodes
• Needs to be called manually by des...
YARN
YARN - Scheduling enhancements
• Application priorities within a queue (YARN-1963)
• For example, in queue Marketing Hive ...
YARN - Built-in support for long-running services
• Simplified and first-class support for services (YARN-4692)
• Abstract...
YARN - Resource Isolation & Docker
• Better Resource Isolation
• Support for disk isolation (YARN-2619)
• Support for netw...
YARN - Service Discovery
• Services can run on any YARN node
• Dynamic IP, can change in case of node failures, etc.
• YAR...
YARN - Use the force!
YARN
MapReduce Tez Spark
YARN
MapReduce Tez Spark
YARN - New UI (YARN-3368)
Application Timeline Service v2 (YARN-2928)
Why?
• Scalability & Performance
• Single global instance of Writer/Reader
• L...
Revolution or evolution?
• Major release, incompatible to Hadoop 2
• Main features are Erasure Coding and better
support for long-running services ...
…but it’s not a revolution!
Twitter:
@uweprintz
uwe.seiler@codecentric.de
Mail:
uwe.printz@codecentric.de
Phone
+49 176 1076531
XING:
https://www.xing...
Slide 1: https://unsplash.com/photos/CIXoFys3gsw
Slide 2: Copyright by Uwe Printz
Slide 7: https://unsplash.com/photos/LHl...
Próximos SlideShares
Carregando em…5
×

Hadoop 3.0 - Revolution or evolution?

692 visualizações

Publicada em

With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.

We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Hadoop 3.0 - Revolution or evolution?

  1. 1. Hadoop 3.0 Revolution or evolution? uweprintz
  2. 2. /whoami & /disclaimer
  3. 3. Some Hadoop history Hadoop 2 HDFS Redundant, reliable storage MapReduce Data processing YARN Cluster resource management Hive SQL Spark In-Memory … Oct. 2013 Let there be YARN Apps! Era of Enterprise Hadoop 2006 Hadoop 1 HDFS Redundant, reliable storage MapReduce Cluster resource mgmt. + data processing Let there be batch! Era of Silicon Valley Hadoop Hadoop 3 ? IoT Machine Learning GPU’s TensorFlow Data Science Streaming Data Cloud FPGA’s Artificial Intelligence Kafka Late 2017 Let there be …? Era of ?
  4. 4. Why Hadoop 3.0? • Deprecated APIs can only be removed in major release • Wire-compatibility will be broken • Change of default ports • Hadoop 2.x Client —||—> Hadoop 3.x Server (and vice versa) • Hadoop command scripts rewrite • Big features that need stabilizing major release
  5. 5. What is Hadoop 3.0? 20142010 2011 201320122009 2015 2.2.02.0.0-alpha branch-1 (branch-0.20) 1.0.0 1.1.0 1.2.1 (Stable)0.20.1 0.20.205 0.21.0 New append 0.23.0 branch-2 HDFS Snapshots NFSv3 support HDFS ACLs HDFS Rolling Upgrades RM Automatic Failover 2.6.0 YARN Rolling Upgrades Transparent Encryption Archival Storage 2.7.0 Hadoop 2 Drop JDK6 Support File Truncate API 2016 branch-0.23 Hadoop 3 Hadoop 2 and 3 were diverged 5+ years ago Hadoop 1 (EOL) Source: Akira Ajisaka (with additions by Uwe Printz) 2017 0.22.0 0.23.11 (Final) Security trunk 2.3.0 2.5.0 2.4.0 NameNode Federation , YARN NameNode HA Heterogeneous storage HDFS In-Memory Caching 2.8.0 3.0.0-alpha1 3.0.0-alpha2 2.1.0-beta HDFS Extended attributes Docker Container in Linux ATS 1.5
  6. 6. Hadoop 3.0 in a nutshell • HDFS • Erasure codes • Low-Level Performance enhancements with Intel ISA-L • 2+ NameNodes • Intra-DataNode Balancer • YARN • Better support for long-running services • Improved isolation & Docker support • Scheduler enhancements • Application Timeline Service v2 • New UI • MapReduce • Task-level native optimization • Derive heap-size automatically • DevOps • Drop JDK7 & Move to JDK8 • Change of default ports • Library & Dependency Upgrade • Client-side classpath Isolation • Shell Script Rewrite & ShellDoc • .hadooprc & .hadoop-env • Metrics2 Sink plugin for Kafka
  7. 7. HDFS
  8. 8. HDFS - Current implementation • 3 replicas by default • Tolerate maximum of 2 failures Write request Lease for file Split into blocks Request for data nodes List of data nodes HDFS Client NameNode DataNode 1 DataNode 2 DataNode 3 Write block + checksum • Simple, scalable & robust • 200% space overhead Write Pipeline Write Pipeline Calculate checksum ACKACK ACK Complete!
  9. 9. Erasure Coding (EC) • k data blocks + m parity blocks • Example: Reed-Solomon (6,3) d d d d d d Raw Data Splitting d d d d d d d d d d d d d d d d d d p p p p p p p p p p p p Encoding Store data and parity • Key Points • XOR Coding —> Saves space, slower recovery • Missing or corrupt data will be restored from available data and parity • Parity can be smaller than data
  10. 10. EC - Main characteristics Replication (Factor 1) Replication (Factor 3) Reed-Solomon (6,3) Reed-Solomon (10,4) Maximum fault tolerance 0 2 3 4 Space Efficiency 100 % 33 % 67 % 71 % Data Locality Yes No (Phase 1) / Yes (Phase 2) Write performance High Low Read performance High Medium Recovery costs Low High Pluggable implementation, first choice Storage Tier Hot Warm Cold Frozen Memory/SSD Disk Dense Disk EC 20 x Day 5 x Week 5 x Month 2 x Year
  11. 11. EC - Contiguous blocks • Approach 1: Retain block size and add parity File A File B File C 128 MB 128 MB 128 MB 128 MB 128 MB 128 MB Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 DN 3 DN 2 DN 12 DN 7 DN 5 DN 1 • Pro: Better for locality • Con: Significant overhead for smaller files, always 3 parity blocks needed • Con: Client potentially needs to process GB’s of data for encoding Parity Parity Parity DN 6 DN 4 DN 8 Encoding
  12. 12. EC - Striping • Approach 2: Splitting blocks into smaller cells (1 MB) File A File B File C • Pro: Works for small files • Pro: Allows parallel write • Con: No data locality -> Increased read latency & More complicated recovery process Block 2 Block 3 Block 4 Block 5 Block 6 DN 7 DN 3 DN 4 DN 1 DN 6 DN 12 Stripe 1 Stripe 2 Stripe n Block 1 Block 4 Round-robin … … … … … … Parity DN 8 DN 9 DN 10 Parity Parity Encoding … … …
  13. 13. • Start from striping to deal with smaller files EC - Apache Hadoop’s decision (HDFS-7285) Contiguous Striping Replication Erasure Coding HDFS Facebook f4 Azure Ceph (before Firefly) Lustre Ceph (with Firefly) QFS Phase 1.1 HDFS-7285 Phase 1.2 HDFS-8031 Phase 3 (Future Work) Phase 2 HDFS-8030 Hadoop 3.0.x implements Phase 1.1 and 1.2
  14. 14. EC - Shell Command • Create a EC Zone on an empty directory • All the files under a zone directory are automatically erasure coded • Rename across zones with different EC schemas are disallowed Usage: hdfs erasurecode [generic options] [-getPolicy <path>] [-help [cmd ...]] [-listPolicies] [-setPolicy [-p <policyName>] <path>] -getPolicy <path> : Get erasure coding policy information about at specified path -listPolicies : Get the list of erasure coding policies supported -setPolicy [-p <policyName>] <path> : Set a specified erasure coding policy to a directory Options : -p <policyName> erasure coding policy name to encode files. If not passed the default policy will be used <path> Path to a directory. Under this directory files will be encoded using specified erasure coding policy
  15. 15. EC - Write Path • Parallel write • Client writes to 9 data nodes at the same time • Calculate parity at client, at write time • Durability • Solomon-Reed(6,3) can tolerate max. 3 failures • Visibility • Read is supported for files being written • Consistency • Client can start reading from any 6 of the 9 data nodes • Appendable • Files can be reopened for appending data HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  16. 16. EC - Write Failure Handling • Data node failure • Client ignores the failed data node and continues writing • Write path is able to tolerate 3 data node failures • Requires at least 6 data nodes • Missing blocks will be constructed later HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  17. 17. EC - Read Path • Read data from 6 data nodes in parallel HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Block
  18. 18. EC - Read Failure Handling • Read data from 6 arbitrary data nodes in parallel • Read parity block to reconstruct missing data block HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data Block Parity Parity reconstructs
  19. 19. EC - Network behavior • Pro’s • Low latency because of parallel read & write • Good for small file sizes • Con’s • Requires high network bandwidth between client & server • Dead data nodes result in high network traffic and reconstruction time
  20. 20. EC - Coder implementation • Legacy coder • From Facebook’s HDFS-RAID project • [Umbrella] HADOOP-11264 • Pure Java coder • Code improvements over HDFS-RAID • HADOOP-11542 • Intel ISA-L coder • Native coder with Intel’s Intelligent Storage Acceleration Library • Accelerates EC-related linear algebra calculations by exploiting advanced hardware instruction sets like SSE, AVX, and AVX2 • HADOOP-11540
  21. 21. EC - Coder performance I
  22. 22. EC - Coder performance II
  23. 23. EC - Coder performance III
  24. 24. • Hadoop 1 • No built-in High Availability • Needed to solve yourself via e.g. VMware 2+ Name Nodes (HDFS-6440) • Hadoop 2 • High Availability out-of-the-box via Active-Passive Pattern • Needed to recover immediately after failure NameNode Active NameNode Standby • Hadoop 3 • 1 Active NameNode with N Standby NameNodes • Trade-off between operation costs vs. hardware costs NameNode Active NameNode Standby NameNode Standby
  25. 25. Intra-DataNode Balancer (HDFS-1312) • Hadoop already has a Balancer between DataNodes • Needs to be called manually by design • Typically used after adding additional worker nodes • The Disk Balancer lets administrators rebalance data across multiple disks of a DataNode • It is useful to correct skewed data distribution often seen after adding or replacing disks • Adds hdfs diskbalancer that will submit a plan but does not wait for the plan to finish executing and the DataNode will do the moves itself
  26. 26. YARN
  27. 27. YARN - Scheduling enhancements • Application priorities within a queue (YARN-1963) • For example, in queue Marketing Hive jobs > MapReduce jobs • Inter-Queue priorities (YARN-4945) • Queue 1 > Queue 2, irrespective of demand & capacity • Previously based only on unconsumed capacity • Affinity / Anti-Affinity (YARN-1042) • More fine-granular restraints on locations, e.g. do not allocate HBase Region servers and Storm workers on the same host • Global Scheduling (YARN-5139) • Currently YARN scheduling is done one-node-at-a-time at arrival of heart beats and can lead to suboptimal decisions • With global scheduling, YARN scheduler looks at more nodes and selects the best nodes based on application requirements which leads to a globally optimal placement and enhanced container scheduling throughput • Gang Scheduling (YARN-624) • Allow allocation of sets of containers, e.g. 1 container with 128GB of RAM and 16 cores OR 100 containers with 2GB of RAM and 1 core • Can be achieved already by holding on to containers but might lead to deadlocks and decreased cluster utilization
  28. 28. YARN - Built-in support for long-running services • Simplified and first-class support for services (YARN-4692) • Abstract common framework to support long running service (similar to Apache Slider) • More simplified API for managing the service lifecycle of YARN Apps • Better support for long running service • Recognition of long running service (YARN-4725) • Auto-restart of containers • Containers for long running service are retried at same node in case of local state • Service/Application upgrade support (YARN-4726) • Hold on to containers during an upgrade of the YARN App • Dynamic container resizing (YARN-1197) • Only ask for minimum resources at start and rather adjust them at runtime • Currently the only way is releasing containers and allocating new containers with the expected size
  29. 29. YARN - Resource Isolation & Docker • Better Resource Isolation • Support for disk isolation (YARN-2619) • Support for network isolation (YARN-2140) • Uses cgroups to give containers their fair share • Docker support in LinuxContainerExecutor (YARN-3611) • The LinuxContainerExecutor already provides functionality around localization, cgroups based resource management and isolation for CPU, network, disk, etc. as well as security mechanisms • Support Docker containers to be run inside of LinuxContainerExecutor • Offers packaging and resource isolation • Complements YARN’s support for long-running services
  30. 30. YARN - Service Discovery • Services can run on any YARN node • Dynamic IP, can change in case of node failures, etc. • YARN Service Discovery via DNS (YARN-4757) • The YARN service registry already provides facilities for applications to register their endpoints and for clients to discover them but they are only available via Java API and REST • Expose service information via a already available discovery mechanism: DNS • Current YARN Service Registry records need to be converted into DNS entries • Discovery of the container IP and service port via standard DNS lookups • Mapping of Applications, e.g. zkapp1.griduser.yarncluster.com -> 172.17.0.2 • Mapping of containers, e.g. container-e3741-1454001598828-0131-01000004.yarncluster.com -> 172.17.0.3
  31. 31. YARN - Use the force! YARN MapReduce Tez Spark YARN MapReduce Tez Spark
  32. 32. YARN - New UI (YARN-3368)
  33. 33. Application Timeline Service v2 (YARN-2928) Why? • Scalability & Performance • Single global instance of Writer/Reader • Local disk based LevelDB storage • Reliability • Failure handling with local disk • Single point-of-failure • Usability • Add configuration and metrics as first-class members • Better support for queries • Flexibility • Data model is more describable Core Concepts • Distributed write path • Logical per app collector • Separate reader instances • Pluggable backend storage • HBase • Enhanced internal data model • Metrics Aggregation • Richer REST API for queries
  34. 34. Revolution or evolution?
  35. 35. • Major release, incompatible to Hadoop 2 • Main features are Erasure Coding and better support for long-running services & Docker • Good fit for IoT and Deep Learning use cases Summary Release time line • 3.0.0-alpha1 - Sep/3/2016 • Alpha2 - Jan/25/2017 • Alpha3 - Q2 2017 (Estimated) • Beta/GA - Q3/Q4 2017 (Estimated)
  36. 36. …but it’s not a revolution!
  37. 37. Twitter: @uweprintz uwe.seiler@codecentric.de Mail: uwe.printz@codecentric.de Phone +49 176 1076531 XING: https://www.xing.com/profile/Uwe_Printz Thank you!
  38. 38. Slide 1: https://unsplash.com/photos/CIXoFys3gsw Slide 2: Copyright by Uwe Printz Slide 7: https://unsplash.com/photos/LHlwgjbSo3k Slide 34: https://unsplash.com/photos/Cvf1IqUel9w Slide 36: https://imgflip.com/i/mkovb Slide 37: Copyright by Uwe Printz All pictures CC0 or shot by the author

×