SlideShare uma empresa Scribd logo
1 de 24
HDFS:
   Now and Future
   Todd Lipcon (todd@cloudera.com)
Sanjay Radia (sanjay@hortonworks.com)
Outline
Part 1 – Todd Lipcon (Cloudera)
•       Namenode HA
•       HDFS Performance improvements
•       Taking advantage of next-gen hardware
•       Storage Efficiency (RAID and compression)
Part 2 - Sanjay Radia (Hortonworks)
•       Federation and Generalized storage service
         –   Leverage it for further innovation
•       Snapshots
•       Other
         –   WebHDFS
         –   Wire compatibility


    2                                                       O'Reilly Strata & Hadoop World
HDFS HA in Hadoop 2.0.0
• Initial implementation last year
    – Introduced Standby NameNode and manual hot
      failover (see Hadoop World 2011 presentation)
       • Handled planned maintenance (eg upgrades) but not
         unplanned
    – Required a highly-available NFS filer to store
      NameNode metadata
       • Complicated and expensive to set up

3                                              O'Reilly Strata & Hadoop World
HDFS HA Phase 2
• Automatic failover
    – Uses Apache ZooKeeper to automatically detect
      NameNode failures and trigger a failover
    – Ops may invoke manual failover for planned
      maintenance windows
• Removed dependency on NFS storage
    – HDFS HA is entirely self-contained
    – No special hardware or software required
    – No SPOF anywhere in the system
4                                        O'Reilly Strata & Hadoop World
Automatic Failover
• Each NameNode has a new process called
  ZooKeeperFailoverController (ZKFC)
     – Maintains a session to ZooKeeper
     – Periodically runs a health-check against its local NameNode to verify
       that it is running properly
• Triggers failover if the health check fails or the ZK session expires
• Operators may still issue manual failover commands for planned
  maintenance
• Failover time: 30-40 seconds unplanned; 0-3 seconds planned.
• Handles all types of faults: machine, software, network, etc.


 5                                                      O'Reilly Strata & Hadoop World
Removed NFS/filer dependency
• Shared storage on NFS practical for some
  organizations, but difficult for others
    –   Complex configuration, custom fencing scripts
    –   Filer itself must be highly available
    –   Expensive to buy, expensive to support
    –   Buggy NFS clients in Linux
• Introduced new system for reliable edit log
  storage: QuorumJournalManager
6                                           O'Reilly Strata & Hadoop World
QuorumJournalManager
• Run 3 or 5 JournalNodes, collocated on existing hardware
  investment
• Each edit must be committed to a majority of the nodes (i.e
  a quorum)
     – A minority of nodes may crash or be slow without affecting
       system availability
     – Run N nodes to tolerate (N-1)/2 failures (same as ZooKeeper)
• Built into HDFS
     – Designed for existing Hadoop ops teams to understand
     – Hadoop Metrics support, full Kerberos support, etc.

 7                                                O'Reilly Strata & Hadoop World
HDFS HA Architecture
         (with Automatic Failover and QuorumJournalManager)
                                           ZK          ZK               ZK
                           Heartbeat                                                   Heartbeat

           FailoverController                                                            FailoverController
                 Active                                                                       Standby

                           Cmds             JN         JN          JN



                                 NN              Shared NN state               NN
    Monitor Health                               through Quorum
    of NN. OS, HW
                                Active           of JournalNodes
                                                                             Standby          Monitor Health
                                                                                              of NN. OS, HW




       Block Reports to Active & Standby
       DN fencing: only obey commands
       from active
                                    DN          DN          DN               DN
8                                                                                        O'Reilly Strata & Hadoop World
HA Improvements Summary
• Automatic failover
    – Avoid both planned an unplanned downtime
• Non-NFS Shared Storage
    – No need to buy or configure a filer
• Result: HA with no external dependencies
• Available now in HDFS trunk and CDH4.1
• Come to our 5pm talk in this room for more
  details on these HA improvements!
9                                           O'Reilly Strata & Hadoop World
HDFS Performance Update: 2.x vs 1.x
• Significant speedups from SSE4.2 hardware checksum
  calculation (2.5-3x less CPU on read path)
• Rewritten read path for fewer memory copies
• Short-circuit past datanodes for 2-3x faster random
  read (HBase workloads)
• I/O scheduling improvements: push down hints to
  Linux using posix_fadvise()
• Covered in my presentation from Hadoop World 2011

10                                    O'Reilly Strata & Hadoop World
HDFS Performance: Recent Work
• Completed
     – Zero-copy read for libhdfs (2-3x improvement for C++
       clients like Impala reading cached data)
     – Expose mapping of blocks to disks: 2x improvement by
       avoiding contention on slower drives (HDFS-3672)
• In progress
     – Using native checksum computation on write path
     – Avoiding copies and allocation on write path
11                                        O'Reilly Strata & Hadoop World
HDFS Performance Benchmarks
             1000
                               (as of June 2012)
              800
Throughput
 (MB/sec)



              600
                                                                                  Read
              400
                                                                                  Write
              200

                0
                    Raw ext4          HDFS         HDFS with disk awareness

Dual quad-core, 12x2T 7200RPM drives, measured max disk throughput at
900MB/sec.
Write throughput is CPU bound; improvements in progress bring it to max disk
throughput as well
Easily saturates SATA3 bus bandwidth on common hardware
12                                                      O'Reilly Strata & Hadoop World
Hardware Trends
• Denser storage
     – 36T per node already common
     – Millions of blocks per DN
         • New need to invest in scaling DataNode memory usage
• More RAM
     – 64GB common today. 256GB soon inexpensive
     – Customers want to explicitly pin recently ingested data in RAM
       (especially with efficient query engines like Impala)
• Solid state storage (SSD, FusionIO, etc)
     – HDFS should transparently or explicitly migrate hot random-
       access data to/from flash
13
     – Hierarchical storage management             O'Reilly Strata & Hadoop World
HDFS Storage Efficiency
• Many customers are expanding their clusters simply to add storage
     – How can we better utilize the disks they already have?
• RAID (Reed-Solomon coding)
     – Store blocks at low replication, keep parity blocks to allow
       reconstruction if they are lost
     – Effective replication: 1.5x with same durability, less locality
• Transparent compression
     – Automatically detect infrequently used files, transparently re-
       compress with Snappy, GZip, bz2, or LZMA
     – Cloudera workload traces indicate 10% of files accessed 90% of the
       time!


14                                                         O'Reilly Strata & Hadoop World
Outline
Part 1 – Todd Lipcon (Cloudera)
•    Namenode HA
•    HDFS Performance improvements
•    Taking advantage of next-gen hardware
•    Storage Efficiency (RAID and compression)
Part 2 - Sanjay Radia (Hortonworks)
•    Federation and Generalized storage service
      –   Leverage it for further innovation
•    Snapshots
•    Other
      –   WebHDFS
      –   Wire compatibilityHA in Hadoop 1!




15                                                       O'Reilly Strata & Hadoop World
Federation: Generalized Block Storage
                                   NN-1                    NN-k                   NN-n




                   Namespace
                                                                                         Foreign
                                          NS1                     NS k                    NS n
                                                     ..                      ..
                                                     .                       .
                                            Pool 1            Pool k               Pool n
                   Block Storage

                                                            Block Pools




                                      DN 1                    DN 2                   DN m
                                           ..                      ..                     ..
                                                          Common Storage
•    Block Storage as generic storage service
      –   Set of blocks for a Namespace Volume is called a Block Pool
      –   DNs store blocks for all the Namespace Volumes – no partitioning
•    Multiple independent Namenodes and Namespace Volumes in a cluster
      –   Namespace Volume = Namespace + Block Pool
16                                                                                          O'Reilly Strata & Hadoop World
HDFS’ Generic Storage Service
                Opportunities for Innovation
• Federation - Distributed (Partitioned) Namespace
     – Simple and Robust due to independent masters
                                                                           Alternate NN
     – Scalability, Isolation, Availability                               Implementation
                                                                                               HBase

• New Services – Independent Block Pools
                                                             HDFS
                                                           Namespace                         MR tmp

     – New FS - Partial namespace in memory
     – MR Tmp storage directly on block storage
     – Shadow file system – caches HDFS, NFS, S3
                                                                       Storage Service
• Future: move Block Management in DataNodes
     – Simplifies namespace/application implementation
     – Distributed namenode becomes significantly simple

17                                                                  O'Reilly Strata & Hadoop World
Managing Namespaces
•    Federation has multiple namespaces
                                                                                                       /       Client-side
•    Don’t you need a single global namespace?
                                                                                                               mount-table
      –   Some tenants want private namespace
      –   Do you create a single DB or Single Table?
      –   Many volumes, share what you want                                              data project home           tmp
      –   Global? Key is to share the data and the names used to access the data

•    Client-side mount table can implement global or private namespaces
      –   Shared mount-table => “global” shared view                                                                      NS4
      –   Personalized mount-table => per-application view
            •   Share the data that matter by mounting it

•    Client-side implementation of mount tables
                                                                                   NS1         NS2             NS3
      –   xInclude from shared place – global view
      –   No single point of failure
      –   No hotspot for root and top level directories

18                                                                                       O'Reilly Strata & Hadoop World
Next Steps… first class support for volumes
                               •   NameServer - Container for namespaces
                                    –   Lots of small namespace volumes
                                          •   Chosen per user, tenant, data feed

                                          •   Management policies (quota, …)

                               •   Mount tables for unified namespace
                    …
                                    –   Centrally managed – (xInclude, ZK, ..)
    NameServers as
Containers of Namespaces       •   Keep only WorkingSet of namespace in memory
                                    –   Break away from old NN’s full namespace in memory

     Datanode   …   Datanode        –   Faster startup, Billions of names, Hundreds of volumes

                               •   Number of NameServers =
         Storage Layer              –   Sum of (Namespace working set)

                                    –   Sum of (Namespace throughput)

19                                  –   Move namespace for balancing
                                                                         O'Reilly Strata & Hadoop World
Snapshots
• Take snapshot of any directory
     – Multiple snapshots allowed
• Snapshot metadata info stored in Namemode
     – Datanodes have no knowledge
     – Blocks are shared
• All regular commands/apis can be used against
  snapshots
     – Cp /foo/bar/.snapshot/x/y /a/b/z
• New CLI’s to create and delete snapshots
20                                        O'Reilly Strata & Hadoop World
Snapshots - Status
• HDFS-2802 (feature branch)
     – Initial design and prototype – March 2012
     – Development active
        • Updated design document and test plan posted
           – Review meeting – 1st week November
        • 15 + patches
     – Expected completion – early December!

21                                                O'Reilly Strata & Hadoop World
Enterprise Use Cases
•    Storage fault-tolerance – built into HDFS Architecture 
      –   Over 7’9s of data reliability
•    High Availability 
•    Standard Interfaces 
      –   WebHdfs(REST) , Fuse  and NFS access
            •   HTTPFS – (WebHDFS as farm of proxy servers)
            •   libWebhdfs – pure c-library for HDFS

•    Wire protocol compatibility 
      –   Protocol buffers
•    Rolling upgrades
      –   Rolling upgrades for dot-releases 
•    Snapshots - Under active development
•    Disaster Recovery
      –   Distcp does parallel and incremental copies across cluster 
            •   Future - Enhance using journal interface & Snapshots
22                                                                       O'Reilly Strata & Hadoop World
Summary
• HA for Namenode
     – Hot failover, shared storage not required (QJM)
• Performance improvements
• Utilize today’s and tomorrow’s hardware to full potential
• Federation and Generalized storage layer
     – Opportunities for innovation
         • Partial namespace in memory, shadow/caching file system, MR tmp, etc.
• Wire compatibility, WebHdfs, …
• Snapshots - Development well in progress
23                                                          O'Reilly Strata & Hadoop World
Questions?

24                O'Reilly Strata & Hadoop World

Mais conteúdo relacionado

Mais procurados

Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Joshua Drake
 
My experience with embedding PostgreSQL
 My experience with embedding PostgreSQL My experience with embedding PostgreSQL
My experience with embedding PostgreSQLJignesh Shah
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014EDB
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSAFrederik Engelen
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 

Mais procurados (19)

Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)
 
My experience with embedding PostgreSQL
 My experience with embedding PostgreSQL My experience with embedding PostgreSQL
My experience with embedding PostgreSQL
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014
 
HBase operations
HBase operationsHBase operations
HBase operations
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
 
D02 Evolution of the HADR tool
D02 Evolution of the HADR toolD02 Evolution of the HADR tool
D02 Evolution of the HADR tool
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 

Destaque

HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
 
YCSB JSONB 対応版 を作ってMongoDB と 比較してみた
YCSB JSONB 対応版 を作ってMongoDB と 比較してみたYCSB JSONB 対応版 を作ってMongoDB と 比較してみた
YCSB JSONB 対応版 を作ってMongoDB と 比較してみたToshi Harada
 
Analysis postgre sql-vs_mongodb_report
Analysis   postgre sql-vs_mongodb_reportAnalysis   postgre sql-vs_mongodb_report
Analysis postgre sql-vs_mongodb_reportAbhishek Rakshe
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Datainside-BigData.com
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesBinu George
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An OverviewMohit Jain
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 

Destaque (20)

HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
YCSB JSONB 対応版 を作ってMongoDB と 比較してみた
YCSB JSONB 対応版 を作ってMongoDB と 比較してみたYCSB JSONB 対応版 を作ってMongoDB と 比較してみた
YCSB JSONB 対応版 を作ってMongoDB と 比較してみた
 
Analysis postgre sql-vs_mongodb_report
Analysis   postgre sql-vs_mongodb_reportAnalysis   postgre sql-vs_mongodb_report
Analysis postgre sql-vs_mongodb_report
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Data
 
ClickHouse
ClickHouseClickHouse
ClickHouse
 
Fluentd and WebHDFS
Fluentd and WebHDFSFluentd and WebHDFS
Fluentd and WebHDFS
 
Introducción a Hadoop
Introducción a HadoopIntroducción a Hadoop
Introducción a Hadoop
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 

Semelhante a Strata + Hadoop World 2012: HDFS: Now and Future

HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and FutureDataWorks Summit
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
HDFS NameNode HA in CDH4
HDFS NameNode HA in CDH4HDFS NameNode HA in CDH4
HDFS NameNode HA in CDH4Lee neal
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Hadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityHadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityCloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkooFabrice dos Santos
 

Semelhante a Strata + Hadoop World 2012: HDFS: Now and Future (20)

HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and Future
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
HDFS NameNode HA in CDH4
HDFS NameNode HA in CDH4HDFS NameNode HA in CDH4
HDFS NameNode HA in CDH4
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Hadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityHadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High Availability
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Strata + Hadoop World 2012: HDFS: Now and Future

  • 1. HDFS: Now and Future Todd Lipcon (todd@cloudera.com) Sanjay Radia (sanjay@hortonworks.com)
  • 2. Outline Part 1 – Todd Lipcon (Cloudera) • Namenode HA • HDFS Performance improvements • Taking advantage of next-gen hardware • Storage Efficiency (RAID and compression) Part 2 - Sanjay Radia (Hortonworks) • Federation and Generalized storage service – Leverage it for further innovation • Snapshots • Other – WebHDFS – Wire compatibility 2 O'Reilly Strata & Hadoop World
  • 3. HDFS HA in Hadoop 2.0.0 • Initial implementation last year – Introduced Standby NameNode and manual hot failover (see Hadoop World 2011 presentation) • Handled planned maintenance (eg upgrades) but not unplanned – Required a highly-available NFS filer to store NameNode metadata • Complicated and expensive to set up 3 O'Reilly Strata & Hadoop World
  • 4. HDFS HA Phase 2 • Automatic failover – Uses Apache ZooKeeper to automatically detect NameNode failures and trigger a failover – Ops may invoke manual failover for planned maintenance windows • Removed dependency on NFS storage – HDFS HA is entirely self-contained – No special hardware or software required – No SPOF anywhere in the system 4 O'Reilly Strata & Hadoop World
  • 5. Automatic Failover • Each NameNode has a new process called ZooKeeperFailoverController (ZKFC) – Maintains a session to ZooKeeper – Periodically runs a health-check against its local NameNode to verify that it is running properly • Triggers failover if the health check fails or the ZK session expires • Operators may still issue manual failover commands for planned maintenance • Failover time: 30-40 seconds unplanned; 0-3 seconds planned. • Handles all types of faults: machine, software, network, etc. 5 O'Reilly Strata & Hadoop World
  • 6. Removed NFS/filer dependency • Shared storage on NFS practical for some organizations, but difficult for others – Complex configuration, custom fencing scripts – Filer itself must be highly available – Expensive to buy, expensive to support – Buggy NFS clients in Linux • Introduced new system for reliable edit log storage: QuorumJournalManager 6 O'Reilly Strata & Hadoop World
  • 7. QuorumJournalManager • Run 3 or 5 JournalNodes, collocated on existing hardware investment • Each edit must be committed to a majority of the nodes (i.e a quorum) – A minority of nodes may crash or be slow without affecting system availability – Run N nodes to tolerate (N-1)/2 failures (same as ZooKeeper) • Built into HDFS – Designed for existing Hadoop ops teams to understand – Hadoop Metrics support, full Kerberos support, etc. 7 O'Reilly Strata & Hadoop World
  • 8. HDFS HA Architecture (with Automatic Failover and QuorumJournalManager) ZK ZK ZK Heartbeat Heartbeat FailoverController FailoverController Active Standby Cmds JN JN JN NN Shared NN state NN Monitor Health through Quorum of NN. OS, HW Active of JournalNodes Standby Monitor Health of NN. OS, HW Block Reports to Active & Standby DN fencing: only obey commands from active DN DN DN DN 8 O'Reilly Strata & Hadoop World
  • 9. HA Improvements Summary • Automatic failover – Avoid both planned an unplanned downtime • Non-NFS Shared Storage – No need to buy or configure a filer • Result: HA with no external dependencies • Available now in HDFS trunk and CDH4.1 • Come to our 5pm talk in this room for more details on these HA improvements! 9 O'Reilly Strata & Hadoop World
  • 10. HDFS Performance Update: 2.x vs 1.x • Significant speedups from SSE4.2 hardware checksum calculation (2.5-3x less CPU on read path) • Rewritten read path for fewer memory copies • Short-circuit past datanodes for 2-3x faster random read (HBase workloads) • I/O scheduling improvements: push down hints to Linux using posix_fadvise() • Covered in my presentation from Hadoop World 2011 10 O'Reilly Strata & Hadoop World
  • 11. HDFS Performance: Recent Work • Completed – Zero-copy read for libhdfs (2-3x improvement for C++ clients like Impala reading cached data) – Expose mapping of blocks to disks: 2x improvement by avoiding contention on slower drives (HDFS-3672) • In progress – Using native checksum computation on write path – Avoiding copies and allocation on write path 11 O'Reilly Strata & Hadoop World
  • 12. HDFS Performance Benchmarks 1000 (as of June 2012) 800 Throughput (MB/sec) 600 Read 400 Write 200 0 Raw ext4 HDFS HDFS with disk awareness Dual quad-core, 12x2T 7200RPM drives, measured max disk throughput at 900MB/sec. Write throughput is CPU bound; improvements in progress bring it to max disk throughput as well Easily saturates SATA3 bus bandwidth on common hardware 12 O'Reilly Strata & Hadoop World
  • 13. Hardware Trends • Denser storage – 36T per node already common – Millions of blocks per DN • New need to invest in scaling DataNode memory usage • More RAM – 64GB common today. 256GB soon inexpensive – Customers want to explicitly pin recently ingested data in RAM (especially with efficient query engines like Impala) • Solid state storage (SSD, FusionIO, etc) – HDFS should transparently or explicitly migrate hot random- access data to/from flash 13 – Hierarchical storage management O'Reilly Strata & Hadoop World
  • 14. HDFS Storage Efficiency • Many customers are expanding their clusters simply to add storage – How can we better utilize the disks they already have? • RAID (Reed-Solomon coding) – Store blocks at low replication, keep parity blocks to allow reconstruction if they are lost – Effective replication: 1.5x with same durability, less locality • Transparent compression – Automatically detect infrequently used files, transparently re- compress with Snappy, GZip, bz2, or LZMA – Cloudera workload traces indicate 10% of files accessed 90% of the time! 14 O'Reilly Strata & Hadoop World
  • 15. Outline Part 1 – Todd Lipcon (Cloudera) • Namenode HA • HDFS Performance improvements • Taking advantage of next-gen hardware • Storage Efficiency (RAID and compression) Part 2 - Sanjay Radia (Hortonworks) • Federation and Generalized storage service – Leverage it for further innovation • Snapshots • Other – WebHDFS – Wire compatibilityHA in Hadoop 1! 15 O'Reilly Strata & Hadoop World
  • 16. Federation: Generalized Block Storage NN-1 NN-k NN-n Namespace Foreign NS1 NS k NS n .. .. . . Pool 1 Pool k Pool n Block Storage Block Pools DN 1 DN 2 DN m .. .. .. Common Storage • Block Storage as generic storage service – Set of blocks for a Namespace Volume is called a Block Pool – DNs store blocks for all the Namespace Volumes – no partitioning • Multiple independent Namenodes and Namespace Volumes in a cluster – Namespace Volume = Namespace + Block Pool 16 O'Reilly Strata & Hadoop World
  • 17. HDFS’ Generic Storage Service Opportunities for Innovation • Federation - Distributed (Partitioned) Namespace – Simple and Robust due to independent masters Alternate NN – Scalability, Isolation, Availability Implementation HBase • New Services – Independent Block Pools HDFS Namespace MR tmp – New FS - Partial namespace in memory – MR Tmp storage directly on block storage – Shadow file system – caches HDFS, NFS, S3 Storage Service • Future: move Block Management in DataNodes – Simplifies namespace/application implementation – Distributed namenode becomes significantly simple 17 O'Reilly Strata & Hadoop World
  • 18. Managing Namespaces • Federation has multiple namespaces / Client-side • Don’t you need a single global namespace? mount-table – Some tenants want private namespace – Do you create a single DB or Single Table? – Many volumes, share what you want data project home tmp – Global? Key is to share the data and the names used to access the data • Client-side mount table can implement global or private namespaces – Shared mount-table => “global” shared view NS4 – Personalized mount-table => per-application view • Share the data that matter by mounting it • Client-side implementation of mount tables NS1 NS2 NS3 – xInclude from shared place – global view – No single point of failure – No hotspot for root and top level directories 18 O'Reilly Strata & Hadoop World
  • 19. Next Steps… first class support for volumes • NameServer - Container for namespaces – Lots of small namespace volumes • Chosen per user, tenant, data feed • Management policies (quota, …) • Mount tables for unified namespace … – Centrally managed – (xInclude, ZK, ..) NameServers as Containers of Namespaces • Keep only WorkingSet of namespace in memory – Break away from old NN’s full namespace in memory Datanode … Datanode – Faster startup, Billions of names, Hundreds of volumes • Number of NameServers = Storage Layer – Sum of (Namespace working set) – Sum of (Namespace throughput) 19 – Move namespace for balancing O'Reilly Strata & Hadoop World
  • 20. Snapshots • Take snapshot of any directory – Multiple snapshots allowed • Snapshot metadata info stored in Namemode – Datanodes have no knowledge – Blocks are shared • All regular commands/apis can be used against snapshots – Cp /foo/bar/.snapshot/x/y /a/b/z • New CLI’s to create and delete snapshots 20 O'Reilly Strata & Hadoop World
  • 21. Snapshots - Status • HDFS-2802 (feature branch) – Initial design and prototype – March 2012 – Development active • Updated design document and test plan posted – Review meeting – 1st week November • 15 + patches – Expected completion – early December! 21 O'Reilly Strata & Hadoop World
  • 22. Enterprise Use Cases • Storage fault-tolerance – built into HDFS Architecture  – Over 7’9s of data reliability • High Availability  • Standard Interfaces  – WebHdfs(REST) , Fuse  and NFS access • HTTPFS – (WebHDFS as farm of proxy servers) • libWebhdfs – pure c-library for HDFS • Wire protocol compatibility  – Protocol buffers • Rolling upgrades – Rolling upgrades for dot-releases  • Snapshots - Under active development • Disaster Recovery – Distcp does parallel and incremental copies across cluster  • Future - Enhance using journal interface & Snapshots 22 O'Reilly Strata & Hadoop World
  • 23. Summary • HA for Namenode – Hot failover, shared storage not required (QJM) • Performance improvements • Utilize today’s and tomorrow’s hardware to full potential • Federation and Generalized storage layer – Opportunities for innovation • Partial namespace in memory, shadow/caching file system, MR tmp, etc. • Wire compatibility, WebHdfs, … • Snapshots - Development well in progress 23 O'Reilly Strata & Hadoop World
  • 24. Questions? 24 O'Reilly Strata & Hadoop World

Notas do Editor

  1. Need reliable predictable latency