SlideShare uma empresa Scribd logo
1 de 42
Putting Wings on the Elephant
Pritam Damania
Facebook, Inc.
Putting wings on the Elephant!
Pritam Damania
Software Engineer
April 2, 2014
1 Background
2 Major Issues in I/O path
3 Read Improvements
4 Write Improvements
5 Lessons learnt
Agenda
High level Messages Architecture
HBASE
Application
Server
Messag
e
Messag
e
AckWrite
Hbase Cluster Physical Layout
▪ Multiple clusters/cells for messaging
▪ 20 servers/rack; 5 or more racks per cluster
Rack #1
ZooKeeper Peer
HDFS Namenode
Region Server
Data Node
Task Tracker
19x...
Region Server
Data Node
Task Tracker
Rack #2
ZooKeeper Peer
Standby Namenode
Region Server
Data Node
Task Tracker
19x...
Region Server
Data Node
Task Tracker
Rack #3
ZooKeeper Peer
Job Tracker
Region Server
Data Node
Task Tracker
19x...
Region Server
Data Node
Task Tracker
Rack #4
ZooKeeper Peer
HBase Master
Region Server
Data Node
Task Tracker
19x...
Region Server
Data Node
Task Tracker
Rack #5
ZooKeeper Peer
Backup HBase Master
Region Server
Data Node
Task Tracker
19x...
Region Server
Data Node
Task Tracker
Write Path Overview
HDFS
Write Ahead
Log
RegionServer
Memstore
HFiles
HDFS Write Pipeline
Datanode
OS page cache
Disk
Regionserver
64k
packet
Datanode
OS page cache
Disk
Datanode
OS page cache
Disk
Ack
Read Path Overview
HDFS
RegionServer
Memstore
HFiles
Get
Problems in R/W Path
• Skewed Disk Usage
• High Disk iops
• High p99 for r/w
Improvements in Read Path
Disk Skew
Datanode
OS page cache
Disk
Datanode
OS page cache
Disk
Datanode
OS page cache
Disk
• HDFS block size : 256MB
• HDFS block resides on single disk
• Fsync of 256MB hitting single disk
Disk Skew - Sync File Range
………………………………………………………………………………………………..
64k 64k 64k 64k
sync_file_range every 1MB
▪ sync_file_range(SYNC_FILE_RANGE_WRITE)
▪ Initiates Async write
Block File Written on
Linux FileSystem
64k 64k
fsync
High IOPS
• Messages workload is random read
• Small preads (~4KB) on datanodes
• Two iops for each pread
Datanode
Block File Checksum file
prea
d
Read checksum
Read data
High IOPS - Inline Checksums
……………………
…………………………………
4096 byte Data Chunk
4 byte Checksum
• Checksums inline with data
• Single iop for pread
HDFS Block
High IOPS - Results
No. of Put
and get
above one
second
Put
avg
time
Get
avg
time
Hbase Locality - HDFS Favored Nodes
▪ Each region’s data on 3 specific datanodes
▪ On failure locality preserved
▪ Favored nodes persisted at hbase layer
RegionServer
Local Datanode
Hbase Locality - Solution
• Persisting info in NameNode complicated
• Region Directory :
▪ /*HBASE/<tablename>/<regionname>/cf1/…
▪ /*HBASE/<tablename>/<regionname>/cf2/…
• Build Histogram of locations in directory
• Pick lowest frequency to delete 0
5000
10000
Datanodes
D1
D2
D3
D4
More Improvements
• Keep fds open
• Throttle re-replication
Improvements in Write Path
Hbase WAL
Datanode
OS page cache
Disk
Regionserver
Datanode
OS page cache
Disk
Datanode
OS page cache
Disk
• Packets never hit disk
• > 1s outliers !
Instrumentation
1. Write to OS cache
2. Write to TCP buffers
3. sync_file_range(SYNC_FILE_RANGE_WRITE)
1. & 3. outliers >1s !
Use of strace
Interesting Observations
• write(2) outliers correlated with busy disk
• Reproducible by artificially stressing disk
dd oflag=sync,dsync if=/dev/zero of=/mnt/d7/test/tempfile
bs=256M count=1000
Test Program
File Written on
Linux FileSystem
……………………………………………………………………………………..
64k 64k 64k 64k
sync_file_range every 1MB
64k 64k
………………………………………………………………………………………………..
63k 1k 63k 1k
sync_file_range every 1MB
63k 1k
No Outliers
!
Outliers Reproduced !
Some suspects
• Too many dirty pages
• Linux stable pages
• Kernel trace points revealed stable pages the culprit
Stable Pages
Persistent Store
(Device with Integrity Checking)
OS page
Kernel
Checksum
Device
Checksum
WriteBack
• Checksum Error
• Solution – Lock
pages under
writeback
Explanation of Write Outliers
Persistent Store
OS Page
4k
WAL write
WriteBack
(sync_file_range)
WAL write
blocked
Solution ?
Patch :
http://thread.gmane.org/gmane.comp.file-
systems.ext4/35561
sync_file_range ?
• sync_file_range not async for > 128 write requests
• Solution – Use threadpool
Results
P99
Write
latency to
OS
cache (in
ms)
Per request profiling
• Entire profile of client requests
• Full profile of pipeline write
• Full profile of pread
• Lot of visibility !
Interesting Profiles
• In memory operations >1s
• No Java GC
• Co-related with busy root disk
• Reproducible by stressing root disk
Investigation
• Use lsof
• /tmp/hsperfdata_hadoop/<pid> suspicious
• Disable using -XX:-UsePerfData
• Stalls disappeared !
• -XX:-UsePerfData breaks jps, jstack
• Mount /tmp/hsperfdata_hadoop/ on tmpfs
Result
p99
WAL
write
latency
(in ms)
Lessons learnt
• Instrumentation is key
• Per request profiling is very useful
• Understanding of Linux kernel and fs is important
Acknowledgements
▪ Hairong Kuang
▪ Siying Dong
▪ Kumar Sundararajan
▪ Binu John
▪ Dikang Gu
▪ Paul Tuckfield
▪ Arjen Roodselaar
▪ Matthew Byng-Maddick
▪ Liyin Tang
FB Hadoop code
• https://github.com/facebook/hadoop-20
Questions ?
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
Putting Wings on the Elephant: Improving HBase Write Performance

Mais conteúdo relacionado

Mais procurados

Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integrationtrihug
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets Perforce
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
Building Linux IPv6 DNS Server (Complete Presentation)
Building Linux IPv6 DNS Server (Complete Presentation)Building Linux IPv6 DNS Server (Complete Presentation)
Building Linux IPv6 DNS Server (Complete Presentation)Hari
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESJan Kalcic
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Building a Virtualized Continuum with Intel(r) Clear Containers
Building a Virtualized Continuum with Intel(r) Clear ContainersBuilding a Virtualized Continuum with Intel(r) Clear Containers
Building a Virtualized Continuum with Intel(r) Clear ContainersMichelle Holley
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Seriesselvaraaju
 

Mais procurados (20)

Re-Engineering the DNS – One Resolver at a Time
Re-Engineering the DNS – One Resolver at a Time Re-Engineering the DNS – One Resolver at a Time
Re-Engineering the DNS – One Resolver at a Time
 
Redis Persistence
Redis  PersistenceRedis  Persistence
Redis Persistence
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Google
GoogleGoogle
Google
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integration
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Building Linux IPv6 DNS Server (Complete Presentation)
Building Linux IPv6 DNS Server (Complete Presentation)Building Linux IPv6 DNS Server (Complete Presentation)
Building Linux IPv6 DNS Server (Complete Presentation)
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
SoNAS
SoNASSoNAS
SoNAS
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Building a Virtualized Continuum with Intel(r) Clear Containers
Building a Virtualized Continuum with Intel(r) Clear ContainersBuilding a Virtualized Continuum with Intel(r) Clear Containers
Building a Virtualized Continuum with Intel(r) Clear Containers
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Zfs intro v2
Zfs intro v2Zfs intro v2
Zfs intro v2
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
SuperServer in Firebird 3
SuperServer in Firebird 3SuperServer in Firebird 3
SuperServer in Firebird 3
 

Destaque

Building an organizational story that inspires - 2013 Retreat, Day 1
Building an organizational story that inspires - 2013 Retreat, Day 1Building an organizational story that inspires - 2013 Retreat, Day 1
Building an organizational story that inspires - 2013 Retreat, Day 1UpStartBayArea
 
Silent Inspirations Vol 11
Silent Inspirations Vol 11Silent Inspirations Vol 11
Silent Inspirations Vol 11leeza21
 
Training on Career Development for Profoundis (vibeapp.co)
Training on Career Development for Profoundis (vibeapp.co)Training on Career Development for Profoundis (vibeapp.co)
Training on Career Development for Profoundis (vibeapp.co)Arjun Pillai
 
Write a Good Career Story
Write a Good Career StoryWrite a Good Career Story
Write a Good Career StoryArjun Pillai
 
17 life lessons from bruce lee presentation by sompong yusoontorn
17 life lessons from bruce lee presentation by sompong yusoontorn17 life lessons from bruce lee presentation by sompong yusoontorn
17 life lessons from bruce lee presentation by sompong yusoontornVenkat Krishnan
 
The was an Idiot - My life Story till now
The was an Idiot - My life Story till nowThe was an Idiot - My life Story till now
The was an Idiot - My life Story till nowArjun Pillai
 
Leader vs manager
Leader vs managerLeader vs manager
Leader vs managermedomsoly
 
Inspirations
InspirationsInspirations
InspirationsSanja .
 
Your career is your choice
Your career is your choiceYour career is your choice
Your career is your choiceArjun Pillai
 
Skill Sets Needed for Corporate World
Skill Sets Needed for Corporate WorldSkill Sets Needed for Corporate World
Skill Sets Needed for Corporate WorldArjun Pillai
 
Fail fail fail, until you succeed
Fail fail fail, until you succeedFail fail fail, until you succeed
Fail fail fail, until you succeedArjun Pillai
 

Destaque (20)

What Inspires Us?
What Inspires Us?What Inspires Us?
What Inspires Us?
 
Building an organizational story that inspires - 2013 Retreat, Day 1
Building an organizational story that inspires - 2013 Retreat, Day 1Building an organizational story that inspires - 2013 Retreat, Day 1
Building an organizational story that inspires - 2013 Retreat, Day 1
 
Progetto orchestra sociale
Progetto orchestra socialeProgetto orchestra sociale
Progetto orchestra sociale
 
7 Quotes for Greater Good
7 Quotes for Greater Good7 Quotes for Greater Good
7 Quotes for Greater Good
 
Silent Inspirations Vol 11
Silent Inspirations Vol 11Silent Inspirations Vol 11
Silent Inspirations Vol 11
 
The brave woman
The brave womanThe brave woman
The brave woman
 
Training on Career Development for Profoundis (vibeapp.co)
Training on Career Development for Profoundis (vibeapp.co)Training on Career Development for Profoundis (vibeapp.co)
Training on Career Development for Profoundis (vibeapp.co)
 
Write a Good Career Story
Write a Good Career StoryWrite a Good Career Story
Write a Good Career Story
 
17 life lessons from bruce lee presentation by sompong yusoontorn
17 life lessons from bruce lee presentation by sompong yusoontorn17 life lessons from bruce lee presentation by sompong yusoontorn
17 life lessons from bruce lee presentation by sompong yusoontorn
 
Good manager, Great Leader
Good manager, Great LeaderGood manager, Great Leader
Good manager, Great Leader
 
The was an Idiot - My life Story till now
The was an Idiot - My life Story till nowThe was an Idiot - My life Story till now
The was an Idiot - My life Story till now
 
Leader vs manager
Leader vs managerLeader vs manager
Leader vs manager
 
Inspirations
InspirationsInspirations
Inspirations
 
Do Something
Do SomethingDo Something
Do Something
 
Your career is your choice
Your career is your choiceYour career is your choice
Your career is your choice
 
Bamboo Lessons
Bamboo Lessons Bamboo Lessons
Bamboo Lessons
 
Skill Sets Needed for Corporate World
Skill Sets Needed for Corporate WorldSkill Sets Needed for Corporate World
Skill Sets Needed for Corporate World
 
Fail fail fail, until you succeed
Fail fail fail, until you succeedFail fail fail, until you succeed
Fail fail fail, until you succeed
 
Of cocks and hens
Of cocks and hensOf cocks and hens
Of cocks and hens
 
Be a Leader
Be a LeaderBe a Leader
Be a Leader
 

Semelhante a Putting Wings on the Elephant: Improving HBase Write Performance

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewPhuwadon D
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...brettallison
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheAmazon Web Services
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
 
70-410 Practice Test
70-410 Practice Test70-410 Practice Test
70-410 Practice Testwrailebo
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeDataWorks Summit
 
CollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientCollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientChristoph Adler
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 

Semelhante a Putting Wings on the Elephant: Improving HBase Write Performance (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's View
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
MYSQL
MYSQLMYSQL
MYSQL
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
 
70-410 Practice Test
70-410 Practice Test70-410 Practice Test
70-410 Practice Test
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
 
CollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientCollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes Client
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Putting Wings on the Elephant: Improving HBase Write Performance

  • 1. Putting Wings on the Elephant Pritam Damania Facebook, Inc.
  • 2.
  • 3. Putting wings on the Elephant! Pritam Damania Software Engineer April 2, 2014
  • 4. 1 Background 2 Major Issues in I/O path 3 Read Improvements 4 Write Improvements 5 Lessons learnt Agenda
  • 5. High level Messages Architecture HBASE Application Server Messag e Messag e AckWrite
  • 6. Hbase Cluster Physical Layout ▪ Multiple clusters/cells for messaging ▪ 20 servers/rack; 5 or more racks per cluster Rack #1 ZooKeeper Peer HDFS Namenode Region Server Data Node Task Tracker 19x... Region Server Data Node Task Tracker Rack #2 ZooKeeper Peer Standby Namenode Region Server Data Node Task Tracker 19x... Region Server Data Node Task Tracker Rack #3 ZooKeeper Peer Job Tracker Region Server Data Node Task Tracker 19x... Region Server Data Node Task Tracker Rack #4 ZooKeeper Peer HBase Master Region Server Data Node Task Tracker 19x... Region Server Data Node Task Tracker Rack #5 ZooKeeper Peer Backup HBase Master Region Server Data Node Task Tracker 19x... Region Server Data Node Task Tracker
  • 7. Write Path Overview HDFS Write Ahead Log RegionServer Memstore HFiles
  • 8. HDFS Write Pipeline Datanode OS page cache Disk Regionserver 64k packet Datanode OS page cache Disk Datanode OS page cache Disk Ack
  • 10. Problems in R/W Path • Skewed Disk Usage • High Disk iops • High p99 for r/w
  • 12. Disk Skew Datanode OS page cache Disk Datanode OS page cache Disk Datanode OS page cache Disk • HDFS block size : 256MB • HDFS block resides on single disk • Fsync of 256MB hitting single disk
  • 13. Disk Skew - Sync File Range ……………………………………………………………………………………………….. 64k 64k 64k 64k sync_file_range every 1MB ▪ sync_file_range(SYNC_FILE_RANGE_WRITE) ▪ Initiates Async write Block File Written on Linux FileSystem 64k 64k fsync
  • 14. High IOPS • Messages workload is random read • Small preads (~4KB) on datanodes • Two iops for each pread Datanode Block File Checksum file prea d Read checksum Read data
  • 15. High IOPS - Inline Checksums …………………… ………………………………… 4096 byte Data Chunk 4 byte Checksum • Checksums inline with data • Single iop for pread HDFS Block
  • 16. High IOPS - Results No. of Put and get above one second Put avg time Get avg time
  • 17. Hbase Locality - HDFS Favored Nodes ▪ Each region’s data on 3 specific datanodes ▪ On failure locality preserved ▪ Favored nodes persisted at hbase layer RegionServer Local Datanode
  • 18. Hbase Locality - Solution • Persisting info in NameNode complicated • Region Directory : ▪ /*HBASE/<tablename>/<regionname>/cf1/… ▪ /*HBASE/<tablename>/<regionname>/cf2/… • Build Histogram of locations in directory • Pick lowest frequency to delete 0 5000 10000 Datanodes D1 D2 D3 D4
  • 19. More Improvements • Keep fds open • Throttle re-replication
  • 21. Hbase WAL Datanode OS page cache Disk Regionserver Datanode OS page cache Disk Datanode OS page cache Disk • Packets never hit disk • > 1s outliers !
  • 22. Instrumentation 1. Write to OS cache 2. Write to TCP buffers 3. sync_file_range(SYNC_FILE_RANGE_WRITE) 1. & 3. outliers >1s !
  • 24. Interesting Observations • write(2) outliers correlated with busy disk • Reproducible by artificially stressing disk dd oflag=sync,dsync if=/dev/zero of=/mnt/d7/test/tempfile bs=256M count=1000
  • 25. Test Program File Written on Linux FileSystem …………………………………………………………………………………….. 64k 64k 64k 64k sync_file_range every 1MB 64k 64k ……………………………………………………………………………………………….. 63k 1k 63k 1k sync_file_range every 1MB 63k 1k No Outliers ! Outliers Reproduced !
  • 26. Some suspects • Too many dirty pages • Linux stable pages • Kernel trace points revealed stable pages the culprit
  • 27. Stable Pages Persistent Store (Device with Integrity Checking) OS page Kernel Checksum Device Checksum WriteBack • Checksum Error • Solution – Lock pages under writeback
  • 28. Explanation of Write Outliers Persistent Store OS Page 4k WAL write WriteBack (sync_file_range) WAL write blocked
  • 30. sync_file_range ? • sync_file_range not async for > 128 write requests • Solution – Use threadpool
  • 32. Per request profiling • Entire profile of client requests • Full profile of pipeline write • Full profile of pread • Lot of visibility !
  • 33. Interesting Profiles • In memory operations >1s • No Java GC • Co-related with busy root disk • Reproducible by stressing root disk
  • 34. Investigation • Use lsof • /tmp/hsperfdata_hadoop/<pid> suspicious • Disable using -XX:-UsePerfData • Stalls disappeared ! • -XX:-UsePerfData breaks jps, jstack • Mount /tmp/hsperfdata_hadoop/ on tmpfs
  • 36.
  • 37. Lessons learnt • Instrumentation is key • Per request profiling is very useful • Understanding of Linux kernel and fs is important
  • 38. Acknowledgements ▪ Hairong Kuang ▪ Siying Dong ▪ Kumar Sundararajan ▪ Binu John ▪ Dikang Gu ▪ Paul Tuckfield ▪ Arjen Roodselaar ▪ Matthew Byng-Maddick ▪ Liyin Tang
  • 39. FB Hadoop code • https://github.com/facebook/hadoop-20
  • 41. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0