The document discusses managing Hadoop, HBase and Storm clusters at Yahoo scale. It describes Yahoo's grid infrastructure which includes 3 data centers with over 45k nodes across 18 Hadoop clusters, 9 HBase clusters and 13 Storm clusters. It then provides details on the rolling upgrade processes for HDFS, YARN, HBase and Storm which involve minimizing downtime, upgrading components independently and verifying upgrades. CI/CD processes are used to automate software deployment and upgrades.
4. Grid Infrastructure at Yahoo
HadoopSummit 2016
▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for
large scale data processing
▪ 3 data centers, over 45k physical nodes.
▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes.
▪ 9 HBase clusters, having 80 to 1080 nodes.
▪ 13 Storm clusters, having 40 to 250 nodes
6. Deployment Model
DataNode NodeManager
NameNode R
M
DataNodes RegionServers
NameNode HBase Master Nimbu
s
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM Load
Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
HadoopSummit 2016
10. CI/CD
process
Git
(release
info)
Jenkins
Start
HDFS Upgrade
RU
process
Finalize RU
Create Dir
Structure
Put NN in
RU mode
SNN
Upgrade
NN
Failover
SNN
Upgrade
foreach DN
Select DN
Check
installed
version
Stop DN
Activate new
software
Start DN
Wait for DN
to join
Stop/termina
te RU on X
failures
1
2
3a
3b
3c
4a
4b
4c 4d
4e
4f
After 100 hosts are
successfully upgraded
Check HDFS used
%age, Live nodes
consistency on
NNs
Terminate
Upgrade incase
of more than X
failure
Involves service and
IP failover from NN
to SNN and vice
versa
Safeupgrade-dn
11. Hadoop 2.7.x improvements over 2.6.x
Performance
▪ Reduce NN failover by parallelizing the quota init
▪ Datanode layout inefficiency causing high I/O load.
▪ Use a offline upgrade script to speed up the layout upgrade.
▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health
check.
▪ Improved datanode shutdown speed
Failure handling
▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
13. YARN Rolling Upgrade
▪ Minimize downtime, maximize service availability
▪ Work preserving restart on RM and NM
▪ Retains state for 10mins.
▪ Ensures that applications continuously run during a RM restart
▪ Save state, update software, restart and restore state.
▪ Uses leveldb as state store
▪ After RM restarts, it loads all the application metadata and other credentials from state-store and
populates them into memory.
HadoopSummit 2016
16. Distributed Cache
▪ Distributed cache distributes application-specific, large, read-only files efficiently.
▪ Applications specify the files to be cached in URLs (hdfs://) in the Job
▪ DistributedCache tracks the modification timestamps of the cached files.
▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex
types such as archives and JAR files.
HadoopSummit 2016
17. Sharelib
▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every
cluster.
▪ Shared libraries can simplify the deployment and management of applications.
▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the
packages are
• /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions)
• /sharelib/v1/{tez, pig, ... } - where the package versions are kept
▪ The links/tags (metafile) are unique per cluster.
▪ Grid Ops maintains shared libraries on HDFS of each cluster
▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie.
HadoopSummit 2016
20. Component Upgrade
HadoopSummit 2016
▪ New Releases : CI environment continuously releases certified builds & their versions.
▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each
& every cluster
▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server
▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every
environment/cluster.
▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end
functionality is working as expected.
21. Components Upgrade
CI
process
Component
versions
Git
Bundles
Certified
Releases
Rule set files
(cluster:
component
specific)
Git bundles
Certified
package
version info
Statefiles
Build
Farms
Cookbook,
Roles, Env,
Attribute files
Git (release
info)
Build
Farms
Artifactory
Ruby (Rake)
New Release Package Rulesets Deploy cookbooks
A B
Build
Farms
Rspec rubocop,
state generate,
compare & upload
Validate increment
version
1 2 3
Chef
22. CD
process
Components Upgrade cont..
Git (release
info)
Build
Farms
Statefiles
Deploy Pipeline
Component
Node
Ruby (Rake)
Min size, zerodowntime
check, targetsize, validate
Chef-client, cookbook-converge,
graceful shutdown and healthcheck
4
Chef
A B
24. HBase Rolling Upgrade
Release Configs:
default:
group: 'all'
command: 'start'
system: 'ALL'
verbose: 'true'
retry: 3
upgradeREST: 'false'
upgradeGateway: 'true'
dryrun: 'false'
force: 'false'
upgrade_type: 'rolling'
skip_nn_upgrade: 'false'
skip_master_upgrade: 'false'
Workflow definitions:
default:
continue_on_failure:
- broken
- badnodes
relux.red:
- master
- default
- user
- ca_soln-stage
- perf,perf2,projects
- restALL
▪ Workflow based system.
▪ Complete CI/CD for HDFS and HBase Upgrades
▪ Build tgz and push to repo servers
▪ Installs software before hand, activate new release during
upgrade
▪ Each component and Region group is upgraded independently
i.e HDFS, group of regionservers.
25. CI/CD
process
Git
(release
info)
Jenkins
Start
Put NN in RU
mode &
Upgrade NN
SNN
Master
Upgrade
Region-
server
Upgrade
process
Stargate
Upgrade
Gateway
Upgrade
HBase Upgrade
Foreach
DN/RS
Upgrade
regionserver
Repo Server
Package +
conf version
Stop
Regionserver
DN
Safeupgrade,
Stop DN
Upgrade and
Start DN
Upgrade and
Start RS
1
2
3
4
3a
3c
3b
3d 3e
3f
3f
5
HDFS Rolling
Upgrade process
Iterate over each group
Iterate over
each server in
a group
28. Storm Upgrade CI/CD
process
Git
(release
info)
Jenkins
Start
Artifactory
(State files &
Release info)
RE Jenkins
and SD
process
Pacemaker
Upgrade
Nimbus
Upgrade
Supervisor
Upgrade
Bounce
Workers
DRPC
Upgrade
DRPC
Upgrade
Verify
Supervisors
Run
Test/Validatio
n topology
Audit All
Components
RE Jenkins lets to statefile
generation for each component and
updates git with release info
Statefiles are published in
artifactory and downloaded during
upgrade
Upgrade fails if
more than X
supervisors
fails to upgrade
29. Rolling Upgrade timeline
Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x
HDFS (4k nodes) 1 4 days 1 day X X
YARN (4k nodes) 1 1 day 1 day X X
HBase (1k nodes) 1-4 4-5 days X 4-5 days X
Storm (350
nodes)
10 X X X 4-6 hrs
Components 1 1-2 hrs 1-2 hrs 1-2 hrs X
HadoopSummit 2016