SlideShare uma empresa Scribd logo
1 de 31
Managing Hadoop, HBase and Storm Clusters at
Yahoo Scale
PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016
Agenda
Topic Speaker(s)
Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur
YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan
Q&A All Presenters
HadoopSummit 2016
Hadoop at Yahoo
Grid Infrastructure at Yahoo
HadoopSummit 2016
▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for
large scale data processing
▪ 3 data centers, over 45k physical nodes.
▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes.
▪ 9 HBase clusters, having 80 to 1080 nodes.
▪ 13 Storm clusters, having 40 to 250 nodes
Grid Stack
Zookeeper
Backend
Support
Hadoop
Storage
Hadoop
Compute
Hadoop
Services
Support
Shop
Monitoring
Starling for
logging
HDFS
Hbase as
NoSql
store
Hcatalog for
metadata
registry
YARN (Mapred) and Tez
for Batch processing
Storm for stream
processing
Spark for iterative
programming
PIG for
ETL
Hive for
SQL
Oozie for
workflows
Proxy
services
GDM for
data Mang
Café on
Spark for
ML
Deployment Model
DataNode NodeManager
NameNode R
M
DataNodes RegionServers
NameNode HBase Master Nimbu
s
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM Load
Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
HadoopSummit 2016
HDFS
Hadoop Rolling Upgrade
▪ Complete CI/CD for HDFS and YARN Upgrades
▪ Build software and config “tgz” and push to repo servers
▪ Installs software and configs in pre-deploy phase, activate
during upgrade
▪ Slow upgrade 1 node per cycle
▪ Each component is upgraded independently i.e HDFS, YARN
& Client
HadoopSummit 2016
Release Configs/Bundles:
---
doc: This file is auto generated
packages:
- label: hadoop
version: 2.7.2.13.1606200235-20160620-000
- label: conf
version: 2.7.2.13.1606200235-20160620-000
- label: gridjdk
version: 1.7.0_17.1303042057-20160620-000
- label: yjava_jdk
version: 1.8.0_60.51-20160620-000
Package Download (pre- deploy)
RU
process
Git
(release
info)
Namenode, Datanodes,
Resourcemanager
HBaseMaster, Regionserver,
Gateways
Repo
Farm
Jenkins
Start
Servers
/Cluster
ygrid-deploy-software
CI/CD
process
Git
(release
info)
Jenkins
Start
HDFS Upgrade
RU
process
Finalize RU
Create Dir
Structure
Put NN in
RU mode
SNN
Upgrade
NN
Failover
SNN
Upgrade
foreach DN
Select DN
Check
installed
version
Stop DN
Activate new
software
Start DN
Wait for DN
to join
Stop/termina
te RU on X
failures
1
2
3a
3b
3c
4a
4b
4c 4d
4e
4f
After 100 hosts are
successfully upgraded
Check HDFS used
%age, Live nodes
consistency on
NNs
Terminate
Upgrade incase
of more than X
failure
Involves service and
IP failover from NN
to SNN and vice
versa
Safeupgrade-dn
Hadoop 2.7.x improvements over 2.6.x
Performance
▪ Reduce NN failover by parallelizing the quota init
▪ Datanode layout inefficiency causing high I/O load.
▪ Use a offline upgrade script to speed up the layout upgrade.
▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health
check.
▪ Improved datanode shutdown speed
Failure handling
▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
YARN
YARN Rolling Upgrade
▪ Minimize downtime, maximize service availability
▪ Work preserving restart on RM and NM
▪ Retains state for 10mins.
▪ Ensures that applications continuously run during a RM restart
▪ Save state, update software, restart and restore state.
▪ Uses leveldb as state store
▪ After RM restarts, it loads all the application metadata and other credentials from state-store and
populates them into memory.
HadoopSummit 2016
CI/CD
process
Git
(release
info)
Jenkins
Start
YARN Upgrade
RU
process
Create Dir
Structure
Resource
Manager
Upgrade
HistoryServer
Upgrade
Foreach NM
Select NM
Check
installed
version
Safestop NM
(kill -9)
Activate new
software
Start NM
Wait for NM
to join
Stop/termina
te RU on X
failures
Timeline
Server
Upgrade
1
2
2a
2b 2c
2d
2e 3
4
5
Terminate
Upgrade incase
of more than X
failure
Distributed cache & Sharelib
Distributed Cache
▪ Distributed cache distributes application-specific, large, read-only files efficiently.
▪ Applications specify the files to be cached in URLs (hdfs://) in the Job
▪ DistributedCache tracks the modification timestamps of the cached files.
▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex
types such as archives and JAR files.
HadoopSummit 2016
Sharelib
▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every
cluster.
▪ Shared libraries can simplify the deployment and management of applications.
▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the
packages are
• /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions)
• /sharelib/v1/{tez, pig, ... } - where the package versions are kept
▪ The links/tags (metafile) are unique per cluster.
▪ Grid Ops maintains shared libraries on HDFS of each cluster
▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie.
HadoopSummit 2016
Jenkins
Start
Sharelib
Uploader
Git
Bundles
Verify Dist
Cache
Download
toDo
packages
Dist repo
Re-package
and upload
package
Re-generate
Meta info
(HDFS)
Upload to
Oozie
Sharelib Update
Generate clients to update
Subsystems
Component Upgrade
HadoopSummit 2016
▪ New Releases : CI environment continuously releases certified builds & their versions.
▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each
& every cluster
▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server
▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every
environment/cluster.
▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end
functionality is working as expected.
Components Upgrade
CI
process
Component
versions
Git
Bundles
Certified
Releases
Rule set files
(cluster:
component
specific)
Git bundles
Certified
package
version info
Statefiles
Build
Farms
Cookbook,
Roles, Env,
Attribute files
Git (release
info)
Build
Farms
Artifactory
Ruby (Rake)
New Release Package Rulesets Deploy cookbooks
A B
Build
Farms
Rspec rubocop,
state generate,
compare & upload
Validate increment
version
1 2 3
Chef
CD
process
Components Upgrade cont..
Git (release
info)
Build
Farms
Statefiles
Deploy Pipeline
Component
Node
Ruby (Rake)
Min size, zerodowntime
check, targetsize, validate
Chef-client, cookbook-converge,
graceful shutdown and healthcheck
4
Chef
A B
HBase
HBase Rolling Upgrade
Release Configs:
default:
group: 'all'
command: 'start'
system: 'ALL'
verbose: 'true'
retry: 3
upgradeREST: 'false'
upgradeGateway: 'true'
dryrun: 'false'
force: 'false'
upgrade_type: 'rolling'
skip_nn_upgrade: 'false'
skip_master_upgrade: 'false'
Workflow definitions:
default:
continue_on_failure:
- broken
- badnodes
relux.red:
- master
- default
- user
- ca_soln-stage
- perf,perf2,projects
- restALL
▪ Workflow based system.
▪ Complete CI/CD for HDFS and HBase Upgrades
▪ Build tgz and push to repo servers
▪ Installs software before hand, activate new release during
upgrade
▪ Each component and Region group is upgraded independently
i.e HDFS, group of regionservers.
CI/CD
process
Git
(release
info)
Jenkins
Start
Put NN in RU
mode &
Upgrade NN
SNN
Master
Upgrade
Region-
server
Upgrade
process
Stargate
Upgrade
Gateway
Upgrade
HBase Upgrade
Foreach
DN/RS
Upgrade
regionserver
Repo Server
Package +
conf version
Stop
Regionserver
DN
Safeupgrade,
Stop DN
Upgrade and
Start DN
Upgrade and
Start RS
1
2
3
4
3a
3c
3b
3d 3e
3f
3f
5
HDFS Rolling
Upgrade process
Iterate over each group
Iterate over
each server in
a group
STORM
Storm Rolling Upgrade
Release Configs:
default:
parallel: 10
verbose: 'true'
retry: 3
dryrun: 'false'
upgrade_type: 'rolling'
quarantine: 'true'
terminate_on_failure: 'true'
sup_failure_threshold: 10
sendmail_to: 'dheerajk@yahoo-inc.com'
sendmail_cc: 'storm-devel@yahoo-inc.com, grid-ops@yahoo-inc.com'
cluster_workflow:
cluster1.colo1: pacemaker_drpc
cluster2.colo2: default
Workflow Defination:
default:
rolling_task:
- upgradeNimbus
- bounceNimbus
- upgradeSupervisor
- bounceSupervisor
- upgradeDRPC
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
full_upgrade_task:
- killAllTopologies
- specifyOperation_stop
- sleep10
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- clearDiskCache
- cleanZKP
- upgradeNimbus
- upgradeSupervisor
- upgradeDRPC
- specifyOperation_start
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
▪ Complete CI/CD system. Statefiles are build per
component and pushed to artifactory before
upgrade
▪ Installs software before hand, activate new release
during upgrade
▪ Each component is upgraded independently i.e
Pacemaker, Nimbus, DRPC & Supervisor
Storm Upgrade CI/CD
process
Git
(release
info)
Jenkins
Start
Artifactory
(State files &
Release info)
RE Jenkins
and SD
process
Pacemaker
Upgrade
Nimbus
Upgrade
Supervisor
Upgrade
Bounce
Workers
DRPC
Upgrade
DRPC
Upgrade
Verify
Supervisors
Run
Test/Validatio
n topology
Audit All
Components
RE Jenkins lets to statefile
generation for each component and
updates git with release info
Statefiles are published in
artifactory and downloaded during
upgrade
Upgrade fails if
more than X
supervisors
fails to upgrade
Rolling Upgrade timeline
Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x
HDFS (4k nodes) 1 4 days 1 day X X
YARN (4k nodes) 1 1 day 1 day X X
HBase (1k nodes) 1-4 4-5 days X 4-5 days X
Storm (350
nodes)
10 X X X 4-6 hrs
Components 1 1-2 hrs 1-2 hrs 1-2 hrs X
HadoopSummit 2016
99.928
99.898
99.940
99.687
99.705
99.600
99.650
99.700
99.750
99.800
99.850
99.900
99.950
100.000
AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR
Rolling Upgrade Impact
YTD Availability by Cluster
99.990
Thank You
HadoopSummit 2016

Mais conteúdo relacionado

Mais procurados

Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezDataWorks Summit
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 

Mais procurados (20)

Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 

Destaque

Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Sumeet Singh
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...DataWorks Summit/Hadoop Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 

Destaque (20)

Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 

Semelhante a Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryDataWorks Summit
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSergey Lukjanov
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosBrian Benz
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseCloudera, Inc.
 
Integrating CloudStack & Ceph
Integrating CloudStack & CephIntegrating CloudStack & Ceph
Integrating CloudStack & CephShapeBlue
 

Semelhante a Managing Hadoop, HBase and Storm Clusters at Yahoo Scale (20)

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Unit 5
Unit  5Unit  5
Unit 5
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Linux Experience for Herman
Linux Experience for HermanLinux Experience for Herman
Linux Experience for Herman
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 
Integrating CloudStack & Ceph
Integrating CloudStack & CephIntegrating CloudStack & Ceph
Integrating CloudStack & Ceph
 

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

  • 1. Managing Hadoop, HBase and Storm Clusters at Yahoo Scale PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016
  • 2. Agenda Topic Speaker(s) Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan Q&A All Presenters HadoopSummit 2016
  • 4. Grid Infrastructure at Yahoo HadoopSummit 2016 ▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for large scale data processing ▪ 3 data centers, over 45k physical nodes. ▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes. ▪ 9 HBase clusters, having 80 to 1080 nodes. ▪ 13 Storm clusters, having 40 to 250 nodes
  • 5. Grid Stack Zookeeper Backend Support Hadoop Storage Hadoop Compute Hadoop Services Support Shop Monitoring Starling for logging HDFS Hbase as NoSql store Hcatalog for metadata registry YARN (Mapred) and Tez for Batch processing Storm for stream processing Spark for iterative programming PIG for ETL Hive for SQL Oozie for workflows Proxy services GDM for data Mang Café on Spark for ML
  • 6. Deployment Model DataNode NodeManager NameNode R M DataNodes RegionServers NameNode HBase Master Nimbu s Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat HadoopSummit 2016
  • 8. Hadoop Rolling Upgrade ▪ Complete CI/CD for HDFS and YARN Upgrades ▪ Build software and config “tgz” and push to repo servers ▪ Installs software and configs in pre-deploy phase, activate during upgrade ▪ Slow upgrade 1 node per cycle ▪ Each component is upgraded independently i.e HDFS, YARN & Client HadoopSummit 2016 Release Configs/Bundles: --- doc: This file is auto generated packages: - label: hadoop version: 2.7.2.13.1606200235-20160620-000 - label: conf version: 2.7.2.13.1606200235-20160620-000 - label: gridjdk version: 1.7.0_17.1303042057-20160620-000 - label: yjava_jdk version: 1.8.0_60.51-20160620-000
  • 9. Package Download (pre- deploy) RU process Git (release info) Namenode, Datanodes, Resourcemanager HBaseMaster, Regionserver, Gateways Repo Farm Jenkins Start Servers /Cluster ygrid-deploy-software
  • 10. CI/CD process Git (release info) Jenkins Start HDFS Upgrade RU process Finalize RU Create Dir Structure Put NN in RU mode SNN Upgrade NN Failover SNN Upgrade foreach DN Select DN Check installed version Stop DN Activate new software Start DN Wait for DN to join Stop/termina te RU on X failures 1 2 3a 3b 3c 4a 4b 4c 4d 4e 4f After 100 hosts are successfully upgraded Check HDFS used %age, Live nodes consistency on NNs Terminate Upgrade incase of more than X failure Involves service and IP failover from NN to SNN and vice versa Safeupgrade-dn
  • 11. Hadoop 2.7.x improvements over 2.6.x Performance ▪ Reduce NN failover by parallelizing the quota init ▪ Datanode layout inefficiency causing high I/O load. ▪ Use a offline upgrade script to speed up the layout upgrade. ▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health check. ▪ Improved datanode shutdown speed Failure handling ▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
  • 12. YARN
  • 13. YARN Rolling Upgrade ▪ Minimize downtime, maximize service availability ▪ Work preserving restart on RM and NM ▪ Retains state for 10mins. ▪ Ensures that applications continuously run during a RM restart ▪ Save state, update software, restart and restore state. ▪ Uses leveldb as state store ▪ After RM restarts, it loads all the application metadata and other credentials from state-store and populates them into memory. HadoopSummit 2016
  • 14. CI/CD process Git (release info) Jenkins Start YARN Upgrade RU process Create Dir Structure Resource Manager Upgrade HistoryServer Upgrade Foreach NM Select NM Check installed version Safestop NM (kill -9) Activate new software Start NM Wait for NM to join Stop/termina te RU on X failures Timeline Server Upgrade 1 2 2a 2b 2c 2d 2e 3 4 5 Terminate Upgrade incase of more than X failure
  • 16. Distributed Cache ▪ Distributed cache distributes application-specific, large, read-only files efficiently. ▪ Applications specify the files to be cached in URLs (hdfs://) in the Job ▪ DistributedCache tracks the modification timestamps of the cached files. ▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex types such as archives and JAR files. HadoopSummit 2016
  • 17. Sharelib ▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every cluster. ▪ Shared libraries can simplify the deployment and management of applications. ▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the packages are • /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions) • /sharelib/v1/{tez, pig, ... } - where the package versions are kept ▪ The links/tags (metafile) are unique per cluster. ▪ Grid Ops maintains shared libraries on HDFS of each cluster ▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie. HadoopSummit 2016
  • 18. Jenkins Start Sharelib Uploader Git Bundles Verify Dist Cache Download toDo packages Dist repo Re-package and upload package Re-generate Meta info (HDFS) Upload to Oozie Sharelib Update Generate clients to update
  • 20. Component Upgrade HadoopSummit 2016 ▪ New Releases : CI environment continuously releases certified builds & their versions. ▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each & every cluster ▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server ▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every environment/cluster. ▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end functionality is working as expected.
  • 21. Components Upgrade CI process Component versions Git Bundles Certified Releases Rule set files (cluster: component specific) Git bundles Certified package version info Statefiles Build Farms Cookbook, Roles, Env, Attribute files Git (release info) Build Farms Artifactory Ruby (Rake) New Release Package Rulesets Deploy cookbooks A B Build Farms Rspec rubocop, state generate, compare & upload Validate increment version 1 2 3 Chef
  • 22. CD process Components Upgrade cont.. Git (release info) Build Farms Statefiles Deploy Pipeline Component Node Ruby (Rake) Min size, zerodowntime check, targetsize, validate Chef-client, cookbook-converge, graceful shutdown and healthcheck 4 Chef A B
  • 23. HBase
  • 24. HBase Rolling Upgrade Release Configs: default: group: 'all' command: 'start' system: 'ALL' verbose: 'true' retry: 3 upgradeREST: 'false' upgradeGateway: 'true' dryrun: 'false' force: 'false' upgrade_type: 'rolling' skip_nn_upgrade: 'false' skip_master_upgrade: 'false' Workflow definitions: default: continue_on_failure: - broken - badnodes relux.red: - master - default - user - ca_soln-stage - perf,perf2,projects - restALL ▪ Workflow based system. ▪ Complete CI/CD for HDFS and HBase Upgrades ▪ Build tgz and push to repo servers ▪ Installs software before hand, activate new release during upgrade ▪ Each component and Region group is upgraded independently i.e HDFS, group of regionservers.
  • 25. CI/CD process Git (release info) Jenkins Start Put NN in RU mode & Upgrade NN SNN Master Upgrade Region- server Upgrade process Stargate Upgrade Gateway Upgrade HBase Upgrade Foreach DN/RS Upgrade regionserver Repo Server Package + conf version Stop Regionserver DN Safeupgrade, Stop DN Upgrade and Start DN Upgrade and Start RS 1 2 3 4 3a 3c 3b 3d 3e 3f 3f 5 HDFS Rolling Upgrade process Iterate over each group Iterate over each server in a group
  • 26. STORM
  • 27. Storm Rolling Upgrade Release Configs: default: parallel: 10 verbose: 'true' retry: 3 dryrun: 'false' upgrade_type: 'rolling' quarantine: 'true' terminate_on_failure: 'true' sup_failure_threshold: 10 sendmail_to: 'dheerajk@yahoo-inc.com' sendmail_cc: 'storm-devel@yahoo-inc.com, grid-ops@yahoo-inc.com' cluster_workflow: cluster1.colo1: pacemaker_drpc cluster2.colo2: default Workflow Defination: default: rolling_task: - upgradeNimbus - bounceNimbus - upgradeSupervisor - bounceSupervisor - upgradeDRPC - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion full_upgrade_task: - killAllTopologies - specifyOperation_stop - sleep10 - bounceNimbus - bounceSupervisor - bounceDRPC - clearDiskCache - cleanZKP - upgradeNimbus - upgradeSupervisor - upgradeDRPC - specifyOperation_start - bounceNimbus - bounceSupervisor - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion ▪ Complete CI/CD system. Statefiles are build per component and pushed to artifactory before upgrade ▪ Installs software before hand, activate new release during upgrade ▪ Each component is upgraded independently i.e Pacemaker, Nimbus, DRPC & Supervisor
  • 28. Storm Upgrade CI/CD process Git (release info) Jenkins Start Artifactory (State files & Release info) RE Jenkins and SD process Pacemaker Upgrade Nimbus Upgrade Supervisor Upgrade Bounce Workers DRPC Upgrade DRPC Upgrade Verify Supervisors Run Test/Validatio n topology Audit All Components RE Jenkins lets to statefile generation for each component and updates git with release info Statefiles are published in artifactory and downloaded during upgrade Upgrade fails if more than X supervisors fails to upgrade
  • 29. Rolling Upgrade timeline Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x HDFS (4k nodes) 1 4 days 1 day X X YARN (4k nodes) 1 1 day 1 day X X HBase (1k nodes) 1-4 4-5 days X 4-5 days X Storm (350 nodes) 10 X X X 4-6 hrs Components 1 1-2 hrs 1-2 hrs 1-2 hrs X HadoopSummit 2016
  • 30. 99.928 99.898 99.940 99.687 99.705 99.600 99.650 99.700 99.750 99.800 99.850 99.900 99.950 100.000 AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR Rolling Upgrade Impact YTD Availability by Cluster 99.990