SlideShare uma empresa Scribd logo
1 de 54
How to overcome mysterious problems
caused by large and multi-tenant
hadoop cluster at Rakuten
Oct/27/2016
Tomomichi Hirano
EC Core Technology Department, Rakuten Inc.
tomomichi.hirano@rakuten.com
Who I am
 Tomomichi Hirano (平野 智巌)
• Joined to Rakuten in 2013
• Hadoop administrator
• Monitoring, tuning and Improving hadoop cluster
• Verifying and enabling new hadoop related components
• Trouble shooting for all problems
• Regular operation such as server and user adding, disk replacement, etc.
 Previous team
• Server provisioning, networking and HW related things.
2
Today’s Agenda
 Quick introduction
• About our clusters
• Hadoop use cases at Rakuten
 Mysterious problems
• Never ending jobs
• DataNode freezing
• NameNode freezing
• High load after restarting NameNode
• Lessons learned
3
 Server provisioning and
management
• Background for Big Data systems
• Provisioning and management
4
1 Quick introduction
About our clusters
 Production cluster
• # of slaves : around 200
• HDFS capacity : around 8PB
• # of jobs per day : 30,000 - 50,000
• # of hadoop active user accounts : around 40
• Types of jobs : MR, Hive, Tez, Spark, Pig, sqoop, HBase, Slider, etc.
 Other clusters
• Another production cluster for Business Continuity (BC).
• Some clusters for staging and development.
5
MAAS for OS
provisioning
About our clusters
6
Chef for configuring
Provisioning Engine
System Management
Shinken and PagerDuty
for alerting and incident
management
Splunk for reporting
Ganglia and Grafana for
graphingSecurity
Kerberos for cluster
security
Analysis Feedback loop
Output
Input
Hadoop use cases at Rakuten
7
shop data
purchase
data
item data
user
behavioruser
membership
item search
reports for shops
search quality
search suggest
recommendation
page design
recommendation
advertisement
event planning
site design
KPI management
marketing and sales
8
2 Mysterious problems
9
Never ending jobs
Some jobs were very slow to submit
or never ended with a lot of preemption
Mystery 1
Never ending jobs
 Recognized
• User began to complain “Hadoop is very slow !!!”
• Actually, a lot of jobs were very slow to submit and/or never ended.
10
“Container preempted by scheduler”
Never ending jobs
 What is “Capacity Scheduler Preemption”
• Jobs in high priority queue kill other jobs in low priority queue.
 Who kills who?
• There were already too many jobs and queues.
• Hard to get what was happening at all.
• So, we have decided to build our original monitoring system.
11
Never ending jobs
 Original monitoring with Grafana / Graphite
12
Graphite
for hadoop
carbon-cache
Grafana
Collectd
graphite-plugin
exec-plugin
scripts with jq
Graphite
for infra
NameNode
ResourceManager
via REST API
curl -s "${RM}:8088/ws/v1/cluster/apps?state=RUNNING"
curl -s "${RM}:8088/ws/v1/cluster/apps?finishedTimeBegin=`date -d '10
minutes ago' +%s%3N`"
Never ending jobs
 Graphs for YARN cluster
13
< Memory usage of YARN cluster >
< Running and Pending jobs >
Yellow : # of pending jobs
Green : # of running jobs
Pending jobs due to lack of memory.
Never ending jobs
 Graphs to analyze per user
14
< Running jobs per user >
< Pending jobs per user >
< Memory usage per user >
“Our cluster is not slow, your jobs are too much!”
Never ending jobs
 Never ending jobs with a lot of preemption
15
< YARM memory usage >
Too much preempting, maybe killing each other.
< Number of preemption per user >
Never ending jobs
 Turning for preemption, but how long?
• Investigated elapse time of each tasks and analyzed with Excel.
• 4.5 million tasks per day!
16
curl -s "http://${JH}:19888/ws/v1/history/mapreduce/jobs/${job_id}/tasks"
99% of tasks finished
within 5 min.
Never ending jobs
 Our solution : Cooperation with users
• In cluster side, set 10 min for
• In user side, we guided like below.
17
Please try to design your jobs so tasks finishes in less than 5 minutes normally
which leaves healthy room up to 10 minutes to avoid getting killed in all cases.
• Still some preemption, but far less!
• Yes, now cluster is under control.
• We can see “who kills who” and why now.
18
DataNode freezing
DataNode seemed to be freezing
for several minutes and went into dead
status sometime.
Mystery 2
DataNode freezing
 Recognized
• Last contact values of some DataNodes were very high.
• Normally, less than 5 sec.
• But, sometime 1 min, worst case 10 mins and went into “dead” status.
• But recovered without any operation.
19
Last contact of each DataNode
* Last contact of DataNode is elapsed time from
last successful health check with NameNode.
DataNode freezing
 Investigated DataNode log
• No log output during this issue was happening.
• Seemed the DataNode was freezing.
 Tried to restart DataNode and reboot OS
• Restarting DataNode did not help at all.
• Reboot OS cleared this issue for a while, but happened again.
 Observation
• Not Memory leak. OS related issue?
• Had to figure out this issue happened on which nodes and when.
20
DataNode freezing
 Added graph to monitor last contact value.
21
curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo"
< Last contact of each DataNode >
Figured out this issue happened
only on newly added DataNodes.
--- 200 sec ---
--- 100 sec ---
--- 0 sec ---
DataNode freezing
 Analyze with some other graphs
• Graphs for OS iowait and HDFS usage
• Trigger of this issue seemed high load caused by HDFS write.
22
DataNode freezing
 Many tries, but still no help
• Increasing DataNode heap, increasing handler count, upgrading OS, etc.
 Then, took thread dump and analyzed
• To figure out it was actual freezing or not.
• To figure out wrong thread blocks other thread?
23
${java home}/bin/jcmd ${pid of target JVM} Thread.print
${java home}/bin/jstack ${pid of target JVM}
Note : need to execute with process owner account.
DataNode freezing
 Thread dump analysis
• “heartbeating”, “DataXceiver” and “PacketResponder” were
blocked by a thread named “Thread-41”.
24
"DataNode: XXX heartbeating to ${NAMENODE}:8020" daemon prio=10 tid=0x0000000002156000 nid=0xf26 waiting for monitor entry
[0x00007f9dd315a000] java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.deleteBlock(BlockPoolSliceScanner.java:305)
- waiting to lock <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.deleteBlocks(BlockPoolSliceScanner.java:330)
...
"Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000] ...
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlockInfo(BlockPoolSliceScanner.java:237)
- locked <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.assignInitialVerificationTimes(BlockPoolSliceScanner.java:602)
- locked <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:645)
...
Blocked
Blocking
DataNode freezing
 What is “Thread-41”?
• Seems, progressing something with java “TreeMap”.
25
...
"Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.put(TreeMap.java:2019)
at java.util.TreeSet.add(TreeSet.java:255)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlockInfo(BlockPoolSliceScanner.java:243)
...
"Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.remove(TreeMap.java:2382)
at java.util.TreeSet.remove(TreeSet.java:276)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.delBlockInfo(BlockPoolSliceScanner.java:253)
...
DataNode freezing
 Source code reading
• org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner
26
Scans the block files under a block pool and verifies that the files are not corrupt.
This keeps track of blocks and their last verification times.
private static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
• 3 weeks???
• DataNode scans blocks more than once in 3 weeks by default.
• So, it’s must be “Datanode Block Scanner” related something!
DataNode freezing
 Found a workaround!
• Found there were something strange behavior for creating block map
to decide which blocks will be scanned.
• Also found two files for “Datanode Block Scanner” in local FS.
27
dncp_block_verification.log.curr
dncp_block_verification.log.prev
 So, we tried to delete them and restart DN
• Kind of initialization for “Datanode Block Scanner”.
• Then this issue never happened after that!
DataNode freezing
 Don’t worry!
• This issue should have been improved already with HDFS 2.7.0.
28
https://issues.apache.org/jira/browse/HDFS-7430
Rewrite the BlockScanner to use O(1) memory and use multiple thread
 Lessons learned
• Thread dump and source code reading for deep analysis.
• Especially, in case that we can’t get any clues from logs.
29
NameNode freezing
NameNode seemed to be freezing for
several minutes repeatedly with an interval.
Mystery 3
NameNode freezing
 Recognized
• One day, we recognized strange behavior with ResourceManager.
30
< Memory usage of YARN cluster >
< Running and pending jobs in last 10 min >
• Seemed, ResourceManager couldn’t accept new jobs.
• But running jobs were ok.
NameNode freezing
 Added some graphs for NameNodes. Must be HDFS checkpoint.
31
< lastTxId, checkpointTxId >
< if_octets.tx, if_octets.rx >
< RpcQueueTimeAveTime, RpcProcessingTimeAveTime >
< CallQueueLength >
NameNode freezing
 Monitoring continuously.
• Then we could catch difference before and
after NameNode failover.
• Checkpoint on standby NameNode should
not affect to active NameNode.
• But, actually affected!
32
• White line is a fail-over from second (nn2) to first (nn1).
• Happened only when second NameNode was active.
NameNode freezing
 HDFS-7858
• Improve HA Namenode Failover detection on the client
• Fix Version/s : 2.8.0, 3.0.0-alpha1
 HDFS-6763
• Initialize file system-wide quota once on transitioning to active
• Fix Version/s : 2.8.0, 3.0.0-alpha1
 Workaround for now
• Our current workaround is just keeping first NameNode active.
• So, strongly want backports of them to an available HDP version!
33
34
High load after restarting NameNode
NameNode went into unstable state by this
unknown high load.
Mystery 4
High load after restarting NameNode
 Symptom
• We met unknown high load after restarting NameNode in several
times.
• It suddenly disappeared several hours or a few days after.
• But the last one, it had never gone...
• During this high load was existing, NameNode went into very
unstable state.
• When it happened after on standby NameNode, we couldn’t fail-
over (fail-back).
• Very serious problem for us!
35
High load after restarting NameNode
 Added graphs for RPC queue activities in NameNode
• Unknown high load between checkpoint
36
< lastTxId (Green) >
< checkpointTxId (Yellow) >
< Waiting Time (Yellow) >
< QueueLength >
Good case Bad case
< Processing Time (Red) >
High load after restarting NameNode
 Multiple graphs
37
• NameNode seemed to be receiving
some amount of data from someone.
• Journal nodes? No...
• DataNode? Hard to know...
• But, it must relate to high load!
< Receive data size (Blue) >
High load after restarting NameNode
 DataNode log analysis
• 3 kinds of 60000 msec timeout were continuously being output.
38
2016-09-28 05:33:35,384 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.net.SocketTimeoutException: Call From XXXX to bhdXXXX:8020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/XXXX remote=XXX]; For more details see:
http://wiki.apache.org/hadoop/SocketTimeout
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:616)
60 sec timeout log
3 methods in a class “BPServiceActor” were failing repeatedly.
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:523)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:312)
High load after restarting NameNode
 Thread dump analysis on NameNode side
• Almost all server handers were waiting for one lock.
39
"IPC Server handler 45 on 8020" daemon prio=10 tid=0x00007fbed169f800 nid=0x26b2 waiting on condition [0x00007f9e05be1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007fa30ad4d890> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
....
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(...)
"IPC Server handler 45 on 8020" daemon prio=10 tid=0x00007fbed169f800 nid=0x26b2 waiting on condition [0x00007f9e05be1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007fa30ad4d890> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
....
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(...)
High load after restarting NameNode
 Source code reading
• org.apache.hadoop.hdfs.server.datanode.BPServiceActor
• Seems to handle communication with NameNode.
40
private void offerService() throws Exception {
LOG.info("For namenode " + nnAddr + " using"
+ " DELETEREPORT_INTERVAL of " + dnConf.deleteReportInterval + " msec "
+ " BLOCKREPORT_INTERVAL of " + dnConf.blockReportInterval + "msec"
+ " CACHEREPORT_INTERVAL of " + dnConf.cacheReportInterval + "msec"
+ " Initial delay: " + dnConf.initialBlockReportDelay + "msec"
+ "; heartBeatInterval=" + dnConf.heartBeatInterval);
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode bhdpXXXX:8020
using DELETEREPORT_INTERVAL of 500000 msec
BLOCKREPORT_INTERVAL of 21600000msec <= 6 hours
CACHEREPORT_INTERVAL of 10000msec
Initial delay: 0msec;
heartBeatInterval=5000 <= 5 sec
Source code
Actual DN’s log
High load after restarting NameNode
 DataNode log analysis again
• Failed to sent block report repeatedly with short time interval.
41
...
2016-10-03 03:40:47,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ...
2016-10-03 03:44:19,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ...
2016-10-03 03:47:43,464 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ...
...
High load after restarting NameNode
 What was the high load?
• Almost all DataNodes failed to send full block report and retrying with
a few minutes interval.
• Yes, it was “Block Report Storm” from all of DataNodes.
• During this storm was existing, full-BlockReport had never succeeded.
 Then, how to stop the storm?
• Have to reduce concurrency of these request somehow.
42
High load after restarting NameNode
 Tries and errors
• Manual arbitration with iptables
• Worked well, but a little bit tricky.
• And some DataNodes lost heartbeart
with Active NameNode sometimes.
43
DN DN DN
Standby
NN
Active
NN
DN DN
• Restart NameNode with different slaves files
• Worked in several time, but unfortunately, we got whole cluster down.
• So, you MUST NOT do this operation!!!
• Most safety way
• NameNode in startup phase discards non-initial block report.
• So, increase dfs.namenode.safemode.extension and wait.
 Monitor, Monitor, Monitor!!!
• Graphing tool is MUST for large and multi-tenant cluster.
• Investigation and monitoring with multiple graphs would great help.
 Cooperation with users
• For some issues, we have to solve cluster problem with users.
 Thread dump and source code reading for deep analysis
• In case that we can’t get any clues from logs, it’s very important.
• Thread dumping would be helpful for freezing or locking issues
especially.
44
Lessons learned from mysteries
 Query examples for NameNode and ResourceManager
45
Just as a reference
Contents Queries
HDFS cluster curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState"
NameNode JVM info curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=JvmMetrics"
NameNode and DataNode curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo"
NameNode state curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"
NameNode RPC curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=RpcActivityForPort8020"
NameNode CMS curl -s "${NN}:50070/jmx?qry=java.lang:type=GarbageCollector,name=ConcurrentMarkSweep"
NameNode Heap curl -s "${NN}:50070/jmx?qry=java.lang:type=Memory"
List of a HDFS directory * curl -s --negotiate -u : "${NN}:50070/webhdfs/v1/${HDFS_PATH}?&op=LISTSTATUS"
Usage of a HDFS directory * curl -s --negotiate -u : "${NN}:50070/webhdfs/v1/${HDFS_PATH}?&op=GETCONTENTSUMMARY"
jobs finished in last 10 min curl -s "${RM}:8088/ws/v1/cluster/apps?finishedTimeBegin=`date -d '10 minutes ago' +%s%3N`"
running jobs curl -s "${RM}:8088/ws/v1/cluster/apps?state=RUNNING"
accepted jobs curl -s "${RM}:8088/ws/v1/cluster/apps?state=ACCEPTED"
ResourceManager status curl -s "${RM}:8088/ws/v1/cluster/info”
YARN cluster curl -s "${RM}:8088/ws/v1/cluster/metrics" | jq "."
NodeManager curl -s "${RM}:8088/ws/v1/cluster/nodes" | jq "."
* kinit required in secured cluster.
46
3
Server provisioning
and management
Background for Big Data systems
 Virtualization vs Bare Metal
47
Bare Metal Virtualization (Cloud)
Management
(Operation)
Quite Complicate Easy
Performance Best performance Bottleneck always
Solutions Many legacy way .. AWS , OpenStack ..
 What’s your choice ?
• Big Data , especially Hadoop needs more resource.
• Bare Metal is best way to maximize HW power.
Background for Big Data systems
 Server capacity is most important thing for Big Data
• Cheaper HW, we don’t care about warranty, cheaper parts,
furthermore NO REDUDANCY.
• Just what we want is more and more servers.
48
But it scared you,,, Don’t we afraid trouble? No, we don’t.
Here, Full automation OS provisioning should work for Big Data
 Only bare metal, but it’s most likely cloud
• Full automation for OS installation.
• Full stack management with Chef.
• Everything should be there when you click.
Automation Provisioning and Operation
49
CHEF
Dash Board
Organization
Role/Recipe
Host name
Custom data
for Rakuten
Provisioning
OS Provisioning
with Chef
API
Worker
Scratch Controller
Installation
Management
Monitoring
Operation
Configuration for you App
All Operation by Chef
App Deploy Monitoring
Request new server
Recipe you built for your App
Recipe for Application
By DevOps
Engineering
MAAS
DNS API
Power
DNS
Shinken Graphite
Full Automated Operation
Connect
Control Control
Provisioning, Just 3 Step
50
1st Step
• Chose Server
2nd Step
• Chose Action
• Install
• Destroy
3rd Step
• Hostname
• OS distribution/version
• Tenant and environment
• Recipes of your application
Final, click and get it
Hey, I want
new server
Just Do It
Provisioning Process
51
InstallOS SetupOS SetupApp
Provisioning Core
Default
Infra Role
App Role
Manage App’s recipes
Default Infra
Monitoring
App Monitoring
Basic Install
DNS entry
OS / APP
Configuration
Monitoring
Configuration
Finish
Task
Worker
Approximately 30 min
Request via GUI/API
MAAS CHEF CHEF
Full Stack Management
 Management not only Infra but also Hadoop
52
• Designed by ApplicationApp Monitoring
• Designed by Application
• Custom Package by ApplicationApp Deployment
• Custom OS Configuration
• Chef OrganizationCustom Configuration
• Default OS monitoringInfra Monitoring
• Default Configuration on OS
• Basic PackagesOS Configuration
• Simple image
• Disk Partitioning / Raid configurationOS Installation
• Detail H/W Spec
• Custom Information for BDDInventory Data
Role/Recipe
Infra Base
App XX
Organization
MAAS
Chef
Tool
Platform XX
Provisioning
Core
Criteria
53
4
Most important thing
at the last
We are hiring!
 Now Rakuten really focuses on utilizing Rakuten’s rich data. So,
Hadoop will be more and more important.
 Current hadoop admin team
• Leader (double-posts)
• 3 members (2 full-time and 1 double-posts)
 So, we need 2 or 3 more engineers for our team!
• Just mail to me, I can help you for your application!
http://global.rakuten.com/corp/careers/
tomomichi.hirano@rakuten.com
54

Mais conteúdo relacionado

Mais procurados

A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
Data Migration with Spark to Hive
Data Migration with Spark to HiveData Migration with Spark to Hive
Data Migration with Spark to HiveDatabricks
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtcYahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtcYahoo!デベロッパーネットワーク
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Antonios Katsarakis
 

Mais procurados (20)

A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Data Migration with Spark to Hive
Data Migration with Spark to HiveData Migration with Spark to Hive
Data Migration with Spark to Hive
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtcYahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc
Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Cassandra
CassandraCassandra
Cassandra
 
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
 

Destaque

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Destaque (13)

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Semelhante a How to overcome mysterious problems caused by large and multi-tenancy Hadoop cluster at Rakuten

Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationDataWorks Summit
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout timeHany Fahim
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
MongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the FieldMongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the FieldMongoDB
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 

Semelhante a How to overcome mysterious problems caused by large and multi-tenancy Hadoop cluster at Rakuten (20)

Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
MongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the FieldMongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the Field
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 

Mais de DataWorks Summit/Hadoop Summit

Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

How to overcome mysterious problems caused by large and multi-tenancy Hadoop cluster at Rakuten

  • 1. How to overcome mysterious problems caused by large and multi-tenant hadoop cluster at Rakuten Oct/27/2016 Tomomichi Hirano EC Core Technology Department, Rakuten Inc. tomomichi.hirano@rakuten.com
  • 2. Who I am  Tomomichi Hirano (平野 智巌) • Joined to Rakuten in 2013 • Hadoop administrator • Monitoring, tuning and Improving hadoop cluster • Verifying and enabling new hadoop related components • Trouble shooting for all problems • Regular operation such as server and user adding, disk replacement, etc.  Previous team • Server provisioning, networking and HW related things. 2
  • 3. Today’s Agenda  Quick introduction • About our clusters • Hadoop use cases at Rakuten  Mysterious problems • Never ending jobs • DataNode freezing • NameNode freezing • High load after restarting NameNode • Lessons learned 3  Server provisioning and management • Background for Big Data systems • Provisioning and management
  • 5. About our clusters  Production cluster • # of slaves : around 200 • HDFS capacity : around 8PB • # of jobs per day : 30,000 - 50,000 • # of hadoop active user accounts : around 40 • Types of jobs : MR, Hive, Tez, Spark, Pig, sqoop, HBase, Slider, etc.  Other clusters • Another production cluster for Business Continuity (BC). • Some clusters for staging and development. 5
  • 6. MAAS for OS provisioning About our clusters 6 Chef for configuring Provisioning Engine System Management Shinken and PagerDuty for alerting and incident management Splunk for reporting Ganglia and Grafana for graphingSecurity Kerberos for cluster security
  • 7. Analysis Feedback loop Output Input Hadoop use cases at Rakuten 7 shop data purchase data item data user behavioruser membership item search reports for shops search quality search suggest recommendation page design recommendation advertisement event planning site design KPI management marketing and sales
  • 9. 9 Never ending jobs Some jobs were very slow to submit or never ended with a lot of preemption Mystery 1
  • 10. Never ending jobs  Recognized • User began to complain “Hadoop is very slow !!!” • Actually, a lot of jobs were very slow to submit and/or never ended. 10 “Container preempted by scheduler”
  • 11. Never ending jobs  What is “Capacity Scheduler Preemption” • Jobs in high priority queue kill other jobs in low priority queue.  Who kills who? • There were already too many jobs and queues. • Hard to get what was happening at all. • So, we have decided to build our original monitoring system. 11
  • 12. Never ending jobs  Original monitoring with Grafana / Graphite 12 Graphite for hadoop carbon-cache Grafana Collectd graphite-plugin exec-plugin scripts with jq Graphite for infra NameNode ResourceManager via REST API curl -s "${RM}:8088/ws/v1/cluster/apps?state=RUNNING" curl -s "${RM}:8088/ws/v1/cluster/apps?finishedTimeBegin=`date -d '10 minutes ago' +%s%3N`"
  • 13. Never ending jobs  Graphs for YARN cluster 13 < Memory usage of YARN cluster > < Running and Pending jobs > Yellow : # of pending jobs Green : # of running jobs Pending jobs due to lack of memory.
  • 14. Never ending jobs  Graphs to analyze per user 14 < Running jobs per user > < Pending jobs per user > < Memory usage per user > “Our cluster is not slow, your jobs are too much!”
  • 15. Never ending jobs  Never ending jobs with a lot of preemption 15 < YARM memory usage > Too much preempting, maybe killing each other. < Number of preemption per user >
  • 16. Never ending jobs  Turning for preemption, but how long? • Investigated elapse time of each tasks and analyzed with Excel. • 4.5 million tasks per day! 16 curl -s "http://${JH}:19888/ws/v1/history/mapreduce/jobs/${job_id}/tasks" 99% of tasks finished within 5 min.
  • 17. Never ending jobs  Our solution : Cooperation with users • In cluster side, set 10 min for • In user side, we guided like below. 17 Please try to design your jobs so tasks finishes in less than 5 minutes normally which leaves healthy room up to 10 minutes to avoid getting killed in all cases. • Still some preemption, but far less! • Yes, now cluster is under control. • We can see “who kills who” and why now.
  • 18. 18 DataNode freezing DataNode seemed to be freezing for several minutes and went into dead status sometime. Mystery 2
  • 19. DataNode freezing  Recognized • Last contact values of some DataNodes were very high. • Normally, less than 5 sec. • But, sometime 1 min, worst case 10 mins and went into “dead” status. • But recovered without any operation. 19 Last contact of each DataNode * Last contact of DataNode is elapsed time from last successful health check with NameNode.
  • 20. DataNode freezing  Investigated DataNode log • No log output during this issue was happening. • Seemed the DataNode was freezing.  Tried to restart DataNode and reboot OS • Restarting DataNode did not help at all. • Reboot OS cleared this issue for a while, but happened again.  Observation • Not Memory leak. OS related issue? • Had to figure out this issue happened on which nodes and when. 20
  • 21. DataNode freezing  Added graph to monitor last contact value. 21 curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo" < Last contact of each DataNode > Figured out this issue happened only on newly added DataNodes. --- 200 sec --- --- 100 sec --- --- 0 sec ---
  • 22. DataNode freezing  Analyze with some other graphs • Graphs for OS iowait and HDFS usage • Trigger of this issue seemed high load caused by HDFS write. 22
  • 23. DataNode freezing  Many tries, but still no help • Increasing DataNode heap, increasing handler count, upgrading OS, etc.  Then, took thread dump and analyzed • To figure out it was actual freezing or not. • To figure out wrong thread blocks other thread? 23 ${java home}/bin/jcmd ${pid of target JVM} Thread.print ${java home}/bin/jstack ${pid of target JVM} Note : need to execute with process owner account.
  • 24. DataNode freezing  Thread dump analysis • “heartbeating”, “DataXceiver” and “PacketResponder” were blocked by a thread named “Thread-41”. 24 "DataNode: XXX heartbeating to ${NAMENODE}:8020" daemon prio=10 tid=0x0000000002156000 nid=0xf26 waiting for monitor entry [0x00007f9dd315a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.deleteBlock(BlockPoolSliceScanner.java:305) - waiting to lock <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.deleteBlocks(BlockPoolSliceScanner.java:330) ... "Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000] ... at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlockInfo(BlockPoolSliceScanner.java:237) - locked <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.assignInitialVerificationTimes(BlockPoolSliceScanner.java:602) - locked <0x00000006fc309158> (a org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:645) ... Blocked Blocking
  • 25. DataNode freezing  What is “Thread-41”? • Seems, progressing something with java “TreeMap”. 25 ... "Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000] java.lang.Thread.State: RUNNABLE at java.util.TreeMap.put(TreeMap.java:2019) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlockInfo(BlockPoolSliceScanner.java:243) ... "Thread-41" daemon prio=10 tid=0x00007f9dec7bf800 nid=0x1097 runnable [0x00007f9dd1c87000] java.lang.Thread.State: RUNNABLE at java.util.TreeMap.remove(TreeMap.java:2382) at java.util.TreeSet.remove(TreeSet.java:276) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.delBlockInfo(BlockPoolSliceScanner.java:253) ...
  • 26. DataNode freezing  Source code reading • org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner 26 Scans the block files under a block pool and verifies that the files are not corrupt. This keeps track of blocks and their last verification times. private static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks • 3 weeks??? • DataNode scans blocks more than once in 3 weeks by default. • So, it’s must be “Datanode Block Scanner” related something!
  • 27. DataNode freezing  Found a workaround! • Found there were something strange behavior for creating block map to decide which blocks will be scanned. • Also found two files for “Datanode Block Scanner” in local FS. 27 dncp_block_verification.log.curr dncp_block_verification.log.prev  So, we tried to delete them and restart DN • Kind of initialization for “Datanode Block Scanner”. • Then this issue never happened after that!
  • 28. DataNode freezing  Don’t worry! • This issue should have been improved already with HDFS 2.7.0. 28 https://issues.apache.org/jira/browse/HDFS-7430 Rewrite the BlockScanner to use O(1) memory and use multiple thread  Lessons learned • Thread dump and source code reading for deep analysis. • Especially, in case that we can’t get any clues from logs.
  • 29. 29 NameNode freezing NameNode seemed to be freezing for several minutes repeatedly with an interval. Mystery 3
  • 30. NameNode freezing  Recognized • One day, we recognized strange behavior with ResourceManager. 30 < Memory usage of YARN cluster > < Running and pending jobs in last 10 min > • Seemed, ResourceManager couldn’t accept new jobs. • But running jobs were ok.
  • 31. NameNode freezing  Added some graphs for NameNodes. Must be HDFS checkpoint. 31 < lastTxId, checkpointTxId > < if_octets.tx, if_octets.rx > < RpcQueueTimeAveTime, RpcProcessingTimeAveTime > < CallQueueLength >
  • 32. NameNode freezing  Monitoring continuously. • Then we could catch difference before and after NameNode failover. • Checkpoint on standby NameNode should not affect to active NameNode. • But, actually affected! 32 • White line is a fail-over from second (nn2) to first (nn1). • Happened only when second NameNode was active.
  • 33. NameNode freezing  HDFS-7858 • Improve HA Namenode Failover detection on the client • Fix Version/s : 2.8.0, 3.0.0-alpha1  HDFS-6763 • Initialize file system-wide quota once on transitioning to active • Fix Version/s : 2.8.0, 3.0.0-alpha1  Workaround for now • Our current workaround is just keeping first NameNode active. • So, strongly want backports of them to an available HDP version! 33
  • 34. 34 High load after restarting NameNode NameNode went into unstable state by this unknown high load. Mystery 4
  • 35. High load after restarting NameNode  Symptom • We met unknown high load after restarting NameNode in several times. • It suddenly disappeared several hours or a few days after. • But the last one, it had never gone... • During this high load was existing, NameNode went into very unstable state. • When it happened after on standby NameNode, we couldn’t fail- over (fail-back). • Very serious problem for us! 35
  • 36. High load after restarting NameNode  Added graphs for RPC queue activities in NameNode • Unknown high load between checkpoint 36 < lastTxId (Green) > < checkpointTxId (Yellow) > < Waiting Time (Yellow) > < QueueLength > Good case Bad case < Processing Time (Red) >
  • 37. High load after restarting NameNode  Multiple graphs 37 • NameNode seemed to be receiving some amount of data from someone. • Journal nodes? No... • DataNode? Hard to know... • But, it must relate to high load! < Receive data size (Blue) >
  • 38. High load after restarting NameNode  DataNode log analysis • 3 kinds of 60000 msec timeout were continuously being output. 38 2016-09-28 05:33:35,384 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From XXXX to bhdXXXX:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XXXX remote=XXX]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:616) 60 sec timeout log 3 methods in a class “BPServiceActor” were failing repeatedly. at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:523) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:312)
  • 39. High load after restarting NameNode  Thread dump analysis on NameNode side • Almost all server handers were waiting for one lock. 39 "IPC Server handler 45 on 8020" daemon prio=10 tid=0x00007fbed169f800 nid=0x26b2 waiting on condition [0x00007f9e05be1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fa30ad4d890> (a java.util.concurrent.locks.ReentrantLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) .... at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(...) "IPC Server handler 45 on 8020" daemon prio=10 tid=0x00007fbed169f800 nid=0x26b2 waiting on condition [0x00007f9e05be1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fa30ad4d890> (a java.util.concurrent.locks.ReentrantLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) .... at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(...)
  • 40. High load after restarting NameNode  Source code reading • org.apache.hadoop.hdfs.server.datanode.BPServiceActor • Seems to handle communication with NameNode. 40 private void offerService() throws Exception { LOG.info("For namenode " + nnAddr + " using" + " DELETEREPORT_INTERVAL of " + dnConf.deleteReportInterval + " msec " + " BLOCKREPORT_INTERVAL of " + dnConf.blockReportInterval + "msec" + " CACHEREPORT_INTERVAL of " + dnConf.cacheReportInterval + "msec" + " Initial delay: " + dnConf.initialBlockReportDelay + "msec" + "; heartBeatInterval=" + dnConf.heartBeatInterval); INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode bhdpXXXX:8020 using DELETEREPORT_INTERVAL of 500000 msec BLOCKREPORT_INTERVAL of 21600000msec <= 6 hours CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=5000 <= 5 sec Source code Actual DN’s log
  • 41. High load after restarting NameNode  DataNode log analysis again • Failed to sent block report repeatedly with short time interval. 41 ... 2016-10-03 03:40:47,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ... 2016-10-03 03:44:19,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ... 2016-10-03 03:47:43,464 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report ... ...
  • 42. High load after restarting NameNode  What was the high load? • Almost all DataNodes failed to send full block report and retrying with a few minutes interval. • Yes, it was “Block Report Storm” from all of DataNodes. • During this storm was existing, full-BlockReport had never succeeded.  Then, how to stop the storm? • Have to reduce concurrency of these request somehow. 42
  • 43. High load after restarting NameNode  Tries and errors • Manual arbitration with iptables • Worked well, but a little bit tricky. • And some DataNodes lost heartbeart with Active NameNode sometimes. 43 DN DN DN Standby NN Active NN DN DN • Restart NameNode with different slaves files • Worked in several time, but unfortunately, we got whole cluster down. • So, you MUST NOT do this operation!!! • Most safety way • NameNode in startup phase discards non-initial block report. • So, increase dfs.namenode.safemode.extension and wait.
  • 44.  Monitor, Monitor, Monitor!!! • Graphing tool is MUST for large and multi-tenant cluster. • Investigation and monitoring with multiple graphs would great help.  Cooperation with users • For some issues, we have to solve cluster problem with users.  Thread dump and source code reading for deep analysis • In case that we can’t get any clues from logs, it’s very important. • Thread dumping would be helpful for freezing or locking issues especially. 44 Lessons learned from mysteries
  • 45.  Query examples for NameNode and ResourceManager 45 Just as a reference Contents Queries HDFS cluster curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState" NameNode JVM info curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=JvmMetrics" NameNode and DataNode curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo" NameNode state curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus" NameNode RPC curl -s "${NN}:50070/jmx?qry=Hadoop:service=NameNode,name=RpcActivityForPort8020" NameNode CMS curl -s "${NN}:50070/jmx?qry=java.lang:type=GarbageCollector,name=ConcurrentMarkSweep" NameNode Heap curl -s "${NN}:50070/jmx?qry=java.lang:type=Memory" List of a HDFS directory * curl -s --negotiate -u : "${NN}:50070/webhdfs/v1/${HDFS_PATH}?&op=LISTSTATUS" Usage of a HDFS directory * curl -s --negotiate -u : "${NN}:50070/webhdfs/v1/${HDFS_PATH}?&op=GETCONTENTSUMMARY" jobs finished in last 10 min curl -s "${RM}:8088/ws/v1/cluster/apps?finishedTimeBegin=`date -d '10 minutes ago' +%s%3N`" running jobs curl -s "${RM}:8088/ws/v1/cluster/apps?state=RUNNING" accepted jobs curl -s "${RM}:8088/ws/v1/cluster/apps?state=ACCEPTED" ResourceManager status curl -s "${RM}:8088/ws/v1/cluster/info” YARN cluster curl -s "${RM}:8088/ws/v1/cluster/metrics" | jq "." NodeManager curl -s "${RM}:8088/ws/v1/cluster/nodes" | jq "." * kinit required in secured cluster.
  • 47. Background for Big Data systems  Virtualization vs Bare Metal 47 Bare Metal Virtualization (Cloud) Management (Operation) Quite Complicate Easy Performance Best performance Bottleneck always Solutions Many legacy way .. AWS , OpenStack ..  What’s your choice ? • Big Data , especially Hadoop needs more resource. • Bare Metal is best way to maximize HW power.
  • 48. Background for Big Data systems  Server capacity is most important thing for Big Data • Cheaper HW, we don’t care about warranty, cheaper parts, furthermore NO REDUDANCY. • Just what we want is more and more servers. 48 But it scared you,,, Don’t we afraid trouble? No, we don’t. Here, Full automation OS provisioning should work for Big Data  Only bare metal, but it’s most likely cloud • Full automation for OS installation. • Full stack management with Chef. • Everything should be there when you click.
  • 49. Automation Provisioning and Operation 49 CHEF Dash Board Organization Role/Recipe Host name Custom data for Rakuten Provisioning OS Provisioning with Chef API Worker Scratch Controller Installation Management Monitoring Operation Configuration for you App All Operation by Chef App Deploy Monitoring Request new server Recipe you built for your App Recipe for Application By DevOps Engineering MAAS DNS API Power DNS Shinken Graphite Full Automated Operation Connect Control Control
  • 50. Provisioning, Just 3 Step 50 1st Step • Chose Server 2nd Step • Chose Action • Install • Destroy 3rd Step • Hostname • OS distribution/version • Tenant and environment • Recipes of your application Final, click and get it Hey, I want new server Just Do It
  • 51. Provisioning Process 51 InstallOS SetupOS SetupApp Provisioning Core Default Infra Role App Role Manage App’s recipes Default Infra Monitoring App Monitoring Basic Install DNS entry OS / APP Configuration Monitoring Configuration Finish Task Worker Approximately 30 min Request via GUI/API MAAS CHEF CHEF
  • 52. Full Stack Management  Management not only Infra but also Hadoop 52 • Designed by ApplicationApp Monitoring • Designed by Application • Custom Package by ApplicationApp Deployment • Custom OS Configuration • Chef OrganizationCustom Configuration • Default OS monitoringInfra Monitoring • Default Configuration on OS • Basic PackagesOS Configuration • Simple image • Disk Partitioning / Raid configurationOS Installation • Detail H/W Spec • Custom Information for BDDInventory Data Role/Recipe Infra Base App XX Organization MAAS Chef Tool Platform XX Provisioning Core Criteria
  • 54. We are hiring!  Now Rakuten really focuses on utilizing Rakuten’s rich data. So, Hadoop will be more and more important.  Current hadoop admin team • Leader (double-posts) • 3 members (2 full-time and 1 double-posts)  So, we need 2 or 3 more engineers for our team! • Just mail to me, I can help you for your application! http://global.rakuten.com/corp/careers/ tomomichi.hirano@rakuten.com 54