SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
1	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Keep your Hadoop
cluster at its best!
Chris Nauroth
Sheetal Dolas
Hadoop Summit, San Jose, 2016
2	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
About Us
⬢  Principal Engineer @ Hortonworks
⬢  Committer and PMC, Apache Hadoop
–  Key	
  contributor	
  to	
  HDFS	
  ACLs,	
  Windows	
  compaJbility,	
  and	
  operability	
  improvements	
  
⬢  Hadoop user since 2010
–  Experience	
  deploying,	
  maintaining	
  and	
  using	
  Hadoop	
  clusters	
  
cnauroth@hortonworks.com
cnauroth
Chris Nauroth
3	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
About Us
⬢  SmartSense Engineering Lead @ Hortonworks
⬢  Most of the career has been in the field, solving real life business problems
⬢  Last 6+ years in Big Data
⬢  Committer and PMC, Apache Metron
sheetal@hortonworks.com
sheetal_dolas
Sheetal Dolas
4	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Agenda
⬢  Days in a life of Hadoop users – Real war stories!
⬢  Hadoop Operational Challenges
⬢  Winning and avoiding the wars
⬢  Q & A
5	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Days in a life of
Hadoop users
Real war stories!
6	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Story I: Unstable NameNode, Frequent Fail Overs
⬢  NameNode periodically becomes unresponsive
⬢  In HA scenario, fails over to standby
⬢  In short time, falls back again
⬢  Very frequent fail overs and fail backs
It was the garbage collection!
7	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Story II: Very high CPU usage but low throughput
⬢  Unusually high system CPU usage
⬢  Jobs slowed down
⬢  Reduced data IO
System CPU
User CPU N/W IO
Transparent Huge Pages (THP) was turned on!
8	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Job
Perfor
mance
Cluster
Stability
Story III: Cascading impact and cluster melt down
⬢  HDFS upgraded
⬢  HDFS utilization kept on increasing even after large data deletion
⬢  Rebalancing made the situation worse
⬢  Eventually HDFS became unresponsive
un-finalized HDFS had cascading
impact on cluster!
9	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Story IV: Overloaded cluster
⬢  Jobs run slower
⬢  Always waiting containers and jobs, all YARN queues are fully utilized
⬢  Some jobs had to wait for hours to get the container slots
Sub optimally configured container sizes!
Requested Memory
Used Memory
10	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Story V: Accidental deletion of critical datasets
⬢  User accidentally executed hdfs dfs -rm -R on a root directory
⬢  Delete is issued in parallel, control + c did not help
⬢  In panic, user shuts down HDFS immediately (fortunately)
⬢  Restarts later to check trash, loses all data
⬢  It’s nearly impossible to recover blocks from local file system
This is a more common mistake than one may think!
11	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Story VI: Hive query returning random results
⬢  A hive query returns different results every time
⬢  Results are usually accurate during office hours
⬢  After office hours, results keep changing randomly on every execution
-- QUERY: WHAT IS TODAY’S TOTAL SALE AS OF NOW ?
SELECT SUM(amount)
FROM sales
WHERE sale_date = TO_DATE (UNIX_TIMESTAMP()) 	
  
One of the host had a different time zone!
12	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
and the stories
continue…
Hadoop
operational
challenges
14	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Hadoop has lots of configurations
⬢  So many configurations! Overwhelming for many users
⬢  Best practices are evolving and change across versions
15	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Many configurations are cluster and workload specific
⬢  A configuration good for one cluster may not be suitable for another cluster
⬢  Optimally configured clusters may become sub optimal tomorrow as they grow
16	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Large clusters add to the complexities
⬢  Managing, updating and keeping nodes in sync becomes challenging
⬢  Nodes going down miss the maintenance cycles and get out of sync
⬢  Newly added nodes may have different standards (java version, os, user
configurations etc.)
⬢  Clusters start having heterogeneous hardware over period of time
17	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Winning 
and
avoiding
the wars with
SmartSense
18	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
⬢  Proactive support  personalized cluster insights by
–  Enabling	
  faster	
  case	
  resoluJon	
  
–  Applying	
  industry	
  best	
  pracJces	
  
–  Providing	
  proacJve	
  analysis	
  
⬢  SmartSense is a collection of tools and services
–  Evaluates	
  cluster’s	
  current	
  configuraJon	
  and	
  runJme	
  environment	
  against	
  rich	
  set	
  of	
  rules	
  
–  Rules	
  are	
  dynamic,	
  reacJng	
  to	
  thresholds	
  tailored	
  to	
  the	
  specific	
  cluster	
  and	
  its	
  workloads	
  
–  ConJnuously	
  evolving	
  and	
  improving	
  rule	
  sets,	
  developed	
  by	
  or	
  in	
  close	
  consultaJon	
  with	
  acJve	
  
commiWers,	
  support	
  engineers,	
  field	
  engineers.	
  
SmartSense
19 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
AGENT	
   AGENT	
  
AGENT	
  AGENT	
  AGENT	
  
AGENT	
  
LANDING	
  ZONE	
  
SERVER	
  
AMBARI	
  
AGENT	
   AGENT	
  
AGENT	
  AGENT	
  AGENT	
  
AGENT	
  
BUNDLE	
  
WORKER	
  
NODE	
  
WORKER	
  
NODE	
  
WORKER	
  
NODE	
  
WORKER	
  
NODE	
  
WORKER	
  
NODE	
  
WORKER	
  
NODE	
  
SmartSense	
  
AnalyJcs	
  
SmartSense Architecture
GATEWAY	
  
20 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing: Unstable NameNode, Frequent Fail Overs
Daunting Questions
⬢  What is right Heap size for my
NN ?
⬢  What should be the new gen
size ?
⬢  Which GC should I use ?
⬢  What GC options to be
configured?
⬢  What if my cluster grows ?
SmartSense Answer
⬢  Rule: hdfs_nn_jvm_opts
⬢  Calculates Heap size based on
–  Current	
  heap	
  usage	
  
–  Total	
  number	
  of	
  objects	
  in	
  file	
  system	
  
–  Best	
  pracJces	
  
⬢  Recalculates dependent JVM
options based on Heap size
⬢  Validates existing JVM opts
⬢  Provides continuous validations
and proactive recommendations
21 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Ã  Heap Size
–  200 bytes per HDFS object (files, directories, blocks)
–  25 % buffer
Ã  -Xms should be same as –Xmx
Ã  New generation size should be 1/8th of –Xmx (capped at 8G)
Ã  Use Concurrent Mark Sweep (CMS) Garbage Collection
–  -XX:+UseConcMarkSweepGC
–  -XX:CMSInitiatingOccupancyFraction=70
–  -XX:+UseCMSInitiatingOccupancyOnly
–  -XX:ParallelGCThreads=8
NameNode JVM Opts
22 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing: Very high CPU usage but low throughput
Daunting Questions
⬢  Is THP applicable to my OS
version ?
⬢  Is it disabled ? Completely
disabled ?
⬢  How do I make sure it is disabled
on newly added nodes too ?
⬢  How do I make these
configurations person
independent ?
SmartSense Answer
⬢  Rule: os_thp
⬢  Checks if thp is completely
disabled
⬢  Provides OS specific disabling
instructions
⬢  Continuous evaluation that
validates newly added nodes and
re-commissioned nodes
23 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Disable THP
⬢  For RedHat  CentOS
echo never  /sys/kernel/mm/redhat_transparent_hugepage/enabled
⬢  For Debian, Ubuntu  SUSE
echo never  /sys/kernel/mm/transparent_hugepage/enabled
System CPU
User CPU
N/W IO
24 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing: Cascading impact and cluster melt down
Daunting Questions
⬢  Should I finalize upgrade ?
⬢  What is right time to finalize ?
⬢  How do I make sure it does not
fall through cracks ?
SmartSense Answer
⬢  Rule: hdfs_nn_finalize_upgrade
⬢  Checks HDFS health after
upgrade
⬢  Evaluates how long HDFS is
running in un-finalized state
⬢  Reminds until it is finalized
25 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Ã  Check NN UI / JMX for upgrade status
Ã  Do not finalize HDFS upgrade until
–  All files and blocks have been verified after upgrade
–  Critical jobs have been executed at least once after upgrade
Ã  Finalize between 2 - 7 days after upgrade
hdfs dfsadmin -finalizeUpgrade
HDFS Upgrade finalization
26 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing : Overloaded cluster
Daunting Questions
⬢  What is right container size for
my cluster ?
⬢  If I add additional components
(HBase, Storm), how does the
container size change ?
⬢  How does container sizes change
when I add new types of nodes in
the cluster ?
⬢  What’s impact on container sizes
if I add SSDs to the nodes?
SmartSense Answer
⬢  Rules: yarn_container_size,
mr_container_size,
tez_container_size
⬢  Evaluates resources available on
individual host (CPU, Memory,
Disks, Running Services etc.)
⬢  Calculates technology specific
container sizes (MR,Tez, Hive)
⬢  Continuously evaluates as the
cluster dynamics change
27 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Container sizing
Ã  Identify resources (CPU, Memory, Disks) available on each node
Ã  Keep aside resources required for other processes (OS, DN, NM, HBase RS)
Ã  Calculate max possible containers for each resource (CPU, Memory, Disks)
–  CPU Containers: 4x cores
–  Disk Containers: ( 3x HDD + 10x SSD )
–  Memory Containers: (Available RAM / 2 )
Ã  Number of containers = Min (CPU Containers, Disk Containers, Memory
Containers)
28 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing: Accidental deletion of critical datasets
Daunting Questions
⬢  Is HDFS trash enabled ?
⬢  What is safe trash interval ?
⬢  How to prevent accidental
deletion of critical data ?
SmartSense Answer
⬢  Rule: hdfs_trash_interval
–  Checks	
  if	
  trash	
  is	
  enabled	
  
–  Validates	
  if	
  trash	
  interval	
  is	
  within	
  
reasonable	
  limits	
  
⬢  Rule: hdfs_nn_protect_imp_dirs
–  New	
  feature	
  available	
  in	
  Hadoop	
  2.8	
  
–  Helps	
  you	
  mark	
  criJcal	
  directories	
  such	
  
as	
  	
  “/”,	
  	
  “/user”,	
  “/user/apps/hive”,	
  “/
user/apps/hbase”	
  etc.	
  are	
  delete	
  
protected.	
  
29 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
HDFS Trash interval and directory protection
Ã  fs.trash.interval detects number of minutes after which the trashed data
gets deleted
–  0 means trash disabled (data gets deleted immediately)
–  Keep it the range 1440 (1 day) – 10080 (7 days)
–  Recommended 4320 (3 days)
Ã  fs.protected.directories specifies directories that will be delete
protected
–  Available from Hadoop 2.8
–  List all key directories there (/, /user,/user/apps, /user/apps/hive,
/user/apps/hbase, /user/apps/hbase/data, /mapred, /mapred/
system, /tmp etc. )
30 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Addressing : Hive query returning random results
Daunting Questions
⬢  Is my cluster configured
consistently ?
⬢  How do I prevent such hard to
analyze issues ?
⬢  How do I make sure newly added
do not bring these types of
issues ?
⬢  How do I make these set ups
person independent ?
SmartSense Answer
⬢  Rule: os_time_zone
⬢  Checks if all hosts have same time
zone
⬢  Rule os_service_ntpd_on make
sure all host times are in sync
⬢  Continuous evaluation that
validates newly added nodes and
re-commissioned nodes
31 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
There are 250+ more such rules
Operations
Ã  hdfs_dn_volume_tolerance
Ã  hdfs_dn_xceivers
Ã  hdfs_nn_handler_count
Ã  …
Ã  yarn_zk_quorum
Ã  yarn_nm_recovery
Ã  …
Ã  os_hostname_reverse_lookup
Ã  os_ssd_tuning
Ã  …
Ã  hive_mr_strict_mode
Ã  hive_datanucleus_cache
Ã  …
Ã  tez_am_heap
Ã  tez_shuffle_buffer
Ã  …
Performance
Ã  ams_mc_distributed_configs
Ã  ams_mc_write_path
Ã  ...
Ã  hbase_jvm_opts
Ã  hbase_rs_open_region_threads
Ã  hbase_tcp_nodelay
Ã  ...
Ã  hdfs_dn_jvm_opts
Ã  hdfs_mount_options
Ã  hdfs_nn_dn_staleness_interval
Ã  ...
Ã  hive_auto_convert_join
Ã  hive_disable_caching
Ã  hive_enable_cbo
Ã  ...
Security
Ã  hdfs_dn_volume_tolerance
Ã  hdfs_audit_log
Ã  hdfs_block_access_token
Ã  hdfs_enable_security_check
Ã  hdfs_nn_super_user_group
Ã  hdfs_zkfc_ha_acl
Ã  ...
Ã  ranger_policy_refresh_interval
Ã  smartsense_2_way_ssl_enabled
Ã  ...
Ã  yarn_ats_security
Ã  yarn_enable_acl
Ã  ...
32 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
There is more than just configurations
How do I
show back/
charge back
my tenants ?
Who are the
top users of
my platform ?
What type of
work loads are
running on my
cluster ?
Which jobs
have significant
impact on my
cluster ?
How do I
improve
performance
of key jobs ?
What is good
time for
maintenance?
33 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Activity Analysis
34 © Hortonworks Inc. 2011 – 2016.All Rights Reserved
Summary
Ã  There are many things involved in managing Hadoop cluster
Ã  Best practices evolve and change across versions
Ã  What is optimal today may not be optimal for tomorrow
Ã  Changing cluster dynamics, workload characteristic need continuous re-evaluation
and configuration adjustments
Ã  SmartSense can significantly help avoid common mistakes, issues, pitfalls and
simplify Hadoop operations
35	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Lets keep your
Hadoop cluster at
its best!
Thank You!
36	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Appendix
37	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
More Resources
⬢  https://docs.hortonworks.com/index.html
⬢  http://hortonworks.com/products/subscriptions/smartsense/
⬢  http://hortonworks.com/info/smartsense/
⬢  http://hortonworks.com/blog/introducing-hortonworks-smartsense/
⬢  https://www.youtube.com/watch?v=IKulo9c8PjE
⬢  https://community.hortonworks.com/topics/smartsense.html
38	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
SmartSense Bundle Security
⬢  All	
  Bundles	
  are	
  Anonymized	
  and	
  Encrypted	
  
⬢  MulJple	
  built-­‐in	
  security	
  measures	
  
–  Ambari clear text passwords are not collected
–  Hive and Oozie database properties are not collected
–  All IP addresses and host names are anonymized
⬢  Extensible	
  security	
  rules	
  
–  Exclude properties within specific Hadoop configuration files
–  Global REGEX replacements across all configuration, metrics, and logs
39	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
SmartSense Stack Support
HDP 2.4 HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0
SmartSense 1.x
Ambari 2.2
Built-In!
Ambari 2.1
Plug-In
Ambari 2.0
Plug-In
Ambari 1.7 Ambari 1.6
SmartSense 1.x

Mais conteúdo relacionado

Mais procurados

Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 

Mais procurados (20)

Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis
 
Apache metron meetup presentation at capital one
Apache metron meetup presentation at capital oneApache metron meetup presentation at capital one
Apache metron meetup presentation at capital one
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
STORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOPSTORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOP
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 

Destaque

Destaque (20)

潜力怎么挖?
潜力怎么挖?潜力怎么挖?
潜力怎么挖?
 
2016 Ideas Hakathon_社群輿情傾聽與品牌形象維護-智策慧行銷顧問
2016 Ideas Hakathon_社群輿情傾聽與品牌形象維護-智策慧行銷顧問2016 Ideas Hakathon_社群輿情傾聽與品牌形象維護-智策慧行銷顧問
2016 Ideas Hakathon_社群輿情傾聽與品牌形象維護-智策慧行銷顧問
 
Преодоление кризиса обучения в музыкальной школе
Преодоление кризиса обучения в музыкальной школеПреодоление кризиса обучения в музыкальной школе
Преодоление кризиса обучения в музыкальной школе
 
icebreaking of Cornellà meeting
icebreaking of Cornellà meetingicebreaking of Cornellà meeting
icebreaking of Cornellà meeting
 
Eutopia programme of the visit to warsaw 2016
Eutopia programme of the visit to warsaw 2016Eutopia programme of the visit to warsaw 2016
Eutopia programme of the visit to warsaw 2016
 
Hoe zijn de ervaringen met de digitale leeromgeving in PO en VO?
Hoe zijn de ervaringen met de digitale leeromgeving in PO en VO?Hoe zijn de ervaringen met de digitale leeromgeving in PO en VO?
Hoe zijn de ervaringen met de digitale leeromgeving in PO en VO?
 
電子學Ch1補充影片
電子學Ch1補充影片電子學Ch1補充影片
電子學Ch1補充影片
 
Exoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraExoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandra
 
機械製造ⅱ隨堂講義 第7章(學生本)
機械製造ⅱ隨堂講義 第7章(學生本)機械製造ⅱ隨堂講義 第7章(學生本)
機械製造ⅱ隨堂講義 第7章(學生本)
 
Kohl's Pay
Kohl's PayKohl's Pay
Kohl's Pay
 
Hoe betrek je docenten bij het inrichten van de leeromgeving?
Hoe betrek je docenten bij het inrichten van de leeromgeving?Hoe betrek je docenten bij het inrichten van de leeromgeving?
Hoe betrek je docenten bij het inrichten van de leeromgeving?
 
數位邏輯實習(實習材料總表)
數位邏輯實習(實習材料總表)數位邏輯實習(實習材料總表)
數位邏輯實習(實習材料總表)
 
74【領導管理】優秀經理人曾走過的11道錯誤
74【領導管理】優秀經理人曾走過的11道錯誤74【領導管理】優秀經理人曾走過的11道錯誤
74【領導管理】優秀經理人曾走過的11道錯誤
 
電工機械II第13章自我評量
電工機械II第13章自我評量電工機械II第13章自我評量
電工機械II第13章自我評量
 
創業管理實戰:新創企業的成長模式
創業管理實戰:新創企業的成長模式創業管理實戰:新創企業的成長模式
創業管理實戰:新創企業的成長模式
 
69【簡報設計】賈伯斯簡報的15個秘訣
69【簡報設計】賈伯斯簡報的15個秘訣69【簡報設計】賈伯斯簡報的15個秘訣
69【簡報設計】賈伯斯簡報的15個秘訣
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Castillo
CastilloCastillo
Castillo
 
LLUNA VERMELLA
LLUNA VERMELLALLUNA VERMELLA
LLUNA VERMELLA
 
Pantallas táctiles
Pantallas táctilesPantallas táctiles
Pantallas táctiles
 

Semelhante a Keep your Hadoop cluster at its best!

Semelhante a Keep your Hadoop cluster at its best! (20)

Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Containers and Big Data
Containers and Big Data Containers and Big Data
Containers and Big Data
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Keep your Hadoop cluster at its best!

  • 1. 1   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Keep your Hadoop cluster at its best! Chris Nauroth Sheetal Dolas Hadoop Summit, San Jose, 2016
  • 2. 2   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   About Us ⬢  Principal Engineer @ Hortonworks ⬢  Committer and PMC, Apache Hadoop –  Key  contributor  to  HDFS  ACLs,  Windows  compaJbility,  and  operability  improvements   ⬢  Hadoop user since 2010 –  Experience  deploying,  maintaining  and  using  Hadoop  clusters   cnauroth@hortonworks.com cnauroth Chris Nauroth
  • 3. 3   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   About Us ⬢  SmartSense Engineering Lead @ Hortonworks ⬢  Most of the career has been in the field, solving real life business problems ⬢  Last 6+ years in Big Data ⬢  Committer and PMC, Apache Metron sheetal@hortonworks.com sheetal_dolas Sheetal Dolas
  • 4. 4   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Agenda ⬢  Days in a life of Hadoop users – Real war stories! ⬢  Hadoop Operational Challenges ⬢  Winning and avoiding the wars ⬢  Q & A
  • 5. 5   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Days in a life of Hadoop users Real war stories!
  • 6. 6   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Story I: Unstable NameNode, Frequent Fail Overs ⬢  NameNode periodically becomes unresponsive ⬢  In HA scenario, fails over to standby ⬢  In short time, falls back again ⬢  Very frequent fail overs and fail backs It was the garbage collection!
  • 7. 7   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Story II: Very high CPU usage but low throughput ⬢  Unusually high system CPU usage ⬢  Jobs slowed down ⬢  Reduced data IO System CPU User CPU N/W IO Transparent Huge Pages (THP) was turned on!
  • 8. 8   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Job Perfor mance Cluster Stability Story III: Cascading impact and cluster melt down ⬢  HDFS upgraded ⬢  HDFS utilization kept on increasing even after large data deletion ⬢  Rebalancing made the situation worse ⬢  Eventually HDFS became unresponsive un-finalized HDFS had cascading impact on cluster!
  • 9. 9   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Story IV: Overloaded cluster ⬢  Jobs run slower ⬢  Always waiting containers and jobs, all YARN queues are fully utilized ⬢  Some jobs had to wait for hours to get the container slots Sub optimally configured container sizes! Requested Memory Used Memory
  • 10. 10   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Story V: Accidental deletion of critical datasets ⬢  User accidentally executed hdfs dfs -rm -R on a root directory ⬢  Delete is issued in parallel, control + c did not help ⬢  In panic, user shuts down HDFS immediately (fortunately) ⬢  Restarts later to check trash, loses all data ⬢  It’s nearly impossible to recover blocks from local file system This is a more common mistake than one may think!
  • 11. 11   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Story VI: Hive query returning random results ⬢  A hive query returns different results every time ⬢  Results are usually accurate during office hours ⬢  After office hours, results keep changing randomly on every execution -- QUERY: WHAT IS TODAY’S TOTAL SALE AS OF NOW ? SELECT SUM(amount) FROM sales WHERE sale_date = TO_DATE (UNIX_TIMESTAMP())   One of the host had a different time zone!
  • 12. 12   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   and the stories continue…
  • 14. 14   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Hadoop has lots of configurations ⬢  So many configurations! Overwhelming for many users ⬢  Best practices are evolving and change across versions
  • 15. 15   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Many configurations are cluster and workload specific ⬢  A configuration good for one cluster may not be suitable for another cluster ⬢  Optimally configured clusters may become sub optimal tomorrow as they grow
  • 16. 16   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Large clusters add to the complexities ⬢  Managing, updating and keeping nodes in sync becomes challenging ⬢  Nodes going down miss the maintenance cycles and get out of sync ⬢  Newly added nodes may have different standards (java version, os, user configurations etc.) ⬢  Clusters start having heterogeneous hardware over period of time
  • 17. 17   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Winning and avoiding the wars with SmartSense
  • 18. 18   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   ⬢  Proactive support personalized cluster insights by –  Enabling  faster  case  resoluJon   –  Applying  industry  best  pracJces   –  Providing  proacJve  analysis   ⬢  SmartSense is a collection of tools and services –  Evaluates  cluster’s  current  configuraJon  and  runJme  environment  against  rich  set  of  rules   –  Rules  are  dynamic,  reacJng  to  thresholds  tailored  to  the  specific  cluster  and  its  workloads   –  ConJnuously  evolving  and  improving  rule  sets,  developed  by  or  in  close  consultaJon  with  acJve   commiWers,  support  engineers,  field  engineers.   SmartSense
  • 19. 19 © Hortonworks Inc. 2011 – 2016.All Rights Reserved AGENT   AGENT   AGENT  AGENT  AGENT   AGENT   LANDING  ZONE   SERVER   AMBARI   AGENT   AGENT   AGENT  AGENT  AGENT   AGENT   BUNDLE   WORKER   NODE   WORKER   NODE   WORKER   NODE   WORKER   NODE   WORKER   NODE   WORKER   NODE   SmartSense   AnalyJcs   SmartSense Architecture GATEWAY  
  • 20. 20 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing: Unstable NameNode, Frequent Fail Overs Daunting Questions ⬢  What is right Heap size for my NN ? ⬢  What should be the new gen size ? ⬢  Which GC should I use ? ⬢  What GC options to be configured? ⬢  What if my cluster grows ? SmartSense Answer ⬢  Rule: hdfs_nn_jvm_opts ⬢  Calculates Heap size based on –  Current  heap  usage   –  Total  number  of  objects  in  file  system   –  Best  pracJces   ⬢  Recalculates dependent JVM options based on Heap size ⬢  Validates existing JVM opts ⬢  Provides continuous validations and proactive recommendations
  • 21. 21 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Ã  Heap Size –  200 bytes per HDFS object (files, directories, blocks) –  25 % buffer Ã  -Xms should be same as –Xmx Ã  New generation size should be 1/8th of –Xmx (capped at 8G) Ã  Use Concurrent Mark Sweep (CMS) Garbage Collection –  -XX:+UseConcMarkSweepGC –  -XX:CMSInitiatingOccupancyFraction=70 –  -XX:+UseCMSInitiatingOccupancyOnly –  -XX:ParallelGCThreads=8 NameNode JVM Opts
  • 22. 22 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing: Very high CPU usage but low throughput Daunting Questions ⬢  Is THP applicable to my OS version ? ⬢  Is it disabled ? Completely disabled ? ⬢  How do I make sure it is disabled on newly added nodes too ? ⬢  How do I make these configurations person independent ? SmartSense Answer ⬢  Rule: os_thp ⬢  Checks if thp is completely disabled ⬢  Provides OS specific disabling instructions ⬢  Continuous evaluation that validates newly added nodes and re-commissioned nodes
  • 23. 23 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Disable THP ⬢  For RedHat CentOS echo never /sys/kernel/mm/redhat_transparent_hugepage/enabled ⬢  For Debian, Ubuntu SUSE echo never /sys/kernel/mm/transparent_hugepage/enabled System CPU User CPU N/W IO
  • 24. 24 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing: Cascading impact and cluster melt down Daunting Questions ⬢  Should I finalize upgrade ? ⬢  What is right time to finalize ? ⬢  How do I make sure it does not fall through cracks ? SmartSense Answer ⬢  Rule: hdfs_nn_finalize_upgrade ⬢  Checks HDFS health after upgrade ⬢  Evaluates how long HDFS is running in un-finalized state ⬢  Reminds until it is finalized
  • 25. 25 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Ã  Check NN UI / JMX for upgrade status Ã  Do not finalize HDFS upgrade until –  All files and blocks have been verified after upgrade –  Critical jobs have been executed at least once after upgrade Ã  Finalize between 2 - 7 days after upgrade hdfs dfsadmin -finalizeUpgrade HDFS Upgrade finalization
  • 26. 26 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing : Overloaded cluster Daunting Questions ⬢  What is right container size for my cluster ? ⬢  If I add additional components (HBase, Storm), how does the container size change ? ⬢  How does container sizes change when I add new types of nodes in the cluster ? ⬢  What’s impact on container sizes if I add SSDs to the nodes? SmartSense Answer ⬢  Rules: yarn_container_size, mr_container_size, tez_container_size ⬢  Evaluates resources available on individual host (CPU, Memory, Disks, Running Services etc.) ⬢  Calculates technology specific container sizes (MR,Tez, Hive) ⬢  Continuously evaluates as the cluster dynamics change
  • 27. 27 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Container sizing Ã  Identify resources (CPU, Memory, Disks) available on each node Ã  Keep aside resources required for other processes (OS, DN, NM, HBase RS) Ã  Calculate max possible containers for each resource (CPU, Memory, Disks) –  CPU Containers: 4x cores –  Disk Containers: ( 3x HDD + 10x SSD ) –  Memory Containers: (Available RAM / 2 ) Ã  Number of containers = Min (CPU Containers, Disk Containers, Memory Containers)
  • 28. 28 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing: Accidental deletion of critical datasets Daunting Questions ⬢  Is HDFS trash enabled ? ⬢  What is safe trash interval ? ⬢  How to prevent accidental deletion of critical data ? SmartSense Answer ⬢  Rule: hdfs_trash_interval –  Checks  if  trash  is  enabled   –  Validates  if  trash  interval  is  within   reasonable  limits   ⬢  Rule: hdfs_nn_protect_imp_dirs –  New  feature  available  in  Hadoop  2.8   –  Helps  you  mark  criJcal  directories  such   as    “/”,    “/user”,  “/user/apps/hive”,  “/ user/apps/hbase”  etc.  are  delete   protected.  
  • 29. 29 © Hortonworks Inc. 2011 – 2016.All Rights Reserved HDFS Trash interval and directory protection Ã  fs.trash.interval detects number of minutes after which the trashed data gets deleted –  0 means trash disabled (data gets deleted immediately) –  Keep it the range 1440 (1 day) – 10080 (7 days) –  Recommended 4320 (3 days) Ã  fs.protected.directories specifies directories that will be delete protected –  Available from Hadoop 2.8 –  List all key directories there (/, /user,/user/apps, /user/apps/hive, /user/apps/hbase, /user/apps/hbase/data, /mapred, /mapred/ system, /tmp etc. )
  • 30. 30 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Addressing : Hive query returning random results Daunting Questions ⬢  Is my cluster configured consistently ? ⬢  How do I prevent such hard to analyze issues ? ⬢  How do I make sure newly added do not bring these types of issues ? ⬢  How do I make these set ups person independent ? SmartSense Answer ⬢  Rule: os_time_zone ⬢  Checks if all hosts have same time zone ⬢  Rule os_service_ntpd_on make sure all host times are in sync ⬢  Continuous evaluation that validates newly added nodes and re-commissioned nodes
  • 31. 31 © Hortonworks Inc. 2011 – 2016.All Rights Reserved There are 250+ more such rules Operations Ã  hdfs_dn_volume_tolerance Ã  hdfs_dn_xceivers Ã  hdfs_nn_handler_count Ã  … Ã  yarn_zk_quorum Ã  yarn_nm_recovery Ã  … Ã  os_hostname_reverse_lookup Ã  os_ssd_tuning Ã  … Ã  hive_mr_strict_mode Ã  hive_datanucleus_cache Ã  … Ã  tez_am_heap Ã  tez_shuffle_buffer Ã  … Performance Ã  ams_mc_distributed_configs Ã  ams_mc_write_path Ã  ... Ã  hbase_jvm_opts Ã  hbase_rs_open_region_threads Ã  hbase_tcp_nodelay Ã  ... Ã  hdfs_dn_jvm_opts Ã  hdfs_mount_options Ã  hdfs_nn_dn_staleness_interval Ã  ... Ã  hive_auto_convert_join Ã  hive_disable_caching Ã  hive_enable_cbo Ã  ... Security Ã  hdfs_dn_volume_tolerance Ã  hdfs_audit_log Ã  hdfs_block_access_token Ã  hdfs_enable_security_check Ã  hdfs_nn_super_user_group Ã  hdfs_zkfc_ha_acl Ã  ... Ã  ranger_policy_refresh_interval Ã  smartsense_2_way_ssl_enabled Ã  ... Ã  yarn_ats_security Ã  yarn_enable_acl Ã  ...
  • 32. 32 © Hortonworks Inc. 2011 – 2016.All Rights Reserved There is more than just configurations How do I show back/ charge back my tenants ? Who are the top users of my platform ? What type of work loads are running on my cluster ? Which jobs have significant impact on my cluster ? How do I improve performance of key jobs ? What is good time for maintenance?
  • 33. 33 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Activity Analysis
  • 34. 34 © Hortonworks Inc. 2011 – 2016.All Rights Reserved Summary Ã  There are many things involved in managing Hadoop cluster Ã  Best practices evolve and change across versions Ã  What is optimal today may not be optimal for tomorrow Ã  Changing cluster dynamics, workload characteristic need continuous re-evaluation and configuration adjustments Ã  SmartSense can significantly help avoid common mistakes, issues, pitfalls and simplify Hadoop operations
  • 35. 35   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Lets keep your Hadoop cluster at its best! Thank You!
  • 36. 36   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Appendix
  • 37. 37   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   More Resources ⬢  https://docs.hortonworks.com/index.html ⬢  http://hortonworks.com/products/subscriptions/smartsense/ ⬢  http://hortonworks.com/info/smartsense/ ⬢  http://hortonworks.com/blog/introducing-hortonworks-smartsense/ ⬢  https://www.youtube.com/watch?v=IKulo9c8PjE ⬢  https://community.hortonworks.com/topics/smartsense.html
  • 38. 38   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   SmartSense Bundle Security ⬢  All  Bundles  are  Anonymized  and  Encrypted   ⬢  MulJple  built-­‐in  security  measures   –  Ambari clear text passwords are not collected –  Hive and Oozie database properties are not collected –  All IP addresses and host names are anonymized ⬢  Extensible  security  rules   –  Exclude properties within specific Hadoop configuration files –  Global REGEX replacements across all configuration, metrics, and logs
  • 39. 39   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   SmartSense Stack Support HDP 2.4 HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0 SmartSense 1.x Ambari 2.2 Built-In! Ambari 2.1 Plug-In Ambari 2.0 Plug-In Ambari 1.7 Ambari 1.6 SmartSense 1.x