SlideShare a Scribd company logo
1 of 59
Upgrade Webinar
Best Practices for Upgrading Hadoop with Cloudera Manager
Vala Dormiani | Product Manager
2© Cloudera, Inc. All rights reserved.
Cloudera Enterprise powered by Apache Hadoop
A new kind of data platform.
• One place for unlimited data
• Unified, multi-framework data access
Only with Cloudera:
• Leading performance
• Enterprise system and data management
• Fundamentally secure
• Open source, open standards
Security and Administration
Unlimited Storage
Process Discover Model Serve
Deployment
Flexibility
On-Premises
Appliances
Engineered Systems
Public Cloud
Private Cloud
Hybrid Cloud
3© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
End-to-End Administration
4© Cloudera, Inc. All rights reserved.
Hadoop Administration Made Easy
Cloudera Manager
Focus on the solution, not the
cluster, with the only complete,
zero-downtime administration
tool for Apache Hadoop.
Unique Capabilities:
• Unified configuration, management
and monitoring across all services
• Online installation and upgrades
• Direct connection to Cloudera Support
• 3rd Party Extensibility
5© Cloudera, Inc. All rights reserved.
Why You Need Cloudera Manager
Complexity
Context
Efficiency
Hadoop is more than a dozen services running across many machines
Hadoop is a system, not just a collection of parts
Managing Hadoop with multiple tools & manual process takes longer
• Hundreds of hardware components
• Thousands of settings
• Limitless permutations
• Everything is interrelated
• Raw data about individual pieces is not enough
• Must extract what’s important
• Complicated, error-prone workflows
• Longer issue resolution
• Lack of consistent and repeatable processes
6© Cloudera, Inc. All rights reserved.
End-to-End Administration for the EDH
Manage
Easily deploy, configure, & optimize clusters1
Monitor
Maintain a central view of all activity2
Diagnose
Easily identify and resolve issues3
Integrate
Use with existing tools4
7© Cloudera, Inc. All rights reserved.
One Tool For Everything
Managing Complexity
+
DEPLOYMENT &
CONFIGURATION
MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING
ACTIVITY
MONITORING
DO-IT-YOURSELF
VERSUS WITH CLOUDERA
8© Cloudera, Inc. All rights reserved.
Raw Data vs. Hadoop Intelligence
Providing Context
? VS
Smart Configuration
Auto-sets configurations and guards against user
error
1
Workflows
Ensures that multi-step tasks are accomplished
completely and in the correct sequence2
Dependencies
Aware of how a particular action affects the rest of
the cluster and manages the impact
3
Events & Alerts
Makes you aware of what’s important at a Hadoop
system level
4
9© Cloudera, Inc. All rights reserved.
Simple Diagnostic Workflow
Maximizing Efficiency
NOTICE JOB IS NOT COMPLETING
IDENTIFY PROBLEM TASK IN TASK
TRACKER WEB UI
GANGLIA: STUDY SERVICE, HOST &
NETWORK METRICS FOR ROOT CAUSE
DETERMINE REQUIRED HEAP SIZE
UPDATE HEAP SIZE & RESTART TASK
TRACKER WITH CHEF
ROOT CAUSE:
LOW HEAP FOR
TASK TRACKER
1
HR
2
HRS
1
HR
30
MIN
RECEIVE ALERT: JOB RUNNING LONGER
THAN EXPECTED
VISUALLY LOCATE PROBLEM TASK IN TASK
DISTRIBUTION VIEW
DRILL DOWN TO TASK TRACKER HEALTH,
SEE ‘LOW HEAP’
UPDATE HEAP SIZE W/RECOMMENDED
VALUE
RESTART TASK TRACKER
ROOT CAUSE: LOW
HEAP FOR TASK
TRACKER
5
MIN
3
MIN
2
MIN
5
MIN
WITH CLOUDERA MANAGER
4.5 HOURS
15 MIN
DO-IT-YOURSELF
10© Cloudera, Inc. All rights reserved.
Why Cloudera Manager
One Holistic View of Everything
Best-in-Class
• Only enterprise-grade Hadoop management application
• Zero downtime rolling upgrades & BDR
• Integrated with Support
Simple
• Manage the complexity of dozens of tools through one interface
Intelligent
• Extract context from your data and Hadoop system
Efficient
• Simplify complex workflows and create consistent, repeatable processes
3rd Party Integration
• Broadest network of partners with complete integration
11© Cloudera, Inc. All rights reserved.
Cloudera Manager Features
12© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Install a Cluster in Three Simple Steps
1 2 3Find Nodes Install Components Assign Roles
Enter the names of the hosts which will
be included in the Hadoop cluster.
Click Continue.
Cloudera Manager automatically installs
the CDH components on the hosts you
specified.
Verify the roles of the nodes within your
cluster. Make changes as necessary.
13© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
View Service Health and Performance
14© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Gather, View, and Search Hadoop Logs
15© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Manage Resources
16© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Customizable Landing Page
17© Cloudera, Inc. All rights reserved.
Open API for Extensibility
Integration with Leading ISVs Alternative Storage Options
Hundreds of Partners Certified to Run In and On Cloudera
18© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
“We always know the health of our cluster and
its nodes. We really can stay in touch with
what's happening on the system, and we can
deploy and manage things really easily”
Kathleen deValk
Senior Architect, Omneo
19© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Manage Backup and Disaster Recovery
20© Cloudera, Inc. All rights reserved.
Cloudera Manager Key Features
Upgrades
21© Cloudera, Inc. All rights reserved.
The Upgrade Wizard
22© Cloudera, Inc. All rights reserved.
Why Upgrade to CDH 5
• Software platform improvements
• New features
• Security and governance
• Bug fixes
• Technology Enablement
• Evolution of infrastructure
• Expanded application stack
Security and Administration
Unlimited Storage
Process Discover Model Serve
23© Cloudera, Inc. All rights reserved.
Motivation for the Wizard
• Upgrades are hard and unpredictable
• Downtime to mission-critical workloads impacts your revenue
• Hadoop can be especially complex
• Upgrading Hadoop can have many steps that can depend on
• Services Installed
• Start & End Versions
• Packages or Parcels…
24© Cloudera, Inc. All rights reserved.
The CDH Upgrade Wizard in Cloudera Manager
• Cloudera Manager has a built-in
Upgrade Wizard
• Major Upgrades (CDH4 to CDH5)
supported from CM 5.0
• CM 5.3 supports upgrades for Minor
(CDH 5.x to CDH 5.y) & Maintenance
Releases (CDH 5.b.x to CDH 5.b.y)
• Zero-downtime for non-major upgrades
• Wizard automatically performs upgrade
steps that were manual in the past
25© Cloudera, Inc. All rights reserved.
What is Included
• Confirmation of applicable manual steps
• Verification that proper binaries are installed and hosts are healthy
• Applicable automated commands for the upgrade
• Post-upgrade messages applicable to this upgrade
26© Cloudera, Inc. All rights reserved.
What is not Included
• Some steps are still manual
• Backing up existing databases & NameNode metadata
• Installing & removing packages
• Wizard doesn’t capture all upgrade caveats
• Does not include Minor CDH 4 upgrades, ie CDH4.2 to 4.3
27© Cloudera, Inc. All rights reserved.
Types of Upgrades Supported by the Wizard
• Major upgrades from CDH 4 to CDH 5
• CDH upgrade wizard extended to minor CDH 5 upgrades & maintenance upgrades
/ downgrades in CM 5.3
Note: Can’t upgrade to a CDH version higher than CM version
28© Cloudera, Inc. All rights reserved.
Parcels vs. Packages
• Using Parcels is preferred as packages must be manually installed
• See the FAQ to learn more about Parcels
• Supported:
• Package Package
• Package Parcel
• Parcel Parcel
29© Cloudera, Inc. All rights reserved.
Zero-Downtime Rolling Upgrade
• New Rolling Restart if
1. Enabled HDFS HA
2. Using Parcels
3. Have an Enterprise License
4. Performing a Non-Major Upgrade
• Supported services will be upgraded and
restarted without cluster downtime
30© Cloudera, Inc. All rights reserved.
How to Start an Upgrade
• Trigger Points
1. Parcel Page:
Download Distribute Activate Upgrade
• After downloading and distributing a parcel,
“Activate” is replaced by “Upgrade” if the version
change is supported by the wizard
• Note: Just “activating” is rarely a good idea
2. Cluster Actions Dropdown Menu:
Click “Upgrade Cluster”
31© Cloudera, Inc. All rights reserved.
Wizard Steps
1. Log in to Cloudera Manager & Trigger the upgrade
2. Select the package/parcel upgrade version
3. Pre-upgrade warnings
4. Perform the required actions before continuing (e.g. Backing up databases)
5. Host health validation
6. Parcel is downloaded and distributed to all hosts
7. Restart selection:
o Regular vs Rolling
8. Commands Progress Screen:
o Activation of new parcel, Upgrading Services, Deploying client config files, Other CDH component steps…
9. Host Inspector
10. Post-upgrade warnings
11. Yarn Migration
32© Cloudera, Inc. All rights reserved.
33© Cloudera, Inc. All rights reserved.
Useful things to have in place
• Have an automated way to backup your NameNode Metadata.
• You will need to backup your NN metadata prior to the update so you should have
scripts ready in advance
• All databases should be backed up regularly, including CM, HDFS, Hive, HBase, Oozie
• You will need to take backups prior to the upgrade, but you should have automated
backup procedures for these databases already
• You cannot revert back to CDH 4 unless you restore a backup
• Maintain your own OS, CM and CDH package/parcel repos to protect against external
repositories being unavailable
34© Cloudera, Inc. All rights reserved.
Other Best Practices for CDH 5 Upgrade
• In critical upgrades, create a fine-grained step-by-step production plan
• Document the existing cluster environment and dependencies
• Test the production upgrade plan in non-prod environment(s)
• Test the step by step upgrade plan in sandbox, test and other non-prod environments
and update the plan if anything unexpected happens
• Test all compatibility with the new version. If desired, run performance tests in a
performance cluster
• Reserve a maintenance window with enough time allotted to perform all steps
• Note that rolling upgrade from CDH 4 to CDH 5 is unsupported
• Enable maintenance mode on your cluster to avoid lots of alerts during the upgrade
35© Cloudera, Inc. All rights reserved.
CDH 4 to CDH 5 Upgrade Steps
• Documentation: Upgrading from CDH 4 to CDH 5 Parcels - Read “Before You Begin”
1. Download/distribute parcel
2. Reduce the upgrade time by reducing the amount of history that Oozie retains
3. Put the NameNode into safe mode and backup HDFS metadata
4. Stop the cluster & stop the CM service
5. Remove CDH Packages (if in-use)
6. Deactivate and Remove the GPL Extras Parcel (if using LZO)
7. Run the Upgrade Wizard
• Recover from any failed steps before proceeding
8. Upgrade the GPL Extras Parcel (if using LZO)
9. Restart the Reports Manager Role
10. Finalize the HDFS Metadata Upgrade
36© Cloudera, Inc. All rights reserved.
Guided Upgrades Prevent Failed Jobs
Upgrading
Synopsis
• Customer manually upgraded CDH
• Misconfigured a MapReduce setting
• Resulted in failure of long-running jobs
With Cloudera Manager
• The upgrade process is managed
• Default configuration settings would
have prevented job failures
Cloudera Manager Benefits
• Streamlined upgrades
• Issue prevention
37© Cloudera, Inc. All rights reserved.
Upgrading Recommendations and Resources
• Start planning now
• Review Upgrade Guide Documentation
• Talk to your Account Team about a Professional Services
Engagement
38© Cloudera, Inc. All rights reserved.
Why Professional Services
• Minimize risk to production environment
• Assist your Hadoop Admin
• Minimize impact on resources (i.e. development)
• Educate the team on a release
• Provide additional guidance on best practices
39© Cloudera, Inc. All rights reserved.
Backup & Disaster Recovery
40© Cloudera, Inc. All rights reserved.
Why You Need Backup & Disaster Recovery
Your EDH is a Mission-Critical Part of the Data
Management Infrastructure
• Stores valuable data and runs important workloads
• Business continuity is a MUST HAVE
1
Managing Business Continuity for Hadoop is
Complex
• Different services that store data – HDFS, HBase, Hive
• Backup and disaster recovery is configured separately for each
• Processes are manual
2
41© Cloudera, Inc. All rights reserved.
Simplified Management of Backup & DR Policies
BDR in Cloudera Enterprise
HDFS
HIVE
NODES
SITE A SITE B
HDFS
HIVE
NODES
Central Configuration
• HDFS - Select files & directories to replicate
• Hive - Select tables to replicate
• Schedule replication jobs for optimal times
Monitoring & Alerting
• Track progress of replication jobs
• Get notified when data is out of sync
Performance & Reliability
• High performance replication using MapReduce
• CDH-optimized version of DistCP
42© Cloudera, Inc. All rights reserved.
Benefits of Cloudera Manager’s BDR
Reduce Complexity
• Centrally manage backup and DR workflows
• Simple setup via an intuitive user interface
Maximize Efficiency
• Simplify processes to meet or exceed SLAs and Recovery Time Objectives (RTOs)
• Optimize system performance and network impact through scheduling
Reduce Risk & Exposure
• Eliminate error-prone manual processes
• Get notified when issues occur
• The only solution for metadata replication (Hive)
43© Cloudera, Inc. All rights reserved.
Data Threat Models and Solutions
Disk/Node/Rack
Hardware Failure
• HDFS replica
architecture
• Configure rack
information and
number of replicas
Application/User
Error
• Snapshots of HDFS
and HBase
• Optionally save
HBase to S3
Datacenter Failure
• Off-site datacenter
replication of HDFS
and Hive
• Includes metadata
44© Cloudera, Inc. All rights reserved.
CDH 5 Backup and Disaster Recovery
HDFS Snapshots
• Minimal impact to
production workload
• No unnecessary data
copy
• Multiple versions
maintained by HDFS
• Fast local restores
• HDFS consistency
HBase Snapshots
• Minimal impact to
production workload
• No unnecessary data
copy
• Multiple versions
maintained by HBase
• HBase region consistency
• Optionally store snapshot
to Amazon S3
HDFS Distributed
Replication
• Snapshot-based
replication ensures
consistency across
replicas
Hive Metastore Replication
• SQL import/export
between two different
metastores
• Fixes file paths and other
cluster-specific
information
Cloudera Manager
Select Configure Synchronize Monitor
Backup and Disaster Recovery Module
45© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
Industry-Leading Support
46© Cloudera, Inc. All rights reserved.
Direct Integration with Cloudera Support in CM
47© Cloudera, Inc. All rights reserved.
Cloudera Manager + Support
Industry’s Best Hadoop Platform Support
• Leverages Cloudera to reduce time-to-resolution by 35%
• Comprehensive view of customers for Proactive and Predictive Support
• Prevent issues before they occur
• Provide guidance on tools and best practices
48© Cloudera, Inc. All rights reserved.
Differentiated Approach to Success
Technical guidance based on insights into performance patterns and the state-of-the-art
Proactive Support
Sophisticated analytics across multiple clusters to prevent issues before they occur
Predictive Support
Input into product roadmaps and projects supported by the Apache community
Voice of the Customer
49© Cloudera, Inc. All rights reserved.
World-Class Support
Customers Love Cloudera Support
8.9/10
91%
Overall satisfaction score makes Cloudera
the industry benchmark for support
Customers agree they benefit
from proactive support outreach
#1 Ability to solve technical issues
is the top reason to recommend
50© Cloudera, Inc. All rights reserved.
Cloudera Enterprise 5
51© Cloudera, Inc. All rights reserved.
Built for Production Success
Hadoop delivers:
• One place for unlimited data
• Unified, multi-framework data access
Cloudera delivers:
• Enterprise Security
• Data Governance
• Complete Management
• Open Source, Open Standards
Security and Administration
Unlimited Storage
Process Discover Model Serve
Deployment
Flexibility
On-Premises
Appliances
Engineered Systems
Public Cloud
Private Cloud
Hybrid Cloud
A modern data platform plus what the enterprise requires.
52© Cloudera, Inc. All rights reserved.
Industrial Multi-Workload Performance
Batch, Interactive,
and Real-Time.
Leading performance and
usability in one platform.
• End-to-end analytic workflows
• Access more data
• Work with data in new ways
• Enable new users
Security and Administration
Process
Ingest
Sqoop, Flume
Transform
MapReduce,
Hive, Pig, Spark
Discover
Analytic Database
Impala
Search
Solr
Model
Machine Learning
SAS, R, Spark,
Mahout
Serve
NoSQL Database
HBase
Streaming
Spark Streaming
Unlimited Storage HDFS, HBase
YARN, Cloudera Manager,
Cloudera Navigator
Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.
53© Cloudera, Inc. All rights reserved.
The Only Comprehensively Secure Hadoop Platform
Cloudera is the leader in
Hadoop security.
Unique Capabilities:
• Comprehensive and Unified
• Secure at the core
• No Performance Impact
• Jointly engineered with Intel
• Compliance-Ready
• Only distribution to pass PCI audit
1. Perimeter Standards-based Authentication
Security and Administration
Unlimited Storage
Process Discover Model Serve
2. Access Unified Role-based Authorization
4. Data Encryption & Key Management
3. Visibility Auditing & Governance
Meet compliance requirements and reduce risk exposure from storing sensitive data.
54© Cloudera, Inc. All rights reserved.
The Most Complete Partner Ecosystem
Data
Systems
Enterprise Data Hub
Security and Administration
Unlimited Storage
Process Discover Model Serve
Applications
System Integration
Infrastructure
More than 1,300 partners
ensure compatibility with existing
investments, lower skill barriers, and
help maximize value from your data.Operational
Tools
55© Cloudera, Inc. All rights reserved.
New In Cloudera Manager 5
Workload/Resource
Management
Pool, resource group &
queue administration
Static & dynamic
partitioning of resources
Usage monitoring &
trending
Extensibility and Partner
Product Integration
Integration with ISVs
• SAS
• Syncsort
• Revolution
• and others
Accumulo support
Spark support
Platform Coverage
CDH5 compatibility
support
Install & Upgrade wizards
for CDH5
56© Cloudera, Inc. All rights reserved.
New In Cloudera Manager 5
Monitoring Improvements
Advanced Impala query monitoring
YARN service monitoring
YARN/MR2 activity monitoring
User defined triggers
Updates to ‘tsquery’ language for
custom charts
Scalable back-end datastore for
monitoring metrics
Enhanced Operational Reports
Oozie HA and YARN/RM HA setup
MR1->MR2 config upgrade wizard
Updates to Parcel management
workflows
Several usability improvements
including new visualizations,
charting enhancements
CM search box
Java7 support
Other Improvements Security Improvements
Direct AD Kerberos Integration
Kerberos wizard for easy securing
of non-secure clusters
Manage & deploy Kerberos client
configs
Added Hadoop SSL related configs
New user roles for fine-grained
separation of duties
57© Cloudera, Inc. All rights reserved.
New in CDH 5
• Impala and Search are now part of CDH
• HDFS has caching and snapshots
• YARN is production-ready
• HBase has faster RegionServer failover, online merge, and batch indexing
• Impala has dynamic resource management through Llama and YARN
• Impala supports UDFs and UDAFs, leverages HDFS caching, and has improved
metadata refresh
• Sentry offers fine-grained role-based authorization for Search, Impala, and Hive
More details about the Cloudera 5 can be found in the Release Notes
58© Cloudera, Inc. All rights reserved.
Webinar to Learn More
More Value from More Data: Production-Ready Hadoop
with Cloudera 5
• Feb. 17, 2015 at 10am PT
• More details on Security, Governance, Cloud, Apache Spark, and Impala 2.0
Register at bit.ly/ProductionReady
Thank You

More Related Content

What's hot

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformMatteo Merli
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningDavid Stein
 

What's hot (20)

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
Helix talk at RelateIQ
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Nifi
NifiNifi
Nifi
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Zuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne PlatformZuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne Platform
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 

Similar to Hadoop Upgrade Webinar Best Practices

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptx
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptxUiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptx
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptxRohit Radhakrishnan
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For WorkflowTimothy Spann
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Cloudera, Inc.
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...VMworld
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld
 

Similar to Hadoop Upgrade Webinar Best Practices (20)

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptx
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptxUiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptx
UiPath_Orchestrtor_Upgrade_IAAS_PAAS.pptx
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Hadoop Upgrade Webinar Best Practices

  • 1. Upgrade Webinar Best Practices for Upgrading Hadoop with Cloudera Manager Vala Dormiani | Product Manager
  • 2. 2© Cloudera, Inc. All rights reserved. Cloudera Enterprise powered by Apache Hadoop A new kind of data platform. • One place for unlimited data • Unified, multi-framework data access Only with Cloudera: • Leading performance • Enterprise system and data management • Fundamentally secure • Open source, open standards Security and Administration Unlimited Storage Process Discover Model Serve Deployment Flexibility On-Premises Appliances Engineered Systems Public Cloud Private Cloud Hybrid Cloud
  • 3. 3© Cloudera, Inc. All rights reserved. Cloudera Enterprise End-to-End Administration
  • 4. 4© Cloudera, Inc. All rights reserved. Hadoop Administration Made Easy Cloudera Manager Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop. Unique Capabilities: • Unified configuration, management and monitoring across all services • Online installation and upgrades • Direct connection to Cloudera Support • 3rd Party Extensibility
  • 5. 5© Cloudera, Inc. All rights reserved. Why You Need Cloudera Manager Complexity Context Efficiency Hadoop is more than a dozen services running across many machines Hadoop is a system, not just a collection of parts Managing Hadoop with multiple tools & manual process takes longer • Hundreds of hardware components • Thousands of settings • Limitless permutations • Everything is interrelated • Raw data about individual pieces is not enough • Must extract what’s important • Complicated, error-prone workflows • Longer issue resolution • Lack of consistent and repeatable processes
  • 6. 6© Cloudera, Inc. All rights reserved. End-to-End Administration for the EDH Manage Easily deploy, configure, & optimize clusters1 Monitor Maintain a central view of all activity2 Diagnose Easily identify and resolve issues3 Integrate Use with existing tools4
  • 7. 7© Cloudera, Inc. All rights reserved. One Tool For Everything Managing Complexity + DEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY MONITORING DO-IT-YOURSELF VERSUS WITH CLOUDERA
  • 8. 8© Cloudera, Inc. All rights reserved. Raw Data vs. Hadoop Intelligence Providing Context ? VS Smart Configuration Auto-sets configurations and guards against user error 1 Workflows Ensures that multi-step tasks are accomplished completely and in the correct sequence2 Dependencies Aware of how a particular action affects the rest of the cluster and manages the impact 3 Events & Alerts Makes you aware of what’s important at a Hadoop system level 4
  • 9. 9© Cloudera, Inc. All rights reserved. Simple Diagnostic Workflow Maximizing Efficiency NOTICE JOB IS NOT COMPLETING IDENTIFY PROBLEM TASK IN TASK TRACKER WEB UI GANGLIA: STUDY SERVICE, HOST & NETWORK METRICS FOR ROOT CAUSE DETERMINE REQUIRED HEAP SIZE UPDATE HEAP SIZE & RESTART TASK TRACKER WITH CHEF ROOT CAUSE: LOW HEAP FOR TASK TRACKER 1 HR 2 HRS 1 HR 30 MIN RECEIVE ALERT: JOB RUNNING LONGER THAN EXPECTED VISUALLY LOCATE PROBLEM TASK IN TASK DISTRIBUTION VIEW DRILL DOWN TO TASK TRACKER HEALTH, SEE ‘LOW HEAP’ UPDATE HEAP SIZE W/RECOMMENDED VALUE RESTART TASK TRACKER ROOT CAUSE: LOW HEAP FOR TASK TRACKER 5 MIN 3 MIN 2 MIN 5 MIN WITH CLOUDERA MANAGER 4.5 HOURS 15 MIN DO-IT-YOURSELF
  • 10. 10© Cloudera, Inc. All rights reserved. Why Cloudera Manager One Holistic View of Everything Best-in-Class • Only enterprise-grade Hadoop management application • Zero downtime rolling upgrades & BDR • Integrated with Support Simple • Manage the complexity of dozens of tools through one interface Intelligent • Extract context from your data and Hadoop system Efficient • Simplify complex workflows and create consistent, repeatable processes 3rd Party Integration • Broadest network of partners with complete integration
  • 11. 11© Cloudera, Inc. All rights reserved. Cloudera Manager Features
  • 12. 12© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Install a Cluster in Three Simple Steps 1 2 3Find Nodes Install Components Assign Roles Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue. Cloudera Manager automatically installs the CDH components on the hosts you specified. Verify the roles of the nodes within your cluster. Make changes as necessary.
  • 13. 13© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features View Service Health and Performance
  • 14. 14© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Gather, View, and Search Hadoop Logs
  • 15. 15© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Manage Resources
  • 16. 16© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Customizable Landing Page
  • 17. 17© Cloudera, Inc. All rights reserved. Open API for Extensibility Integration with Leading ISVs Alternative Storage Options Hundreds of Partners Certified to Run In and On Cloudera
  • 18. 18© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features “We always know the health of our cluster and its nodes. We really can stay in touch with what's happening on the system, and we can deploy and manage things really easily” Kathleen deValk Senior Architect, Omneo
  • 19. 19© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Manage Backup and Disaster Recovery
  • 20. 20© Cloudera, Inc. All rights reserved. Cloudera Manager Key Features Upgrades
  • 21. 21© Cloudera, Inc. All rights reserved. The Upgrade Wizard
  • 22. 22© Cloudera, Inc. All rights reserved. Why Upgrade to CDH 5 • Software platform improvements • New features • Security and governance • Bug fixes • Technology Enablement • Evolution of infrastructure • Expanded application stack Security and Administration Unlimited Storage Process Discover Model Serve
  • 23. 23© Cloudera, Inc. All rights reserved. Motivation for the Wizard • Upgrades are hard and unpredictable • Downtime to mission-critical workloads impacts your revenue • Hadoop can be especially complex • Upgrading Hadoop can have many steps that can depend on • Services Installed • Start & End Versions • Packages or Parcels…
  • 24. 24© Cloudera, Inc. All rights reserved. The CDH Upgrade Wizard in Cloudera Manager • Cloudera Manager has a built-in Upgrade Wizard • Major Upgrades (CDH4 to CDH5) supported from CM 5.0 • CM 5.3 supports upgrades for Minor (CDH 5.x to CDH 5.y) & Maintenance Releases (CDH 5.b.x to CDH 5.b.y) • Zero-downtime for non-major upgrades • Wizard automatically performs upgrade steps that were manual in the past
  • 25. 25© Cloudera, Inc. All rights reserved. What is Included • Confirmation of applicable manual steps • Verification that proper binaries are installed and hosts are healthy • Applicable automated commands for the upgrade • Post-upgrade messages applicable to this upgrade
  • 26. 26© Cloudera, Inc. All rights reserved. What is not Included • Some steps are still manual • Backing up existing databases & NameNode metadata • Installing & removing packages • Wizard doesn’t capture all upgrade caveats • Does not include Minor CDH 4 upgrades, ie CDH4.2 to 4.3
  • 27. 27© Cloudera, Inc. All rights reserved. Types of Upgrades Supported by the Wizard • Major upgrades from CDH 4 to CDH 5 • CDH upgrade wizard extended to minor CDH 5 upgrades & maintenance upgrades / downgrades in CM 5.3 Note: Can’t upgrade to a CDH version higher than CM version
  • 28. 28© Cloudera, Inc. All rights reserved. Parcels vs. Packages • Using Parcels is preferred as packages must be manually installed • See the FAQ to learn more about Parcels • Supported: • Package Package • Package Parcel • Parcel Parcel
  • 29. 29© Cloudera, Inc. All rights reserved. Zero-Downtime Rolling Upgrade • New Rolling Restart if 1. Enabled HDFS HA 2. Using Parcels 3. Have an Enterprise License 4. Performing a Non-Major Upgrade • Supported services will be upgraded and restarted without cluster downtime
  • 30. 30© Cloudera, Inc. All rights reserved. How to Start an Upgrade • Trigger Points 1. Parcel Page: Download Distribute Activate Upgrade • After downloading and distributing a parcel, “Activate” is replaced by “Upgrade” if the version change is supported by the wizard • Note: Just “activating” is rarely a good idea 2. Cluster Actions Dropdown Menu: Click “Upgrade Cluster”
  • 31. 31© Cloudera, Inc. All rights reserved. Wizard Steps 1. Log in to Cloudera Manager & Trigger the upgrade 2. Select the package/parcel upgrade version 3. Pre-upgrade warnings 4. Perform the required actions before continuing (e.g. Backing up databases) 5. Host health validation 6. Parcel is downloaded and distributed to all hosts 7. Restart selection: o Regular vs Rolling 8. Commands Progress Screen: o Activation of new parcel, Upgrading Services, Deploying client config files, Other CDH component steps… 9. Host Inspector 10. Post-upgrade warnings 11. Yarn Migration
  • 32. 32© Cloudera, Inc. All rights reserved.
  • 33. 33© Cloudera, Inc. All rights reserved. Useful things to have in place • Have an automated way to backup your NameNode Metadata. • You will need to backup your NN metadata prior to the update so you should have scripts ready in advance • All databases should be backed up regularly, including CM, HDFS, Hive, HBase, Oozie • You will need to take backups prior to the upgrade, but you should have automated backup procedures for these databases already • You cannot revert back to CDH 4 unless you restore a backup • Maintain your own OS, CM and CDH package/parcel repos to protect against external repositories being unavailable
  • 34. 34© Cloudera, Inc. All rights reserved. Other Best Practices for CDH 5 Upgrade • In critical upgrades, create a fine-grained step-by-step production plan • Document the existing cluster environment and dependencies • Test the production upgrade plan in non-prod environment(s) • Test the step by step upgrade plan in sandbox, test and other non-prod environments and update the plan if anything unexpected happens • Test all compatibility with the new version. If desired, run performance tests in a performance cluster • Reserve a maintenance window with enough time allotted to perform all steps • Note that rolling upgrade from CDH 4 to CDH 5 is unsupported • Enable maintenance mode on your cluster to avoid lots of alerts during the upgrade
  • 35. 35© Cloudera, Inc. All rights reserved. CDH 4 to CDH 5 Upgrade Steps • Documentation: Upgrading from CDH 4 to CDH 5 Parcels - Read “Before You Begin” 1. Download/distribute parcel 2. Reduce the upgrade time by reducing the amount of history that Oozie retains 3. Put the NameNode into safe mode and backup HDFS metadata 4. Stop the cluster & stop the CM service 5. Remove CDH Packages (if in-use) 6. Deactivate and Remove the GPL Extras Parcel (if using LZO) 7. Run the Upgrade Wizard • Recover from any failed steps before proceeding 8. Upgrade the GPL Extras Parcel (if using LZO) 9. Restart the Reports Manager Role 10. Finalize the HDFS Metadata Upgrade
  • 36. 36© Cloudera, Inc. All rights reserved. Guided Upgrades Prevent Failed Jobs Upgrading Synopsis • Customer manually upgraded CDH • Misconfigured a MapReduce setting • Resulted in failure of long-running jobs With Cloudera Manager • The upgrade process is managed • Default configuration settings would have prevented job failures Cloudera Manager Benefits • Streamlined upgrades • Issue prevention
  • 37. 37© Cloudera, Inc. All rights reserved. Upgrading Recommendations and Resources • Start planning now • Review Upgrade Guide Documentation • Talk to your Account Team about a Professional Services Engagement
  • 38. 38© Cloudera, Inc. All rights reserved. Why Professional Services • Minimize risk to production environment • Assist your Hadoop Admin • Minimize impact on resources (i.e. development) • Educate the team on a release • Provide additional guidance on best practices
  • 39. 39© Cloudera, Inc. All rights reserved. Backup & Disaster Recovery
  • 40. 40© Cloudera, Inc. All rights reserved. Why You Need Backup & Disaster Recovery Your EDH is a Mission-Critical Part of the Data Management Infrastructure • Stores valuable data and runs important workloads • Business continuity is a MUST HAVE 1 Managing Business Continuity for Hadoop is Complex • Different services that store data – HDFS, HBase, Hive • Backup and disaster recovery is configured separately for each • Processes are manual 2
  • 41. 41© Cloudera, Inc. All rights reserved. Simplified Management of Backup & DR Policies BDR in Cloudera Enterprise HDFS HIVE NODES SITE A SITE B HDFS HIVE NODES Central Configuration • HDFS - Select files & directories to replicate • Hive - Select tables to replicate • Schedule replication jobs for optimal times Monitoring & Alerting • Track progress of replication jobs • Get notified when data is out of sync Performance & Reliability • High performance replication using MapReduce • CDH-optimized version of DistCP
  • 42. 42© Cloudera, Inc. All rights reserved. Benefits of Cloudera Manager’s BDR Reduce Complexity • Centrally manage backup and DR workflows • Simple setup via an intuitive user interface Maximize Efficiency • Simplify processes to meet or exceed SLAs and Recovery Time Objectives (RTOs) • Optimize system performance and network impact through scheduling Reduce Risk & Exposure • Eliminate error-prone manual processes • Get notified when issues occur • The only solution for metadata replication (Hive)
  • 43. 43© Cloudera, Inc. All rights reserved. Data Threat Models and Solutions Disk/Node/Rack Hardware Failure • HDFS replica architecture • Configure rack information and number of replicas Application/User Error • Snapshots of HDFS and HBase • Optionally save HBase to S3 Datacenter Failure • Off-site datacenter replication of HDFS and Hive • Includes metadata
  • 44. 44© Cloudera, Inc. All rights reserved. CDH 5 Backup and Disaster Recovery HDFS Snapshots • Minimal impact to production workload • No unnecessary data copy • Multiple versions maintained by HDFS • Fast local restores • HDFS consistency HBase Snapshots • Minimal impact to production workload • No unnecessary data copy • Multiple versions maintained by HBase • HBase region consistency • Optionally store snapshot to Amazon S3 HDFS Distributed Replication • Snapshot-based replication ensures consistency across replicas Hive Metastore Replication • SQL import/export between two different metastores • Fixes file paths and other cluster-specific information Cloudera Manager Select Configure Synchronize Monitor Backup and Disaster Recovery Module
  • 45. 45© Cloudera, Inc. All rights reserved. Cloudera Enterprise Industry-Leading Support
  • 46. 46© Cloudera, Inc. All rights reserved. Direct Integration with Cloudera Support in CM
  • 47. 47© Cloudera, Inc. All rights reserved. Cloudera Manager + Support Industry’s Best Hadoop Platform Support • Leverages Cloudera to reduce time-to-resolution by 35% • Comprehensive view of customers for Proactive and Predictive Support • Prevent issues before they occur • Provide guidance on tools and best practices
  • 48. 48© Cloudera, Inc. All rights reserved. Differentiated Approach to Success Technical guidance based on insights into performance patterns and the state-of-the-art Proactive Support Sophisticated analytics across multiple clusters to prevent issues before they occur Predictive Support Input into product roadmaps and projects supported by the Apache community Voice of the Customer
  • 49. 49© Cloudera, Inc. All rights reserved. World-Class Support Customers Love Cloudera Support 8.9/10 91% Overall satisfaction score makes Cloudera the industry benchmark for support Customers agree they benefit from proactive support outreach #1 Ability to solve technical issues is the top reason to recommend
  • 50. 50© Cloudera, Inc. All rights reserved. Cloudera Enterprise 5
  • 51. 51© Cloudera, Inc. All rights reserved. Built for Production Success Hadoop delivers: • One place for unlimited data • Unified, multi-framework data access Cloudera delivers: • Enterprise Security • Data Governance • Complete Management • Open Source, Open Standards Security and Administration Unlimited Storage Process Discover Model Serve Deployment Flexibility On-Premises Appliances Engineered Systems Public Cloud Private Cloud Hybrid Cloud A modern data platform plus what the enterprise requires.
  • 52. 52© Cloudera, Inc. All rights reserved. Industrial Multi-Workload Performance Batch, Interactive, and Real-Time. Leading performance and usability in one platform. • End-to-end analytic workflows • Access more data • Work with data in new ways • Enable new users Security and Administration Process Ingest Sqoop, Flume Transform MapReduce, Hive, Pig, Spark Discover Analytic Database Impala Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase YARN, Cloudera Manager, Cloudera Navigator Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.
  • 53. 53© Cloudera, Inc. All rights reserved. The Only Comprehensively Secure Hadoop Platform Cloudera is the leader in Hadoop security. Unique Capabilities: • Comprehensive and Unified • Secure at the core • No Performance Impact • Jointly engineered with Intel • Compliance-Ready • Only distribution to pass PCI audit 1. Perimeter Standards-based Authentication Security and Administration Unlimited Storage Process Discover Model Serve 2. Access Unified Role-based Authorization 4. Data Encryption & Key Management 3. Visibility Auditing & Governance Meet compliance requirements and reduce risk exposure from storing sensitive data.
  • 54. 54© Cloudera, Inc. All rights reserved. The Most Complete Partner Ecosystem Data Systems Enterprise Data Hub Security and Administration Unlimited Storage Process Discover Model Serve Applications System Integration Infrastructure More than 1,300 partners ensure compatibility with existing investments, lower skill barriers, and help maximize value from your data.Operational Tools
  • 55. 55© Cloudera, Inc. All rights reserved. New In Cloudera Manager 5 Workload/Resource Management Pool, resource group & queue administration Static & dynamic partitioning of resources Usage monitoring & trending Extensibility and Partner Product Integration Integration with ISVs • SAS • Syncsort • Revolution • and others Accumulo support Spark support Platform Coverage CDH5 compatibility support Install & Upgrade wizards for CDH5
  • 56. 56© Cloudera, Inc. All rights reserved. New In Cloudera Manager 5 Monitoring Improvements Advanced Impala query monitoring YARN service monitoring YARN/MR2 activity monitoring User defined triggers Updates to ‘tsquery’ language for custom charts Scalable back-end datastore for monitoring metrics Enhanced Operational Reports Oozie HA and YARN/RM HA setup MR1->MR2 config upgrade wizard Updates to Parcel management workflows Several usability improvements including new visualizations, charting enhancements CM search box Java7 support Other Improvements Security Improvements Direct AD Kerberos Integration Kerberos wizard for easy securing of non-secure clusters Manage & deploy Kerberos client configs Added Hadoop SSL related configs New user roles for fine-grained separation of duties
  • 57. 57© Cloudera, Inc. All rights reserved. New in CDH 5 • Impala and Search are now part of CDH • HDFS has caching and snapshots • YARN is production-ready • HBase has faster RegionServer failover, online merge, and batch indexing • Impala has dynamic resource management through Llama and YARN • Impala supports UDFs and UDAFs, leverages HDFS caching, and has improved metadata refresh • Sentry offers fine-grained role-based authorization for Search, Impala, and Hive More details about the Cloudera 5 can be found in the Release Notes
  • 58. 58© Cloudera, Inc. All rights reserved. Webinar to Learn More More Value from More Data: Production-Ready Hadoop with Cloudera 5 • Feb. 17, 2015 at 10am PT • More details on Security, Governance, Cloud, Apache Spark, and Impala 2.0 Register at bit.ly/ProductionReady

Editor's Notes

  1. Darren Lo, He was the primary driver of the upgrade wizard and the lead engineer on making the changes in CM. === Upgrade Without the Headache
Best Practices for Upgrading Hadoop in Production Thursday, February 12th, 2015  •  10am – 11am Pacific Time During this webinar, Vala Dormiani, Product Manager at Cloudera, will walk you through some of the best practices to keep in mind when it comes to upgrading and how to leverage Cloudera Manager to upgrade your Cloudera cluster. He will also discuss some of the new Upgrade Wizard features released with Cloudera Enterprise 5.3. Finally, he will go over a few of the other production-ready capabilities available with Cloudera Manager, including backup and disaster recovery and direct integration with Cloudera Support. This is a technical webinar with a live Q&A at the end.
  2. Many organizations have turned to a new architecture – an enterprise data hub – to complement and extend existing investments. An enterprise data hub can store unlimited data, cost-effectively and reliably, for as long as you need, and lets users access that data in a variety of ways. Data can be collected, stored, processed, explored, modeled, and served in one unified platform. It’s connected to the systems you already rely on. Cloudera’s enterprise data hub, powered by Apache Hadoop, the popular open source distributed data platform, is differentiated in several crucial areas. We provide: Leading query performance. The enterprise management and governance that you require of all of your mission-critical infrastructure. Comprehensive, transparent, compliance-ready security at the core. An open source platform that is also built of open standards – projects that are supported by multiple vendors to ensure sustainability, portability, and compatibility. Our platform runs in your choice of environment, whether on-premises or in the cloud. === Cheat Sheet version: Our enterprise data hub is: One place for unlimited data Accessible to anyone Connected to the systems you already depend on Secure, governed, managed & compliant Built on open source and open standards Deployed however you want Coupled with the support and enablement you need to succeed. Important Note: Our EDH emphasizes “unified analytics” over “unified data”: It’s not practical or probable that customers will actually unify all their data. Much of it lives in the cloud or on storage (e.g. Isilon), in remote datacenters, is of uncertain value vs. cost of moving it to a hub, or security mandates preclude collocation. We enable customers to gather unlimited data, while bringing diverse processing and analytics to that data.
  3. Hadoop is more than a dozen services running across many machines Hundreds of hardware components Thousands of settings Limitless permutations Manager lets you manage the complexity of running all these tools through one, easy to use interface Hadoop is a system, not just a collection of parts Everything is interrelated Raw data about individual pieces is not enough Must extract what’s important Manager provides context to help you know what’s important Managing Hadoop with multiple tools and manual process takes longer Complicated, error-prone workflows Longer issue resolution Lack of consistent and repeatable processes Manager lets you maximize efficiency by simplifying your workflow (and allowing it to be repeated)
  4. Best-in-Class The only enterprise-grade Hadoop management application available Zero downtime rolling upgrades and BDR Most downtime is scheduled. Manager provides zero-downtime upgrades to minimize scheduled downtime Deploy jars across entire cluster Integration with Cloudera Support A direct connection to Cloudera Support to easily and efficiently support customers Simple Gain end-to-end administration for an enterprise data hub in a single tool Add/Remove nodes, diagnose issues Intelligent Manage Hadoop at a system level – Cloudera’s experience realized in software Efficient Simplify complex workflows and make administrators more productive 3rd Party Broadest network of partners to add greater functionality and have it be a completely integrated component of Cloudera Enterprise While CM is not "open-source", it is "open". By this I mean the following: 1) A rich set of API's for customers to work with. So they can script their way through with CM. At any point they decide to move away from Cloudera, they can "script" out/ parse out any recommended setting that they have used with CM 2) Cloudera is very transparent w.r.t to how CM works. See CM docs and some of the blog posts: http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/ 3) Customers can revert back to the Free edition if they decided to not renew the subscription. The Free edition is very capable version and we hold back very few feature in the enterprise version for production requirements. See here - http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager/cloudera-manager-features.html 4) We have several Partners now that are starting to integrate with CM, leveraging the API's and other functionality (most notably 3rd party extensibility). All these integrations are available for the free users as well. Examples of partners include - Syncsort, 0xData, Dell, StackIQ, Wibidata etc.
  5. Monitor and Diagnose Cluster Workloads Manage Workflows Get Host-Level Snapshots
  6. Track events across the cluster
  7. Create Custom Charts Report on System Performance and Usage
  8. Features View Service Health and Performance Get Host-Level Snapshots Monitor and Diagnose Cluster Workloads Gather, View, and Search Hadoop Logs Track Events from Across the Cluster Report on System Performance and Usage Create Custom Charts and Landing Page Manage Resources Manage Workflows Set Time Context Globally
  9. Key Features: Manage BDR
  10. Key Features: Zero Downtime Rolling Upgrades
  11. Hopefully, we have established that Cloudera Manager makes it easy and is the only production-ready administration tool for Hadoop. Cloudera Manager has a built in Upgrade Wizard to make upgrades simple and predictable and also features zero-downtime rolling upgrades
  12. Cloudera Enterprise 5 provides additional enterprise-ready capabilities and marks the next step in the evolution of the Hadoop-based data management platform. Latest and Greatest
  13. Multitude of other factors For most mission critical workloads, downtime is never an option. Any downtime can have a direct impact on revenue and lead to frantic calls in the middle of the night. For this reason, upgrading the software that powers these workloads can often be a daunting task. It can cause unpredictable issues without access to support. Hadoop consists of dozens of components, running across multiple machines, all with their own configurations. That can lead to a lot of complexity and uncertainty - especially when taking the upgrade plunge. That’s why an enterprise-grade administration tool is crucial for running Hadoop in production.
  14. A built-in Upgrade Wizard in Cloudera Manager 5 makes it easy to upgrade CDH on your clusters. The Upgrade Wizard (enhanced) performs service-specific upgrade steps that you would have had to run manually in the past.
  15. Misc other things === No retry support on failure yet
  16. For example, to upgrade to CDH 5.3, you must be on Cloudera Manager 5.3 or higher. === Maintenance CDH 5 Downgrades are still called “upgrades” Other Variations: Environments / Cross-cutting features HA (HDFS, MR1, Yarn, Oozie...) Security (Kerberos, SSL, Sentry)
  17. Both parcel and package installations are supported by the Upgrade Wizard. Using parcels is the preferred and recommended way, as packages must be manually installed, whereas parcels are installed by Cloudera Manager. ==== By type of bits
  18. Rolling restart capability enables zero-downtime upgrades under certain conditions. vs. Regular restart If you are using parcels, have a Cloudera Enterprise license, and have enabled HDFS high availability, you can perform a rolling upgrade for non-major upgrades. This enables you to upgrade your cluster software and restart the upgraded services without incurring any cluster downtime. Note that it is not possible to perform a rolling upgrade from CDH 4 to CDH 5 (i.e. major upgrade) because of incompatibilities between the two major versions. For minor and maintenance upgrades, you will have the option to select Rolling Upgrades where Supported services will undergo a rolling restart…while the rest will undergo a normal restart, with some downtime.
  19. Log in to the Cloudera Manager Admin Console. To access the wizard, on the Home page, click the cluster’s drop down menu, and select Upgrade Cluster. Alternately, you can trigger the wizard from the Parcels page, by first downloading and distributing a parcel to upgrade to, and then selecting the Upgrade button for this parcel. Select the CDH version. If the option to pick between packages and parcels is provided, click the Use Parcels radio button. If there are no qualifying parcels, the location of the parcel repository will need to be added under Parcel Configuration Settings. It will provide additional steps to prepare your cluster for upgrade. The Wizard will now prompt you to backup existing databases. Check Yes for all required actions to be able to Continue. Please read the Upgrade Documentation for a more complete list of actions to be taken at this stage, before proceeding with the upgrade. The Wizard now performs consistency and health checks on all hosts in the cluster. This is particularly helpful if you have mismatched versions of packages across cluster hosts. If any problems are found, you will be prompted to fix these before continuing. The selected parcel is downloaded and distributed to all hosts. For major upgrades, the Wizard will warn that the services are about to be shut down for the upgrade. For minor and maintenance upgrades, if you are using parcels and have HDFS high availability enabled, you will have the option to select Rolling Upgrades on this page. Supported services will undergo a rolling restart, while the rest will undergo a normal restart. Check Rolling Upgrade to proceed with this option. Until this point, you can exit and resume the Wizard without impacting any running services. The Command Progress screen displays the results of the commands run by the Wizard as it shuts down all services, activates the new parcel, upgrades services, deploys client configuration files, and restarts services. The service commands include upgrading HDFS metadata, upgrading the Oozie database and installing ShareLib, upgrading the Sqoop server and Hive Metastore, among others. The Host Inspector runs to validate all hosts, as well as report CDH versions running on them. At the end of the Wizard, you are prompted to finalize the HDFS metadata upgrade. It is recommended at this stage to refer to the Upgrade Documentation for additional steps that might be relevant to your cluster configuration and upgrade path. For major (CDH 4 to CDH 5) upgrades, you have the option of importing your MapReduce configurations into your YARN service. Additional steps in the Wizard will assist with this migration. On completion, we recommend reviewing the YARN configurations for any additional tuning you might need. Your upgrade is now complete!
  20. === If the cluster can’t access the internet or even if their cluster has internet access They may want to stage their own repos anyway, A parallel shell or some way to execute commands across all cluster hosts
  21. Notable points & possible issues - before you upgrade job, app and tool compatibility === Restore a fresh environment and repeat if necessary
  22. You should still read the docs Please refer to the Upgrade Documentation for more comprehensive details on using the Upgrade Wizard and the steps if the upgrade wizard reports a failure
  23. === Synopsis Customer upgraded to CDH4.2.1. Long-running or large MapReduce jobs were failing. Among some other configuration changes customer was trying to do simultaneously (JT HA introduction, Kerberizing cluster) it was discovered that the setting mapred.job.reuse.jvm.num.tasks = -1 was causing the failure (MAPREDUCE-4490). Where CM would help? Cloudera Manager’s default setting for this is mapred.job.reuse.jvm.num.tasks = 1 which could have prevented hitting this known issue.
  24. Sharing their expertise for large & critical upgrades (downtime, data loss) Cloudera’s goal is to deliver customer experience
  25. Makes the upgrade process less stressful
  26. Why You Need Backup & Disaster Recovery Your EDH is a Mission-Critical Part of the Data Management Infrastructure Stores valuable data and runs important workloads Business continuity is a MUST HAVE Managing Business Continuity for Hadoop is Complex Different services that store data – HDFS, HBase, Hive Backup and disaster recovery is configured separately for each Processes are manual BDR in CM is important and makes it easy to manage Hive replication, metadata replication, and have data readily available across datacenters. It also is automated and fault tolerant
  27. Central configuration: Define backup and disaster recover policies and apply across services Track progress of replication jobs and get notified when data is out of sync High performance & CDH-optimized replication using MapReduce via DistCP - the replication uses the scalability and availability of MapReduce and YARN to parallelize the copying of files using a specialized MapReduce job or YARN application that efficiently and quickly transfers only changed files from each Mapper to the replica side.
  28. Cloudera Manager provides an integrated, easy-to-use management solution for enabling data protection in the Hadoop platform. Manager provides rich functionality aimed towards replicating data stored in HDFS and accessed through Hive. When critical data is stored on HDFS, Cloudera Manager provides the necessary capabilities to ensure that the data is available at all times, Cloudera Manager provides key capabilities that are fully integrated into the Cloudera Manager Admin Console: Select - Choose the key datasets that are critical for your business operations. Schedule - Create an appropriate schedule for data replication and/or snapshots – trigger replication and snapshots as frequently as is appropriate for your business needs. Monitor - Track progress of your snapshots and replication jobs through a central console and easily identify issues or files that failed to be transferred. Alert - Issue alerts when a snapshot or replication job fails or is aborted so that the problem can be diagnosed expeditiously. Replication capabilities work seamlessly across Hive and HDFS – replication can be setup on files or directories in the case of HDFS and on tables in the case of Hive. Hive metastore information is also replicated which means that the applications that depend upon the table definitions stored in Hive will work correctly on the replica side All Cloudera BDR functionality is available directly through the Cloudera Manager Admin Console.
  29. Diagnostic Data Bundles - From log files and other longitudinal data sets drawn from Cloudera Manager Enable predictive and proactive support of customer clusters under license. Based on similar use cases- And comparative analysis of diagnostics across all nodes under subscription
  30. Cloudera’s dedicated Proactive Support unit ensures customers are prepared to benefit from every element of their subscription. Proactive Support includes reviewing configurations for known issues and providing comparisons of usage patterns to help enhance your operations and plan for future changes. Unique to Cloudera, our Predictive Support model means we're regularly monitoring the status of your EDH environment, allowing us to isolate and prevent issues before they even occur by analyzing support cases and platform usage across all deployments. === Full Lifecycle Support.  starting on day one. The onboarding process scopes technical assistance to customer requirements, introduces key product documentation and community resources, and assures you can take full advantage of the online support portal to meet your business goals.  Support as a Strategic Advantage.  We also ensure that customers are optimizing their use of Cloudera's technical resources, starting with the onboarding process. via our Proactive Support program
  31. Cloudera offers the industry’s best Hadoop support === Some internal results of external feedback loops, benefitting customers: COE Pods: support resources specialized by committership and allocated to support specific parts of Hadoop/EDH to enhance expertise, responsiveness, and staffing based on customer needs Support Portal: full Proactive Support orientation to online customer support and communication resources during onboarding License Key Provision: helps keep customers unified and up to date between systems and provide ability to generate and manage license keys for Cloudera Manager and CDH Cluster stats Cloudera Communities: user forums for basic self-support. insights into best practices, and virtual community-building Account Health Check: ongoing nine-attribute diagnostic to correlate the most important characteristics determining customer satisfaction Customer Operations Tools Team (COTT): Cloudera staff dedicated to building tools that enable predictive and proactive support of customer clusters under license using longitudinal data drawn primarily from Cloudera Manager CSI: HBase database of cluster data, community info, knowledge base, support records, Cloudera internal Monacle: Search-based user interface for CSI Validations (under development): automated alerting system based on comparative analysis of diagnostics across all nodes under subscription
  32. An enterprise data hub can store unlimited data, cost-effectively and reliably, for as long as you need, and lets users access that data in a variety of ways. Data can be collected, stored, processed, explored, modeled, and served in one unified platform. It’s connected to the systems you already rely on. Our EDH emphasizes “unified analytics” We enable customers to gather unlimited data, while bringing diverse processing and analytics to that data. == In response, many organizations have turned to a new architecture – an enterprise data hub – to complement and extend existing investments. Cloudera’s enterprise data hub, powered by Apache Hadoop, the popular open source distributed data platform, is differentiated in several crucial areas. We provide: Leading query performance. The enterprise management and governance that you require of all of your mission-critical infrastructure. Comprehensive, transparent, compliance-ready security at the core. An open source platform that is also built of open standards – projects that are supported by multiple vendors to ensure sustainability, portability, and compatibility. Our platform runs in your choice of environment, whether on-premises or in the cloud. === Cheat Sheet version: Our enterprise data hub is: One place for unlimited data Accessible to anyone Connected to the systems you already depend on Secure, governed, managed & compliant Built on open source and open standards Deployed however you want Coupled with the support and enablement you need to succeed. Important Note: over “unified data”: It’s not practical or probable that customers will actually unify all their data. Much of it lives in the cloud or on storage (e.g. Isilon), in remote datacenters, is of uncertain value vs. cost of moving it to a hub, or security mandates preclude collocation.
  33. * We offer the most complete set of processing, analysis, and serving frameworks for Hadoop. * Including comprehensive support for YARN. *For example, Impala runs on YARN. YARN is not a differentiator.* What’s really significant about this architecture is how it unifies diverse access to common data. In traditional approaches, you’d have separate systems to collect, store, process, explore, model, and serve data. Different teams would use different systems for each workload, and users whose roles span multiple systems would have to use several of them to achieve their objectives. With Cloudera’s enterprise data hub: You can perform end-to-end data workflows in a single system, dramatically lowering time to value. Each workload can access unlimited data, thanks to the underlying data platform, enhancing the value of each workload. Users can now access their data in new ways and are enabled by these diverse workloads to interact with data Cloudera Enterprise provides comprehensive support for batch, interactive, and real-time workloads: Batch Data integration with Apache Sqoop Data processing with MapReduce, Apache Hive, Apache Pig Memory-centric processing with Apache Spark Interactive Analytic SQL with Impala Search with Apache Solr Machine Learning with Apache Spark Real-Time Data integration with Apache Kafka, Apache Flume Stream processing with Apache Spark Data serving with Apache Hbase Shared resource management ensures that each workload is handled appropriately and abides by IT policy. === What’s more, 3rd party tools, such as SAS or Informatica can run as native workloads inside Cloudera’s enterprise data hub.
  34. To enable you achieve the benefits of an enterprise data hub without compromise, we offer the most comprehensive security capabilities of any Hadoop solution. We approach security in terms of 4 core pillars: Perimeter security. Can we ensure only the right people have access to the cluster? Access controls. Can we ensure people using the cluster can access only the right data? Visibility. Can we ensure that these rules are being followed, and that malicious activity isn’t taking place? Trust but verify. Data protection. If all else fails, can we ensure that data is comprehensively encrypted, both at rest and in transit? *It's all too easy for other vendors to claim their platforms are "secure" because they cover one or more of these pillars.* It’s important to ensure complete coverage in order to protect your customers and your most sensitive data. Key capabilities include: Active Directory and Kerberos for all identity management and user / service authentication Sensitive data is restricted to authorized personnel and secured against privileged users Data encrypted using dedicated key manager tied to corporate HSM as root of trust Full logging of data access, creation of derivative data sets, and changes to access permissions
  35. Cloudera partners more broadly and deeply across the Hadoop ecosystem than any other vendor. With over 1300 partners and counting, our partnerships offer: Compatibility with your existing tools and skills 160+ certified on Cloudera 5, including all 12 of the 12 Gartner Business Intelligence Magic Quadrant leaders Flexible deployment options On-premises Public, private, or hybrid cloud Appliances and engineered systems Partnerships you can trust Deep engineering relationships Comprehensive certification program
  36. Workload/ Resource Management Service Extensibility - enabling 3rd party applications
  37. Enhanced Impala Query Monitoring YARN/MR2 Monitoring User defined triggers – custom alerts Oozie and YARN RM High Availability workflows
  38. Why C5 is great Any enhancements are ineffective if the benefits of the enterprise data hub are not easily accessible to existing users. That’s why Cloudera has placed an increased emphasis on the upgrade experience, to make it easier to upgrade to the latest version of the software. The team will continue to work on making improvements to this experience. More details about the Cloudera 5.3 release can be found at “Cloudera Enterprise 5.3 is Released.” CDH 5.3 is now available for download.
  39. Preview of the next webinar
  40. To ensure the highest level of functionality and stability, consider upgrading to the most recent version of CDH.