Apache Ambari provides a 100% open source and intuitive set of tools to monitor, manage and efficiently provision your Apache Hadoop cluster. Ambari simplifies the operation and hides the complexity of Hadoop, making Hadoop appear like a single, cohesive data platform. Hadoop cluster provisioning and ongoing management can be a complicated task, especially when there are hundreds or thousands of nodes involved. Ambari allows you to control Hadoop cluster services from a single point. In this session, we will provide an overview of the Apache Ambari key features, architecture and web service-based APIs.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Managing your Hadoop Clusters with Ambari
1. Apache Ambari – Management, Monitoring and
Operations
Mahadev Konar
Co Founder & Software Engineer
@mahadevkonar (@hortonworks)
Pramod Thangali
Director of Engineering
@pramodthangali (@hortonworks)
Page 1
2. Hello!
• Hadoop
– Contributor since 2006
– Committer/PMC
• ZooKeeper
– Contributor since 2008
– Committer/PMC
• Ambari
– Contributor since 2012
– Committer/PPMC
Page 2
3. Ambari Easiest way to manage Hadoop Clusters
Cluster Operations Job Diagnostics Extensible Platform
Core capabilities to include the Insight into job performance and Integration and customization
critical tasks associated with reduce the burden on points so Hadoop can
provisioning and operating specialized Hadoop skills and interoperate with existing
Hadoop clusters. knowledge. operational tooling.
Page 3
4. Architecture
Web Client
REST API JS
Ambari
/clusters Web
Configurable
java
Ambari Auth Provider
Server
Auth Provider
Cluster
REST API
Configurations
RDBMS
Request Dispatcher LDAP
DB User
postgres
Orchestrator Monitoring Repo
Bootstrap or
Manual install
python Ambari Ganglia php
puppet
Agent/s Collector
Page 4
6. How to plug into the API’s
• Consistent front-end REST API for
–Accessing Hadoop and System metrics
–Managing cluster resources
• Service provider plugin architecture to enable
–Adding new components and resources
–Providing alternative service providers to
manage existing components
< Place tittle Here by using Header and Footer Options >
Page 6
8. Ambari + Teradata Viewpoint Integration
• Integrate with Ambari to provide Monitoring
REST API
• Use Ganglia for metrics collection
• No Ambari mgmt controls
• No Ambari UI
• No Ambari Agents
P
HDP Cluster Ambari
Custom P
Page 8
9. Extending Ambari – Adding new services
• Stack Definitions
– Define which Services are available (services)
– Define where to get the packages (repos)
repos
Stack A
services S S S S S
repos
Stack B
services S S S S
repos
Stack C
extends
Stack A S S S S S S
services +
Page 9
11. Ambari RoadMap
• Releases
–Ambari 1.2.0
–Ambari 1.2.1
–Ambari 1.2.2 (under vote)
–Ambari 1.2.3 (under development for stability fixes)
• Ambari 1.3.0
• Ambari 1.4.0
• Futures
< Place tittle Here by using Header and Footer Options >
Page 11
12. Ambari 1.2.2 – whats new
• Bug Fixes to 1.2 line
• Host-level Alerts
• Paging Controls on Zoomed Graphs
• Change Ambari Web Port #
• Support for AD Authentication
• Quicker Install/Start/Test progress display
• Upgrade Paths
– 1.2.0 1.2.2 and 1.2.1 1.2.2
• Performance
– Optimized deployment + monitoring on clusters
< Place tittle Here by using Header and Footer Options >
Page 12
13. Host Level Alerts
• Host-specific alerts
• For example:
– TaskTracker process, DataNode
process, DataNode Storage
< Place tittle Here by using Header and Footer Options >
Page 13
14. Paging Controls on Zoomed Graphs
Last 1 Hour
Last 12 Hours
Last 2 Hours
Last 24 Hours
Last 4 Hours
Last 1 Week
Page 14
15. Ambari 1.3
• Manage Kerberos Secure Cluster
• Managed Stack Upgrade
• Disaster protection service via HDFS mirroring
• Multi-tenancy support via capacity Scheduler
• Improved Configuration Mgmt with host-level controls
• Master service mobility to ease hardware
maintenance
< Place tittle Here by using Header and Footer Options >
Page 15
16. Ambari 1.3 continued
• External group mappings (LDAP/AD)
• Support API and external driven cluster Installation
Management
• Enhanced Job Diagnostic visualizations
• HUE Support for Hadoop data workers
Page 16
17. Multi Tenancy Support
• Compute resource quotas
– Managed via Capacity Scheduler and used in PROD at Yahoo!
– Basic configuration support in Ambari 1.3
– Visualization for Queue Diagnostics in Ambari 1.4 (target)
Page 17
18. Multi Tenancy Support – User and Data
Management
• Disk and Name space quotas
• Support for quota mgmt in Ambari 2.x (target)
Page 18
19. Release Schedule and Info
• Ambari 1.2.2 (pending release)
• Ambari 1.2.3 (2-3 weeks)
• Ambari 1.3.0 (May)
• Ambari 1.4.0 (TBD)
• Links
– Website
– http://incubator.apache.org/ambari/
– Mailing Lists
– http://incubator.apache.org/ambari/mail-lists.html
– Development Wiki
– https://cwiki.apache.org/confluence/display/AMBARI
– Looking for Apache contributors
< Place tittle Here by using Header and Footer Options >
Page 19