1. How Ambari manifest files
are used by System Center
and Windows Azure
Brian Swan
Program Manager, HDInsight Team
Microsoft
2. A representation of a software packages to be installed on a cluster
(typically Hadoop, but also any custom packages, such as Java or
Python). This representation captures all the invariants such as
services, components, properties associated with a specific package.
Authored by package distributor.
A mapping between a package component and one or more logical
host groups defined in the host manifest.
Authored by Hadoop Admin.
Contains a list of logical host definitions, system-level resources, and
(optionally) the actual hosts that fall into the host def categories.
When actual hosts are not described, references that are realized by
on-demand services (such as a cloud provider) are included. A logical
group may contain one or more hosts.
Authored by System Admin.
Captures the specific configuration for a deployment at the cluster
level, as well as overrides at the service and component levels.
Authored by Hadoop Admin.
HostComponentMapping.json
Manifest Files - Overview
HostManifest.json
PackageDefinition.json
PackageConfiguration.json
3. A representation of a software packages to be installed on a cluster
(typically Hadoop, but also any custom packages, such as Java or
Python). This representation captures all the invariants such as
services, components, properties associated with a specific package.
Authored by package distributor.
A mapping between a package component and one or more logical
host groups defined in the host manifest.
Authored by Hadoop Admin.
Contains a list of logical host definitions, system-level resourced, and
(optionally) the actual hosts that fall into the host def categories.
When actual hosts are not described, references that are realized by
on-demand services (such as a cloud provider) are included. A logical
group may contain one or more hosts.
Authored by System Admin.
Captures the specific configuration for a deployment at the cluster
level, as well as overrides at the service and component levels.
Authored by Hadoop Admin.
HostComponentMapping.json
Manifest Files - Overview
HostManifest.json
PackageDefinition.json
PackageConfiguration.json
4. A representation of a software packages to be installed on a cluster
(typically Hadoop, but also any custom packages, such as Java or
Python). This representation captures all the invariants such as
services, components, properties associated with a specific package.
Authored by package distributor.
A mapping between a package component and one or more logical
host groups defined in the host manifest.
Authored by Hadoop Admin.
Contains a list of logical host definitions, system-level resourced, and
(optionally) the actual hosts that fall into the host def categories.
When actual hosts are not described, references that are realized by
on-demand services (such as a cloud provider) are included. A logical
group may contain one or more hosts.
Authored by System Admin.
Captures the specific configuration for a deployment at the cluster
level, as well as overrides at the service and component levels.
Authored by Hadoop Admin.
HostComponentMapping.json
Manifest Files - Overview
HostManifest.json
PackageDefinition.json
PackageConfiguration.json
5. A representation of a software packages to be installed on a cluster
(typically Hadoop, but also any custom packages, such as Java or
Python). This representation captures all the invariants such as
services, components, properties associated with a specific package.
Authored by package distributor.
A mapping between a package component and one or more logical
host groups defined in the host manifest.
Authored by Hadoop Admin.
Contains a list of logical host definitions, system-level resourced, and
(optionally) the actual hosts that fall into the host def categories.
When actual hosts are not described, references that are realized by
on-demand services (such as a cloud provider) are included. A logical
group may contain one or more hosts.
Authored by System Admin.
Captures the specific configuration for a deployment at the cluster
level, as well as overrides at the service and component levels.
Authored by Hadoop Admin.
HostComponentMapping.json
Manifest Files - Overview
HostManifest.json
PackageDefinition.json
PackageConfiguration.json
6. A representation of a software packages to be installed on a cluster
(typically Hadoop, but also any custom packages, such as Java or
Python). This representation captures all the invariants such as
services, components, properties associated with a specific package.
Authored by package distributor.
A mapping between a package component and one or more logical
host groups defined in the host manifest.
Authored by Hadoop Admin.
Contains a list of logical host definitions, system-level resourced, and
(optionally) the actual hosts that fall into the host def categories.
When actual hosts are not described, references that are realized by
on-demand services (such as a cloud provider) are included. A logical
group may contain one or more hosts.
Authored by System Admin.
Captures the specific configuration for a deployment at the cluster
level, as well as overrides at the service and component levels.
Authored by Hadoop Admin.
HostComponentMapping.json
Manifest Files - Overview
HostManifest.json
PackageDefinition.json
PackageConfiguration.json
7. Deployment using System Center
Note: The tools described here for deploying Hadoop clusters using System
Center are prototype tools used internally at Microsoft. The intent here is to
demonstrate one consumer of cluster manifest files.
8. System Center – Prerequisites
Deployment
DB
System Center
Virtual Machine Manager
(VMM)
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• System Center 2013
• VM running Virtual Machine Manager
(VMM) with…
• Hadoop Service Template
• Windows Server VHD
• HDInsight Deployment Tool
• Deployment Database (SQL Server)
9. Phase 1: Parse, Validate, Populate DB
Deployment
DB
System Center
VMM
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• Copy manifest files to Deployment Tool directory.
Manifest
Files
10. Phase 1: Parse, Validate, Populate DB
Deployment
DB
System Center
VMM
HadoopServiceTemplate.xml
>HDInsightDeployment.exe
• Copy manifest files to Deployment Tool directory.
• Update the Deployment Tool configuration file.
11. Phase 1: Parse, Validate, Populate DB
Deployment
DB
System Center
VMM
HadoopServiceTemplate.xml
>HDInsightDeployment.exe
• Copy manifest files to Deployment Tool directory.
• Update HDInsightDeployment.exe.config.
• Start deployment with HDInsightDeployment.exe.
• Deployment tool reads and validates manifest files.
• Schema validation.
• Dependency validation.
12. Phase 1: Parse, Validate, Populate DB
Deployment
DB
System Center
VMM
HadoopServiceTemplate.xml
>HDInsightDeployment.exe
• Copy manifest files to Deployment Tool directory.
• Update HDInsightDeployment.exe.config.
• Start deployment with HDInsightDeployment.exe.
• Deployment tool reads and validates manifest files.
• Schema validation.
• Dependency validation.
• Deployment DB is populated with steps for creating system
resources on hosts (e.g. Users/Groups/Firewall Rules/etc.)
• Deployment DB is populated with ordered steps for installing
Hadoop (and other packages).
13. Phase 2: Download Packages
Deployment
DB
System Center
VMM
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• Deployment tool downloads/copies packages to VMM based on
information in PackageDefinition.json.
14. VMM
Phase 3: Provision VMs, Install Packages
Deployment
DB
System Center
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• VMM does VM provisioning based on HostManifest.json file.
15. VMM
Phase 3: Provision VMs, Install Packages
Deployment
DB
System Center
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• VMM does VM provisioning based on HostManifest.json file.
VM1
VM2
VM3
VM4
MASTER_HOSTS
SLAVE_HOSTS
16. VMM
Phase 3: Provision VMs, Install Packages
Deployment
DB
System Center
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• VMM does VM provisioning based on HostManifest.json file.
• Hadoop Service Template (a VMM template) specifies which
system components to install (e.g. Deployment Agent)
• Starts Deployment Agent
VM1
VM2
VM3
VM4
17. VMM
Phase 3: Provision VMs, Install Packages
Deployment
DB
System Center
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• VMM does VM provisioning based on HostManifest.json file.
• Template specifies which system components to install (e.g.
Deployment Agent)
• Starts Deployment Agent
VM1
Deployment
Agent
VM2
Deployment
Agent
VM3
Deployment
Agent
VM4
Deployment
Agent
18. VMM
Phase 3: Provision VMs, Install Packages
Deployment
DB
System Center
HadoopServiceTemplate.xml
Win.vhd
>HDInsightDeployment.exe
• VMM does VM provisioning based on HostManifest file.
• Template specifies which system components to install (e.g.
Deployment Agent)
• Starts Deployment Agent
• Deployment Agents pull packages from SCVMM
VM1
Deployment
Agent
VM2
Deployment
Agent
VM3
Deployment
Agent
VM4
Deployment
Agent
19. Phase 4: Create System Resources, Install
Packages
Deployment
DB
System Center
VM1
Deployment
Agent
VM2
Deployment
Agent
VM3
Deployment
Agent
VM4
Deployment
Agent
• Deployment Agents create system resources
(Users/Groups/Firewall Rules/etc.) from steps in
Deployment DB hdfs_user
hadoop_admin
mapred_user
hadoop_admin
hdfs_user
mapred_user
hdfs_user
mapred_user
20. Phase 4: Create System Resources, Install
Packages
Deployment
DB
System Center
VM1
Deployment
Agent
VM2
Deployment
Agent
VM3
Deployment
Agent
VM4
Deployment
Agent
• Deployment Agents create system resources
(Users/Groups/Firewall Rules/etc.) from steps in
Deployment DB
• Deployment Agents work through steps for
installing Hadoop (and other packages)
• Packages contain scripts that will be invoked
for installing custom components (e.g. Java,
Python, etc.)
HDFS
NameNode
MapReduce
JobTracker
HDFS, MapReduce
DataNode, TaskTracker
HDFS, MapReduce
DataNode, TaskTracker
21. Phase 4: Create System Resources, Install
Packages
Deployment
DB
System Center
VM1
Deployment
Agent
VM2
Deployment
Agent
VM3
Deployment
Agent
VM4
Deployment
Agent
• Deployment Agents create system resources
(Users/Groups/Firewall Rules/etc.) from steps in
Deployment DB
• Deployment Agents work through steps for
installing Hadoop (and other packages)
• Packages contain scripts that will be invoked
for installing custom components (e.g. Java,
Python, etc.)
• Deployment Agents stores states of steps for re-trys
upon failures.
23. WA Blob Storage
Phase 1: Submit request, generate
manifest files
Windows Azure
Deployment Service
• Cluster creation request submitted via Windows Azure Portal.
• Deployment Service generates and validates manifest files.
• DA stores manifest files in Blob Storage.
• (Hadoop package files are already in Blob Storage.)
24. Windows Azure Fabric
WA Blob Storage
Phase 2: Generate/submit deployment
files
Windows Azure
Deployment Service
• Deployment Service generates Cloud Service deployment files.
• .cspkg: contains Deployment Agent
• .cscfg: contains instance counts for VMs and location of
generated manifest files.
• Cloud Service deployment files are submitted to Windows Azure
Fabric.
.cspkg .cscfg
25. WA Blob Storage
Phase 3: Provision VMs, Deployment
Agent
Windows Azure
Deployment Service
• Windows Azure Fabric provisions VMs and deploys Deployment
Agent on VMs
Windows Azure Fabric
26. WA Blob Storage
Phase 3: Provision VMs, Deployment
Agent
Windows Azure
• Windows Azure Fabric provisions VMs and deploys Deployment
Agent on VMsWindows Azure Fabric
VM1
VM2
VM3
VM4
WEB_ROLES
WORKER_ROLES
Deployment
Agent
Deployment
Agent
Deployment
Agent
Deployment
Agent
27. VM1
WA Blob Storage
Phase 4: Get manifest files, install
components
Windows Azure
• Deployment Agent determines environment and VM type.
• Deployment Agent gets manifest files based on location in .cscfg
file.
Windows Azure Fabric
VM2
VM3
VM4
Deployment
Agent
Deployment
Agent
Deployment
Agent
Deployment
Agent
WEB_ROLES
WORKER_ROLES
28. VM1
WA Blob Storage
Phase 4: Get manifest files, install
components
Windows Azure
• Deployment Agent generates in-memory list of activities for
installing components.
• Deployment Agent retrieves packages (based on repo location in
PackageDefinition file).
Windows Azure Fabric
VM2
VM3
VM4
Deployment
Agent
Deployment
Agent
Deployment
Agent
Deployment
Agent
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
• ----------
Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file.Note that SQL Authentication is shown in the sqlConnectionString. In production environment, Integrated Authentication is/should be used.
Dependency validation is validation to make sure the cluster can run once it is deployed.Examples include…Is there Package Definition that matches the package specified in the Host-Component-Mapping?Are host groups consistent across Host-Component-Mapping and Host Manifest files?If Hive is selected to install, are its dependencies selected and available?In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
Deployment DB is populated with ordered steps for installing Hadoop (and other packages). For example…Install HDFS service before MapReduceInstall NameNode component before DataNode component
Deployment Agents stores states of steps for re-trys upon failures.E.g. if namenode install fails, it will retryIf namenode install fails, datanode will not proceed.Once issue is resolved, deployment agent will pick from last successful step
Deployment Service is transparent to users.Deployment Service is a Cloud Service running in Windows Azure.Currently, the manifest files are mostly static. The HostManifest file isn’t used at all. VM information is handled by Azure Fabric.We have flexibility going forward to incorporate user input (e.g. configuration overrides).Manifest files are stored in user storage account.HDP and other packages are in HDInsight blob storage account.
Web/Worker Roles are logical host groups in Windows Azure (the types of VMs)VM sizes are fixed (for now).
Deployment Agent is the same code that is used in System Center scenario. Logic is forked based on environment.