3. intro
Adrian Cole (@jclouds)
founded jclouds march 2009
member of the Apache Whirr project
chief evangelist at Cloudsoft
4. so you want to build a
cluster..
Web REST
HTable
Browser Client
Public
UI RPC UI RPC UI RPC HTTP
Master RegionServer RegionServer RestServer
listen listen listen listen
Private
5. First, you need to analyze
the roles you need in the
cluster
hadoop- hadoop- hbase-
zookeeper
namenode datanode master
hadoop- hadoop- hbase-
jobtracker tasktracker regionserver
6. Next, organize and
understand the topology
hadoop-
tasktracker
hadoop-
zookeeper
namenode
hadoop-
datanode
hadoop- hbase-
jobtracker master
hbase-
regionserver
7. Then you need to develop
the workflow
provision
bootstrap install
configure
operate manage
14. jclouds concepts
Templates abstractly describe nodes:
What OS to use, what versions...
What kind of hardware, what features are needed...
Groups organize identically configured nodes
19. whirr apache
spec = new ClusterSpec();
spec.setProvider(“cloudservers-uk”);
spec.setIdentity(apikey);
spec.setCredential(secret);
spec.setClusterName(“hbase”);
spec.setInstanceTemplates(ImmutableList.of(
new InstanceTemplate(1,”hbase-master”),
new InstanceTemplate(6,”hbase-regionserver”)));
cluster = new ClusterController().launchCluster(spec);
$ whirr launch-cluster
--cluster-name=hbase
--instance-templates='1 hbase-master,6 hbase-regionserver'
20. Whirr internals
BootstrapClusterAction
creates cluster and runs bootstrap script for all roles
ConfigureClusterAction
configures firewall and runs configure script for all roles
DestroyClusterAction
destroys the nodes in the cluster
24. Puppet-Whirr implementation
beforeBootstrap
install all modules for roles prefixed with puppet:
beforeConfigure
apply all manifests w/attribs, then puppet apply
Notes!
tar.gz files are staged to a BlobStore instead of scp’ing
25. Special Thanks!
Alex Heneveld: Garrett Honeycutt: Chad Metcalf:
design + puppety advice concept +
integration elaboration
26. What’s next?
• More services
• VBox Support
• Chef Solo/Puppet Master integration
• Cluster healing
Podling Project Management Committee \njclouds focuses on cloud storage and infrastructure portability\nwhirr focuses on portable management of clustered services\n
The UI ports are for web browsers to see the status pages of HBase (master and regionservers have this available). The RPC port uses native (Hadoop) RPC to talk between client and servers. The HTTP/REST exposes a similar API as the RPC does, and is used by clients wanting to tunnel through a firewall for example, or implement in non-Java languages. Internally the same ports are used to communicate between servers, i.e. RPC native calls. Default ports are: Master (60000 for RPC and 60010 for UI) and RegionServer (60020 RPC, 60030 UI) but can be changed. REST uses 8080 by default.\n
\n
\n
\n
get a hold of a few machines with the right base operating system, role based templates can help selecting the appropriate features of the hardware.\n