SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Whatever happened to cman?

                             Christine “Chrissie” Caulfield, Red Hat
                                     ccaulfie@redhat.com

Version history
  0.1   30th January 2009        First draft
  0.2   3rd February 2009        Add a chapter on migrating from libcman
  0.2   27th February 2009       Add some commas and fix some typos
1. Introduction

CMAN was the mainstay of Red Hat's cluster suite in Red Hat Enterprise Linux 4, 5 and 6. Actually
CMAN in RHEL4 was a totally different thing to that found in RHEL5 and 6 but the external interfaces
looked very similar.

So what was CMAN?

CMAN stood for “Cluster MANager” (boring eh?) and provided the low-level infrastructure for the
cluster; node communications, votes, quorum etc. Most of the useful bits of the cluster stack were built
on top of cman so few people got to see it unless things went horribly wrong.

Starting with RHEL5 we dispensed with the custom-written kernel CMAN and based the cluster
manager on OpenAIS. For more details about this see my document on the subject:
http://people.redhat.com/ccaulfie/docs/aiscman.pdf.

OK, a short summary then. CMAN in RHEL 5/6 is really just a quorum module that loads into openais
(these days known as corosync). It provides other API services such as messaging but those are really
just wrappers around other openais services.

At the cluster summit in 2008 it was decided that keeping companies' own clustering modules was a
waste of effort and almost always fails to build a community. Even though CMAN (all versions of it)
were released as free software, they lived in their own repositories and were seldom modified by the
original author (ie me!). So efforts were made to make the software even more open and included in the
components of their parent projects. Also part of the collaboration efforts were the pooling of resource
and fence agents and several others. But this is just about CMAN OK? Come on now, concentrate.


2. corosync/openais – what happened there?
In Red Hat Enterprise Linux 5 cman was a module for openais. In RHEL6 (we're not actually supposed
to call it RHEL, but “Red Hat Enterprise Linux” is a lot of typing. And even if I do a search/replace
afterwards it'll be unwieldy to read ... sorry marketing, but this is an engineering paper, not a sales one)
it's a module for corosync. So, what happened there?
Just in case you weren't already confused enough, openais has now been split into two parts. The parts
still called openais are the modules for corosync that implement the SAF AIS services and APIs – you
know: the ones with the VeryAnnoyinglyCapitalisedNames. Corosync is the communications layer
below that, but it also includes other services such as cpg (Closed Process Groups ... very popular
amongst service writers than one) and, now, quorum. Most people won't really notice this split, apart
from, possibly, the name of the process in 'ps' output.



Just to put a little bit of detail on this, the services that are included with corosync are:
cfg cpg evs quorum
Services included in openais are:
Amf Ckpt Clm Evt Lck Msg (see what I mean about the capitalisation ... and that's just the names!)


corosync is the name of the daemon, replacing aisexec.
I hope that's all clear because it gets more complicated from here on in.

3. Intermediate status – RHEL6
Before getting onto the really new code, I'll talk a little about the RHEL6 code which is sort of a half-
way house between RHEL5 and the code that will be in RHEL7. I suppose that makes some sort of
sense, numerically if nothing else.
The first thing you'll notice about RHEL6 is that ccsd is not there any more. Actually I very much
doubt it'll be really the very first thing you notice unless you're incredibly sad or a real clustering nerd,
but I do hope you might have spotted it at some stage. I'll talk (well, write – unless you want to invite
me, all expenses paid, to give a presentation) about that in the next chapter.
Apart from this, you will notice very little difference in the RHEL6 cman. We were under pressure to
allow rolling upgrades from RHEL5 (ie no wire protocol changes) and time constraints mean that the
code discussed below was not stable enough to be included for the deadline. The only other substantial
difference is that cman now is a quorum provider for corosync. I'll also write more about this this later
on (you're really agog now aren't you?), but basically it means that the rest of the corosync modules
will know about the quorum status and can act on it if they feel the urge, and that applications that are
only interested in the quorum status and node list can link straight into libquorum and bypass cman
altogether. This might help some people with the upgrade path ... or not ... we'll see.

4. Configuring corosync
This chapter is also relevant to RHEL6 by the way, just in case you were going to skip the rest of the
document and have an early night.
So, as I mentioned above, ccsd is now gone and I can hear the cheers already. The distribution of
cluster.conf files is now delegated to ricci and a new command ccs_sync. I will not focus too closely on
that bit any more but it brings me to the next main change in the way CMAN (as it still is at this stage)
is loaded. Actually you don't have to use ccs_sync to distribute cluster config files, conga, scp, nfs and
several other methods work perfectly well. Pick that one that suits you best; “choice is good” as the
free-marketeers love to say.
The way that corosync (not aisexec remember?) is launched is still via cman_tool as before. But it now
sets a few environment variables first. The most important of these tells corosync which configuration
modules to use, and there are up to three of those now.
The first, xmlconfig, simply loads the contents of /etc/cluster/cluster.conf into the corosync object
database, where it keeps all the configuration information. The second, cman-preconfig, interprets that
format into a format that is native to corosync by resolving nodenames, generating multicast addresses
etc.
4.1. Those modules in full
Corosync config modules are different from the normal service modules that corosync uses. They are
specified in the environment variable COROSYNC_DEFAULT_CONFIG_IFACE before corosync is
started. This variable can contain a colon-separated list of modules that will be called before corosync
is started for real. They also manage the reloading of data when the underlying configuration file or
device changes. By default cman_tool starts corosync with
COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable

4.2. xmlconfig & friends
xmlconfig is simply an XML parser with a default filename of /etc/cluster/cluster.conf. It reads the tree
of objects it finds in that file and loads it, unmodified, into corosync's object database (objdb). There
are a few other config modules that can be used here. The is an ldap one (called, rather obviously I
hope, ldapconfig) and even a ccsd one too if you're hopelessly addicted to the past. These modules are
quite easy to write and if you have a different back-end database you want to use it's probably quite
simple to write a corosync config module for it.

Just in case you were wondering, objdb is simply an in-memory hierarchical database where corosync
holds configuration data and some state.

The point about xmlconfig and it's similarly-named friends (actually, maybe they're relatives ... hmm)
is that they do not try to interpret the data, they simply copy what is there into objdb. If it makes no
sense to corosync then it will tell you when it tries to start up properly shortly afterwards. You don't
need cman-preconfig (see below) to use these if the format of the data matches the hierarchy of
corosync.conf(5).

cman_tool join allows you to specify a different main config module than xmlconfig on the command-
line with cman_tool join -C <module>. It will always add cman-preconfig though. eg the command:

# cman_tool join -C ldapconfig

will load corosync with the environment variable:

COROSYNC_DEFAULT_CONFIG_IFACE=ldapconfig:cmanpreconfig:openaisserviceenable


4.3 cman-preconfig
Cman-preconfig is a rather strange beast in some ways. It's main purpose is to re-interpret the loaded
configuration file. Let me endeavour to explain.
CMAN since RHEL4 has a very specific XML-based format which has not changed significantly in the
meantime. It contains details of all the nodes in the cluster, along with their fence details and also
rgmanager services too. This format bears no resemblance to corosync.conf. If you load a normal
RHEL cluster.conf into corosync using xmlconfig alone, without passing it through cman-preconfig
then corosync will not start. It will not be able to find a single entry that it needs to run.
cman-preconfig re-interprets this configuration data into a form that corosync expects, and adds some
sensible, we hope, defaults for things not specified at all, eg the cluster multicast address. It also has a
mode where it can start up cman/corosync with no configuration data at all (see cman_tool -X) which is
very useful for time-poor or lazy people who just want to test the cluster a little, or perhaps for
commissioning virtualised clusters. There's a whole chapter on this later on.



4.4. openaisserviceenable
This module simply tells corosync to also load the openais services as well it its own default set.
Normally this is what you will want as many cluster daemons use openais services like checkpointing.
You can disable this by adding the -A switch to cman_tool join.



5. CMAN itself
Now we get to the nitty-gritty. The CMAN module exists in RHEL5 and RHEL6 and does all the
cman-y things that cman does such as quorum and messaging. It also services the libcman library using
an IPC mechanism all of its own. It does not use corosync's IPC (library to daemon communications)
system for reasons that are unlikely to become clear any more.

The new code, as I've said, dispenses with anything called cman at all. The main work of managing
quorum is done by a built-in corosync module called votequorum. This code is actually largely lifted
from the original cman module, with all the non-quorum bits removed, and converted to use the
corosync IPC layer.

This module also includes code for managing quorum devices (eg qdiskd) as well as the hated
“Disallowed” mode. Though both are inactive in a default corosync system.

At the time of writing libcman still exists. It provides the same API as the libcman in previous versions
though there might be small semantic differences. It's also not ABI compatible so the soname has been
incremented to 4. This libcman uses a combination of calls to quorum, confdb and a custom cman
module to provide these services. It is provided largely as a compatibility layer for older applications
though the source code to libcman can also provide ideas on how to do cman-y type things with the
existing corosync services. All the programs in Red Hat clustering will be changed to use the standard
APIs that are provided with corosync.


6. The CMAN module
Now hang on there, you just said “a custom cman module”. So cman does still exist?


Well, yes and no.


There is a module called “cmanbits” which provides services for some libcman APIs. These are:
cman_send_data,
cman_start_recv_data
cman_end_recv_data
cman_is_listening


Of these, the messaging API can be replaced with the normal openais messaging system. There is no
suitable replacement for is_listening.
Missing totally from this new cman is the barriers API which nobody used, to my knowledge. If you
did and are fuming with me for leaving it out ... sorry. But there are actually better ways of achieving
the same effect with existing corosync or openais calls.
Some other functions that were in libcman are now provided by the corosync cfg service. These are
corosync_cfg_kill_node
corosync_cfg_try_shutdown
corosync_cfg_replyto_shutdown
corosync_cfg_get_node_addrs


The are wrapped in the new libcman but are deprecated.



7. corosync quorum

7.1. quorum plugins
A brief word about corosync's quorum system might he helpful here. Corosync has a basic notion of
quorum built in to the executive. Services cannot be synchronised without it and several IPC calls will
be blocked if the cluster is inquorate.
By default corosync does not load the quorum module. This is for backwards compatibility so that
existing clusters will work. If the quorum module was loaded and no quorum provider was specified
then the cluster would never be quorate and never get any work done. Not a popular system I would
guess. If the quorum module is absent then everything just works all the time.
To load the quorum module, you need to add the following stanza to corosync.conf (or equivalent):
service {
       name: corosync_quorum
       ver:0
}
cman-preconfig does this for you by the way. Corosync will then load the quorum module and allow
users to use the basic quorum API as defined in /usr/include/corosync/quorum.h. This simply allows
programs to query the quorate state and to receive notifications when it changes. This notification also
includes a list of nodes that are part of the quorum. It's useful (I hope) to realise that this plugin is also
available in RHEL6 and that cman provides quorum for it. So if you only need to know the quorum
state (ie you don't need any of the other libcman gubbins) then calling into libquorum instead will
provide you with forward-compatibility.
However, as I mentioned above (or on the previous page if you were ungreen enough to print this
document out) this will not give you quorum, all it will do is to block a lot of useful activity in
corosync. To make it work you need a quorum provider.
A quorum provider is a corosync module that tells the corosync quorum plugin when the quorate state
changes, based in its own internal algorithm. There are many ways of calculating quorum so there are
several modules available, and I hope people will write more to suit their own needs. The quorum
provider is loaded by putting its name in corosync.conf as follows:


quorum {
       provider: testquorum
}
This example, as I sincerely hoped you have guessed, loads the 'testquorum' module. Testquorum is an
extremely simple module which I recommend as a template to other quorum provider writers. It simply
changes the corosync quorate status based on the quorum.quorate variable in the object database. Using
corosync-objctl the quorate state can be switched on and off at will. If you have some form of external
quorum system this might actually be useful to you.
Another quorum provider shipped with corosync is YKD. This is the YKD system that has been
shipped with openais for some time, it has merely been re-designated a quorum plugin now. Sadly the
developers did not have time to give it a shiny new logo too so it looks pretty much like it always did.
To be quite frank, I know almost nothing about YKD and it scares me a little, so unless a knight in
shining armour (ideally looking like David Tennant) comes along to rescue me I'll leave it there.
Now, where was I, sorry I got distracted by the David Tennant reference. Oh yes, votequorum is the
quorum module based on cman. This is the one I know most about and is probably the most interest to
those of you coming from a cman background. So I'll devote an entire subsection it. Favouritism? Pah!

7.2. votequorum
Votequorum is the name of the corosync module that replaces most of that cman used to do. Only it
does it in a more 'corosync-y' way. It uses the corosync IPC system for communication with users, it
provides quorum to the central core of the corosync daemon and provides familiar tracking callbacks.
I'll outline the API calls for votequorum here. If you are familiar with libcman then you can go and get
a hot chocolate or something while this bit is going on ... see you in chapter 8.


cs_error_t votequorum_initialize(votequorum_handle_t *handle, votequorum_callbacks_t
*callbacks)
cs_error_t votequorum_finalize(votequorum_handle_t handle)
These two are the familiar corosync initialise & finalise calls that corosync/openais developers should
already be familiar with. libcman users will note that the callbacks are specified here rather than in later
API calls. The callbacks are for quorum change or expected votes change – remember votequorum
does not have a messaging API.
cs_error_t votequorum_fd_get(votequorum_handle_t handle, int *fd);
cs_error_t votequorum_dispatch(votequorum_handle_t handle,cs_dispatch_flags_t
dispatch_types)
Again, these should be familiar sort of call to both corosync and libcman users. You call
votequorum_dispatch when the FD returned from votequrum_fd_get becomes active from select or
poll.


cs_error_t votequorum_getinfo (votequorum_handle_t handle, unsigned int nodeid, struct
votequorum_info *info)
This call fills in the info structure with information about the quorum subsystem and the node for
which you requested information. Members starting node_ are specific to the node in question, the
others are for the cluster as a whole.


cs_error_t votequorum_setexpected(votequorum_handle_t handle, unsigned int expected_votes)
This call changes the expected_votes value for the cluster. This will persist until it is changed again.
NOTE: if you are using a configuration file then a forced reload of that file will overwrite a manually
set votes value, as will adding a new node to the cluster.


cs_error_t votequorum_setvotes (votequorum_handle_t handle, unsigned int nodeid, unsigned
int votes)
This call changes the number of votes assigned to a existing node in the cluster. This will persist until it
is changed again. NOTE: if you are using a configuration file then a forced reload of that file will
overwrite a manually set votes value.


cs_error_t votequorum_qdisk_register(votequorum_handle_t handle, char *name, unsigned int
votes)
cs_error_t votequorum_qdisk_unregister(votequorum_handle_t handle)
cs_error_t votequorum_qdisk_poll(votequorum_handle_t handle, unsigned int state)
cs_error_t votequorum_qdisk_getinfo(votequorum_handle_t handle, struct
votequorum_qdisk_info *info)
This group of calls manages an optional extra quorum device, often a quorum disk. There can only be
one quorum device per cluster and it is up to the quorum device software to ensure that its state is
consistent across the cluster or to remove it from quorum. The device can have any number of votes,
but those votes cannot be changed without re-registering the device – doing this might cause a
temporary loss of quorum. Quorum devices must poll corosync regularly to be regarded as alive, by
default if a quorum device has not called votequorum_qdisk_poll() after ten seconds then it will be
regarded as dead and removed from quorum. This poll period is customisable, see below.
cs_error_t votequorum_setstate(votequorum_handle_t handle)
This call sets a non-resettable flag “HasState” in votequorum. This state is available from the getinfo
call above and used by some subsystems to prevent cluster merges. It only makes sense to call this if
the dreaded “disallowed” flag is set. (see configuration variable below).


cs_error_t votequorum_trackstart(votequorum_handle_t handle, uint64_t context, unsigned int
flags)
cs_error_t votequorum_trackstop(votequorum_handle_t handle)
These are the usual tracking start/stop calls for a corosync plugin. Trackstart enables quorum or
expected_votes change messages to be sent to the notification callback. To avoid race conditions it is
recommended that the flags CS_TRACKSTART | CS_TRACKCHANGES are used so the program
will get one immediate callback containing the current quorum state, then further callbacks as nodes
join or leave the cluster.


cs_error_t votequorum_leaving(votequorum_handle_t handle)
This call sets a flag inside all nodes in the cluster telling them that the node will be leaving the cluster
soon. When the node actually does leave the cluster, then quorum will be reduced and activity will
carry on. This is like 'cman_tool leave remove' in cman. If this is not called before removing a node
then the number of active votes will be reduced and the cluster could lose quorum.
Not that this call does not remove the node from the cluster, it merely sets the flag saying it will. There
is a timeout on this flag of ten seconds. If the node is not shut down within this time it will be
cancelled. It is important that this timeout is longer than CFGs shutdown timeout otherwise leaving will
be cancelled if daemons don't respond.


cs_error_t votequorum_context_get(votequorum_handle_t handle, void **context)
cs_error_t votequorum_context_set(votequorum_handle_t handle, void *context)
These two calls simply allow the context variable returned to the callbacks to be changed.



7.2.1 configuring votequorum
The votequorum module has a small number of variables that you can set in objdb. They are all held
under the main quorum{} section.


       quorumdev_poll          This is the time between quorum device polls, in milliseconds.
                               The default is 10000.
       leaving_timeout         If the node is not shutdown <leaving_timeout> milliseconds after
                               votequorum_leave is called then the leave flag will be cancelled.
                               The default is 10000.
       votes                   The number of votes this node has.
                               The default is 1.
expected_votes          The number of votes that this cluster expects. votequorum will use the
                               largest value of all the nodes in the cluster.
                               The default is 1024. ie you will not have quorum unless you change it!
       two_node                Enables a special two-node mode where a cluster with only two nodes
                               and no quorum device can continue running with only one node. This
                               expects you to have hardware fencing configured. two_node mode can be
                               cleared without a cluster restart (ie you can add a third node) but it
                               cannot be set without restarting all nodes.
                               The default is 0.
       disallowed              This is an odd mode that prevents two clusters with state (See setstate()
                               above) from merging. If this happens then nodes will be marked
                               “disallowed” and not included in quorum calculations. Those who have
                               read up on RHEL5 cman will know this mode and fear it. It cannot be
                               changed without a cluster restart.
                               The default is 0.

8 Configuration free clusters
Configuring a cluster is hard work. Don't you wish you could just plug in a node and have it “just
work”? Well, within several limitations you can. This mode has existed in cman since RHEL4 but is
not well documented (oops, maybe that's my fault) and is hardly used to my knowledge.
The key command to remember is
# cman_tool join -X


On a freshly installed system this will fail with the error:


corosync died: Could not read cluster configuration


OK, so it's not that configuration-free. You don't get something for nothing in this life. However, don't
despair. All you need to do is generate a security key for the cluster and try again:


# mkdir /etc/cluster
# corosync-keygen
# mv /etc/ais/authkey /etc/cluster/cman_authkey


You now need to copy this file to all the nodes you want to join in this cluster. Once you've done that
try cman_tool join -X again and it should magically start up. If you start up the other nodes then they
will join the same cluster.
cman_tool join -X sets up several defaults to get you going. Maybe the first thing you will notice is that
the node numbers are HUGE! They are the IP addresses of the nodes. This isn't a problem, it just looks
a little unwieldy, especially on little-endian systems where adjacently numbered nodes might appear to
have random and massively different nodeids. eg:


# cman_tool nodes
Node Id       Sts     Name
16951488         M     amy
33728704         M     anna
50505920         M     clara
67283136         M     fanny
117614784         M      nadia
134392000         M      sofia


Anyway, if you can cope with those this system should get you going with a simple single cluster that
has no fencing requirements (ie GFS cannot be used).
You can override several things on the cman_tool command-line to add things like sensible nodeids
and a different cluster name or cluster ID. The last two will allow you to run multiple configuration-
less clusters on a single network. By default all nodes will join a cluster called “RHCluster”.
Actually, you can change the nodeids by adding -n<nodeid> to the cman_tool command-line, but that
probably doesn't fit in too well with the zero-configuration system I suppose. See the cman_tool man
page for a list of options.
Quorum is the big problem with this system. There is no way to know how many nodes are needed in
such a cluster. You can set expected_votes on the cman_tool command-line, and I urge you to do so,
but without this quorum will default to 1. Yes ONE. Your cluster will always be quorate. I know this
goes against the principles of a lot of clustering but there are two main reasons why it is done this way.
1. It's easy to get going. A lot of the time, this mode will be used by people just testing clusters and
they want to get going and see what all the fuss is about. Once they understand clusters they can get
involved in sorting out quorum, fencing and configuration files.
2. A lot of the time these clusters are run as virtual machines in systems like Xen or KVM. In this case
there is no danger of a “split-brain” occurring as the systems are either all on the same host computer or
on nodes that are already part of a properly configured cluster with real hardware fencing.



9. Migrating code from libcman
This chapter is just a list of libcman calls showing you where in corosync to find equivalents. Most of
them have very similar names so it shouldn't be too hard I hope. The compatibility libcman provides
most of these calls so you might like to look at the source code for that library if you are unsure about
how to achieve a particular result.

cman_admin_init
cman_init
cman_finish
cman_setprivdata
cman_getprivdata
cman_start_notification
cman_stop_notification
cman_start_confchg
cman_stop_confchg
cman_get_fd
cman_dispatch

Use the equivalent calls for the subsystem(s) you are using.
cman_set_votes
cman_set_expected_votes
cman_set_dirty
cman_register_quorum_device
cman_unregister_quorum_device
cman_poll_quorum_device
cman_get_quorum_device
cman_get_node_count
cman_get_nodes
cman_get_disallowed_nodes
cman_get_node

Use libvotequorum. Note though that this does not provide you with node names for the cman_get_
calls.
cman_is_quorate

Use libquorum, or libvotequorum. You should only use libvotequorum if you are also using other
features of that module. If all you need to know is the quorum status then you should use libquorum as
that will work regardless of quorum provider.
cman_get_version
cman_set_version
cman_get_cluster
cman_set_debuglog

Use libconfdb or libccs to read/write to the objdb.
cman_kill_node
cman_shutdown
cman_replyto_shutdown
cman_get_node_addrs

Use libcfg
cman_leave_cluster

If you just need to leave the cluster then libcfg will work. If you need to leave the cluster and make the
remaining nodes keep quorum then you also need to tell votequorum.

cman_send_data
cman_start_recv_data
cman_end_recv_data
Use libcpg
cman_get_node_extra
cman_get_subsys_count
cman_is_active
cman_is_listening
cman_barrier_register
cman_barrier_change
cman_barrier_wait
cman_barrier_delete
cman_get_extra_info
cman_get_fenceinfo
cman_node_fenced

These calls are not implemented at all, sorry. You're on your own.

Mais conteúdo relacionado

Semelhante a Whither cman

Automated Deployment using Open Source
Automated Deployment using Open SourceAutomated Deployment using Open Source
Automated Deployment using Open Sourceduskglow
 
OS Lab Manual.pdf
OS Lab Manual.pdfOS Lab Manual.pdf
OS Lab Manual.pdfQucHunh15
 
ABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxBrodyMitchum
 
why c++11?
why c++11?why c++11?
why c++11?idrajeev
 
Trigger maxl from fdmee
Trigger maxl from fdmeeTrigger maxl from fdmee
Trigger maxl from fdmeeBernard Ash
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with pythonHugo Shi
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosRob Gulewich
 
Piattaforma Web Linux completa dai sorgenti
Piattaforma Web Linux completa dai sorgentiPiattaforma Web Linux completa dai sorgenti
Piattaforma Web Linux completa dai sorgentiGiulio Destri
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecturejoaquincasares
 
Connections fornewbies
Connections fornewbiesConnections fornewbies
Connections fornewbiesr4ttl3r
 
Dotnet Basics Presentation
Dotnet Basics PresentationDotnet Basics Presentation
Dotnet Basics PresentationSudhakar Sharma
 
dotCloud (now Docker) Paas under the_hood
dotCloud (now Docker) Paas under the_hood dotCloud (now Docker) Paas under the_hood
dotCloud (now Docker) Paas under the_hood Susan Wu
 
How to write shared libraries!
How to write shared libraries!How to write shared libraries!
How to write shared libraries!Stanley Ho
 

Semelhante a Whither cman (20)

Readme
ReadmeReadme
Readme
 
Automated Deployment using Open Source
Automated Deployment using Open SourceAutomated Deployment using Open Source
Automated Deployment using Open Source
 
OS Lab Manual.pdf
OS Lab Manual.pdfOS Lab Manual.pdf
OS Lab Manual.pdf
 
Rac&asm
Rac&asmRac&asm
Rac&asm
 
Operating system lab manual
Operating system lab manualOperating system lab manual
Operating system lab manual
 
ABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptx
 
why c++11?
why c++11?why c++11?
why c++11?
 
Trigger maxl from fdmee
Trigger maxl from fdmeeTrigger maxl from fdmee
Trigger maxl from fdmee
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Php mysql-tutorial-en
Php mysql-tutorial-enPhp mysql-tutorial-en
Php mysql-tutorial-en
 
7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
 
Piattaforma Web Linux completa dai sorgenti
Piattaforma Web Linux completa dai sorgentiPiattaforma Web Linux completa dai sorgenti
Piattaforma Web Linux completa dai sorgenti
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecture
 
Connections fornewbies
Connections fornewbiesConnections fornewbies
Connections fornewbies
 
Dotnet Basics Presentation
Dotnet Basics PresentationDotnet Basics Presentation
Dotnet Basics Presentation
 
dotCloud (now Docker) Paas under the_hood
dotCloud (now Docker) Paas under the_hood dotCloud (now Docker) Paas under the_hood
dotCloud (now Docker) Paas under the_hood
 
Php simple
Php simplePhp simple
Php simple
 
How to write shared libraries!
How to write shared libraries!How to write shared libraries!
How to write shared libraries!
 

Mais de jorg_marq

Clase9 threads
Clase9 threadsClase9 threads
Clase9 threadsjorg_marq
 
Clase8 innerclasses
Clase8 innerclassesClase8 innerclasses
Clase8 innerclassesjorg_marq
 
Clase7 generics
Clase7 genericsClase7 generics
Clase7 genericsjorg_marq
 
Clase6 collections
Clase6 collectionsClase6 collections
Clase6 collectionsjorg_marq
 
Clase5 controldeflujo
Clase5 controldeflujoClase5 controldeflujo
Clase5 controldeflujojorg_marq
 
Clase4 operadores
Clase4 operadoresClase4 operadores
Clase4 operadoresjorg_marq
 
Clase3 asignaciones
Clase3 asignacionesClase3 asignaciones
Clase3 asignacionesjorg_marq
 
Clase2 ejemplosdeenumpoo
Clase2 ejemplosdeenumpooClase2 ejemplosdeenumpoo
Clase2 ejemplosdeenumpoojorg_marq
 
Clase1 introduccinalcurso
Clase1 introduccinalcursoClase1 introduccinalcurso
Clase1 introduccinalcursojorg_marq
 
Capitulos 8 9-10
Capitulos 8 9-10Capitulos 8 9-10
Capitulos 8 9-10jorg_marq
 
Clase10 stringsio
Clase10 stringsioClase10 stringsio
Clase10 stringsiojorg_marq
 
Ex300 objectives
Ex300   objectivesEx300   objectives
Ex300 objectivesjorg_marq
 
Ex200 objectives
Ex200   objectivesEx200   objectives
Ex200 objectivesjorg_marq
 

Mais de jorg_marq (20)

Clase9 threads
Clase9 threadsClase9 threads
Clase9 threads
 
Clase8 innerclasses
Clase8 innerclassesClase8 innerclasses
Clase8 innerclasses
 
Clase7 generics
Clase7 genericsClase7 generics
Clase7 generics
 
Clase6 collections
Clase6 collectionsClase6 collections
Clase6 collections
 
Clase5 controldeflujo
Clase5 controldeflujoClase5 controldeflujo
Clase5 controldeflujo
 
Clase4 operadores
Clase4 operadoresClase4 operadores
Clase4 operadores
 
Clase3 asignaciones
Clase3 asignacionesClase3 asignaciones
Clase3 asignaciones
 
Clase2 ejemplosdeenumpoo
Clase2 ejemplosdeenumpooClase2 ejemplosdeenumpoo
Clase2 ejemplosdeenumpoo
 
Clase1 introduccinalcurso
Clase1 introduccinalcursoClase1 introduccinalcurso
Clase1 introduccinalcurso
 
Capitulos 8 9-10
Capitulos 8 9-10Capitulos 8 9-10
Capitulos 8 9-10
 
Capitulo 7
Capitulo 7Capitulo 7
Capitulo 7
 
Capitulo 6
Capitulo 6Capitulo 6
Capitulo 6
 
Capitulo 5
Capitulo 5Capitulo 5
Capitulo 5
 
Capitulo 4
Capitulo 4Capitulo 4
Capitulo 4
 
Capitulo 3
Capitulo 3Capitulo 3
Capitulo 3
 
Capitulo 2
Capitulo 2Capitulo 2
Capitulo 2
 
Capitulo 1
Capitulo 1Capitulo 1
Capitulo 1
 
Clase10 stringsio
Clase10 stringsioClase10 stringsio
Clase10 stringsio
 
Ex300 objectives
Ex300   objectivesEx300   objectives
Ex300 objectives
 
Ex200 objectives
Ex200   objectivesEx200   objectives
Ex200 objectives
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Whither cman

  • 1. Whatever happened to cman? Christine “Chrissie” Caulfield, Red Hat ccaulfie@redhat.com Version history 0.1 30th January 2009 First draft 0.2 3rd February 2009 Add a chapter on migrating from libcman 0.2 27th February 2009 Add some commas and fix some typos
  • 2. 1. Introduction CMAN was the mainstay of Red Hat's cluster suite in Red Hat Enterprise Linux 4, 5 and 6. Actually CMAN in RHEL4 was a totally different thing to that found in RHEL5 and 6 but the external interfaces looked very similar. So what was CMAN? CMAN stood for “Cluster MANager” (boring eh?) and provided the low-level infrastructure for the cluster; node communications, votes, quorum etc. Most of the useful bits of the cluster stack were built on top of cman so few people got to see it unless things went horribly wrong. Starting with RHEL5 we dispensed with the custom-written kernel CMAN and based the cluster manager on OpenAIS. For more details about this see my document on the subject: http://people.redhat.com/ccaulfie/docs/aiscman.pdf. OK, a short summary then. CMAN in RHEL 5/6 is really just a quorum module that loads into openais (these days known as corosync). It provides other API services such as messaging but those are really just wrappers around other openais services. At the cluster summit in 2008 it was decided that keeping companies' own clustering modules was a waste of effort and almost always fails to build a community. Even though CMAN (all versions of it) were released as free software, they lived in their own repositories and were seldom modified by the original author (ie me!). So efforts were made to make the software even more open and included in the components of their parent projects. Also part of the collaboration efforts were the pooling of resource and fence agents and several others. But this is just about CMAN OK? Come on now, concentrate. 2. corosync/openais – what happened there? In Red Hat Enterprise Linux 5 cman was a module for openais. In RHEL6 (we're not actually supposed to call it RHEL, but “Red Hat Enterprise Linux” is a lot of typing. And even if I do a search/replace afterwards it'll be unwieldy to read ... sorry marketing, but this is an engineering paper, not a sales one) it's a module for corosync. So, what happened there? Just in case you weren't already confused enough, openais has now been split into two parts. The parts still called openais are the modules for corosync that implement the SAF AIS services and APIs – you know: the ones with the VeryAnnoyinglyCapitalisedNames. Corosync is the communications layer below that, but it also includes other services such as cpg (Closed Process Groups ... very popular amongst service writers than one) and, now, quorum. Most people won't really notice this split, apart from, possibly, the name of the process in 'ps' output. Just to put a little bit of detail on this, the services that are included with corosync are: cfg cpg evs quorum
  • 3. Services included in openais are: Amf Ckpt Clm Evt Lck Msg (see what I mean about the capitalisation ... and that's just the names!) corosync is the name of the daemon, replacing aisexec. I hope that's all clear because it gets more complicated from here on in. 3. Intermediate status – RHEL6 Before getting onto the really new code, I'll talk a little about the RHEL6 code which is sort of a half- way house between RHEL5 and the code that will be in RHEL7. I suppose that makes some sort of sense, numerically if nothing else. The first thing you'll notice about RHEL6 is that ccsd is not there any more. Actually I very much doubt it'll be really the very first thing you notice unless you're incredibly sad or a real clustering nerd, but I do hope you might have spotted it at some stage. I'll talk (well, write – unless you want to invite me, all expenses paid, to give a presentation) about that in the next chapter. Apart from this, you will notice very little difference in the RHEL6 cman. We were under pressure to allow rolling upgrades from RHEL5 (ie no wire protocol changes) and time constraints mean that the code discussed below was not stable enough to be included for the deadline. The only other substantial difference is that cman now is a quorum provider for corosync. I'll also write more about this this later on (you're really agog now aren't you?), but basically it means that the rest of the corosync modules will know about the quorum status and can act on it if they feel the urge, and that applications that are only interested in the quorum status and node list can link straight into libquorum and bypass cman altogether. This might help some people with the upgrade path ... or not ... we'll see. 4. Configuring corosync This chapter is also relevant to RHEL6 by the way, just in case you were going to skip the rest of the document and have an early night. So, as I mentioned above, ccsd is now gone and I can hear the cheers already. The distribution of cluster.conf files is now delegated to ricci and a new command ccs_sync. I will not focus too closely on that bit any more but it brings me to the next main change in the way CMAN (as it still is at this stage) is loaded. Actually you don't have to use ccs_sync to distribute cluster config files, conga, scp, nfs and several other methods work perfectly well. Pick that one that suits you best; “choice is good” as the free-marketeers love to say. The way that corosync (not aisexec remember?) is launched is still via cman_tool as before. But it now sets a few environment variables first. The most important of these tells corosync which configuration modules to use, and there are up to three of those now. The first, xmlconfig, simply loads the contents of /etc/cluster/cluster.conf into the corosync object database, where it keeps all the configuration information. The second, cman-preconfig, interprets that format into a format that is native to corosync by resolving nodenames, generating multicast addresses etc.
  • 4. 4.1. Those modules in full Corosync config modules are different from the normal service modules that corosync uses. They are specified in the environment variable COROSYNC_DEFAULT_CONFIG_IFACE before corosync is started. This variable can contain a colon-separated list of modules that will be called before corosync is started for real. They also manage the reloading of data when the underlying configuration file or device changes. By default cman_tool starts corosync with COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable 4.2. xmlconfig & friends xmlconfig is simply an XML parser with a default filename of /etc/cluster/cluster.conf. It reads the tree of objects it finds in that file and loads it, unmodified, into corosync's object database (objdb). There are a few other config modules that can be used here. The is an ldap one (called, rather obviously I hope, ldapconfig) and even a ccsd one too if you're hopelessly addicted to the past. These modules are quite easy to write and if you have a different back-end database you want to use it's probably quite simple to write a corosync config module for it. Just in case you were wondering, objdb is simply an in-memory hierarchical database where corosync holds configuration data and some state. The point about xmlconfig and it's similarly-named friends (actually, maybe they're relatives ... hmm) is that they do not try to interpret the data, they simply copy what is there into objdb. If it makes no sense to corosync then it will tell you when it tries to start up properly shortly afterwards. You don't need cman-preconfig (see below) to use these if the format of the data matches the hierarchy of corosync.conf(5). cman_tool join allows you to specify a different main config module than xmlconfig on the command- line with cman_tool join -C <module>. It will always add cman-preconfig though. eg the command: # cman_tool join -C ldapconfig will load corosync with the environment variable: COROSYNC_DEFAULT_CONFIG_IFACE=ldapconfig:cmanpreconfig:openaisserviceenable 4.3 cman-preconfig Cman-preconfig is a rather strange beast in some ways. It's main purpose is to re-interpret the loaded configuration file. Let me endeavour to explain. CMAN since RHEL4 has a very specific XML-based format which has not changed significantly in the meantime. It contains details of all the nodes in the cluster, along with their fence details and also rgmanager services too. This format bears no resemblance to corosync.conf. If you load a normal RHEL cluster.conf into corosync using xmlconfig alone, without passing it through cman-preconfig then corosync will not start. It will not be able to find a single entry that it needs to run. cman-preconfig re-interprets this configuration data into a form that corosync expects, and adds some sensible, we hope, defaults for things not specified at all, eg the cluster multicast address. It also has a
  • 5. mode where it can start up cman/corosync with no configuration data at all (see cman_tool -X) which is very useful for time-poor or lazy people who just want to test the cluster a little, or perhaps for commissioning virtualised clusters. There's a whole chapter on this later on. 4.4. openaisserviceenable This module simply tells corosync to also load the openais services as well it its own default set. Normally this is what you will want as many cluster daemons use openais services like checkpointing. You can disable this by adding the -A switch to cman_tool join. 5. CMAN itself Now we get to the nitty-gritty. The CMAN module exists in RHEL5 and RHEL6 and does all the cman-y things that cman does such as quorum and messaging. It also services the libcman library using an IPC mechanism all of its own. It does not use corosync's IPC (library to daemon communications) system for reasons that are unlikely to become clear any more. The new code, as I've said, dispenses with anything called cman at all. The main work of managing quorum is done by a built-in corosync module called votequorum. This code is actually largely lifted from the original cman module, with all the non-quorum bits removed, and converted to use the corosync IPC layer. This module also includes code for managing quorum devices (eg qdiskd) as well as the hated “Disallowed” mode. Though both are inactive in a default corosync system. At the time of writing libcman still exists. It provides the same API as the libcman in previous versions though there might be small semantic differences. It's also not ABI compatible so the soname has been incremented to 4. This libcman uses a combination of calls to quorum, confdb and a custom cman module to provide these services. It is provided largely as a compatibility layer for older applications though the source code to libcman can also provide ideas on how to do cman-y type things with the existing corosync services. All the programs in Red Hat clustering will be changed to use the standard APIs that are provided with corosync. 6. The CMAN module Now hang on there, you just said “a custom cman module”. So cman does still exist? Well, yes and no. There is a module called “cmanbits” which provides services for some libcman APIs. These are: cman_send_data, cman_start_recv_data
  • 6. cman_end_recv_data cman_is_listening Of these, the messaging API can be replaced with the normal openais messaging system. There is no suitable replacement for is_listening. Missing totally from this new cman is the barriers API which nobody used, to my knowledge. If you did and are fuming with me for leaving it out ... sorry. But there are actually better ways of achieving the same effect with existing corosync or openais calls. Some other functions that were in libcman are now provided by the corosync cfg service. These are corosync_cfg_kill_node corosync_cfg_try_shutdown corosync_cfg_replyto_shutdown corosync_cfg_get_node_addrs The are wrapped in the new libcman but are deprecated. 7. corosync quorum 7.1. quorum plugins A brief word about corosync's quorum system might he helpful here. Corosync has a basic notion of quorum built in to the executive. Services cannot be synchronised without it and several IPC calls will be blocked if the cluster is inquorate. By default corosync does not load the quorum module. This is for backwards compatibility so that existing clusters will work. If the quorum module was loaded and no quorum provider was specified then the cluster would never be quorate and never get any work done. Not a popular system I would guess. If the quorum module is absent then everything just works all the time. To load the quorum module, you need to add the following stanza to corosync.conf (or equivalent): service { name: corosync_quorum ver:0 } cman-preconfig does this for you by the way. Corosync will then load the quorum module and allow users to use the basic quorum API as defined in /usr/include/corosync/quorum.h. This simply allows programs to query the quorate state and to receive notifications when it changes. This notification also includes a list of nodes that are part of the quorum. It's useful (I hope) to realise that this plugin is also available in RHEL6 and that cman provides quorum for it. So if you only need to know the quorum state (ie you don't need any of the other libcman gubbins) then calling into libquorum instead will provide you with forward-compatibility.
  • 7. However, as I mentioned above (or on the previous page if you were ungreen enough to print this document out) this will not give you quorum, all it will do is to block a lot of useful activity in corosync. To make it work you need a quorum provider. A quorum provider is a corosync module that tells the corosync quorum plugin when the quorate state changes, based in its own internal algorithm. There are many ways of calculating quorum so there are several modules available, and I hope people will write more to suit their own needs. The quorum provider is loaded by putting its name in corosync.conf as follows: quorum { provider: testquorum } This example, as I sincerely hoped you have guessed, loads the 'testquorum' module. Testquorum is an extremely simple module which I recommend as a template to other quorum provider writers. It simply changes the corosync quorate status based on the quorum.quorate variable in the object database. Using corosync-objctl the quorate state can be switched on and off at will. If you have some form of external quorum system this might actually be useful to you. Another quorum provider shipped with corosync is YKD. This is the YKD system that has been shipped with openais for some time, it has merely been re-designated a quorum plugin now. Sadly the developers did not have time to give it a shiny new logo too so it looks pretty much like it always did. To be quite frank, I know almost nothing about YKD and it scares me a little, so unless a knight in shining armour (ideally looking like David Tennant) comes along to rescue me I'll leave it there. Now, where was I, sorry I got distracted by the David Tennant reference. Oh yes, votequorum is the quorum module based on cman. This is the one I know most about and is probably the most interest to those of you coming from a cman background. So I'll devote an entire subsection it. Favouritism? Pah! 7.2. votequorum Votequorum is the name of the corosync module that replaces most of that cman used to do. Only it does it in a more 'corosync-y' way. It uses the corosync IPC system for communication with users, it provides quorum to the central core of the corosync daemon and provides familiar tracking callbacks. I'll outline the API calls for votequorum here. If you are familiar with libcman then you can go and get a hot chocolate or something while this bit is going on ... see you in chapter 8. cs_error_t votequorum_initialize(votequorum_handle_t *handle, votequorum_callbacks_t *callbacks) cs_error_t votequorum_finalize(votequorum_handle_t handle) These two are the familiar corosync initialise & finalise calls that corosync/openais developers should already be familiar with. libcman users will note that the callbacks are specified here rather than in later API calls. The callbacks are for quorum change or expected votes change – remember votequorum does not have a messaging API.
  • 8. cs_error_t votequorum_fd_get(votequorum_handle_t handle, int *fd); cs_error_t votequorum_dispatch(votequorum_handle_t handle,cs_dispatch_flags_t dispatch_types) Again, these should be familiar sort of call to both corosync and libcman users. You call votequorum_dispatch when the FD returned from votequrum_fd_get becomes active from select or poll. cs_error_t votequorum_getinfo (votequorum_handle_t handle, unsigned int nodeid, struct votequorum_info *info) This call fills in the info structure with information about the quorum subsystem and the node for which you requested information. Members starting node_ are specific to the node in question, the others are for the cluster as a whole. cs_error_t votequorum_setexpected(votequorum_handle_t handle, unsigned int expected_votes) This call changes the expected_votes value for the cluster. This will persist until it is changed again. NOTE: if you are using a configuration file then a forced reload of that file will overwrite a manually set votes value, as will adding a new node to the cluster. cs_error_t votequorum_setvotes (votequorum_handle_t handle, unsigned int nodeid, unsigned int votes) This call changes the number of votes assigned to a existing node in the cluster. This will persist until it is changed again. NOTE: if you are using a configuration file then a forced reload of that file will overwrite a manually set votes value. cs_error_t votequorum_qdisk_register(votequorum_handle_t handle, char *name, unsigned int votes) cs_error_t votequorum_qdisk_unregister(votequorum_handle_t handle) cs_error_t votequorum_qdisk_poll(votequorum_handle_t handle, unsigned int state) cs_error_t votequorum_qdisk_getinfo(votequorum_handle_t handle, struct votequorum_qdisk_info *info) This group of calls manages an optional extra quorum device, often a quorum disk. There can only be one quorum device per cluster and it is up to the quorum device software to ensure that its state is consistent across the cluster or to remove it from quorum. The device can have any number of votes, but those votes cannot be changed without re-registering the device – doing this might cause a temporary loss of quorum. Quorum devices must poll corosync regularly to be regarded as alive, by default if a quorum device has not called votequorum_qdisk_poll() after ten seconds then it will be regarded as dead and removed from quorum. This poll period is customisable, see below.
  • 9. cs_error_t votequorum_setstate(votequorum_handle_t handle) This call sets a non-resettable flag “HasState” in votequorum. This state is available from the getinfo call above and used by some subsystems to prevent cluster merges. It only makes sense to call this if the dreaded “disallowed” flag is set. (see configuration variable below). cs_error_t votequorum_trackstart(votequorum_handle_t handle, uint64_t context, unsigned int flags) cs_error_t votequorum_trackstop(votequorum_handle_t handle) These are the usual tracking start/stop calls for a corosync plugin. Trackstart enables quorum or expected_votes change messages to be sent to the notification callback. To avoid race conditions it is recommended that the flags CS_TRACKSTART | CS_TRACKCHANGES are used so the program will get one immediate callback containing the current quorum state, then further callbacks as nodes join or leave the cluster. cs_error_t votequorum_leaving(votequorum_handle_t handle) This call sets a flag inside all nodes in the cluster telling them that the node will be leaving the cluster soon. When the node actually does leave the cluster, then quorum will be reduced and activity will carry on. This is like 'cman_tool leave remove' in cman. If this is not called before removing a node then the number of active votes will be reduced and the cluster could lose quorum. Not that this call does not remove the node from the cluster, it merely sets the flag saying it will. There is a timeout on this flag of ten seconds. If the node is not shut down within this time it will be cancelled. It is important that this timeout is longer than CFGs shutdown timeout otherwise leaving will be cancelled if daemons don't respond. cs_error_t votequorum_context_get(votequorum_handle_t handle, void **context) cs_error_t votequorum_context_set(votequorum_handle_t handle, void *context) These two calls simply allow the context variable returned to the callbacks to be changed. 7.2.1 configuring votequorum The votequorum module has a small number of variables that you can set in objdb. They are all held under the main quorum{} section. quorumdev_poll This is the time between quorum device polls, in milliseconds. The default is 10000. leaving_timeout If the node is not shutdown <leaving_timeout> milliseconds after votequorum_leave is called then the leave flag will be cancelled. The default is 10000. votes The number of votes this node has. The default is 1.
  • 10. expected_votes The number of votes that this cluster expects. votequorum will use the largest value of all the nodes in the cluster. The default is 1024. ie you will not have quorum unless you change it! two_node Enables a special two-node mode where a cluster with only two nodes and no quorum device can continue running with only one node. This expects you to have hardware fencing configured. two_node mode can be cleared without a cluster restart (ie you can add a third node) but it cannot be set without restarting all nodes. The default is 0. disallowed This is an odd mode that prevents two clusters with state (See setstate() above) from merging. If this happens then nodes will be marked “disallowed” and not included in quorum calculations. Those who have read up on RHEL5 cman will know this mode and fear it. It cannot be changed without a cluster restart. The default is 0. 8 Configuration free clusters Configuring a cluster is hard work. Don't you wish you could just plug in a node and have it “just work”? Well, within several limitations you can. This mode has existed in cman since RHEL4 but is not well documented (oops, maybe that's my fault) and is hardly used to my knowledge. The key command to remember is # cman_tool join -X On a freshly installed system this will fail with the error: corosync died: Could not read cluster configuration OK, so it's not that configuration-free. You don't get something for nothing in this life. However, don't despair. All you need to do is generate a security key for the cluster and try again: # mkdir /etc/cluster # corosync-keygen # mv /etc/ais/authkey /etc/cluster/cman_authkey You now need to copy this file to all the nodes you want to join in this cluster. Once you've done that try cman_tool join -X again and it should magically start up. If you start up the other nodes then they will join the same cluster. cman_tool join -X sets up several defaults to get you going. Maybe the first thing you will notice is that the node numbers are HUGE! They are the IP addresses of the nodes. This isn't a problem, it just looks a little unwieldy, especially on little-endian systems where adjacently numbered nodes might appear to
  • 11. have random and massively different nodeids. eg: # cman_tool nodes Node Id Sts Name 16951488 M amy 33728704 M anna 50505920 M clara 67283136 M fanny 117614784 M nadia 134392000 M sofia Anyway, if you can cope with those this system should get you going with a simple single cluster that has no fencing requirements (ie GFS cannot be used). You can override several things on the cman_tool command-line to add things like sensible nodeids and a different cluster name or cluster ID. The last two will allow you to run multiple configuration- less clusters on a single network. By default all nodes will join a cluster called “RHCluster”. Actually, you can change the nodeids by adding -n<nodeid> to the cman_tool command-line, but that probably doesn't fit in too well with the zero-configuration system I suppose. See the cman_tool man page for a list of options. Quorum is the big problem with this system. There is no way to know how many nodes are needed in such a cluster. You can set expected_votes on the cman_tool command-line, and I urge you to do so, but without this quorum will default to 1. Yes ONE. Your cluster will always be quorate. I know this goes against the principles of a lot of clustering but there are two main reasons why it is done this way. 1. It's easy to get going. A lot of the time, this mode will be used by people just testing clusters and they want to get going and see what all the fuss is about. Once they understand clusters they can get involved in sorting out quorum, fencing and configuration files. 2. A lot of the time these clusters are run as virtual machines in systems like Xen or KVM. In this case there is no danger of a “split-brain” occurring as the systems are either all on the same host computer or on nodes that are already part of a properly configured cluster with real hardware fencing. 9. Migrating code from libcman This chapter is just a list of libcman calls showing you where in corosync to find equivalents. Most of them have very similar names so it shouldn't be too hard I hope. The compatibility libcman provides most of these calls so you might like to look at the source code for that library if you are unsure about how to achieve a particular result. cman_admin_init cman_init cman_finish
  • 12. cman_setprivdata cman_getprivdata cman_start_notification cman_stop_notification cman_start_confchg cman_stop_confchg cman_get_fd cman_dispatch Use the equivalent calls for the subsystem(s) you are using. cman_set_votes cman_set_expected_votes cman_set_dirty cman_register_quorum_device cman_unregister_quorum_device cman_poll_quorum_device cman_get_quorum_device cman_get_node_count cman_get_nodes cman_get_disallowed_nodes cman_get_node Use libvotequorum. Note though that this does not provide you with node names for the cman_get_ calls. cman_is_quorate Use libquorum, or libvotequorum. You should only use libvotequorum if you are also using other features of that module. If all you need to know is the quorum status then you should use libquorum as that will work regardless of quorum provider. cman_get_version cman_set_version cman_get_cluster cman_set_debuglog Use libconfdb or libccs to read/write to the objdb. cman_kill_node cman_shutdown cman_replyto_shutdown cman_get_node_addrs Use libcfg cman_leave_cluster If you just need to leave the cluster then libcfg will work. If you need to leave the cluster and make the remaining nodes keep quorum then you also need to tell votequorum. cman_send_data cman_start_recv_data cman_end_recv_data