Cluster on red hat
Mon 22 January 2007
Tips for managing dual-control nodes If you find yourself in the situation where you need to do hands-on management of the control nodes, here are a few random bits of advice. * This may sound a bit obvious, but the clumanager software depends on Collage’s ability to start itself, as defined in /etc/init.d/collage-core. In the unlikely event that Collage does not start up, look at /var/log/messages and /var/log/cluster to determine what’s causing the start-up problem and fix any issues. * To the extent you need to manage the dual-control nodes, you should be familiar with a few clumanager commands: o clustat - this command shows the status of the failover cluster, the configured services, and the IP address of the active control node. Output from clustat looks like this: sample clustat output o clusvcadm - this command allows you to enable, disable, relocate, and restart services in the failover cluster. Using clusvcadm requires that the failover cluster is operational (that is, the daemons are running and able to access the shared disk) from the node on which the command is invoked. A service can have one of the following states: - Pending – the service is transitioning to running or disabled state. - Running – the service is online and being actively monitored. - Disabled – the service is not online and has been stopped. This state warrants your attention. - Stopped – the service is disabled, but will start when the failover cluster processes are started up. - Failed – the service is not online. Again, look into this one. * If clumanager fails over too many times in succession (failing back-and-forth between the two control nodes), it automatically stops the collage-core service and puts it in a disabled state. The clumanager software then uses chckconfig, setting clumanager to off. After you debug and correct any problems with collage-core, you have to manually restart clumanager. To do so on a dual-control node system, use the following commands on both control nodes: rm/opt/cassatt/etc/pingpong.txt chkconfig clumanager on service clumanager start * You can use the following command to verify that clumanager services are running properly: service clumanager status If the output is similar to the following (PIDs will vary), the clumanager service is running properly: clumembd (pid 5144) is running… cluquorumd (pid 5138) is running… clulockd (pid 5155) is running… clusvcmgrd (pid 5209) is running… * If for some reason you are in a shell and doing something on /cassatt (the mounted file system that houses all of the Collage-specific files, system database, et al.), a failover will pull the rug out from under you. So, if your shell session just disappears, a failover might have occurred. This could leave your editing session or whatever you were doing on /cassatt in an indeterminate state.