Products/HA/DellRedHatHALinuxCluster/Cluster

From DellLinuxWiki

Jump to: navigation, search

Dell|Red Hat HA Linux > Cluster

Contents

Configuring Your Cluster

This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI Tools.

Conga is a configuration and management suite based on a server/agent model. You can access the management server luci using a standard web browser from anywhere on the network. Luci communicates to the client agent ricci on the nodes and installs all required packages, synchronizes the cluster configuration file, and manages the storage cluster. Though there are other possible methods such as system-config-cluster and creating an xml configuration file by hand, it is recommended that you use Conga to configure and manage your cluster.

Setting Up a High-Availability Cluster

The following section provides an overview to installing your cluster using Conga. For more information on using Conga, see the section Configuring Red Hat Cluster With Conga in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Preparing the Cluster Nodes for Conga

Run the following commands on each cluster node install and start ricci:

  • Install ricci:
[root]# yum install ricci
  • Configure ricci to start on boot:
[root]# chkconfig ricci on

Start the ricci service:

[root]# service ricci start

Execute the following commands on the management node to install the Conga server luci:

  • Install luci:
[root]# yum install luci

NOTE: You can configure luci on any node, but it is recommended that you install luci on a dedicated management node. If you do have a dedicated management node, that node will need access to the Cluster channel on RHN or a Satellite server. You can also login to RHN and manually download luci for installation on your management node.

  • Initialize the luci server and assign the admin password:
[root]# luci_admin init
Initializing the luci server
  
Creating the 'admin' user
  
Enter password:
Confirm password:
  
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
  
You must restart the luci server for changes to take effect.
  
Run "service luci restart" to do so

  • Configure luci to start on boot:
[root]# chkconfig luci on
  • Start the luci service:
[root]# service luci start
Starting luci: Generating https SSL certificates...  done   [  OK  ]
 
Point your web browser to https://management.example.com:8084 to access luci

For more information on configuring your cluster with Conga, see the section Configuring Red Hat Cluster With Conga on the Red Hat website at www.redhat.com/docs/manuals/enterprise/ or locally from the Cluster_Administration-en-US package.

Creating Your Cluster Using Conga

Conga automatically installs the software required for clustering, on all cluster nodes. Ensure you have completed the steps in Preparing Cluster Nodes for Conga before proceeding. Verify the steps in Configuring the Firewall were completed for each cluster node, other wise luci will not be able to communicate with ricci.


1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:

https://{management_node_hostname_or_IP_address}:8084

Where {management_node_hostname_or_IP_address} is the hostname or IP address of the management server running luci.
NOTE: If you encounter any errors, see section Troubleshooting Conga.
2. Enter your username and password to securely log in to the luci server.
3. Go to the cluster tab.
4. Click Create a New Cluster.
5. Enter a cluster name of 15 characters or less.
6. Add the fully qualified private hostname or IP address and root password for each cluster node.
NOTE: You may also select Check if node passwords are identical and only enter the password for the first node.
NOTE: The password is sent in an encrypted SSL session, and not saved.
7. Ensure that the option for Enable Shared Storage Support is selected and click Submit.

Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.

NOTE: If an error message such as An error occurred when trying to contact any of the nodes in the cluster appears in luci server webpage, wait a few minutes and refresh your browser.

Configuring Fencing Using Conga

Fencing ensures data integrity on the shared storage file system by removing any problematic nodes from the cluster. This is accomplished by cutting off power to the system to ensure it does not attempt to write to the storage device.

In your Dell|Red Hat HA Cluster system, network power switches provide the most reliable fencing method. Remote access controllers such as DRAC or IPMI should be used as secondary fencing methods. However, if no network power switches are available for primary fencing, a secondary method such as manual can be used, but is not supported. Use a log watching utility to notify you if your primary fencing method is failing.

When using a Dell M1000e modular blade enclosure, the Dell CMC may be used as a primary fencing method instead, as it controls power to each individual blades. In this case each blade's individual iDRAC or IPMI may be used as secondary fencing methods.

For more information, see Testing Fencing Mechanisms and the section Fencing in the Cluster Suite Overview at www.redhat.com/docs/manuals/enterprise.

Configure any network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section Additional Dell Configuration for more information. For details on configuring your network power switches for remote access, see the documentation for that product.

To configure fencing:

  1. Login to luci
  2. Go to Cluster -> Nodes.
  3. Select one of the cluster nodes.
  4. Under the section for Main Fencing Method, click Add.
  5. Configure both primary and secondary fencing.
    NOTE: If you have network power switches, they will be configured as shared devices.

Additional Configuration for DRAC Fencing

Depending on the specific DRAC model your systems are using, one or more of the following sections may be applicable.

Configure iDRAC6 Fencing

PowerEdge Dell servers using iDRAC6 will need specific parameters set in order to function properly. For the latest information on support for this in Conga, see Bug 496749.

  1. Manually SSH to the iDRAC6
  2. Copy the prompt that is displayed after successful login. (e.g. admin1->)
  3. On one node only, edit /etc/cluster/cluster.conf and change each fencedevice line as follows.
    1. Change references of the agent fence_drac to fence_drac5
    2. Add the parameter cmd_prompt="your_iDRAC6_prompt" to each fencedevice line for each node.

where your_iDRAC6_prompt is the one you copied in step 2. (e.g. admin1->)

Example:

Find the line for each fence device. This example shows a two node cluster with DRAC fencing:

       <fencedevices>
               <fencedevice agent="fence_drac" ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
               <fencedevice agent="fence_drac" ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
       </fencedevices>

Change the agent to fence_drac5 and add the option cmd_prompt="admin1->" on each line:

       <fencedevices>
               <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
               <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
       </fencedevices>

NOTE: You must update the cluster configuration as described in Update Cluster Configuration.

Configure DRAC CMC Fencing

PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. At this time Conga does not have an entry for the Dell CMC when configuring fencing. The steps in this section describe how to manually configure fencing for the Dell CMC. See Bug 496724 for details on Conga support.

NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See Bug 466788 for beta code and further discussions on this issue.

NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see Bug 496748.

To configure your nodes for DRAC CMC fencing:

  1. Select Dell Drac as the fencing device in Conga.
  2. Enter the a unique name for the node that will be fenced.
  3. For IP Address enter the DRAC CMC IP address.
  4. Enter the specific blade for Module Name. For example, enter server-1 for blade 1, and server-4 for blade 4.
  5. On one node only, edit /etc/cluster/cluster.conf and change each fencedevice line as follows.
    1. Change references of the agent fence_drac to fence_drac5
    2. Edit the parameter modulename= to read module_name= instead.

Example:

Find the line for each fence device. This example shows a two node cluster with DRAC CMC fencing:

       <fencedevices>
               <fencedevice agent="fence_drac" modulename="server-1" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
               <fencedevice agent="fence_drac" modulename="server-2" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
       </fencedevices>

Change the agent to fence_drac5 and change the option modulename= to module_name= on each line:

       <fencedevices>
               <fencedevice agent="fence_drac5" module_name="server-1" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
               <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
       </fencedevices>

NOTE: You must update the cluster configuration as described in Update Cluster Configuration.

Configure DRAC SSH Fencing

By default any DRAC5/iDRAC/iDRAC6 has SSH enabled, but telnet disabled. To use DRAC5/iDRAC/iDRAC6 fencing over SSH check the Use SSH option while adding a fencing device to a node.
NOTE: This SSH option in Conga is included with luci-0.12.1-7.3.el5_3 and greater.

Update Cluster Configuration

If you make any manual edits to /etc/cluster/cluster.conf, you will need to update all nodes in the cluster with the new configuration. Perform these steps from any one node to update the cluster configuration:

1. Edit /etc/cluster/cluster.conf and change the config_version number at the top of the file:

<cluster alias="my_cluster" config_version="2" name="my_cluster">

Increment it by one:

<cluster alias="my_cluster" config_version="3" name="my_cluster">

2. Save your changes and distribute the cluster configuration file to all nodes:

[root]# ccs_tool update /etc/cluster/cluster.conf
Config file updated from version 2 to 3
  
Update complete.

Setting Up a Storage Cluster

This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster. The Dell|Red Hat HA Cluster is comprised of a Red Hat Cluster Suite high-availability cluster and a Red Hat GFS storage cluster.

Configuring shared storage consists of the following steps:

  1. Configuring a Clustered Logical Volume (CLVM).
  2. Configuring Global File System (GFS).

For more information, see the LVM Administrator's Guide and Global File System on www.redhat.com/docs/manuals/enterprise/. The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools. You can use either method, but only one needs to be completed.

Configuring a Storage Cluster With Conga

Configuring a Storage Cluster with Conga

Configuring a Storage Cluster With CLI Tools

Configuring a Storage Cluster with CLI Tools

Managing the Cluster Infrastructure

It may be necessary to start or stop the cluster infrstructure on one or more nodes at any time. This can be accomplished through the Conga user interface, or individually on each node via the cli.

Managing the Cluster Infrastructure with Conga

The easiest way to start and stop the cluster is using the Conga management interface. This will start and stop all cluster infrastructure daemons on all nodes simulateously.

  1. Login to the web interface at: https://{management_node_hostname_or_IP_address}:8084
    Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
  2. Select cluster
  3. Select the desired action next to the cluster name and select go

Managing the Cluster Infrastructure from the CLI

The proper procedure for starting and stoping the cluster infrastructure from the CLI is outlined below. Note that these commands need to be executed on each node. It is best to run these commands as close to parallel as possible.

  • Staring the cluster infrastructure:
[root]# service cman start
[root]# service clvmd start
[root]# service gfs start
[root]# service rgmanager start

Before proceeding further, make sure all the above mentioned services are started in the order listed above.

  1. Stopping the cluster infrastructure:
[root]# service rgmanager stop
[root]# service clvmd stop
[root]# service gfs stop
[root]# service cman stop

Configuring Cluster Services

This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.

  • A resource is anything that needs to be monitored or managed to ensure availability of an application or deamon (e.g. IP address, file system, database, or web server).
  • A Service is a collection of resources required to provide an application or deamon to the clients (e.g. for a highly available web server, all of these resources are necessary: IP address, file system, and httpd).

Creating Resources

The following steps provide an overview for creating resources:

  1. Click Cluster List.
  2. Select the cluster name and click Resources on the left-pane.
  3. Select Add a Resource.
  4. Select the type of resource and enter in the required data in the appropriate fields and click Submit.
    NOTE: For more information, see Adding Cluster Resources in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Example GFS resource

After clicking on Add a Resource following the steps above

  1. In the drop down list Select a Resource Type select GFS filesystem
  2. In the GFS Resource Configuration enter the details as described below:
    • Name - a name to describe the GFS file system.
    • Mount Point - a mount point on the local file system (e.g. /gfs). This is the directory to which the clustered logical volume will be mounted to, on all the nodes.
    • Device - the GFS device created (e.g. /dev/vg_cluster/lv_cluster). This is the logical volume created in the Setting Up a Storage Cluster section.
    • Filesystem Type - GFS2. Use GFS2, since gfs2 clustered logical volume was created above.
    • options - mounting options. (e.g. rw,debug). Remember to mention debug among the options.
  • The remaining fields may be left blank.

NOTE: Among the mouting options, it is crtical to mention debug because, this option makes the cluster nodes to panic and there by fence, in case there was a problem accessing the shared storage.

Creating a Failover Domain (Optional)

A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide better administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference. For more information see Configuring a Failover Domain in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Creating Services

The following steps provide an overview for creating services:

  1. Click Cluster List.
  2. Select the cluster name and click Services on the left-pane.
  3. Select Add a service.
  4. Choose a service name that describes the function that you are creating the service for.
  5. Leave the default Failover Domain set to None to allow any node in the cluster to run this service. Otherwise, select a previously created failover domain.
  6. Click Add a resource to this service.
  7. Select either new local or existing global resource.
  8. Select the type of resource and enter any required data in the appropriate fields and click submit.
    NOTE: For more information, see Adding a Cluster Service to the Cluster in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Example Configuration of NFS

  • 1. Before configuring an NFS service, ensure that all of your nodes have nfs installed:
[root]# yum install nfs nfs-utils
  • 2. Create resources to be used for the NFS service:
    1. Click Resources
    2. Add an IP Address - IP address to be used as a virtual IP address by the NFS service
    3. Add a GFS File System - follow the steps at Example GFS resource
    4. Add an NFS Export - a name to describe this export (e.g. nfs-export)
    5. Add an NFS Client - the target network client options with the following values:
      • Name - A name to describe this resource (e.g. nfs-cli)
      • Target - The client network that will have access to the exports (e.g. 172.16.0.0/16)
      • Options - NFS client options (e.g. rw,sync,no_root_squash)
  • 3. Create an NFS service:
    1. Click Services and enter a service name (e.g. nfs)
    2. Check Automatically start this service if you want this service to start automatically with the cluster.
    3. Select a recovery policy. Relocate will move the NFS service to another node upon a failure.
    4. Select None for Failover Domain if you want the application to failover to any other node in the cluster. If you want the application to failover to a particular set of nodes, configure a new Failover Domain and select it.
    5. Click Add a resource to this service.
    6. From the drop-down menu, choose Use an existing global resource.
    7. Select the resource IP Address that you created in step 2.
    8. Click Add a child to the newly added IP resource.
    9. From the drop-down menu, choose Use an existing global resource.
    10. Choose GFS File System created in step 2.
    11. Click Add a child to the added GFS File System resource. Choose NFS Export created in step 2.
    12. Click Add a child resource to the newly added NFS Export resource. Choose NFS Client created in step 2.
      This process creates a dependency check among resources. All child resources wait for the parent resource they are associated with, before they start. For example, the above process ensures the NFS service does not try to start if there is no mounted gfs file system shared between the nodes.

Example Configuration of FTP

  • 1. Before configuring an FTP service, ensure that all of your nodes have vsftpd package installed:
[root]# yum install vsftpd
  • 2. Create resources to be used for the FTP service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the FTP service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the FTP service:
    1. Click Services and enter a service name (e.g. ftp).
    2. Select a recovery policy. Relocate will move the FTP service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly added IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-added GFS File System resource. In select a resource type select Script. Enter a name for Name and for the Full path to script file enter /etc/init.d/vsftpd.
      Configuring the Script resource as a child to GFS File System ensures that the file system is mounted before the ftp service attempts to start, as the ftp root will reside on the GFS file system.
  • 4. Additional FTP configuration
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/vsftpd/vsftpd.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/ftp/conf/
[root]# mv /etc/vsftpd/vsftpd.conf /gfs/ftp/conf/vsftpd.conf
[root]# ln -s /gfs/ftp/conf/vsftpd.conf /etc/vsftpd/vsftpd.conf
  • Add a value in the vsftpd.conf file to change the root directory for anonymous logins:
anon_root=/gfs/ftp/pub

Each user that has access would also need their home directory changed to the GFS root if desired. Each node will also need to reference the same users through a central authentication mechanism such as NIS or LDAP, or by creating the same username and passwords on each node. See man vsftpd.conf for more information.

Example Configuration of HTTP

  • 1. Before configuring an HTTP service, ensure that all of your nodes have httpd installed:
[root]# yum install httpd
  • 2. Create resources to be used for the HTTP service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the HTTP service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the HTTP service:
    1. Click Services and enter a service name (e.g. http).
    2. Select a recovery policy. Relocate will move the HTTP service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly add IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-created GFS File System. In select a resource type select script. Enter any name for Name. For the Full path to script file enter /etc/init.d/httpd.
      Configuring the Script resource as a child to GFS ensures that the file system is mounted before the http service attempts to start, as the http root will reside on the GFS file system.
  • 4. Additional HTTP configuration:
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/httpd/conf/httpd.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/web/conf/
[root]# mv /etc/httpd/conf/httpd.conf /gfs/web/conf/httpd.conf
[root]# ln -s /gfs/web/conf/httpd.conf /etc/httpd/conf/httpd.conf
    • Edit httpd.conf and change DocumentRoot to a location on the GFS file system. (e.g. /gfs/web/html/)

Example Configuration of Samba

  • 1. Before configuring a Samba service, ensure that all of your nodes have samba installed:
[root]# yum install samba
  • 2. Create resources to be used for the Samba service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the SAMBA service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the Samba service:
    1. Click Services and enter a service name (e.g. samba).
    2. Select a recovery policy. Relocate will move the SAMBA service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly added IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-created GFS File System. In select a resource type select script. Enter any name for Name. For the Full path to script file enter /etc/init.d/smb.

Configuring the Script resource as a child to GFS ensures that the file system is mounted before the Samba service attempts to start, as the Samba share will reside on the GFS file system.

  • 4. Additional Samba configuration:
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/httpd/conf/smb.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/web/conf/
[root]# mv /etc/samba/smb.conf /gfs/samba/conf/smb.conf
[root]# ln -s /gfs/samba/conf/smb.conf /etc/samba/smb.conf
  • Edit smb.conf and create a share that points to a location on the GFS file system. (e.g. /gfs/samba/share/)

Managing Cluster Services

You can manage cluster services from Conga by:

  • Click the Cluster tab.
  • Click the name of the service you want to manage.
  • Select a task to perform such as Enable this service, Disable this service, Restart this service, or Relocate this service

You can also use the CLI to manage services. Use the following command:

[root]# clusvcadm

For example, to relocate a service from node 1 to node 2, enter the following command:

[root]# clusvcadm -r service_name node2

Use the command clustat to view cluster status:

[root]# clustat

Verification Checklist

Item Verified
Cluster and Cluster Storage  
Red Hat Cluster Suite installed and configured  
Nodes participating in cluster  
Fencing configured on nodes  
Clustered logical volume  
Global File System  
Services created  

Troubleshooting

Networking Issues

Red Hat Clustering nodes use Multicast to communicate. Your switches must be configured to enable multicast addresses and support IGMP. See the Cluster Administration guide in section 2.6. Multicast Addresses on www.redhat.com/docs/manuals/enterprise/ for more information, and the documentation that came with your switches.

Cluster Status

Conga will allow you to monitor your cluster. Alternatively, you may run the command clustat from any node. For example:

[root]# clustat

Other utilities that may help:

[root]# cman_tool nodes
[root]# cman_tool status
[root]# cman_tool services

Logging: Any important messages are logged to /var/log/messages. The following is an example of loss of network connectivity on node1 which causes node2 to fence it.

Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] aru 2d high delivered 2d received flag 1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Did not need to originate any messages in recovery .
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Sending initial ORF token
Nov 28 15:37:56 node2 openais[3450]: [CLM ] CLM CONFIGURATION CHANGE
Nov 28 15:37:56 node2 openais[3450]: [CLM ] New Configuration:
Nov 28 15:37:56 node2 kernel: dlm: closing connection to node 2
Nov 28 15:37:56 node2 fenced[3466]: node1.ha.lab not a cluster member after 0 sec post_fail_delay
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.2)
Nov 28 15:37:56 node2 fenced[3466]: fencing node "node1.example.com"
Nov 28 15:37:56 node2 openais[3450]: [CLM ] Members Left:
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.1)

Troubleshooting Conga

The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.

Running luci on a Cluster Node

If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:

[root]# service luci restart

Debugging problems with luci

luci can be started in debug mode, by changing the settings in /var/lib/luci/etc/zope.conf file. Change the debug-mode value to on and restart luci on the management node. The debug messages will be directed to /var/log/messages files after setting the debug mode.

Issues While Creating a Cluster Initially

If the following error appears when initially installing the cluster:

The following errors occurred:
Unable to add the key for node node1.ha.lab to the trusted keys list.
Unable to add the key for node node2.ha.lab to the trusted keys list.
Unable to connect to node2.ha.lab: Unable to establish an SSL connection to node2.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

Unable to connect to node1.ha.lab: Unable to establish an SSL connection to node1.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

This error occurs when the luci server cannot communicate with the ricci agent. Verify that ricci is installed and started on each node. Ensure that the firewall has been configured correctly, and that Security-Enhanced Linux (SELinux) is not the issue. Check /var/log/audit/audit.log for details on SELinux issues.

Make sure your nodes have the latest SELinux policy with the following command:

[root]# yum update selinux-policy

If you continue to encounter errors, it may be necessary to disable SELinux. This is not recommended, and should only be used as a last resort. Disable SELinux with the command:

[root]# setenforce 0

See Security and SELinux in the Deployment Guide on www.redhat.com/docs/manuals/enterprise/.

Configuration File Issues

Configuration errors manifest themselves as the following error in /var/log/messages:

"AIS Executive exiting (-9)"

Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.

Logical Volume Issues

It may be necessary to restart the clustered logical volume manager with the command:

[root]# service clvmd restart

Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.

In some cases you may need to rescan for logical volumes if you still cannot see the shared volume:

[root]# partprobe -s
[root]# pvscan
[root]# vgscan
[root]# vgchange -ay
[root]# lvscan
[root]# service clvmd restart

Shared Storage Issues

If you are experiencing errors when creating the clustered logical volume, you may need to wipe any previous labels from the virtual disk.
NOTICE: This will destroy all data on the shared storage disk!

Execute the following command from one node:

[root@node1 ~]# pvremove -ff {/dev/sdXY}

Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:

[root@node1 ~]# pvremove -ff /dev/sdb1

If you are using Conga, click reprobe storage, otherwise type:

[root@node1 ~]# partprobe -s /dev/sdb

If you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see LVM Administrator's Guide on the Red Hat website at www.redhat.com/docs/manuals/enterprise.

Testing Fencing Mechanisms

Fence each node to ensure that fencing is working properly.
1. Watch the logs from node 1 with the following command:

[root@node1]# tail -f /var/log/messages


2. Fence the node 2 by executing the following command:

[root@node1]# fence_node {fully qualified hostname or ip address of node2}


3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.


4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:

[root]# clustat -i 2


5. After you successfully fence one node, repeat this process for the second node.

Personal tools
Distributions