Products/HA/DellRedHatHALinuxCluster/Cluster

From DellLinuxWiki

< Products/HA/DellRedHatHALinuxCluster(Difference between revisions)
Jump to: navigation, search
(Creating Global File System with Conga)
 
(13 intermediate revisions not shown)
Line 1: Line 1:
 +
<font size=1>[[../../DellRedHatHALinuxCluster|Dell|Red Hat HA Linux]] > Cluster</font>
 +
=Configuring Your Cluster=
=Configuring Your Cluster=
-
This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI tools.
+
This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI Tools.
-
Conga is a newer configuration and management suite based on a server/agent model. You can access the management server (luci) using a standard web browser from anywhere on the network. Luci communicates through the client agent (ricci) on the nodes and installs all required packages, synchronizes the configuration file, and manages the storage cluster. Though there are many possible methods, including the command-line interface, it is recommended that you use Conga to configure and manage your cluster.
+
Conga is a configuration and management suite based on a server/agent model. You can access the management server <tt>luci</tt> using a standard web browser from anywhere on the network. Luci communicates to the client agent <tt>ricci</tt> on the nodes and installs all required packages, synchronizes the cluster configuration file, and manages the storage cluster. Though there are other possible methods such as <tt>system-config-cluster</tt> and creating an <tt>xml</tt> configuration file by hand, it is recommended that you use Conga to configure and manage your cluster.
==Setting Up a High-Availability Cluster==
==Setting Up a High-Availability Cluster==
-
The following sections provide information about installing and configuring your high-availability Red Hat® cluster using Conga.  For more information on using Conga, see the section ''Configuring Red Hat Cluster With Conga'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
+
The following section provides an overview to installing your cluster using Conga.  For more information on using Conga, see the section ''Configuring Red Hat Cluster With Conga'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
-
===Preparing Cluster Nodes for Conga===
+
===Preparing the Cluster Nodes for Conga===
-
The ''dell-config-cluster'' script installs the correct packages. You can use the following sections to verify that the correct software is installed.
+
Run the following commands on each cluster node install and start <tt>ricci</tt>:
 +
* Install ricci:
 +
[root]# '''yum install ricci'''
 +
* Configure ricci to start on boot:
 +
[root]# '''chkconfig ricci on'''
 +
Start the <tt>ricci</tt> service:
 +
[root]# '''service ricci start'''
-
To prepare the cluster nodes:
+
Execute the following commands on the management node to install the Conga server <tt>luci</tt>:
-
<br>1. Execute the following command on each cluster node to install the Conga
+
* Install luci:
-
client agent ricci:
+
  [root]# '''yum install luci'''
-
* Install ricci.
+
'''NOTE''': You can configure <tt>luci</tt> on any node, but it is recommended that you install luci on a dedicated management node. If you do have a dedicated management node, that node will need access to the ''Cluster'' channel on RHN or  a Satellite server. You can also login to RHN and manually download <tt>luci</tt> for installation on your management node.
-
  [root]# yum install ricci
+
-
* Configure ricci to start on boot
+
-
[root]# chkconfig ricci on
+
-
* Start the ricci agent
+
-
[root]# service ricci start
+
-
<br>2. Execute the following command on the management node to install the Conga server luci.
+
-
<br><br>'''NOTE''': You can configure the luci server on any node, but it is recommended that you install luci on a dedicated management node.  
+
-
<br><br>On the management node:
+
-
* Install the Conga server agent luci:
+
-
[root]# yum install luci
+
-
'''NOTE''': The install command will fail if your node does not have an RHEL AP license.
+
-
<br><br>If the above command fails with the following error:
+
-
No Match for argument: luci
+
-
Verify you have access to the "Cluster" channel with the command:
+
-
[root]# yum grouplist
+
-
The RHEL AP license provides access to the "Cluster" channel. If your dedicated management node is not running RHEL AP, you can manually obtain the luci package from the RHEL installation media or via RHN.
+
* Initialize the luci server and assign the <tt>admin</tt> password:
-
 
+
  [root]# '''luci_admin init'''
-
* Initialize the luci server:
+
Initializing the luci server
-
  [root]# luci_admin init
+
 
-
* Configure the luci server to start on boot:
+
Creating the 'admin' user
-
  [root]# chkconfig luci on
+
 
-
* Start the luci server:
+
Enter password:
-
  [root]# service luci start
+
Confirm password:
 +
 
 +
Please wait...
 +
The admin password has been successfully set.
 +
Generating SSL certificates...
 +
The luci server has been successfully initialized
 +
 
 +
You must restart the luci server for changes to take effect.
 +
 
 +
Run "service luci restart" to do so
 +
 +
* Configure <tt>luci</tt> to start on boot:
 +
  [root]# '''chkconfig luci on'''
 +
* Start the <tt>luci</tt> service:
 +
  [root]# '''service luci start'''
 +
Starting luci: Generating https SSL certificates...  done  [  OK  ]
 +
 
 +
Point your web browser to https://management.example.com:8084 to access luci
-
For more information on configuring your cluster with Conga, see the section ''Configuring Red Hat Cluster With Conga'' on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/] or locally in the Cluster_Administration-en-US package on each node.
+
For more information on configuring your cluster with Conga, see the section ''Configuring Red Hat Cluster With Conga'' on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/] or locally from the <tt>Cluster_Administration-en-US</tt> package.
===Creating Your Cluster Using Conga===
===Creating Your Cluster Using Conga===
-
Conga automatically installs the software required for clustering on all cluster nodes. Ensure you have completed the steps in "Preparing Cluster Nodes for Conga" before proceeding.
+
Conga automatically installs the software required for clustering, on all cluster nodes. Ensure you have completed the steps in [[#Preparing the Cluster Nodes for Conga|Preparing Cluster Nodes for Conga]] before proceeding.  Verify the steps in [[../System#Configuring the Firewall|Configuring the Firewall]] were completed for each cluster node, other wise <tt>luci</tt> will not be able to communicate with <tt>ricci</tt>.
<br>1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:
<br>1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:
  https://{management_node_hostname_or_IP_address}:8084
  https://{management_node_hostname_or_IP_address}:8084
-
 
+
Where {''management_node_hostname_or_IP_address''} is the hostname or IP address of the management server running <tt>luci</tt>.
-
Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
+
<br>'''NOTE''': If you encounter any errors, see section [[#Troubleshooting Conga|Troubleshooting Conga]].
-
<br>'''NOTE''': If you encounter any errors, see section "Troubleshooting Conga".
+
<br>2. Enter your username and password to securely log in to the luci server.
<br>2. Enter your username and password to securely log in to the luci server.
<br>3. Go to the cluster tab.
<br>3. Go to the cluster tab.
Line 59: Line 68:
Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.
Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.
 +
 +
'''NOTE''': If an error message such as '''An error occurred when trying to contact any of the nodes in the cluster''' appears in luci server webpage, wait a few minutes and refresh your browser.
===Configuring Fencing Using Conga===
===Configuring Fencing Using Conga===
-
In your Dell|Red Hat HA Cluster system, the network power switches provide the most reliable fencing method. For more information on configuring your network power switches for remote access, see the documentation that came with your network power switches.
+
Fencing ensures data integrity on the shared storage file system by removing any problematic nodes from the cluster. This is accomplished by cutting off power to the system to ensure it does not attempt to write to the storage device.
-
Configure your network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section "Configuring Remote Access" for more information.
+
In your Dell|Red Hat HA Cluster system, network power switches provide the most reliable fencing method.  Remote access controllers such as DRAC or IPMI should be used as secondary fencing methods.  However, if no network power switches are available for primary fencing, a secondary method such as manual can be used, but is not supported. Use a log watching utility to notify you if your primary fencing method is failing.
-
Remote access controllers such as DRAC or IPMI should be used as secondary methods, unless no network power switches are available. In this case configure a secondary method as manual. Use a log watching utility to notify you if your primary fencing method is failing. For more information, see ''Testing Fencing Mechanisms'' and see the section ''Fencing'' in the ''Cluster Suite Overview'' located on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise www.redhat.com/docs/manuals/enterprise].
+
When using a Dell M1000e modular blade enclosure, the Dell CMC may be used as a primary fencing method instead, as it controls power to each individual blades. In this case each blade's individual iDRAC or IPMI may be used as secondary fencing methods.
 +
 
 +
For more information, see [[#Testing Fencing Mechanisms|Testing Fencing Mechanisms]] and the section ''Fencing'' in the ''Cluster Suite Overview'' at [http://www.redhat.com/docs/manuals/enterprise www.redhat.com/docs/manuals/enterprise].
 +
 +
Configure any network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section [[../System#Additional_Dell_Configuration|Additional Dell Configuration]] for more information.  For details on configuring your network power switches for remote access, see the documentation for that product.
To configure fencing:
To configure fencing:
-
# Login to luci
+
# Login to <tt>luci</tt>
-
# Go to Cluster→ Nodes.
+
# Go to '''Cluster''' -> '''Nodes'''.
# Select one of the cluster nodes.
# Select one of the cluster nodes.
-
# Under the section for ''Main Fencing Method'', click ''Add''.
+
# Under the section for ''Main Fencing Method'', click '''Add'''.
-
# Configure primary and secondary fencing.
+
# Configure both primary and secondary fencing.<br>'''NOTE''': If you have network power switches, they will be configured as shared devices.
-
<br>'''NOTE''': If you have optional network power switches, they will be configured as shared devices.
+
-
==Setting Up a Storage Cluster==
+
====Additional Configuration for DRAC Fencing====
-
This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster. The Dell|Red Hat HA Cluster system comprises of a Red Hat Cluster Suite high-availability cluster and GFS storage cluster.
+
Depending on the specific DRAC model your systems are using, one or more of the following sections may be applicable.
-
<br>'''NOTE''': Before setting up the storage, complete the steps in "Installing the RDAC Multi-Path Proxy Driver".
+
-
Configuring shared storage consists of the following steps:
+
=====Configure iDRAC6 Fencing=====
-
# Configuring a Clustered Logical Volume (CLVM).
+
PowerEdge Dell servers using iDRAC6 will need specific parameters set in order to function properly.  For the latest information on support for this in Conga, see [https://bugzilla.redhat.com/show_bug.cgi?id=496749 Bug 496749].
-
# Configuring Global File System (GFS).
+
-
For more information, see the ''LVM Administrator's Guide'' and ''Global File System'' on [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/] The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools. You can use any one of the methods.
+
# Manually SSH to the iDRAC6
 +
# Copy the prompt that is displayed after successful login.  (e.g. '''admin1->''')
 +
# On one node only, edit <tt>/etc/cluster/cluster.conf</tt> and change each '''fencedevice''' line as follows.
 +
## Change references of the agent '''fence_drac''' to '''fence_drac5'''
 +
## Add the parameter '''cmd_prompt="''your_iDRAC6_prompt''"''' to each '''fencedevice''' line for each node.
 +
where ''your_iDRAC6_prompt'' is the one you copied in step 2. (e.g. '''admin1->''')
-
===Configuring a Storage Cluster With Conga ===
+
Example:
-
To configure a storage cluster with Conga:
+
-
# Log in to Luci.
+
-
# Click the Storage tab.
+
-
# In the System list section, select a node. The hard drives that are visible to the node are displayed. Repeat this step for all nodes. The same hard-drive list must be displayed. If all nodes cannot access the same hard drives, see the storage documentation to ensure all nodes are viewing the same virtual disk.
+
-
====Partitioning Shared Storage with Conga====
+
Find the line for each fence device.  This example shows a two node cluster with DRAC fencing:
-
'''NOTICE''': Executing the commands in the following steps erases all partitions and data from your hard drive
+
        <fencedevices>
-
<br>'''NOTE''': Execute all commands from one cluster node only. All nodes have access to the same virtual disk and the following steps must be performed only one time.
+
                <fencedevice agent="fence_drac" ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
-
# In the system list section, select a node.
+
                <fencedevice agent="fence_drac" ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
-
# Select ''Partition Tables''.
+
        </fencedevices>
-
# Select ''New Partition Table''. For ''Label'' ensure the option '''GPT''' is selected.  On the right pane, check the box for your virtual disk and click create.
+
-
# Luci will reprobe your storage, and should display your Virtual Disk.  If your Virtual Disk is not listed, follow the steps in Troubleshooting under "Shared Storage Issues."
+
-
# At the top, select ''Unused Space''.
+
-
# The entire space is usually consumed, but you may change to fit your needs.  Leave the value for "Content" as '''Empty''' and select '''create'''.
+
-
# On the left page, select ''System List'' then another node and view the partitions it has access to. If the data displayed is inconsistent, click '''reprobe storage'''. It may be necessary to run the command '''partprobe''' on all nodes to ensure a consistent view. See the Troubleshooting section for "Shared Storage Issues" for more information.
+
-
====Creating Clustered Logical Volume with Conga====
+
Change the ''agent'' to '''<tt>fence_drac5</tt>''' and add the option '''<tt>cmd_prompt="admin1->"</tt>''' on each line:
-
# On the left-pane, click Volume Groups and click New Volume Group.
+
        <fencedevices>
-
# In Volume Group Name enter a name that identifies your shared storage volume group. For example, vg_cluster.
+
                <fencedevice agent="'''fence_drac5'''" '''cmd_prompt="admin1->"''' ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
-
# Ensure that the option Clustered is set to True.
+
                <fencedevice agent="'''fence_drac5'''" '''cmd_prompt="admin1->"''' ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
-
# In the right-pane, select the newly-created data partition. In this example, the partition is /dev/sdb1.
+
        </fencedevices>
-
# Click create.  The new volume group is displayed. Verify that the volume group size is correct.
+
-
# Click New Logical Volume. Enter a value in the Logical Volume Name field to identify your shared storage logical volume. For example, lv_cluster.
+
-
# Select a size. All the available space is used by default, however you can create several logical volumes to meet your specific needs.
+
-
====Creating Global File System with Conga====
+
* Lastly, you can also [[#Configure DRAC SSH Fencing|Configure DRAC SSH Fencing]]Otherwise you will need to enable telnet on the iDRAC6.
-
# In the Content field, select GFS1 - Global FS v.1.
+
-
# The GFS entry dialogue appears.
+
-
# Verify that the value in the Cluster Name field is the same as the value listed in the Cluster tab.
+
-
# Enter a unique GFS name. You do not have to specify a mount point or list the mount point in /etc/fstab.
+
-
# In Number of Journals enter the number of cluster nodes plus one.
+
-
# Verify that the clustered value is set to trueDo not change any other values unless you need to customize the cluster as per your requirements.  <br>'''NOTE''': If any errors are displayed, see "Shared Storage Issues".
+
-
# Skip to "Managing the Cluster Infrastructure".
+
-
===Configuring a Storage Cluster With CLI Tools ===
+
'''NOTE''': You must update the cluster configuration as described in [[#Update Cluster Configuration|Update Cluster Configuration]].
-
The following is an example of creating a clustered logical volume for use with GFS and a PowerVault MD3000 storage array using Multi-Path Proxy driver using CLI commands. For instructions on using Conga, see "Configuring a Storage Cluster With Conga".
+
-
====Partitioning Shared Storage with CLI Tools====
+
=====Configure DRAC CMC Fencing=====
-
To partition shared storage:
+
PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. At this time Conga does not have an entry for the Dell CMC when configuring fencing.  The steps in this section describe how to manually configure fencing for the Dell CMC. See [https://bugzilla.redhat.com/show_bug.cgi?id=496724 Bug 496724] for details on Conga support.
-
<br>1. Verify that the devices on each node are consistent by executing the following command:
+
-
  [root]# cat /proc/partitions
+
-
<br>All devices must be accessible from all nodes.
+
-
<br>'''NOTICE''': Executing the command in step 2 erases all partitions and data from your hard drive.
+
-
<br>'''NOTE''': Execute all commands from one cluster node only. All nodes have access to the same virtual disk and the following steps must be performed one time only.
+
-
<br>2. From one node only, set up shared storage partitioning. Create a partition by executing the following command. For Red Hat® Cluster Suite and Global File System, it is typical for one partition to consume the entire disk.
+
-
[root@node1 ~]# parted {your mpp device}
+
-
Where {your mpp device} is the mpp pseudo device that points to the shared virtual disk. If you are unsure which device is your virtual disk, follow the steps in "Determining Virtual Disk". For example using the command:
+
'''NOTE''': At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced.  To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command '''racadm serveraction -m server-# powerup''').  New code available for testing can correct this behavior. See [https://bugzilla.redhat.com/show_bug.cgi?id=466788 Bug 466788] for beta code and further discussions on this issue.
-
  [root@node1 ~]# parted /dev/sdab
+
-
The following screen is displayed:
+
'''NOTE''': Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell BladesFor information on support of the Dell iDRAC, see [https://bugzilla.redhat.com/show_bug.cgi?id=496748 Bug 496748].
-
  GNU Parted 1.8.1
+
-
  Using /dev/sdb
+
-
Welcome to GNU Parted! Type 'help' to view a list of commands.
+
-
<br>3. Change the partition type to GPT.
+
To configure your nodes for DRAC CMC fencing:
-
(parted) mklabel gpt
+
# Select '''Dell Drac''' as the fencing device in Conga.
 +
# Enter the a unique name for the node that will be fenced.
 +
# For '''IP Address''' enter the DRAC CMC IP address.
 +
# Enter the specific blade for '''Module Name'''.  For example, enter '''server-1''' for blade 1, and '''server-4''' for blade 4.
 +
# On one node only, edit <tt>/etc/cluster/cluster.conf</tt> and change each '''fencedevice''' line as follows.
 +
## Change references of the agent '''fence_drac''' to '''fence_drac5'''
 +
## Edit the parameter '''modulename=''' to read '''module_name=''' instead.
-
View the disk information with the print command:
+
Example:
-
(parted) print
+
-
Model: DELL MD Virtual Disk (scsi)
+
-
Disk /dev/sdb: 1000GB
+
-
Disk geometry for /dev/sdb 0kB - 1000GB
+
-
Disk label type: gpt
+
-
<br>4. Create a single partition that uses the entire disk:
+
Find the line for each fence device. This example shows a two node cluster with DRAC CMC fencing:
-
(parted) mkpart primary 0 -1
+
        <fencedevices>
-
<br>'''NOTE''': The value "-1" is use the entire disk.
+
                <fencedevice agent="fence_drac" modulename="server-1" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
 +
                <fencedevice agent="fence_drac" modulename="server-2" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
 +
        </fencedevices>
-
<br>5. Set the lvm flag for the data partition. Use the set command:
+
Change the ''agent'' to '''<tt>fence_drac5</tt>''' and change the option '''<tt>modulename=</tt>''' to '''<tt>module_name=</tt>''' on each line:
-
(parted) set 1 lvm on
+
        <fencedevices>
 +
                <fencedevice agent="'''fence_drac5'''" '''module_name="server-1"''' ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
 +
                <fencedevice agent="'''fence_drac5'''" '''module_name="server-2"''' ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
 +
        </fencedevices>
-
<br>6. Verify that the partitions are set up properly:
+
* Lastly, you can also [[#Configure DRAC SSH Fencing|Configure DRAC SSH Fencing]]. Otherwise you will need to enable telnet on the DRAC CMC.
-
(parted) print
+
-
Disk geometry for /dev/sdb: 0kB - 1000GB
+
-
For Model: DELL MD Virtual Disk (scsi)
+
-
Disk /dev/sdb: 1000GB
+
-
Disk label type: gpt
+
-
Number Start End Size FileSystemName Flags
+
-
1 17.4kB 1000MB 1000MB primary lvm
+
-
<br>7. Quit parted. The changes are saved immediately.
+
'''NOTE''': You must update the cluster configuration as described in [[#Update Cluster Configuration|Update Cluster Configuration]].
-
(parted) quit
+
-
<br>8. Perform the following step on all nodes. Display devices and verify the partitions are visible to the kernel.
+
=====Configure DRAC SSH Fencing=====
-
cat /proc/partitions
+
By default any DRAC5/iDRAC/iDRAC6 has SSH enabled, but telnet disabled.
 +
To use DRAC5/iDRAC/iDRAC6 fencing over SSH check the '''Use SSH''' option while adding a fencing device to a node.
 +
<br>'''NOTE''': This SSH option in Conga is included with <tt>luci-0.12.1-7.3.el5_3</tt> and greater.
-
<br>9. If the partitions are not visible, run the following command to update the kernel:
+
====Update Cluster Configuration====
-
[root]# partprobe {your mpp device}
+
If you make any manual edits to <tt>/etc/cluster/cluster.conf</tt>, you will need to update all nodes in the cluster with the new configuration.  Perform these steps from any one node to update the cluster configuration:
-
Where {your mpp device} is the newly-created partition. For example:
+
1. Edit <tt>/etc/cluster/cluster.conf</tt> and change the ''config_version'' number at the top of the file:
-
  [root]# partprobe /dev/sdb
+
  <cluster alias="my_cluster" '''config_version="2"''' name="my_cluster">
-
To view partitions again:
+
Increment it by one:
-
  [root]# cat /proc/partitions
+
  <cluster alias="my_cluster" '''config_version="3"''' name="my_cluster">
-
<br>'''NOTE''': If the newly-created partition is not displayed after executing the partprobe command, reboot the system.
+
-
====Creating Clustered Logical Volume with CLI Tools====
+
2. Save your changes and distribute the cluster configuration file to all nodes:
-
'''NOTE''': Ensure that Red Hat Cluster Suite is functioning before you create a cluster logical volume.
+
[root]# '''ccs_tool update /etc/cluster/cluster.conf'''
 +
Config file updated from version 2 to 3
 +
 
 +
Update complete.
-
To create a logical volume, execute the following commands in series:
+
==Setting Up a Storage Cluster==
-
<br>1. Create a physical volume (pv):
+
This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes.  Verify that the high-availability Red Hat cluster is running before setting up the storage cluster.  The Dell|Red Hat HA Cluster is comprised of a Red Hat Cluster Suite high-availability cluster and a Red Hat GFS storage cluster.
-
[root@node1 ~]# pvcreate {your mpp device partition}
+
-
Where {your mpp device partition} is the newly-created partition. For example:
+
-
  [root@node1 ~]# pvcreate /dev/sdb1
+
-
The message below is displayed:
+
-
Physical volume "/dev/sdb1" successfully created.
+
-
<br>2. Create a volume group (vg):
+
Configuring shared storage consists of the following steps:
-
[root@node1 ~]# vgcreate {volume group name} {your mpp device partition}
+
# Configuring a Clustered Logical Volume (CLVM).
-
 
+
# Configuring Global File System (GFS).
-
Where {volume group name} is a name of your choice to identify this volume group, and {your mpp device partition} is the data partition you created. For example:
+
-
[root@node1 ~]# vgcreate vg_cluster /dev/sdb1
+
-
 
+
-
The following message is displayed:
+
-
Volume group vg_cluster successfully created
+
-
 
+
-
<br>3. Create a logical volume (lv):
+
-
[root@node1 ~]# lvcreate {size of logical volume} -n {logical volume name} {volume group name}
+
-
 
+
-
Where {size of logical volume} is one of the following options:
+
-
* -l 100%FREE (consumes all available space)
+
-
* -l {physical extents} (see man lvcreate for more info)
+
-
* -L {size} (see man lvcreate for more info)
+
-
 
+
-
You can create several logical volumes based on your specific applications needs, or a single one that consumes all available space.
+
-
 
+
-
Lastly where {logical volume name} is a name of your choice. For example:
+
-
[root@node1 ~]# lvcreate -l 100%FREE -n lv_cluster vg_cluster
+
-
 
+
-
The message below is displayed:
+
-
Logical volume "lv_cluster" created
+
-
 
+
-
<br>4. Verify the lv was created successfully:
+
-
[root@node1 ~]# lvdisplay /dev/vg_cluster/lv_cluster
+
-
 
+
-
The following screen is displayed:
+
-
--- Logical volume ---
+
-
LV Name /dev/vg_cluster/lv_cluster
+
-
VG Name vg_cluster
+
-
LV UUID zcdt4M-VI2U-l9bA-h2mJ-dmfofFBE-
+
-
ub3RYk
+
-
LV Write Access read/write
+
-
LV Status available
+
-
# open 0
+
-
LV Size 1020.00 MB
+
-
Current LE 255
+
-
Segments 1
+
-
Allocation inherit
+
-
Read ahead sectors 0
+
-
Block device 253:2
+
-
'''NOTE''': It may be necessary to restart the clustered lvm daemon on all nodes at this point
+
-
[root]# service clvmd restart
+
-
 
+
-
<br>5. On all nodes run the following command:
+
-
[root]# lvmconf --enable-cluster
+
-
 
+
-
<br>6. Start the clustered lvm daemon (clvmd) by typing:
+
-
[root]# service clvmd start
+
-
 
+
-
Ensure that clvmd starts on boot:
+
-
[root]# chkconfig clvmd on
+
-
 
+
-
====Creating Global File System with CLI Tools====
+
-
NOTE: For more information on configuring GFS, see the ''Global File System Guide'' on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/] or the package Global_File_System-en-US locally on each node. For instructions on using Conga to configure GFS, see the section "Configuring a Storage Cluster With Conga".
+
-
 
+
-
Before you create a GFS:
+
-
* Verify that Red Hat Cluster Suite is functioning properly and ensure that a clustered logical volume is created.
+
-
* Gather the following information:
+
-
** {cluster name}—is listed in Conga in the Cluster tab, and is also the /etc/cluster/cluster.conf file in each node.
+
-
** {gfs filesystem name}—A unique identifier among GFS file systems.
+
-
** {number of journals}—One journal per node is required. However, at least one additional journal is recommended.
+
-
** {name of clustered logical volume}—The cluster logical volume that you created. For example ''/dev/vg_cluster/clustered/lv''.
+
-
 
+
-
To create a file system:
+
-
<br>1. From one system only execute the following command:
+
-
[root@node1 ~]# gfs_mkfs -t {cluster name}:{gfs filesystem name} -p lock_dlm -j {number of journales} {name of clustered logical volume}
+
-
 
+
-
For example:
+
-
[root@node1 ~]# gfs_mkfs -t my_cluster:gfs -p lock_dlm -j 3 /dev/vg_cluster/lv_cluster
+
-
The following warning message is displayed:
+
For more information, see the ''LVM Administrator's Guide'' and ''Global File System'' on [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].  The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools.  You can use either method, but only one needs to be completed.
-
This will destroy any data on
+
-
/dev/vg_cluster/lv_cluster.
+
-
Are you sure you want to proceed? [y/n]
+
-
<br>2. Enter '''<y>'''.
+
===Configuring a Storage Cluster With Conga===
 +
[[../Cluster/Storage Cluster with Conga|Configuring a Storage Cluster with Conga]]
-
<br>3. A confirmation message is displayed indicating the creation of a file system.
+
===Configuring a Storage Cluster With CLI Tools===
-
Device:
+
[[../Cluster/Storage Cluster with CLI|Configuring a Storage Cluster with CLI Tools]]
-
/dev/vg_cluster/lv_cluster
+
-
Blocksize: 4096
+
-
Filesystem Size: 162772
+
-
Journals: 3
+
-
Resource Groups: 8
+
-
Locking Protocol: lock_dlm
+
-
Lock Table: my_cluster:gfs
+
-
Syncing...
+
-
All Done
+
==Managing the Cluster Infrastructure==
==Managing the Cluster Infrastructure==
Line 290: Line 193:
===Managing the Cluster Infrastructure with Conga===
===Managing the Cluster Infrastructure with Conga===
-
The easiest way to start and stop the cluster is using the Luci management interface.  This will start and stop all cluster infrastructure daemons on all nodes simulateously.
+
The easiest way to start and stop the cluster is using the Conga management interface.  This will start and stop all cluster infrastructure daemons on all nodes simulateously.
# Login to the web interface at: https://{management_node_hostname_or_IP_address}:8084<br>Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
# Login to the web interface at: https://{management_node_hostname_or_IP_address}:8084<br>Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
# Select ''cluster''
# Select ''cluster''
Line 296: Line 199:
===Managing the Cluster Infrastructure from the CLI===
===Managing the Cluster Infrastructure from the CLI===
-
The proper procedure for starting and stoping the cluster infrastructure from the cli is outlined below.  Note that these commands need to be executed on each node.  It is best to run these commands as close to parallel as possible.
+
The proper procedure for starting and stoping the cluster infrastructure from the CLI is outlined below.  Note that these commands need to be executed on each node.  It is best to run these commands as close to parallel as possible.
-
Staring the cluster infrastructure:
+
* Staring the cluster infrastructure:
-
  [root]# service cman start
+
  [root]# '''service cman start'''
-
  [root]# service clvmd start
+
  [root]# '''service clvmd start'''
-
  [root]# service gfs start
+
  [root]# '''service gfs start'''
-
  [root]# service rgmanager start
+
  [root]# '''service rgmanager start'''
-
Stopping the cluster infrastructure:
+
Before proceeding further, make sure all the above mentioned services are started in the order listed above.
-
[root]# service rgmanager stop
+
-
[root]# service clvmd stop
+
-
[root]# service gfs stop
+
-
[root]# service cman stop
+
-
=Configuring a Cluster Service=
+
# Stopping the cluster infrastructure:
 +
[root]# '''service rgmanager stop'''
 +
[root]# '''service clvmd stop'''
 +
[root]# '''service gfs stop'''
 +
[root]# '''service cman stop'''
 +
 
 +
=Configuring Cluster Services=
This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.
This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.
Line 322: Line 227:
# Select '''Add a Resource'''.
# Select '''Add a Resource'''.
# Select the type of resource and enter in the required data in the appropriate fields and click '''Submit'''.<br>'''NOTE:''' For more information, see ''Adding Cluster Resources'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
# Select the type of resource and enter in the required data in the appropriate fields and click '''Submit'''.<br>'''NOTE:''' For more information, see ''Adding Cluster Resources'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
 +
 +
===Example GFS resource===
 +
After clicking on '''Add a Resource''' following the steps above
 +
# In the drop down list  '''Select a Resource Type''' select '''GFS filesystem'''
 +
# In the '''GFS Resource Configuration''' enter the details as described below:
 +
** Name - a name to describe the GFS file system.
 +
** Mount Point - a mount point on the local file system (e.g. '''/gfs'''). This is the directory to which the clustered logical volume will be mounted to, on all the nodes.
 +
** Device - the GFS device created (e.g. '''/dev/vg_cluster/lv_cluster'''). This is the logical volume created in the [[#Setting_Up_a_Storage_Cluster | Setting Up a Storage Cluster]] section.
 +
** Filesystem Type - '''GFS2'''. Use GFS2, since gfs2 clustered logical volume was created above.
 +
** options - mounting options. (e.g. '''rw,debug'''). Remember to mention '''debug''' among the options.
 +
* The remaining fields may be left blank.
 +
 +
'''NOTE:''' Among the mouting options, it is crtical to mention ''debug'' because, this option makes the cluster nodes to ''panic'' and there by fence, in case there was a problem accessing the shared storage.
==Creating a Failover Domain (Optional)==
==Creating a Failover Domain (Optional)==
-
A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide greater administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference.  For more information see ''Configuring a Failover Domain'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
+
A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide better administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference.  For more information see ''Configuring a Failover Domain'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
==Creating Services==
==Creating Services==
Line 337: Line 255:
# Select the type of resource and enter any required data in the appropriate fields and click '''submit'''.<br>'''NOTE:''' For more information, see ''Adding a Cluster Service to the Cluster'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
# Select the type of resource and enter any required data in the appropriate fields and click '''submit'''.<br>'''NOTE:''' For more information, see ''Adding a Cluster Service to the Cluster'' in the ''Cluster Administration'' guide at [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
-
==Example Configuration of NFS==
+
===Example Configuration of NFS===
-
Use Conga to configure an NFS service by first creating resources:
+
* 1. Before configuring an NFS service, ensure that all of your nodes have <tt>nfs</tt> installed:
-
<br>1. Click '''Resources''' and add the following resources:
+
[root]# '''yum install nfs nfs-utils'''
-
* IP Address—IP address to be used as a virtual IP address by the NFS service
+
-
* GFS File System—the clustered GFS logical volume
+
-
** Name - a name to describe the GFS file system
+
-
** Mount Point - the GFS device created (e.g. '''/dev/vg_cluster/lv_cluster''')
+
-
** The remaining fields may be left blank
+
-
* NFS Export— a name to describe this export
+
-
* NFS Client—the target network client options
+
-
** Name - A name to describe this resource
+
-
** Target - The client network that will have access (e.g. 172.16.0.0/16)
+
-
** Options - NFS client options (e.g. rw,sync,no_root_squash)
+
-
2. Click '''Services''' and enter a service name.
+
-
<br>3. Select a recovery policy.
+
-
<br>4. Click '''Add''' a resource to this service.
+
-
<br>5. From the drop-down menu, choose '''Use an existing global resource'''.
+
-
<br>6. Select the resource IP Address that you created in step 1.
+
-
<br>7. Select '''Add a resource to this service'''
+
-
<br>8. From the drop-down menu, choose '''Use an existing global resource'''.
+
-
<br>9. Choose '''GFS File System''' created in step 1.
+
-
<br>10. Click '''Add a child''' to the newly-created GFS File System. Choose '''NFS Export''' created in step 1.
+
-
<br>11. Click '''Add a child''' resource to the newly-created NFS Export. Choose '''NFS Client''' created in step 1.
+
-
This process creates a dependency check among resources. Before attempting to start, all child resources wait for the parent resource they are associated with. For example, this ensures the NFS service does not try to start if there is no mounted file system, and so on.
+
* 2. Create resources to be used for the NFS service:
 +
*# Click '''Resources'''
 +
*# Add an '''IP Address''' - IP address to be used as a virtual IP address by the NFS service
 +
*# Add a '''GFS File System''' - follow the steps at [[#Example_GFS_resource|Example GFS resource]]
 +
*# Add an '''NFS Export''' - a name to describe this export (e.g. nfs-export)
 +
*# Add an '''NFS Client''' - the target network client options with the following values:
 +
*#* Name - A name to describe this resource (e.g. nfs-cli)
 +
*#* Target - The client network that will have access to the exports (e.g. 172.16.0.0/16)
 +
*#* Options - NFS client options (e.g. rw,sync,no_root_squash)
 +
 
 +
* 3. Create an NFS service:
 +
*# Click '''Services''' and enter a service name (e.g. nfs)
 +
*# Check '''Automatically start this service''' if you want this service to start automatically with the cluster.
 +
*# Select a recovery policy. '''Relocate''' will move the NFS service to another node upon a failure.
 +
*# Select '''None''' for '''Failover Domain''' if you want the application to failover to any other node in the cluster. If you want the application to failover to a particular set of nodes, configure a new Failover Domain and select it.
 +
*# Click '''Add''' a resource to this service.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Select the resource IP Address that you created in step 2.
 +
*# Click '''Add a child''' to the newly added IP resource.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Choose '''GFS File System''' created in step 2.
 +
*# Click '''Add a child''' to the added '''GFS File System''' resource. Choose '''NFS Export''' created in step 2.
 +
*# Click '''Add a child''' resource to the newly added '''NFS Export''' resource. Choose '''NFS Client''' created in step 2.<br>This process creates a dependency check among resources. All child resources wait for the parent resource they are associated with, before they start. For example, the above process ensures the NFS service does not try to start if there is no mounted gfs file system shared between the nodes.
 +
 
 +
===Example Configuration of FTP===
 +
* 1. Before configuring an FTP service, ensure that all of your nodes have <tt>vsftpd</tt> package installed:
 +
[root]# '''yum install vsftpd'''
 +
 
 +
* 2. Create resources to be used for the FTP service:
 +
*# Click '''Resources'''
 +
*# Add an '''IP Address''' - IP address to be used by the clients to access the FTP service
 +
*# Add a '''GFS File System''' - follow the steps at [[#Example_GFS_resource|Example GFS resource]]
 +
 
 +
* 3. Create the FTP service:
 +
*# Click '''Services''' and enter a service name (e.g. ftp).
 +
*# Select a recovery policy. '''Relocate''' will move the FTP service to another node upon a failure.
 +
*# Click '''Add a resource to this service'''.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Select the resource ''IP Address'' that you created in step 2.
 +
*# Click '''Add a child''' to the newly added '''IP Address''' resource.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Choose '''GFS File System''' created in step 2.
 +
*# Click '''Add a child''' to the newly-added '''GFS File System''' resource.  In '''select a resource type''' select '''Script'''. Enter a name for '''Name''' and for the '''Full path to script file''' enter '''/etc/init.d/vsftpd'''.<br>Configuring the  ''Script'' resource as a child to ''GFS File System'' ensures that the file system is mounted before the ftp service attempts to start, as the ftp root will reside on the GFS file system.
 +
 
 +
* 4. Additional FTP configuration
 +
** Create a symbolic link on all your nodes for their configuration file.  This ensures that any configuration changes only need to be made in a single location for all nodes.  Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file.  This is a desired side effect.<br>On one node, manually mount the GFS file system, and move the file <tt>/etc/vsftpd/vsftpd.conf</tt> to a directory on the GFS file system.  For example:
 +
[root]# '''mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs'''
 +
[root]# '''mkdir /gfs/ftp/conf/'''
 +
[root]# '''mv /etc/vsftpd/vsftpd.conf /gfs/ftp/conf/vsftpd.conf'''
 +
[root]# '''ln -s /gfs/ftp/conf/vsftpd.conf /etc/vsftpd/vsftpd.conf'''
 +
* Add a value in the <tt>vsftpd.conf</tt> file to change the root directory for anonymous logins:
 +
anon_root=/gfs/ftp/pub
 +
 
 +
Each user that has access would also need their home directory changed to the GFS root if desired.  Each node will also need to reference the same users through a central authentication mechanism such as NIS or LDAP, or by creating the same username and passwords on each node.  See '''<tt>man vsftpd.conf</tt>''' for more information.
 +
 
 +
===Example Configuration of HTTP===
 +
* 1. Before configuring an HTTP service, ensure that all of your nodes have <tt>httpd</tt> installed:
 +
[root]# '''yum install httpd'''
 +
 
 +
* 2. Create resources to be used for the HTTP service:
 +
*# Click '''Resources'''
 +
*# Add an '''IP Address''' - IP address to be used by the clients to access the HTTP service
 +
*# Add a '''GFS File System''' - follow the steps at [[#Example_GFS_resource|Example GFS resource]]
 +
 
 +
* 3. Create the HTTP service:
 +
*# Click '''Services''' and enter a service name (e.g. http).
 +
*# Select a recovery policy. '''Relocate''' will move the HTTP service to another node upon a failure.
 +
*# Click '''Add a resource to this service'''.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Select the resource IP Address that you created in step 2.
 +
*# Click '''Add a child''' to the newly add IP Address resource.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Choose '''GFS File System''' created in step 2.
 +
*# Click '''Add a child''' to the newly-created GFS File System. In '''select a resource type''' select '''script'''. Enter any name for '''Name'''. For the '''Full path to script file''' enter '''/etc/init.d/httpd'''.<br>Configuring the  ''Script'' resource as a child to GFS ensures that the file system is mounted before the http service attempts to start, as the http root will reside on the GFS file system.
 +
 
 +
* 4. Additional HTTP configuration:
 +
** Create a symbolic link on all your nodes for their configuration file.  This ensures that any configuration changes only need to be made in a single location for all nodes.  Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file.  This is a desired side effect.<br>On one node, manually mount the GFS file system, and move the file <tt>/etc/httpd/conf/httpd.conf</tt> to a directory on the GFS file system.  For example:
 +
[root]# '''mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs'''
 +
[root]# '''mkdir /gfs/web/conf/'''
 +
[root]# '''mv /etc/httpd/conf/httpd.conf /gfs/web/conf/httpd.conf'''
 +
[root]# '''ln -s /gfs/web/conf/httpd.conf /etc/httpd/conf/httpd.conf'''
 +
** Edit <tt>httpd.conf</tt> and change '''<tt>DocumentRoot</tt>''' to a location on the GFS file system.  (e.g. '''<tt>/gfs/web/html/</tt>''')
 +
 
 +
===Example Configuration of Samba===
 +
* 1. Before configuring a Samba service, ensure that all of your nodes have <tt>samba</tt> installed:
 +
[root]# '''yum install samba'''
 +
 
 +
* 2. Create resources to be used for the Samba service:
 +
*# Click '''Resources'''
 +
*# Add an '''IP Address''' - IP address to be used by the clients to access the SAMBA service
 +
*# Add a '''GFS File System''' - follow the steps at [[#Example_GFS_resource|Example GFS resource]]
 +
 
 +
* 3. Create the Samba service:
 +
*# Click '''Services''' and enter a service name (e.g. samba).
 +
*# Select a recovery policy. '''Relocate''' will move the SAMBA service to another node upon a failure.
 +
*# Click '''Add a resource to this service'''.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Select the resource IP Address that you created in step 2.
 +
*# Click '''Add a child''' to the newly added '''IP Address''' resource.
 +
*# From the drop-down menu, choose '''Use an existing global resource'''.
 +
*# Choose '''GFS File System''' created in step 2.
 +
*# Click '''Add a child''' to the newly-created GFS File System. In '''select a resource type''' select '''script'''. Enter any name for '''Name'''. For the '''Full path to script file''' enter '''/etc/init.d/smb'''.
 +
 
 +
Configuring the  ''Script'' resource as a child to GFS ensures that the file system is mounted before the Samba service attempts to start, as the Samba share will reside on the GFS file system.
 +
 
 +
* 4. Additional Samba configuration:
 +
** Create a symbolic link on all your nodes for their configuration file.  This ensures that any configuration changes only need to be made in a single location for all nodes.  Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file.  This is a desired side effect.<br>On one node, manually mount the GFS file system, and move the file <tt>/etc/httpd/conf/smb.conf</tt> to a directory on the GFS file system.  For example:
 +
[root]# '''mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs'''
 +
[root]# '''mkdir /gfs/web/conf/'''
 +
[root]# '''mv /etc/samba/smb.conf /gfs/samba/conf/smb.conf'''
 +
[root]# '''ln -s /gfs/samba/conf/smb.conf /etc/samba/smb.conf'''
 +
* Edit <tt>smb.conf</tt> and create a share that points to a location on the GFS file system.  (e.g. '''<tt>/gfs/samba/share/</tt>''')
=Managing Cluster Services=
=Managing Cluster Services=
Line 370: Line 379:
You can also use the CLI to manage services. Use the following command:
You can also use the CLI to manage services. Use the following command:
-
  [root]# clusvcadm
+
  [root]# '''clusvcadm'''
For example, to relocate a service from node 1 to node 2, enter the following
For example, to relocate a service from node 1 to node 2, enter the following
command:
command:
-
  [root]# clusvcadm -r service_name node2
+
  [root]# '''clusvcadm -r service_name node2'''
Use the command clustat to view cluster status:
Use the command clustat to view cluster status:
-
  [root]# clustat
+
  [root]# '''clustat'''
=Verification Checklist=
=Verification Checklist=
Line 386: Line 395:
|-
|-
|Cluster and Cluster Storage
|Cluster and Cluster Storage
-
|
+
|&nbsp;
|-
|-
|Red Hat Cluster Suite installed and configured
|Red Hat Cluster Suite installed and configured
-
|
+
|&nbsp;
|-
|-
|Nodes participating in cluster
|Nodes participating in cluster
-
|
+
|&nbsp;
|-
|-
|Fencing configured on nodes
|Fencing configured on nodes
-
|
+
|&nbsp;
|-
|-
|Clustered logical volume
|Clustered logical volume
-
|
+
|&nbsp;
|-
|-
|Global File System
|Global File System
-
|
+
|&nbsp;
|-
|-
|Services created
|Services created
-
|
+
|&nbsp;
|}
|}
Line 412: Line 421:
==Cluster Status==
==Cluster Status==
-
Conga will allow you to monitor your cluster. Alternatively, you may run the command clustat from any node. For example:
+
Conga will allow you to monitor your cluster. Alternatively, you may run the command <tt>clustat</tt> from any node. For example:
-
  [root]# clustat
+
  [root]# '''clustat'''
-
Logging
+
Other utilities that may help:
 +
[root]# '''cman_tool nodes'''
 +
[root]# '''cman_tool status'''
 +
[root]# '''cman_tool services'''
 +
 
 +
Logging:
Any important messages are logged to /var/log/messages.  The following is an example of loss of network connectivity on node1 which causes node2 to fence it.
Any important messages are logged to /var/log/messages.  The following is an example of loss of network connectivity on node1 which causes node2 to fence it.
  Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1
  Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1
Line 429: Line 443:
  Nov 28 15:37:56 node2 openais[3450]: [CLM ] Members Left:
  Nov 28 15:37:56 node2 openais[3450]: [CLM ] Members Left:
  Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.1)
  Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.1)
-
+
 
==Troubleshooting Conga==
==Troubleshooting Conga==
The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.
The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.
Line 435: Line 449:
===Running luci on a Cluster Node===
===Running luci on a Cluster Node===
If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:
If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:
-
  [root]# service luci restart
+
  [root]# '''service luci restart'''
 +
===Debugging problems with luci===
 +
luci can be started in debug mode, by changing the settings in ''/var/lib/luci/etc/zope.conf'' file. Change the '''debug-mode''' value to '''on''' and restart luci on the management node. The debug messages will be directed to /var/log/messages files after setting the debug mode.
===Issues While Creating a Cluster Initially===
===Issues While Creating a Cluster Initially===
Line 452: Line 468:
Make sure your nodes have the latest SELinux policy with the following command:
Make sure your nodes have the latest SELinux policy with the following command:
-
  [root]# yum update selinux-policy
+
  [root]# '''yum update selinux-policy'''
If you continue to encounter errors, it may be necessary to disable SELinux.  This is not recommended, and should only be used as a last resort. Disable SELinux with the command:
If you continue to encounter errors, it may be necessary to disable SELinux.  This is not recommended, and should only be used as a last resort. Disable SELinux with the command:
-
  [root]# setenforce 0
+
  [root]# '''setenforce 0'''
See ''Security and SELinux'' in the ''Deployment Guide'' on [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
See ''Security and SELinux'' in the ''Deployment Guide'' on [http://www.redhat.com/docs/manuals/enterprise/ www.redhat.com/docs/manuals/enterprise/].
Line 463: Line 479:
  "AIS Executive exiting (-9)"
  "AIS Executive exiting (-9)"
-
Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.
+
Check for syntax errors in your <tt>/etc/cluster/cluster.conf</tt> file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.
==Logical Volume Issues==
==Logical Volume Issues==
It may be necessary to restart the clustered logical volume manager with the command:
It may be necessary to restart the clustered logical volume manager with the command:
-
  [root]# service clvmd restart
+
  [root]# '''service clvmd restart'''
Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.
Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.
 +
 +
In some cases you may need to rescan for logical volumes if you still cannot see the shared volume:
 +
[root]# '''partprobe -s'''
 +
[root]# '''pvscan'''
 +
[root]# '''vgscan'''
 +
[root]# '''vgchange -ay'''
 +
[root]# '''lvscan'''
 +
[root]# '''service clvmd restart'''
==Shared Storage Issues==
==Shared Storage Issues==
If you are experiencing errors when creating the clustered logical volume, you may need to wipe any previous labels from the virtual disk.
If you are experiencing errors when creating the clustered logical volume, you may need to wipe any previous labels from the virtual disk.
-
<br>'''NOTICE''': This will destroy all data on the virtual disk.
+
<br>'''NOTICE''': This will destroy all data on the shared storage disk!
Execute the following command from one node:
Execute the following command from one node:
-
  [root@node1 ~]# pvremove -ff {/dev/sdXY}
+
  [root@node1 ~]# '''pvremove -ff {/dev/sdXY}'''
Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:
Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:
-
  [root@node1 ~]# pvremove -ff /dev/sdb1
+
  [root@node1 ~]# '''pvremove -ff /dev/sdb1'''
If you are using Conga, click reprobe storage, otherwise type:
If you are using Conga, click reprobe storage, otherwise type:
-
  [root@node1 ~]# partprobe /dev/sdb
+
  [root@node1 ~]# '''partprobe -s /dev/sdb'''
If you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see ''LVM Administrator's Guide'' on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise www.redhat.com/docs/manuals/enterprise].
If you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see ''LVM Administrator's Guide'' on the Red Hat website at [http://www.redhat.com/docs/manuals/enterprise www.redhat.com/docs/manuals/enterprise].
Line 489: Line 513:
Fence each node to ensure that fencing is working properly.
Fence each node to ensure that fencing is working properly.
<br>1. Watch the logs from node 1 with the following command:
<br>1. Watch the logs from node 1 with the following command:
-
  [root@node1]# tail -f /var/log/messages
+
  [root@node1]# '''tail -f /var/log/messages'''
<br>2. Fence the node 2 by executing the following command:
<br>2. Fence the node 2 by executing the following command:
-
  [root@node1]# fence_node {fully qualified hostname or ip address of node2}
+
  [root@node1]# '''fence_node {fully qualified hostname or ip address of node2}'''
<br>3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.
<br>3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.
 +
<br>4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:
<br>4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:
-
  [root]# clustat -i 2
+
  [root]# '''clustat -i 2'''
<br>5. After you successfully fence one node, repeat this process for the second
<br>5. After you successfully fence one node, repeat this process for the second
node.
node.
 +
 +
 +
 +
----
 +
<font size=1>[[../../DellRedHatHALinuxCluster|Dell|Red Hat HA Linux]] > Cluster</font>

Latest revision as of 22:53, 24 April 2009

Dell|Red Hat HA Linux > Cluster

Contents

Configuring Your Cluster

This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI Tools.

Conga is a configuration and management suite based on a server/agent model. You can access the management server luci using a standard web browser from anywhere on the network. Luci communicates to the client agent ricci on the nodes and installs all required packages, synchronizes the cluster configuration file, and manages the storage cluster. Though there are other possible methods such as system-config-cluster and creating an xml configuration file by hand, it is recommended that you use Conga to configure and manage your cluster.

Setting Up a High-Availability Cluster

The following section provides an overview to installing your cluster using Conga. For more information on using Conga, see the section Configuring Red Hat Cluster With Conga in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Preparing the Cluster Nodes for Conga

Run the following commands on each cluster node install and start ricci:

  • Install ricci:
[root]# yum install ricci
  • Configure ricci to start on boot:
[root]# chkconfig ricci on

Start the ricci service:

[root]# service ricci start

Execute the following commands on the management node to install the Conga server luci:

  • Install luci:
[root]# yum install luci

NOTE: You can configure luci on any node, but it is recommended that you install luci on a dedicated management node. If you do have a dedicated management node, that node will need access to the Cluster channel on RHN or a Satellite server. You can also login to RHN and manually download luci for installation on your management node.

  • Initialize the luci server and assign the admin password:
[root]# luci_admin init
Initializing the luci server
  
Creating the 'admin' user
  
Enter password:
Confirm password:
  
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
  
You must restart the luci server for changes to take effect.
  
Run "service luci restart" to do so

  • Configure luci to start on boot:
[root]# chkconfig luci on
  • Start the luci service:
[root]# service luci start
Starting luci: Generating https SSL certificates...  done   [  OK  ]
 
Point your web browser to https://management.example.com:8084 to access luci

For more information on configuring your cluster with Conga, see the section Configuring Red Hat Cluster With Conga on the Red Hat website at www.redhat.com/docs/manuals/enterprise/ or locally from the Cluster_Administration-en-US package.

Creating Your Cluster Using Conga

Conga automatically installs the software required for clustering, on all cluster nodes. Ensure you have completed the steps in Preparing Cluster Nodes for Conga before proceeding. Verify the steps in Configuring the Firewall were completed for each cluster node, other wise luci will not be able to communicate with ricci.


1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:

https://{management_node_hostname_or_IP_address}:8084

Where {management_node_hostname_or_IP_address} is the hostname or IP address of the management server running luci.
NOTE: If you encounter any errors, see section Troubleshooting Conga.
2. Enter your username and password to securely log in to the luci server.
3. Go to the cluster tab.
4. Click Create a New Cluster.
5. Enter a cluster name of 15 characters or less.
6. Add the fully qualified private hostname or IP address and root password for each cluster node.
NOTE: You may also select Check if node passwords are identical and only enter the password for the first node.
NOTE: The password is sent in an encrypted SSL session, and not saved.
7. Ensure that the option for Enable Shared Storage Support is selected and click Submit.

Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.

NOTE: If an error message such as An error occurred when trying to contact any of the nodes in the cluster appears in luci server webpage, wait a few minutes and refresh your browser.

Configuring Fencing Using Conga

Fencing ensures data integrity on the shared storage file system by removing any problematic nodes from the cluster. This is accomplished by cutting off power to the system to ensure it does not attempt to write to the storage device.

In your Dell|Red Hat HA Cluster system, network power switches provide the most reliable fencing method. Remote access controllers such as DRAC or IPMI should be used as secondary fencing methods. However, if no network power switches are available for primary fencing, a secondary method such as manual can be used, but is not supported. Use a log watching utility to notify you if your primary fencing method is failing.

When using a Dell M1000e modular blade enclosure, the Dell CMC may be used as a primary fencing method instead, as it controls power to each individual blades. In this case each blade's individual iDRAC or IPMI may be used as secondary fencing methods.

For more information, see Testing Fencing Mechanisms and the section Fencing in the Cluster Suite Overview at www.redhat.com/docs/manuals/enterprise.

Configure any network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section Additional Dell Configuration for more information. For details on configuring your network power switches for remote access, see the documentation for that product.

To configure fencing:

  1. Login to luci
  2. Go to Cluster -> Nodes.
  3. Select one of the cluster nodes.
  4. Under the section for Main Fencing Method, click Add.
  5. Configure both primary and secondary fencing.
    NOTE: If you have network power switches, they will be configured as shared devices.

Additional Configuration for DRAC Fencing

Depending on the specific DRAC model your systems are using, one or more of the following sections may be applicable.

Configure iDRAC6 Fencing

PowerEdge Dell servers using iDRAC6 will need specific parameters set in order to function properly. For the latest information on support for this in Conga, see Bug 496749.

  1. Manually SSH to the iDRAC6
  2. Copy the prompt that is displayed after successful login. (e.g. admin1->)
  3. On one node only, edit /etc/cluster/cluster.conf and change each fencedevice line as follows.
    1. Change references of the agent fence_drac to fence_drac5
    2. Add the parameter cmd_prompt="your_iDRAC6_prompt" to each fencedevice line for each node.

where your_iDRAC6_prompt is the one you copied in step 2. (e.g. admin1->)

Example:

Find the line for each fence device. This example shows a two node cluster with DRAC fencing:

       <fencedevices>
               <fencedevice agent="fence_drac" ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
               <fencedevice agent="fence_drac" ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
       </fencedevices>

Change the agent to fence_drac5 and add the option cmd_prompt="admin1->" on each line:

       <fencedevices>
               <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.101" login="root" name="node1-drac" passwd="drac_password"/>
               <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.102" login="root" name="node2-drac" passwd="drac_password"/>
       </fencedevices>

NOTE: You must update the cluster configuration as described in Update Cluster Configuration.

Configure DRAC CMC Fencing

PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. At this time Conga does not have an entry for the Dell CMC when configuring fencing. The steps in this section describe how to manually configure fencing for the Dell CMC. See Bug 496724 for details on Conga support.

NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See Bug 466788 for beta code and further discussions on this issue.

NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see Bug 496748.

To configure your nodes for DRAC CMC fencing:

  1. Select Dell Drac as the fencing device in Conga.
  2. Enter the a unique name for the node that will be fenced.
  3. For IP Address enter the DRAC CMC IP address.
  4. Enter the specific blade for Module Name. For example, enter server-1 for blade 1, and server-4 for blade 4.
  5. On one node only, edit /etc/cluster/cluster.conf and change each fencedevice line as follows.
    1. Change references of the agent fence_drac to fence_drac5
    2. Edit the parameter modulename= to read module_name= instead.

Example:

Find the line for each fence device. This example shows a two node cluster with DRAC CMC fencing:

       <fencedevices>
               <fencedevice agent="fence_drac" modulename="server-1" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
               <fencedevice agent="fence_drac" modulename="server-2" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
       </fencedevices>

Change the agent to fence_drac5 and change the option modulename= to module_name= on each line:

       <fencedevices>
               <fencedevice agent="fence_drac5" module_name="server-1" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade1" passwd="drac_password"/>
               <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="192.168.0.200" login="root" name="drac-cmc-blade2" passwd="drac_password"/>
       </fencedevices>

NOTE: You must update the cluster configuration as described in Update Cluster Configuration.

Configure DRAC SSH Fencing

By default any DRAC5/iDRAC/iDRAC6 has SSH enabled, but telnet disabled. To use DRAC5/iDRAC/iDRAC6 fencing over SSH check the Use SSH option while adding a fencing device to a node.
NOTE: This SSH option in Conga is included with luci-0.12.1-7.3.el5_3 and greater.

Update Cluster Configuration

If you make any manual edits to /etc/cluster/cluster.conf, you will need to update all nodes in the cluster with the new configuration. Perform these steps from any one node to update the cluster configuration:

1. Edit /etc/cluster/cluster.conf and change the config_version number at the top of the file:

<cluster alias="my_cluster" config_version="2" name="my_cluster">

Increment it by one:

<cluster alias="my_cluster" config_version="3" name="my_cluster">

2. Save your changes and distribute the cluster configuration file to all nodes:

[root]# ccs_tool update /etc/cluster/cluster.conf
Config file updated from version 2 to 3
  
Update complete.

Setting Up a Storage Cluster

This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster. The Dell|Red Hat HA Cluster is comprised of a Red Hat Cluster Suite high-availability cluster and a Red Hat GFS storage cluster.

Configuring shared storage consists of the following steps:

  1. Configuring a Clustered Logical Volume (CLVM).
  2. Configuring Global File System (GFS).

For more information, see the LVM Administrator's Guide and Global File System on www.redhat.com/docs/manuals/enterprise/. The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools. You can use either method, but only one needs to be completed.

Configuring a Storage Cluster With Conga

Configuring a Storage Cluster with Conga

Configuring a Storage Cluster With CLI Tools

Configuring a Storage Cluster with CLI Tools

Managing the Cluster Infrastructure

It may be necessary to start or stop the cluster infrstructure on one or more nodes at any time. This can be accomplished through the Conga user interface, or individually on each node via the cli.

Managing the Cluster Infrastructure with Conga

The easiest way to start and stop the cluster is using the Conga management interface. This will start and stop all cluster infrastructure daemons on all nodes simulateously.

  1. Login to the web interface at: https://{management_node_hostname_or_IP_address}:8084
    Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
  2. Select cluster
  3. Select the desired action next to the cluster name and select go

Managing the Cluster Infrastructure from the CLI

The proper procedure for starting and stoping the cluster infrastructure from the CLI is outlined below. Note that these commands need to be executed on each node. It is best to run these commands as close to parallel as possible.

  • Staring the cluster infrastructure:
[root]# service cman start
[root]# service clvmd start
[root]# service gfs start
[root]# service rgmanager start

Before proceeding further, make sure all the above mentioned services are started in the order listed above.

  1. Stopping the cluster infrastructure:
[root]# service rgmanager stop
[root]# service clvmd stop
[root]# service gfs stop
[root]# service cman stop

Configuring Cluster Services

This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.

  • A resource is anything that needs to be monitored or managed to ensure availability of an application or deamon (e.g. IP address, file system, database, or web server).
  • A Service is a collection of resources required to provide an application or deamon to the clients (e.g. for a highly available web server, all of these resources are necessary: IP address, file system, and httpd).

Creating Resources

The following steps provide an overview for creating resources:

  1. Click Cluster List.
  2. Select the cluster name and click Resources on the left-pane.
  3. Select Add a Resource.
  4. Select the type of resource and enter in the required data in the appropriate fields and click Submit.
    NOTE: For more information, see Adding Cluster Resources in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Example GFS resource

After clicking on Add a Resource following the steps above

  1. In the drop down list Select a Resource Type select GFS filesystem
  2. In the GFS Resource Configuration enter the details as described below:
    • Name - a name to describe the GFS file system.
    • Mount Point - a mount point on the local file system (e.g. /gfs). This is the directory to which the clustered logical volume will be mounted to, on all the nodes.
    • Device - the GFS device created (e.g. /dev/vg_cluster/lv_cluster). This is the logical volume created in the Setting Up a Storage Cluster section.
    • Filesystem Type - GFS2. Use GFS2, since gfs2 clustered logical volume was created above.
    • options - mounting options. (e.g. rw,debug). Remember to mention debug among the options.
  • The remaining fields may be left blank.

NOTE: Among the mouting options, it is crtical to mention debug because, this option makes the cluster nodes to panic and there by fence, in case there was a problem accessing the shared storage.

Creating a Failover Domain (Optional)

A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide better administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference. For more information see Configuring a Failover Domain in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Creating Services

The following steps provide an overview for creating services:

  1. Click Cluster List.
  2. Select the cluster name and click Services on the left-pane.
  3. Select Add a service.
  4. Choose a service name that describes the function that you are creating the service for.
  5. Leave the default Failover Domain set to None to allow any node in the cluster to run this service. Otherwise, select a previously created failover domain.
  6. Click Add a resource to this service.
  7. Select either new local or existing global resource.
  8. Select the type of resource and enter any required data in the appropriate fields and click submit.
    NOTE: For more information, see Adding a Cluster Service to the Cluster in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Example Configuration of NFS

  • 1. Before configuring an NFS service, ensure that all of your nodes have nfs installed:
[root]# yum install nfs nfs-utils
  • 2. Create resources to be used for the NFS service:
    1. Click Resources
    2. Add an IP Address - IP address to be used as a virtual IP address by the NFS service
    3. Add a GFS File System - follow the steps at Example GFS resource
    4. Add an NFS Export - a name to describe this export (e.g. nfs-export)
    5. Add an NFS Client - the target network client options with the following values:
      • Name - A name to describe this resource (e.g. nfs-cli)
      • Target - The client network that will have access to the exports (e.g. 172.16.0.0/16)
      • Options - NFS client options (e.g. rw,sync,no_root_squash)
  • 3. Create an NFS service:
    1. Click Services and enter a service name (e.g. nfs)
    2. Check Automatically start this service if you want this service to start automatically with the cluster.
    3. Select a recovery policy. Relocate will move the NFS service to another node upon a failure.
    4. Select None for Failover Domain if you want the application to failover to any other node in the cluster. If you want the application to failover to a particular set of nodes, configure a new Failover Domain and select it.
    5. Click Add a resource to this service.
    6. From the drop-down menu, choose Use an existing global resource.
    7. Select the resource IP Address that you created in step 2.
    8. Click Add a child to the newly added IP resource.
    9. From the drop-down menu, choose Use an existing global resource.
    10. Choose GFS File System created in step 2.
    11. Click Add a child to the added GFS File System resource. Choose NFS Export created in step 2.
    12. Click Add a child resource to the newly added NFS Export resource. Choose NFS Client created in step 2.
      This process creates a dependency check among resources. All child resources wait for the parent resource they are associated with, before they start. For example, the above process ensures the NFS service does not try to start if there is no mounted gfs file system shared between the nodes.

Example Configuration of FTP

  • 1. Before configuring an FTP service, ensure that all of your nodes have vsftpd package installed:
[root]# yum install vsftpd
  • 2. Create resources to be used for the FTP service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the FTP service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the FTP service:
    1. Click Services and enter a service name (e.g. ftp).
    2. Select a recovery policy. Relocate will move the FTP service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly added IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-added GFS File System resource. In select a resource type select Script. Enter a name for Name and for the Full path to script file enter /etc/init.d/vsftpd.
      Configuring the Script resource as a child to GFS File System ensures that the file system is mounted before the ftp service attempts to start, as the ftp root will reside on the GFS file system.
  • 4. Additional FTP configuration
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/vsftpd/vsftpd.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/ftp/conf/
[root]# mv /etc/vsftpd/vsftpd.conf /gfs/ftp/conf/vsftpd.conf
[root]# ln -s /gfs/ftp/conf/vsftpd.conf /etc/vsftpd/vsftpd.conf
  • Add a value in the vsftpd.conf file to change the root directory for anonymous logins:
anon_root=/gfs/ftp/pub

Each user that has access would also need their home directory changed to the GFS root if desired. Each node will also need to reference the same users through a central authentication mechanism such as NIS or LDAP, or by creating the same username and passwords on each node. See man vsftpd.conf for more information.

Example Configuration of HTTP

  • 1. Before configuring an HTTP service, ensure that all of your nodes have httpd installed:
[root]# yum install httpd
  • 2. Create resources to be used for the HTTP service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the HTTP service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the HTTP service:
    1. Click Services and enter a service name (e.g. http).
    2. Select a recovery policy. Relocate will move the HTTP service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly add IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-created GFS File System. In select a resource type select script. Enter any name for Name. For the Full path to script file enter /etc/init.d/httpd.
      Configuring the Script resource as a child to GFS ensures that the file system is mounted before the http service attempts to start, as the http root will reside on the GFS file system.
  • 4. Additional HTTP configuration:
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/httpd/conf/httpd.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/web/conf/
[root]# mv /etc/httpd/conf/httpd.conf /gfs/web/conf/httpd.conf
[root]# ln -s /gfs/web/conf/httpd.conf /etc/httpd/conf/httpd.conf
    • Edit httpd.conf and change DocumentRoot to a location on the GFS file system. (e.g. /gfs/web/html/)

Example Configuration of Samba

  • 1. Before configuring a Samba service, ensure that all of your nodes have samba installed:
[root]# yum install samba
  • 2. Create resources to be used for the Samba service:
    1. Click Resources
    2. Add an IP Address - IP address to be used by the clients to access the SAMBA service
    3. Add a GFS File System - follow the steps at Example GFS resource
  • 3. Create the Samba service:
    1. Click Services and enter a service name (e.g. samba).
    2. Select a recovery policy. Relocate will move the SAMBA service to another node upon a failure.
    3. Click Add a resource to this service.
    4. From the drop-down menu, choose Use an existing global resource.
    5. Select the resource IP Address that you created in step 2.
    6. Click Add a child to the newly added IP Address resource.
    7. From the drop-down menu, choose Use an existing global resource.
    8. Choose GFS File System created in step 2.
    9. Click Add a child to the newly-created GFS File System. In select a resource type select script. Enter any name for Name. For the Full path to script file enter /etc/init.d/smb.

Configuring the Script resource as a child to GFS ensures that the file system is mounted before the Samba service attempts to start, as the Samba share will reside on the GFS file system.

  • 4. Additional Samba configuration:
    • Create a symbolic link on all your nodes for their configuration file. This ensures that any configuration changes only need to be made in a single location for all nodes. Additionally, if a node is having issues and cannot mount the GFS file system, the service will fail to read the configuration file. This is a desired side effect.
      On one node, manually mount the GFS file system, and move the file /etc/httpd/conf/smb.conf to a directory on the GFS file system. For example:
[root]# mount -t gfs2 /dev/vg_cluster/lv_cluster /gfs
[root]# mkdir /gfs/web/conf/
[root]# mv /etc/samba/smb.conf /gfs/samba/conf/smb.conf
[root]# ln -s /gfs/samba/conf/smb.conf /etc/samba/smb.conf
  • Edit smb.conf and create a share that points to a location on the GFS file system. (e.g. /gfs/samba/share/)

Managing Cluster Services

You can manage cluster services from Conga by:

  • Click the Cluster tab.
  • Click the name of the service you want to manage.
  • Select a task to perform such as Enable this service, Disable this service, Restart this service, or Relocate this service

You can also use the CLI to manage services. Use the following command:

[root]# clusvcadm

For example, to relocate a service from node 1 to node 2, enter the following command:

[root]# clusvcadm -r service_name node2

Use the command clustat to view cluster status:

[root]# clustat

Verification Checklist

Item Verified
Cluster and Cluster Storage  
Red Hat Cluster Suite installed and configured  
Nodes participating in cluster  
Fencing configured on nodes  
Clustered logical volume  
Global File System  
Services created  

Troubleshooting

Networking Issues

Red Hat Clustering nodes use Multicast to communicate. Your switches must be configured to enable multicast addresses and support IGMP. See the Cluster Administration guide in section 2.6. Multicast Addresses on www.redhat.com/docs/manuals/enterprise/ for more information, and the documentation that came with your switches.

Cluster Status

Conga will allow you to monitor your cluster. Alternatively, you may run the command clustat from any node. For example:

[root]# clustat

Other utilities that may help:

[root]# cman_tool nodes
[root]# cman_tool status
[root]# cman_tool services

Logging: Any important messages are logged to /var/log/messages. The following is an example of loss of network connectivity on node1 which causes node2 to fence it.

Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] aru 2d high delivered 2d received flag 1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Did not need to originate any messages in recovery .
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Sending initial ORF token
Nov 28 15:37:56 node2 openais[3450]: [CLM ] CLM CONFIGURATION CHANGE
Nov 28 15:37:56 node2 openais[3450]: [CLM ] New Configuration:
Nov 28 15:37:56 node2 kernel: dlm: closing connection to node 2
Nov 28 15:37:56 node2 fenced[3466]: node1.ha.lab not a cluster member after 0 sec post_fail_delay
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.2)
Nov 28 15:37:56 node2 fenced[3466]: fencing node "node1.example.com"
Nov 28 15:37:56 node2 openais[3450]: [CLM ] Members Left:
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.1)

Troubleshooting Conga

The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.

Running luci on a Cluster Node

If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:

[root]# service luci restart

Debugging problems with luci

luci can be started in debug mode, by changing the settings in /var/lib/luci/etc/zope.conf file. Change the debug-mode value to on and restart luci on the management node. The debug messages will be directed to /var/log/messages files after setting the debug mode.

Issues While Creating a Cluster Initially

If the following error appears when initially installing the cluster:

The following errors occurred:
Unable to add the key for node node1.ha.lab to the trusted keys list.
Unable to add the key for node node2.ha.lab to the trusted keys list.
Unable to connect to node2.ha.lab: Unable to establish an SSL connection to node2.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

Unable to connect to node1.ha.lab: Unable to establish an SSL connection to node1.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

This error occurs when the luci server cannot communicate with the ricci agent. Verify that ricci is installed and started on each node. Ensure that the firewall has been configured correctly, and that Security-Enhanced Linux (SELinux) is not the issue. Check /var/log/audit/audit.log for details on SELinux issues.

Make sure your nodes have the latest SELinux policy with the following command:

[root]# yum update selinux-policy

If you continue to encounter errors, it may be necessary to disable SELinux. This is not recommended, and should only be used as a last resort. Disable SELinux with the command:

[root]# setenforce 0

See Security and SELinux in the Deployment Guide on www.redhat.com/docs/manuals/enterprise/.

Configuration File Issues

Configuration errors manifest themselves as the following error in /var/log/messages:

"AIS Executive exiting (-9)"

Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.

Logical Volume Issues

It may be necessary to restart the clustered logical volume manager with the command:

[root]# service clvmd restart

Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.

In some cases you may need to rescan for logical volumes if you still cannot see the shared volume:

[root]# partprobe -s
[root]# pvscan
[root]# vgscan
[root]# vgchange -ay
[root]# lvscan
[root]# service clvmd restart

Shared Storage Issues

If you are experiencing errors when creating the clustered logical volume, you may need to wipe any previous labels from the virtual disk.
NOTICE: This will destroy all data on the shared storage disk!

Execute the following command from one node:

[root@node1 ~]# pvremove -ff {/dev/sdXY}

Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:

[root@node1 ~]# pvremove -ff /dev/sdb1

If you are using Conga, click reprobe storage, otherwise type:

[root@node1 ~]# partprobe -s /dev/sdb

If you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see LVM Administrator's Guide on the Red Hat website at www.redhat.com/docs/manuals/enterprise.

Testing Fencing Mechanisms

Fence each node to ensure that fencing is working properly.
1. Watch the logs from node 1 with the following command:

[root@node1]# tail -f /var/log/messages


2. Fence the node 2 by executing the following command:

[root@node1]# fence_node {fully qualified hostname or ip address of node2}


3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.


4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:

[root]# clustat -i 2


5. After you successfully fence one node, repeat this process for the second node.



Dell|Red Hat HA Linux > Cluster

Personal tools
Distributions