Products/HA/DellRedHatHALinuxCluster/Cluster

From DellLinuxWiki

Jump to: navigation, search

Contents

Configuring Your Cluster

This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI tools.

Conga is a newer configuration and management suite based on a server/agent model. You can access the management server (luci) using a standard web browser from anywhere on the network. Luci communicates through the client agent (ricci) on the nodes and installs all required packages, synchronizes the configuration file, and manages the storage cluster. Though there are many possible methods, including the command-line interface, it is recommended that you use Conga to configure and manage your cluster.

Setting Up a High-Availability Cluster

The following sections provide information about installing and configuring your high-availability Red Hat® cluster using Conga. For more information on using Conga, see the section Configuring Red Hat Cluster With Conga in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Preparing Cluster Nodes for Conga

The dell-config-cluster script installs the correct packages. You can use the following sections to verify that the correct software is installed.

To prepare the cluster nodes:
1. Execute the following command on each cluster node to install the Conga client agent ricci:

  • Install ricci.
[root]# yum install ricci
  • Configure ricci to start on boot
[root]# chkconfig ricci on
  • Start the ricci agent
[root]# service ricci start


2. Execute the following command on the management node to install the Conga server luci.

NOTE: You can configure the luci server on any node, but it is recommended that you install luci on a dedicated management node.

On the management node:

  • Install the Conga server agent luci:
[root]# yum install luci

NOTE: The install command will fail if your node does not have an RHEL AP license.

If the above command fails with the following error:

No Match for argument: luci

Verify you have access to the "Cluster" channel with the command:

[root]# yum grouplist

The RHEL AP license provides access to the "Cluster" channel. If your dedicated management node is not running RHEL AP, you can manually obtain the luci package from the RHEL installation media or via RHN.

  • Initialize the luci server:
[root]# luci_admin init
  • Configure the luci server to start on boot:
[root]# chkconfig luci on
  • Start the luci server:
[root]# service luci start

For more information on configuring your cluster with Conga, see the section Configuring Red Hat Cluster With Conga on the Red Hat website at www.redhat.com/docs/manuals/enterprise/ or locally in the Cluster_Administration-en-US package on each node.

Creating Your Cluster Using Conga

Conga automatically installs the software required for clustering on all cluster nodes. Ensure you have completed the steps in "Preparing Cluster Nodes for Conga" before proceeding.


1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:

https://{management_node_hostname_or_IP_address}:8084

Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
NOTE: If you encounter any errors, see section "Troubleshooting Conga".
2. Enter your username and password to securely log in to the luci server.
3. Go to the cluster tab.
4. Click Create a New Cluster.
5. Enter a cluster name of 15 characters or less.
6. Add the fully qualified private hostname or IP address and root password for each cluster node.
NOTE: You may also select Check if node passwords are identical and only enter the password for the first node.
NOTE: The password is sent in an encrypted SSL session, and not saved.
7. Ensure that the option for Enable Shared Storage Support is selected and click Submit.

Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.

Configuring Fencing Using Conga

In your Dell|Red Hat HA Cluster system, the network power switches provide the most reliable fencing method. For more information on configuring your network power switches for remote access, see the documentation that came with your network power switches.

Configure your network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section "Configuring Remote Access" for more information.

Remote access controllers such as DRAC or IPMI should be used as secondary methods, unless no network power switches are available. In this case configure a secondary method as manual. Use a log watching utility to notify you if your primary fencing method is failing. For more information, see Testing Fencing Mechanisms and see the section Fencing in the Cluster Suite Overview located on the Red Hat website at www.redhat.com/docs/manuals/enterprise.

To configure fencing:

  1. Login to luci
  2. Go to Cluster→ Nodes.
  3. Select one of the cluster nodes.
  4. Under the section for Main Fencing Method, click Add.
  5. Configure primary and secondary fencing.


NOTE: If you have optional network power switches, they will be configured as shared devices.

Setting Up a Storage Cluster

This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster. The Dell|Red Hat HA Cluster system comprises of a Red Hat Cluster Suite high-availability cluster and GFS storage cluster.
NOTE: Before setting up the storage, complete the steps in "Installing the RDAC Multi-Path Proxy Driver".

Configuring shared storage consists of the following steps:

  1. Configuring a Clustered Logical Volume (CLVM).
  2. Configuring Global File System (GFS).

For more information, see the LVM Administrator's Guide and Global File System on www.redhat.com/docs/manuals/enterprise/ The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools. You can use any one of the methods.

Configuring a Storage Cluster With Conga

To configure a storage cluster with Conga:

  1. Log in to Luci.
  2. Click the Storage tab.
  3. In the System list section, select a node. The hard drives that are visible to the node are displayed. Repeat this step for all nodes. The same hard-drive list must be displayed. If all nodes cannot access the same hard drives, see the storage documentation to ensure all nodes are viewing the same virtual disk.

Partitioning Shared Storage with Conga

NOTICE: Executing the commands in the following steps erases all partitions and data from your hard drive
NOTE: Execute all commands from one cluster node only. All nodes have access to the same virtual disk and the following steps must be performed only one time.

  1. In the system list section, select a node.
  2. Select Partition Tables.
  3. Select New Partition Table. For Label ensure the option GPT is selected. On the right pane, check the box for your virtual disk and click create.
  4. Luci will reprobe your storage, and should display your Virtual Disk. If your Virtual Disk is not listed, follow the steps in Troubleshooting under "Shared Storage Issues."
  5. At the top, select Unused Space.
  6. The entire space is usually consumed, but you may change to fit your needs. Leave the value for "Content" as Empty and select create.
  7. On the left page, select System List then another node and view the partitions it has access to. If the data displayed is inconsistent, click reprobe storage. It may be necessary to run the command partprobe on all nodes to ensure a consistent view. See the Troubleshooting section for "Shared Storage Issues" for more information.

Creating Clustered Logical Volume with Conga

  1. On the left-pane, click Volume Groups and click New Volume Group.
  2. In Volume Group Name enter a name that identifies your shared storage volume group. For example, vg_cluster.
  3. Ensure that the option Clustered is set to True.
  4. In the right-pane, select the newly-created data partition. In this example, the partition is /dev/sdb1.
  5. Click create. The new volume group is displayed. Verify that the volume group size is correct.
  6. Click New Logical Volume. Enter a value in the Logical Volume Name field to identify your shared storage logical volume. For example, lv_cluster.
  7. Select a size. All the available space is used by default, however you can create several logical volumes to meet your specific needs.

Creating Global File System with Conga

  1. In the Content field, select GFS1 - Global FS v.1.
  2. The GFS entry dialogue appears.
  3. Verify that the value in the Cluster Name field is the same as the value listed in the Cluster tab.
  4. Enter a unique GFS name. You do not have to specify a mount point or list the mount point in /etc/fstab.
  5. In Number of Journals enter the number of cluster nodes plus one.
  6. Verify that the clustered value is set to true. Do not change any other values unless you need to customize the cluster as per your requirements.
    NOTE: If any errors are displayed, see "Shared Storage Issues".
  7. Skip to "Managing the Cluster Infrastructure".

Configuring a Storage Cluster With CLI Tools

The following is an example of creating a clustered logical volume for use with GFS using CLI commands. For instructions on using Conga, see "Configuring a Storage Cluster With Conga".

Partitioning Shared Storage with CLI Tools

To partition shared storage:
1. Verify that the devices on each node are consistent by executing the following command:

[root]# cat /proc/partitions


All devices must be accessible from all nodes.
NOTICE: Executing the command in step 2 erases all partitions and data from your hard drive.
NOTE: Execute all commands from one cluster node only. All nodes have access to the same virtual disk and the following steps must be performed one time only.
2. From one node only, set up shared storage partitioning. Create a partition by executing the following command. For Red Hat® Cluster Suite and Global File System, it is typical for one partition to consume the entire disk.

[root@node1 ~]# parted {your storage device}

Where {your storage device} is the device that points to the shared virtual disk. If you are unsure which device is your virtual disk, follow the steps in "Determining Virtual Disk". For example using the command:

[root@node1 ~]# parted /dev/sdab

The following screen is displayed:

GNU Parted 1.8.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.


3. Change the partition type to GPT.

(parted) mklabel gpt

View the disk information with the print command:

(parted) print
Model: DELL MD Virtual Disk (scsi)
Disk /dev/sdb: 1000GB
Disk geometry for /dev/sdb 0kB - 1000GB
Disk label type: gpt


4. Create a single partition that uses the entire disk:

(parted) mkpart primary 0 -1


NOTE: The value "-1" is use the entire disk.


5. Set the lvm flag for the data partition. Use the set command:

(parted) set 1 lvm on


6. Verify that the partitions are set up properly:

(parted) print
Disk geometry for /dev/sdb: 0kB - 1000GB
For Model: DELL MD Virtual Disk (scsi)
Disk /dev/sdb: 1000GB
Disk label type: gpt
Number Start End Size FileSystemName Flags
1 17.4kB 1000MB 1000MB primary lvm


7. Quit parted. The changes are saved immediately.

(parted) quit


8. Run the following command on all the nodes and ensure that the partitions are present on each node:

cat /proc/partitions


9. If the partitions are not visible, run the following command to update the kernel:

[root]# partprobe {your storage device}

Where {your mpp device} is the newly-created partition. For example:

[root]# partprobe /dev/sdb

To view partitions again:

[root]# cat /proc/partitions


NOTE: If the newly-created partition is not displayed after executing the partprobe command, reboot the system.

Creating Clustered Logical Volume with CLI Tools

NOTE: Ensure that Red Hat Cluster Suite is functioning before you create a cluster logical volume.

To create a logical volume, execute the following commands in series:
1. Create a physical volume (pv):

[root@node1 ~]# pvcreate {your mpp device partition}

Where {your mpp device partition} is the newly-created partition. For example:

[root@node1 ~]# pvcreate /dev/sdb1

The message below is displayed:

Physical volume "/dev/sdb1" successfully created.


2. Create a volume group (vg):

[root@node1 ~]# vgcreate {volume group name} {your mpp device partition}

Where {volume group name} is a name of your choice to identify this volume group, and {your mpp device partition} is the data partition you created. For example:

[root@node1 ~]# vgcreate vg_cluster /dev/sdb1

The following message is displayed:

Volume group vg_cluster successfully created


3. Create a logical volume (lv):

[root@node1 ~]# lvcreate {size of logical volume} -n {logical volume name} {volume group name}

Where {size of logical volume} is one of the following options:

  • -l 100%FREE (consumes all available space)
  • -l {physical extents} (see man lvcreate for more info)
  • -L {size} (see man lvcreate for more info)

You can create several logical volumes based on your specific applications needs, or a single one that consumes all available space.

Lastly where {logical volume name} is a name of your choice. For example:

[root@node1 ~]# lvcreate -l 100%FREE -n lv_cluster vg_cluster

The message below is displayed:

Logical volume "lv_cluster" created


4. Verify the lv was created successfully:

[root@node1 ~]# lvdisplay /dev/vg_cluster/lv_cluster

The following screen is displayed:

--- Logical volume ---
LV Name /dev/vg_cluster/lv_cluster
VG Name vg_cluster
LV UUID zcdt4M-VI2U-l9bA-h2mJ-dmfofFBE-
ub3RYk
LV Write Access read/write
LV Status available
# open 0
LV Size 1020.00 MB
Current LE 255
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:2

NOTE: It may be necessary to restart the clustered lvm daemon on all nodes at this point

[root]# service clvmd restart


5. On all nodes run the following command:

[root]# lvmconf --enable-cluster


6. Start the clustered lvm daemon (clvmd) by typing:

[root]# service clvmd start

Ensure that clvmd starts on boot:

[root]# chkconfig clvmd on

Creating Global File System with CLI Tools

NOTE: For more information on configuring GFS, see the Global File System Guide on the Red Hat website at www.redhat.com/docs/manuals/enterprise/ or the package Global_File_System-en-US locally on each node. For instructions on using Conga to configure GFS, see the section "Configuring a Storage Cluster With Conga".

Before you create a GFS:

  • Verify that Red Hat Cluster Suite is functioning properly and ensure that a clustered logical volume is created.
  • Gather the following information:
    • {cluster name}—is listed in Conga in the Cluster tab, and is also the /etc/cluster/cluster.conf file in each node.
    • {gfs filesystem name}—A unique identifier among GFS file systems.
    • {number of journals}—One journal per node is required. However, at least one additional journal is recommended.
    • {name of clustered logical volume}—The cluster logical volume that you created. For example /dev/vg_cluster/clustered/lv.

To create a file system:
1. From one system only execute the following command:

[root@node1 ~]# gfs_mkfs -t {cluster name}:{gfs filesystem name} -p lock_dlm -j {number of journales} {name of clustered logical volume}

For example:

[root@node1 ~]# gfs_mkfs -t my_cluster:gfs -p lock_dlm -j 3 /dev/vg_cluster/lv_cluster

The following warning message is displayed:

This will destroy any data on
/dev/vg_cluster/lv_cluster.
Are you sure you want to proceed? [y/n]


2. Enter <y>.


3. A confirmation message is displayed indicating the creation of a file system.

Device:
/dev/vg_cluster/lv_cluster
Blocksize: 4096
Filesystem Size: 162772
Journals: 3
Resource Groups: 8
Locking Protocol: lock_dlm
Lock Table: my_cluster:gfs
Syncing...
All Done

Managing the Cluster Infrastructure

It may be necessary to start or stop the cluster infrstructure on one or more nodes at any time. This can be accomplished through the Conga user interface, or individually on each node via the cli.

Managing the Cluster Infrastructure with Conga

The easiest way to start and stop the cluster is using the Luci management interface. This will start and stop all cluster infrastructure daemons on all nodes simulateously.

  1. Login to the web interface at: https://{management_node_hostname_or_IP_address}:8084
    Where {management_node_hostname_or_IP_address} is the hostname or IP address of the luci server.
  2. Select cluster
  3. Select the desired action next to the cluster name and select go

Managing the Cluster Infrastructure from the CLI

The proper procedure for starting and stoping the cluster infrastructure from the cli is outlined below. Note that these commands need to be executed on each node. It is best to run these commands as close to parallel as possible.

Staring the cluster infrastructure:

[root]# service cman start
[root]# service clvmd start
[root]# service gfs start
[root]# service rgmanager start

Stopping the cluster infrastructure:

[root]# service rgmanager stop
[root]# service clvmd stop
[root]# service gfs stop
[root]# service cman stop

Configuring a Cluster Service

This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.

  • A resource is anything that needs to be monitored or managed to ensure availability of an application or deamon (e.g. IP address, file system, database, or web server).
  • A Service is a collection of resources required to provide an application or deamon to the clients (e.g. for a highly available web server, all of these resources are necessary: IP address, file system, and httpd).

Creating Resources

The following steps provide an overview for creating resources:

  1. Click Cluster List.
  2. Select the cluster name and click Resources on the left-pane.
  3. Select Add a Resource.
  4. Select the type of resource and enter in the required data in the appropriate fields and click Submit.
    NOTE: For more information, see Adding Cluster Resources in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Creating a Failover Domain (Optional)

A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide greater administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference. For more information see Configuring a Failover Domain in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Creating Services

The following steps provide an overview for creating services:

  1. Click Cluster List.
  2. Select the cluster name and click Services on the left-pane.
  3. Select Add a service.
  4. Choose a service name that describes the function that you are creating the service for.
  5. Leave the default Failover Domain set to None to allow any node in the cluster to run this service. Otherwise, select a previously created failover domain.
  6. Click Add a resource to this service.
  7. Select either new local or existing global resource.
  8. Select the type of resource and enter any required data in the appropriate fields and click submit.
    NOTE: For more information, see Adding a Cluster Service to the Cluster in the Cluster Administration guide at www.redhat.com/docs/manuals/enterprise/.

Example Configuration of NFS

Use Conga to configure an NFS service by first creating resources:
1. Click Resources and add the following resources:

  • IP Address—IP address to be used as a virtual IP address by the NFS service
  • GFS File System—the clustered GFS logical volume
    • Name - a name to describe the GFS file system
    • Mount Point - the GFS device created (e.g. /dev/vg_cluster/lv_cluster)
    • The remaining fields may be left blank
  • NFS Export— a name to describe this export
  • NFS Client—the target network client options
    • Name - A name to describe this resource
    • Target - The client network that will have access (e.g. 172.16.0.0/16)
    • Options - NFS client options (e.g. rw,sync,no_root_squash)

2. Click Services and enter a service name.
3. Select a recovery policy.
4. Click Add a resource to this service.
5. From the drop-down menu, choose Use an existing global resource.
6. Select the resource IP Address that you created in step 1.
7. Select Add a resource to this service
8. From the drop-down menu, choose Use an existing global resource.
9. Choose GFS File System created in step 1.
10. Click Add a child to the newly-created GFS File System. Choose NFS Export created in step 1.
11. Click Add a child resource to the newly-created NFS Export. Choose NFS Client created in step 1.

This process creates a dependency check among resources. Before attempting to start, all child resources wait for the parent resource they are associated with. For example, this ensures the NFS service does not try to start if there is no mounted file system, and so on.

Managing Cluster Services

You can manage cluster services from Conga by:

  • Click the Cluster tab.
  • Click the name of the service you want to manage.
  • Select a task to perform such as Enable this service, Disable this service, Restart this service, or Relocate this service

You can also use the CLI to manage services. Use the following command:

[root]# clusvcadm

For example, to relocate a service from node 1 to node 2, enter the following command:

[root]# clusvcadm -r service_name node2

Use the command clustat to view cluster status:

[root]# clustat

Verification Checklist

Item Verified
Cluster and Cluster Storage
Red Hat Cluster Suite installed and configured
Nodes participating in cluster
Fencing configured on nodes
Clustered logical volume
Global File System
Services created

Troubleshooting

Networking Issues

Red Hat Clustering nodes use Multicast to communicate. Your switches must be configured to enable multicast addresses and support IGMP. See the Cluster Administration guide in section 2.6. Multicast Addresses on www.redhat.com/docs/manuals/enterprise/ for more information, and the documentation that came with your switches.

Cluster Status

Conga will allow you to monitor your cluster. Alternatively, you may run the command clustat from any node. For example:

[root]# clustat

Logging Any important messages are logged to /var/log/messages. The following is an example of loss of network connectivity on node1 which causes node2 to fence it.

Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] aru 2d high delivered 2d received flag 1
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Did not need to originate any messages in recovery .
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] Sending initial ORF token
Nov 28 15:37:56 node2 openais[3450]: [CLM ] CLM CONFIGURATION CHANGE
Nov 28 15:37:56 node2 openais[3450]: [CLM ] New Configuration:
Nov 28 15:37:56 node2 kernel: dlm: closing connection to node 2
Nov 28 15:37:56 node2 fenced[3466]: node1.ha.lab not a cluster member after 0 sec post_fail_delay
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.2)
Nov 28 15:37:56 node2 fenced[3466]: fencing node "node1.example.com"
Nov 28 15:37:56 node2 openais[3450]: [CLM ] Members Left:
Nov 28 15:37:56 node2 openais[3450]: [CLM ] r(0) ip(172.16.0.1)

Troubleshooting Conga

The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.

Running luci on a Cluster Node

If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:

[root]# service luci restart

Issues While Creating a Cluster Initially

If the following error appears when initially installing the cluster:

The following errors occurred:
Unable to add the key for node node1.ha.lab to the trusted keys list.
Unable to add the key for node node2.ha.lab to the trusted keys list.
Unable to connect to node2.ha.lab: Unable to establish an SSL connection to node2.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

Unable to connect to node1.ha.lab: Unable to establish an SSL connection to node1.ha.lab:11111:
ClientSocket(hostname, port, timeout): connect() failed

This error occurs when the luci server cannot communicate with the ricci agent. Verify that ricci is installed and started on each node. Ensure that the firewall has been configured correctly, and that Security-Enhanced Linux (SELinux) is not the issue. Check /var/log/audit/audit.log for details on SELinux issues.

Make sure your nodes have the latest SELinux policy with the following command:

[root]# yum update selinux-policy

If you continue to encounter errors, it may be necessary to disable SELinux. This is not recommended, and should only be used as a last resort. Disable SELinux with the command:

[root]# setenforce 0

See Security and SELinux in the Deployment Guide on www.redhat.com/docs/manuals/enterprise/.

Configuration File Issues

Configuration errors manifest themselves as the following error in /var/log/messages:

"AIS Executive exiting (-9)"

Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.

Logical Volume Issues

It may be necessary to restart the clustered logical volume manager with the command:

[root]# service clvmd restart

Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.

Shared Storage Issues

If you are experiencing errors when creating the clustered logical volume, you may need to wipe any previous labels from the virtual disk.
NOTICE: This will destroy all data on the virtual disk.

Execute the following command from one node:

[root@node1 ~]# pvremove -ff {/dev/sdXY}

Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:

[root@node1 ~]# pvremove -ff /dev/sdb1

If you are using Conga, click reprobe storage, otherwise type:

[root@node1 ~]# partprobe /dev/sdb

If you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see LVM Administrator's Guide on the Red Hat website at www.redhat.com/docs/manuals/enterprise.

Testing Fencing Mechanisms

Fence each node to ensure that fencing is working properly.
1. Watch the logs from node 1 with the following command:

[root@node1]# tail -f /var/log/messages


2. Fence the node 2 by executing the following command:

[root@node1]# fence_node {fully qualified hostname or ip address of node2}


3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.
4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:

[root]# clustat -i 2


5. After you successfully fence one node, repeat this process for the second node.

Personal tools
Distributions