Set up an OCFS2 cluster filesystem

From DellLinuxWiki

Jump to: navigation, search

This is an example how to set up Oracle Cluster File System for use with cluster databases. This is an example using SUSE Enterprise Linux 10 Service Pack 4. As shared storage we use an iscsi storage.

Contents

Installation

We assume here they are named node1 and node2 and have the IP addresses 192.168.0.11 and 192.168.0.12.
  • On both nodes, configure your iscsi initiator, install everything that yast proposes:
yast2 iscsi-client
  • On both nodes, install ocfs2 software
yast -i ocfs2-tools ocfsconsole
  • On both nodes, make the cluster services start at boot
chkconfig o2cb on
/etc/init.d/o2cb enable

You get a message "cluster not known". That is okay.

  • check that each node can reach each other
node1:~ # ping node2
PING node2 (192.168.0.12) 56(84) bytes of data.
64 bytes from node2 (192.168.0.12): icmp_seq=1 ttl=64 time=1.09 ms
node2:~ # ping node1
PING node1 (192.168.0.11) 56(84) bytes of data.
64 bytes from node1 (192.168.0.11): icmp_seq=1 ttl=64 time=1.09 ms
  • Start ocfs2console
    • Choose Cluster->Configure Nodes...
    • write the cluster nodes in with their local host names (what the command "hostname" returns).
    • Choose Cluster->Propagate Configuration...
  • make sure that the o2cb service is running on all nodes
/etc/init.d/o2cb start

TroubleShooting

Unable to access cluster service while starting heartbeat

Symptom

ls3147:~ # mount /dev/sdb /FilerOCFS/
ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"

Reason 1: hostname

ls3147:~ # hostname bwa3
ls3147:~ # /etc/init.d/o2cb start
Starting O2CB cluster ocfs2: OK
ls3147:~ # mount /dev/sdb /FilerOCFS/
ls3147:~ # ls /FilerOCFS/
lost+found  this_is_an_OCFS2_filesystem
ls3147:~ #     

Reason 2: IP address

bwa4:~ # mount -a
ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
bwa4:~ # /etc/init.d/o2cb enable
Writing O2CB configuration: OK
Starting O2CB cluster ocfs2: Failed
Cluster ocfs2 created
Node bwa1 added
Node bwa2 added
Node bwa3 added
o2cb_ctl: Internal logic failure while adding node bwa4

Stopping O2CB cluster ocfs2: OK
bwa4:~ # dmesg
(15575,11):o2net_open_listening_sock:1896 ERROR: unable to bind socket at 192.168.51.4:7777, ret=-99
(15606,11):o2net_open_listening_sock:1896 ERROR: unable to bind socket at 192.168.51.4:7777, ret=-99
bwa4:~ # ping 192.168.51.4
PING 192.168.51.4 (192.168.51.4) 56(84) bytes of data.

--- 192.168.51.4 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms

Reason 3: Cluster service is not started

node2:~ # mount.ocfs2 /dev/sdb1 /filer/
ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
node2:~ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Offline
node2:~ # /etc/init.d/o2cb start
Starting O2CB cluster ocfs2: OK
node2:~ # mount.ocfs2 /dev/sdb1 /filer/
node2:~ # ls /filer/
lost+found
node2:~ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active

Unable to access cluster service while trying to initialize cluster

Symptom

When trying to mount the cluster volume you get the error message

node1:~ # mount /dev/sdb /mnt
mount.ocfs2: Unable to access cluster service while trying initialize cluster

Reason

Once the reason was the cluster service had not been started on all nodes.

Solution

Start the cluster service on all nodes:

node2:~ # /etc/init.d/o2cb status
Driver for "configfs": Not loaded
Driver for "ocfs2_dlmfs": Not loaded
node2:~ # /etc/init.d/o2cb enable
Writing O2CB configuration: OK
Loading filesystem "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading stack plugin "o2cb": OK
Loading filesystem "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Cluster not known
node2:~ # /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted

Unable to access cluster service while creating node

Symptom

In ocfs2console when adding nodes you get the error message

o2cb_ctl: Unable to access cluster service while creating node
Could not add node node1

Solution 1

The following solution worked once: Delete /etc/ocfs2/cluster.conf

rm /etc/ocfs2/cluster.conf

Solution 2

The following solution worked once: Write /etc/ocfs2/cluster.conf manually:

node:
        name = node1
        cluster = ocfs2
        number = 0
        ip_address = 192.168.0.11
        ip_port = 7777

node:
        name = node2
        cluster = ocfs2
        number = 1
        ip_address = 192.168.0.12
        ip_port = 7777

cluster:
        name = ocfs2
        node_count = 2

Unable to access cluster service while trying to join the group

Symptom

When trying to mount an OCFS2 share, you get the above error message.

/etc/init.d/o2cb start

gives you

Cluster not known

Solution

Configure ocfs2 to start a cluster:

/etc/init.d/o2cb configure

and answer all questions. As

Cluster to start on boot

use the cluster name from /etc/ocfs2/cluster.conf, e.g.:

Cluster to start on boot (Enter "none" to clear) []: ocfs2

Unable to access cluster service Cannot initialize cluster

Symptom

When trying to mount an ocfs2 volume you get the above error message.

Solution

Start the cluster service using

/etc/init.d/o2cb enable

And try to mount again.

cluster not known

Symptom

You get a message

o2cb cluster not known

Reason 1

You have not yet formatted a LUN using the tool ocfs2console.

Solution 1

Run ocfs2console as described in chapter OCFS2 Volume.

Reason 2

Your host names in /etc/ocfs2/cluster.conf differ from your actual hostnames.

Solution 2

Make sure the command

hostname

Delivers bwa1 on the first blade. Make sure the cluster nodes in /etc/ocfs2/cluster.conf are named like bwa1 and have the correct (192….) IP addresses. Start yast2 lan and check your IP addresses are local (starting with 192.168.).

Reason 3

Your cluster is not started.

Solution

Configure ocfs2 to start a cluster:

/etc/init.d/o2cb configure

and answer all questions. As

Cluster to start on boot

use the cluster name from /etc/ocfs2/cluster.conf, e.g.:

Cluster to start on boot (Enter "none" to clear) []: ocfs2

no free slots available

ls3154:~ # mount /dev/sdb /FilerOCFS/
mount.ocfs2: Invalid argument while mounting /dev/sdb on /FilerOCFS/. Check 'dmesg' for more information on this error.
ls3154:~ # dmesg | grep slots
(23985,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24030,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24145,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24352,4):ocfs2_find_slot:243 ERROR: no free slots available!
ls3154:~ # man mkfs.ocfs2 
[...]
       -N, --node-slots number-of-node-slots
              Valid  number  ranges from 1 to 255. This number specifies the maximum number of nodes that can concurrently mount the partition. If omitted, the
              number defaults to 4. This number can be later increased using tunefs.ocfs2.

is apparently in use by the system

Symptom

When trying to format an ocfs2 volume you get

/dev/disk/by-id/scsi-3600601600a302600aeddc308894cdf11 is apparently in use by the system; will not make a ocfs2 volume here

Solution

Use another name for the same device, for example from the /dev/disk/by-name path.

transport endpoint not connected

Symptom

When trying to mount an ocfs2 volume you get an error message containing

transport endpoint not connected

Reason 1

Your firewall is up

Solution 1

Stop your firewall, e.g. for SUSE Linux

rcSuSEfirewall2 stop

Reason 2

Your cluster nodes have different opinions about the cluster's node count.

Solution 2

Adapt node count in /etc/ocfs2/cluster.conf on every cluster node. Reboot all blades.

Reason 3

Your cluster node have different opinions about the timeout values.

Solution 3

Read your timeout configuration from your cluster nodes using the command

/etc/init.d/o2cb status

Set them right on all nodes using the command

/etc/init.d/o2cb configure

Reason 4

Your values from /etc/sysconfig/o2cb differ.

Solution 4

Change /etc/sysconfig/o2cb, restart the cluster using

/etc/init.d/o2cb restart

you see files only on one node

Symptom

You have your filesystem mounted and add a file on one node, but do not see it on the other node.

Reason 1

In one case the reason for this was that the user had forgotten to "propagate configuration" AND node2 could not be reached over the network.

Solution 1

Make sure the nodes can ping one another and use "propagate configuration".

Personal tools
Distributions