Set up an OCFS2 cluster filesystem
From DellLinuxWiki
This is an example how to set up Oracle Cluster File System for use with cluster databases. This is an example using SUSE Enterprise Linux 10 Service Pack 4. As shared storage we use an iscsi storage.
Installation
- We assume here they are named node1 and node2 and have the IP addresses 192.168.0.11 and 192.168.0.12.
- On both nodes, configure your iscsi initiator, install everything that yast proposes:
yast2 iscsi-client
- On both nodes, install ocfs2 software
yast -i ocfs2-tools ocfsconsole
- On both nodes, make the cluster services start at boot
chkconfig o2cb on /etc/init.d/o2cb enable
You get a message "cluster not known". That is okay.
- check that each node can reach each other
node1:~ # ping node2 PING node2 (192.168.0.12) 56(84) bytes of data. 64 bytes from node2 (192.168.0.12): icmp_seq=1 ttl=64 time=1.09 ms
node2:~ # ping node1 PING node1 (192.168.0.11) 56(84) bytes of data. 64 bytes from node1 (192.168.0.11): icmp_seq=1 ttl=64 time=1.09 ms
- Start ocfs2console
- Choose Cluster->Configure Nodes...
- write the cluster nodes in with their local host names (what the command "hostname" returns).
- Choose Cluster->Propagate Configuration...
- make sure that the o2cb service is running on all nodes
/etc/init.d/o2cb start
TroubleShooting
Unable to access cluster service while starting heartbeat
Symptom
ls3147:~ # mount /dev/sdb /FilerOCFS/ ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
Reason 1: hostname
ls3147:~ # hostname bwa3 ls3147:~ # /etc/init.d/o2cb start Starting O2CB cluster ocfs2: OK ls3147:~ # mount /dev/sdb /FilerOCFS/ ls3147:~ # ls /FilerOCFS/ lost+found this_is_an_OCFS2_filesystem ls3147:~ #
Reason 2: IP address
bwa4:~ # mount -a ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted" bwa4:~ # /etc/init.d/o2cb enable Writing O2CB configuration: OK Starting O2CB cluster ocfs2: Failed Cluster ocfs2 created Node bwa1 added Node bwa2 added Node bwa3 added o2cb_ctl: Internal logic failure while adding node bwa4 Stopping O2CB cluster ocfs2: OK bwa4:~ # dmesg (15575,11):o2net_open_listening_sock:1896 ERROR: unable to bind socket at 192.168.51.4:7777, ret=-99 (15606,11):o2net_open_listening_sock:1896 ERROR: unable to bind socket at 192.168.51.4:7777, ret=-99 bwa4:~ # ping 192.168.51.4 PING 192.168.51.4 (192.168.51.4) 56(84) bytes of data. --- 192.168.51.4 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1009ms
Reason 3: Cluster service is not started
node2:~ # mount.ocfs2 /dev/sdb1 /filer/ ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted" node2:~ # /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Offline node2:~ # /etc/init.d/o2cb start Starting O2CB cluster ocfs2: OK node2:~ # mount.ocfs2 /dev/sdb1 /filer/ node2:~ # ls /filer/ lost+found node2:~ # /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active
Unable to access cluster service while trying to initialize cluster
Symptom
When trying to mount the cluster volume you get the error message
node1:~ # mount /dev/sdb /mnt mount.ocfs2: Unable to access cluster service while trying initialize cluster
Reason
Once the reason was the cluster service had not been started on all nodes.
Solution
Start the cluster service on all nodes:
node2:~ # /etc/init.d/o2cb status Driver for "configfs": Not loaded Driver for "ocfs2_dlmfs": Not loaded node2:~ # /etc/init.d/o2cb enable Writing O2CB configuration: OK Loading filesystem "configfs": OK Mounting configfs filesystem at /sys/kernel/config: OK Loading stack plugin "o2cb": OK Loading filesystem "ocfs2_dlmfs": OK Mounting ocfs2_dlmfs filesystem at /dlm: OK Cluster not known node2:~ # /etc/init.d/o2cb status Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted
Unable to access cluster service while creating node
Symptom
In ocfs2console when adding nodes you get the error message
o2cb_ctl: Unable to access cluster service while creating node Could not add node node1
Solution 1
The following solution worked once: Delete /etc/ocfs2/cluster.conf
rm /etc/ocfs2/cluster.conf
Solution 2
The following solution worked once: Write /etc/ocfs2/cluster.conf manually:
node:
name = node1
cluster = ocfs2
number = 0
ip_address = 192.168.0.11
ip_port = 7777
node:
name = node2
cluster = ocfs2
number = 1
ip_address = 192.168.0.12
ip_port = 7777
cluster:
name = ocfs2
node_count = 2
Unable to access cluster service while trying to join the group
Symptom
When trying to mount an OCFS2 share, you get the above error message.
/etc/init.d/o2cb start
gives you
Cluster not known
Solution
Configure ocfs2 to start a cluster:
/etc/init.d/o2cb configure
and answer all questions. As
Cluster to start on boot
use the cluster name from /etc/ocfs2/cluster.conf, e.g.:
Cluster to start on boot (Enter "none" to clear) []: ocfs2
Unable to access cluster service Cannot initialize cluster
Symptom
When trying to mount an ocfs2 volume you get the above error message.
Solution
Start the cluster service using
/etc/init.d/o2cb enable
And try to mount again.
cluster not known
Symptom
You get a message
o2cb cluster not known
Reason 1
You have not yet formatted a LUN using the tool ocfs2console.
Solution 1
Run ocfs2console as described in chapter OCFS2 Volume.
Reason 2
Your host names in /etc/ocfs2/cluster.conf differ from your actual hostnames.
Solution 2
Make sure the command
hostname
Delivers bwa1 on the first blade. Make sure the cluster nodes in /etc/ocfs2/cluster.conf are named like bwa1 and have the correct (192….) IP addresses. Start yast2 lan and check your IP addresses are local (starting with 192.168.).
Reason 3
Your cluster is not started.
Solution
Configure ocfs2 to start a cluster:
/etc/init.d/o2cb configure
and answer all questions. As
Cluster to start on boot
use the cluster name from /etc/ocfs2/cluster.conf, e.g.:
Cluster to start on boot (Enter "none" to clear) []: ocfs2
no free slots available
ls3154:~ # mount /dev/sdb /FilerOCFS/
mount.ocfs2: Invalid argument while mounting /dev/sdb on /FilerOCFS/. Check 'dmesg' for more information on this error.
ls3154:~ # dmesg | grep slots
(23985,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24030,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24145,4):ocfs2_find_slot:243 ERROR: no free slots available!
(24352,4):ocfs2_find_slot:243 ERROR: no free slots available!
ls3154:~ # man mkfs.ocfs2
[...]
-N, --node-slots number-of-node-slots
Valid number ranges from 1 to 255. This number specifies the maximum number of nodes that can concurrently mount the partition. If omitted, the
number defaults to 4. This number can be later increased using tunefs.ocfs2.
is apparently in use by the system
Symptom
When trying to format an ocfs2 volume you get
/dev/disk/by-id/scsi-3600601600a302600aeddc308894cdf11 is apparently in use by the system; will not make a ocfs2 volume here
Solution
Use another name for the same device, for example from the /dev/disk/by-name path.
transport endpoint not connected
Symptom
When trying to mount an ocfs2 volume you get an error message containing
transport endpoint not connected
Reason 1
Your firewall is up
Solution 1
Stop your firewall, e.g. for SUSE Linux
rcSuSEfirewall2 stop
Reason 2
Your cluster nodes have different opinions about the cluster's node count.
Solution 2
Adapt node count in /etc/ocfs2/cluster.conf on every cluster node. Reboot all blades.
Reason 3
Your cluster node have different opinions about the timeout values.
Solution 3
Read your timeout configuration from your cluster nodes using the command
/etc/init.d/o2cb status
Set them right on all nodes using the command
/etc/init.d/o2cb configure
Reason 4
Your values from /etc/sysconfig/o2cb differ.
Solution 4
Change /etc/sysconfig/o2cb, restart the cluster using
/etc/init.d/o2cb restart
you see files only on one node
Symptom
You have your filesystem mounted and add a file on one node, but do not see it on the other node.
Reason 1
In one case the reason for this was that the user had forgotten to "propagate configuration" AND node2 could not be reached over the network.
Solution 1
Make sure the nodes can ping one another and use "propagate configuration".