21.2 Installing and Configuring OCFS2

21.2.1 Preparing a Cluster for OCFS2
21.2.2 Configuring the Firewall
21.2.3 Configuring the Cluster Software
21.2.4 Creating the Configuration File for the Cluster Stack
21.2.5 Configuring the Cluster Stack
21.2.6 Configuring the Kernel for Cluster Operation
21.2.7 Starting and Stopping the Cluster Stack
21.2.8 Creating OCFS2 volumes
21.2.9 Mounting OCFS2 Volumes
21.2.10 Querying and Changing Volume Parameters

The procedures in the following sections describe how to set up a cluster to use OCFS2.

21.2.1 Preparing a Cluster for OCFS2

For best performance, each node in the cluster should have at least two network interfaces. One interface is connected to a public network to allow general access to the systems. The other interface is used for private communication between the nodes; the cluster heartbeat that determines how the cluster nodes coordinate their access to shared resources and how they monitor each other's state. These interface must be connected via a network switch. Ensure that all network interfaces are configured and working before continuing to configure the cluster.

You have a choice of two cluster heartbeat configurations:

  • Local heartbeat thread for each shared device. In this mode, a node starts a heartbeat thread when it mounts an OCFS2 volume and stops the thread when it unmounts the volume. This is the default heartbeat mode. There is a large CPU overhead on nodes that mount a large number of OCFS2 volumes as each mount requires a separate heartbeat thread. A large number of mounts also increases the risk of a node fencing itself out of the cluster due to a heartbeat I/O timeout on a single mount.

  • Global heartbeat on specific shared devices. You can configure any OCFS2 volume as a global heartbeat device provided that it occupies a whole disk device and not a partition. In this mode, the heartbeat to the device starts when the cluster comes online and stops when the cluster goes offline. This mode is recommended for clusters that mount a large number of OCFS2 volumes. A node fences itself out of the cluster if a heartbeat I/O timeout occurs on more than half of the global heartbeat devices. To provide redundancy against failure of one of the devices, you should therefore configure at least three global heartbeat devices.

Figure 21.1 shows a shows a cluster of four nodes connected via a network switch to a LAN and a network storage server. The nodes and the storage server are also connected via a switch to a private network that they use for the local cluster heartbeat.

Figure 21.1 Cluster Configuration Using a Private Network

The diagram shows a cluster of four nodes connected via a network switch to a LAN and a network storage server. The nodes and the storage server are also connected via a switch to a private network that they use for the cluster heartbeat.


It is possible to configure and use OCFS2 without using a private network but such a configuration increases the probability of a node fencing itself out of the cluster due to an I/O heartbeat timeout.

21.2.2 Configuring the Firewall

Configure or disable the firewall on each node to allow access on the interface that the cluster will use for private cluster communication. By default, the cluster uses both TCP and UDP over port 7777.

To allow incoming TCP connections and UDP datagrams on port 7777 from the private network, use the following commands:

# iptables -I INPUT -s subnet_addr/prefix_length -p tcp \
  -m state --state NEW -m tcp -–dport 7777 -j ACCEPT
# iptables -I INPUT -s subnet_addr/prefix_length -p udp \
  -m udp -–dport 7777 -j ACCEPT
# service iptables save

where subnet_addr/prefix_length specifies the network address of the private network, for example 10.0.1.0/24.

21.2.3 Configuring the Cluster Software

Ideally, each node should be running the same version of the OCFS2 software and a compatible version of the Oracle Linux Unbreakable Enterprise Kernel (UEK). It is possible for a cluster to run with mixed versions of the OCFS2 and UEK software, for example, while you are performing a rolling update of a cluster. The cluster node that is running the lowest version of the software determines the set of usable features.

Use yum to install or upgrade the following packages to the same version on each node:

  • kernel-uek

  • ocfs2-tools

Note

If you want to use the global heartbeat feature, you must install ocfs2-tools-1.8.0-11 or later.

21.2.4 Creating the Configuration File for the Cluster Stack

You can create the configuration file by using the o2cb command or a text editor.

To configure the cluster stack by using the o2cb command:

  1. Use the following command to create a cluster definition.

    # o2cb add-cluster cluster_name 

    For example, to define a cluster named mycluster with four nodes:

    # o2cb add-cluster mycluster

    The command creates the configuration file /etc/ocfs2/cluster.conf if it does not already exist.

  2. For each node, use the following command to define the node.

    # o2cb add-node cluster_name node_name --ip ip_address

    The name of the node must be same as the value of system's HOSTNAME that is configured in /etc/sysconfig/network. The IP address is the one that the node will use for private communication in the cluster.

    For example, to define a node named node0 with the IP address 10.1.0.100 in the cluster mycluster:

    # o2cb add-node mycluster node0 --ip 10.1.0.100
  3. If you want the cluster to use global heartbeat devices, use the following commands.

    # o2cb add-heartbeat cluster_name device1
    .
    .
    .
    # o2cb heartbeat-mode cluster_name global
    Note

    You must configure global heartbeat to use whole disk devices. You cannot configure a global heartbeat device on a disk partition.

    For example, to use /dev/sdd, /dev/sdg, and /dev/sdj as global heartbeat devices:

    # o2cb add-heartbeat mycluster /dev/sdd
    # o2cb add-heartbeat mycluster /dev/sdg
    # o2cb add-heartbeat mycluster /dev/sdj
    # o2cb heartbeat-mode mycluster global

  4. Copy the cluster configuration file /etc/ocfs2/cluster.conf to each node in the cluster.

    Note

    Any changes that you make to the cluster configuration file do not take effect until you restart the cluster stack.

The following sample configuration file /etc/ocfs2/cluster.conf defines a 4-node cluster named mycluster with a local heartbeat.

node:
	name = node0
	cluster = mycluster
	number = 0
	ip_address = 10.1.0.100
	ip_port = 7777

node:
        name = node1
        cluster = mycluster
        number = 1
        ip_address = 10.1.0.101
        ip_port = 7777

node:
        name = node2
        cluster = mycluster
        number = 2
        ip_address = 10.1.0.102
        ip_port = 7777

node:
        name = node3
        cluster = mycluster
        number = 3
        ip_address = 10.1.0.103
        ip_port = 7777

cluster:
        name = mycluster
        heartbeat_mode = local
        node_count = 4

If you configure your cluster to use a global heartbeat, the file also include entries for the global heartbeat devices.

node:
        name = node0
        cluster = mycluster
        number = 0
        ip_address = 10.1.0.100
        ip_port = 7777

node:
        name = node1
        cluster = mycluster
        number = 1
        ip_address = 10.1.0.101
        ip_port = 7777

node:
        name = node2
        cluster = mycluster
        number = 2
        ip_address = 10.1.0.102
        ip_port = 7777

node:
        name = node3
        cluster = mycluster
        number = 3
        ip_address = 10.1.0.103
        ip_port = 7777

cluster:
        name = mycluster
        heartbeat_mode = global
        node_count = 4

heartbeat:
        cluster = mycluster
        region = 7DA5015346C245E6A41AA85E2E7EA3CF

heartbeat:
        cluster = mycluster
        region = 4F9FBB0D9B6341729F21A8891B9A05BD

heartbeat:
        cluster = mycluster
        region = B423C7EEE9FC426790FC411972C91CC3

The cluster heartbeat mode is now shown as global, and the heartbeat regions are represented by the UUIDs of their block devices.

If you edit the configuration file manually, ensure that you use the following layout:

  • The cluster:, heartbeat:, and node: headings must start in the first column.

  • Each parameter entry must be indented by one tab space.

  • A blank line must separate each section that defines the cluster, a heartbeat device, or a node.

21.2.5 Configuring the Cluster Stack

To configure the cluster stack:

  1. Run the following command on each node of the cluster:

    # service o2cb configure

    The following table describes the values for which you are prompted.

    Prompt

    Description

    Load O2CB driver on boot (y/n)

    Whether the cluster stack driver should be loaded at boot time. The default response is n.

    Cluster stack backing O2CB

    The name of the cluster stack service. The default and usual response is o2cb.

    Cluster to start at boot (Enter "none" to clear)

    Enter the name of your cluster that you defined in the cluster configuration file, /etc/ocfs2/cluster.conf.

    Specify heartbeat dead threshold (>=7)

    The number of 2-second heartbeats that must elapse without response before a node is considered dead. To calculate the value to enter, divide the required threshold time period by 2 and add 1. For example, to set the threshold time period to 120 seconds, enter a value of 61. The default value is 31, which corresponds to a threshold time period of 60 seconds.

    Note

    If your system uses multipathed storage, the recommended value is 61 or greater.

    Specify network idle timeout in ms (>=5000)

    The time in milliseconds that must elapse before a network connection is considered dead. The default value is 30,000 milliseconds.

    Note

    For bonded network interfaces, the recommended value is 30,000 milliseconds or greater.

    Specify network keepalive delay in ms (>=1000)

    The maximum delay in milliseconds between sending keepalive packets to another node. The default and recommended value is 2,000 milliseconds.

    Specify network reconnect delay in ms (>=2000)

    The minimum delay in milliseconds between reconnection attempts if a network connection goes down. The default and recommended value is 2,000 milliseconds.

    To verify the settings for the cluster stack, enter the service o2cb status command:

    # service o2cb status
    Driver for "configfs": Loaded
    Filesystem "configfs": Mounted
    Stack glue driver: Loaded
    Stack plugin "o2cb": Loaded
    Driver for "ocfs2_dlmfs": Loaded
    Filesystem "ocfs2_dlmfs": Mounted
    Checking O2CB cluster "mycluster": Online
      Heartbeat dead threshold: 61
      Network idle timeout: 30000
      Network keepalive delay: 2000
      Network reconnect delay: 2000
      Heartbeat mode: Local
    Checking O2CB heartbeat: Active

    In this example, the cluster is online and is using local heartbeat mode. If no volumes have been configured, the O2CB heartbeat is shown as Not active rather than Active.

    The next example shows the command output for an online cluster that is using three global heartbeat devices:

    # service o2cb status
    Driver for "configfs": Loaded
    Filesystem "configfs": Mounted
    Stack glue driver: Loaded
    Stack plugin "o2cb": Loaded
    Driver for "ocfs2_dlmfs": Loaded
    Filesystem "ocfs2_dlmfs": Mounted
    Checking O2CB cluster "mycluster": Online
      Heartbeat dead threshold: 61
      Network idle timeout: 30000
      Network keepalive delay: 2000
      Network reconnect delay: 2000
      Heartbeat mode: Global
    Checking O2CB heartbeat: Active
      7DA5015346C245E6A41AA85E2E7EA3CF /dev/sdd
      4F9FBB0D9B6341729F21A8891B9A05BD /dev/sdg
      B423C7EEE9FC426790FC411972C91CC3 /dev/sdj
  2. Configure the o2cb and ocfs2 services so that they start at boot time after networking is enabled:

    # chkconfig o2cb on
    # chkconfig ocfs2 on

    These settings allow the node to mount OCFS2 volumes automatically when the system starts.

21.2.6 Configuring the Kernel for Cluster Operation

For the correct operation of the cluster, you must configure the kernel settings shown in the following table:

Kernel Setting

Description

panic

Specifies the number of seconds after a panic before a system will automatically reset itself.

If the value is 0, the system hangs, which allows you to collect detailed information about the panic for troubleshooting. This is the default value.

To enable automatic reset, set a non-zero value. If you require a memory image (vmcore), allow enough time for Kdump to create this image. The suggested value is 30 seconds, although large systems will require a longer time.

panic_on_oops

Specifies that a system must panic if a kernel oops occurs. If a kernel thread required for cluster operation crashes, the system must reset itself. Otherwise, another node might not be able to tell whether a node is slow to respond or unable to respond, causing cluster operations to hang.

On each node, enter the following commands to set the recommended values for panic and panic_on_oops:

# sysctl kernel.panic = 30
# sysctl kernel.panic_on_oops = 1

To make the change persist across reboots, add the following entries to the /etc/sysctl.conf file:

# Define panic and panic_on_oops for cluster operation
kernel.panic = 30
kernel.panic_on_oops = 1

21.2.7 Starting and Stopping the Cluster Stack

The following table shows the commands that you can use to perform various operations on the cluster stack.

Command

Description

service o2cb status

Check the status of the cluster stack.

service o2cb online

Start the cluster stack.

service o2cb offline

Stop the cluster stack.

service o2cb unload

Unload the cluster stack.

21.2.8 Creating OCFS2 volumes

You can use the mkfs.ocfs2 command to create an OCFS2 volume on a device. If you want to label the volume and mount it by specifying the label, the device must correspond to a partition. You cannot mount an unpartitioned disk device by specifying a label. The following table shows the most useful options that you can use when creating an OCFS2 volume.

Command Option

Description

-b block-size

--block-size block-size

Specifies the unit size for I/O transactions to and from the file system, and the size of inode and extent blocks. The supported block sizes are 512 bytes, 1 KB, 2 KB, and 4 KB. The default and recommended block size is 4K (4 KB).

-C cluster-size

--cluster-size cluster-size

Specifies the unit size for space used to allocate file data. The supported cluster sizes are 4KB, 8KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, and 1 MB. The default cluster size is 4K (4 KB). If you intend the volume to store database files, do not specify a cluster size that is smaller than the block size of the database.

--fs-feature-level=feature-level

Allows you select a set of file-system features:

default

Enables support for the sparse files, unwritten extents, and inline data features.

max-compat

Enables only those features that are understood by older versions of OCFS2.

max-features

Enables all features that OCFS2 currently supports.

--fs_features=feature

Allows you to enable or disable individual features such as support for sparse files, unwritten extents, and backup superblocks. For more information, see the mkfs.ocfs2(8) manual page.

-J size=journal-size

--journal-options size=journal-size

Specifies the size of the write-ahead journal. If not specified, the size is determined from the file system usage type that you specify to the -T option, and, otherwise, from the volume size. The default size of the journal is 64M (64 MB) for datafiles, 256M (256 MB) for mail, and 128M (128 MB) for vmstore.

-L volume-label

--label volume-label

Specifies a descriptive name for the volume that allows you to identify it easily on different cluster nodes.

-N number

--node-slots number

Determines the maximum number of nodes that can concurrently access a volume, which is limited by the number of node slots for system files such as the file-system journal. For best performance, set the number of node slots to at least twice the number of nodes. If you subsequently increase the number of node slots, performance can suffer because the journal will no longer be contiguously laid out on the outer edge of the disk platter.

-T file-system-usage-type

Specifies the type of usage for the file system:

datafiles

Database files are typically few in number, fully allocated, and relatively large. Such files require few metadata changes, and do not benefit from having a large journal.

mail

Mail server files are typically many in number, and relatively small. Such files require many metadata changes, and benefit from having a large journal.

vmstore

Virtual machine image files are typically few in number, sparsely allocated, and relatively large. Such files require a moderate number of metadata changes and a medium sized journal.

For example, create an OCFS2 volume on /dev/sdc1 labeled as myvol using all the default settings for generic usage (4 KB block and cluster size, eight node slots, a 256 MB journal, and support for default file-system features).

# mkfs.ocfs2 -L "myvol" /dev/sdc1

Create an OCFS2 volume on /dev/sdd2 labeled as dbvol for use with database files. In this case, the cluster size is set to 128 KB and the journal size to 32 MB.

# mkfs.ocfs2 -L "dbvol" -T datafiles /dev/sdd2

Create an OCFS2 volume on /dev/sde1 with a 16 KB cluster size, a 128 MB journal, 16 node slots, and support enabled for all features except refcount trees.

# mkfs.ocfs2 -C 16K -J size=128M -N 16 --fs-feature-level=max-features \
  --fs-features=norefcount /dev/sde1
Note

Do not create an OCFS2 volume on an LVM logical volume. LVM is not cluster-aware.

You cannot change the block and cluster size of an OCFS2 volume after it has been created. You can use the tunefs.ocfs2 command to modify other settings for the file system with certain restrictions. For more information, see the tunefs.ocfs2(8) manual page.

21.2.9 Mounting OCFS2 Volumes

As shown in the following example, specify the _netdev option in /etc/fstab if you want the system to mount an OCFS2 volume at boot time after networking is started, and to unmount the file system before networking is stopped.

myocfs2vol  /dbvol1  ocfs2     _netdev,defaults  0 0
Note

The file system will not mount unless you have enabled the o2cb and ocfs2 services to start after networking is started. See Section 21.2.5, “Configuring the Cluster Stack”.

21.2.10 Querying and Changing Volume Parameters

You can use the tunefs.ocfs2 command to query or change volume parameters. For example, to find out the label, UUID and the number of node slots for a volume:

# tunefs.ocfs2 -Q "Label = %V\nUUID = %U\nNumSlots =%N\n" /dev/sdb
Label = myvol
UUID = CBB8D5E0C169497C8B52A0FD555C7A3E
NumSlots = 4

Generate a new UUID for a volume:

# tunefs.ocfs2 -U /dev/sda
# tunefs.ocfs2 -Q "Label = %V\nUUID = %U\nNumSlots =%N\n" /dev/sdb
Label = myvol
UUID = 48E56A2BBAB34A9EB1BE832B3C36AB5C
NumSlots = 4