Deploy Lustre File System

Deploying OCI File Storage with Lustre comprises the following steps:
  • Creating an OCI File Storage with Lustre file system
  • Mounting the file system
  • Configuring LNet
  • Testing read and write operations

Create Lustre File System

The following steps walkthrough how to create an OCI File Storage with Lustre file system.

  1. In the OCI Console navigation menu, click Storage, and then click Lustre file systems.
  2. Click Create.
  3. Configure the Lustre file system details:
    • File system name: Accept the default name or enter a friendly name for the file system.
    • Mount name: Accept the default or enter a friendly name for use when mounting the file system.
    • File system description: (Optional) Enter a description for the file system.
    • Compartment: Accept the default compartment, or select the list to change compartments.
    • Availability domain: Accept the default availability domain, or select the list to change availability domains.
    • Tags: If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. If you're not sure whether to apply tags, skip this option or ask an administrator. You can apply tags later.
    • Cluster placement group: We recommend selecting a cluster placement group to physically place file systems and other resources together in an availability domain to ensure low latency. The cluster placement group can't be changed after the file system is created.
    • Performance tier: Select the performance tier for the file system. The performance tier controls the throughput of the file system. Throughput is specified per terabyte (TB) of provisioned storage. The performance tier can't be changed after the file system is created.
    • Capacity: Select the storage capacity of the file system. If you select a capacity larger than the service limit, you might be prompted to request an increase.

      The aggregate throughput of the file system is calculated from performance tier and capacity.

    • Networking: Select the VCN and subnet in which to mount the file system. By default, the Console shows a list of VCNs and subnets in the compartment you're working in. Change compartments to select a VCN or subnet from a different compartment.

      The subnet selected will be the subnet used to deploy the Lustre servers. Ensure this subnet allows communication to and from Lustre server side port 988 with client side port in between port 512-1023.

    • Use network security groups to control traffic: Enable this option and select a network security group (NSG) to act as a virtual firewall for the file system. Select + Another network security group to add the file system to up to five NSGs.
    • Root squash: These settings control whether clients accessing the file system have their User ID (UID) and Group ID (GID) remapped to Squash UID and Squash GID.

      Squash: Select None or Root. The default value is None, so no remapping is done by default.

      Squash UID: If Squash is set to Root, the root user (UID 0) is remapped to this value. The default value is 65534.

      Squash GID: If Squash is set to Root, the root group user (GID 0) is remapped to this value. The default value is 65534.

    • Root squash exceptions: To exclude specific clients from the root squash configuration, enter their Client address as a valid IP address or range. For example, 10.0.2.4 or 10.0.[2-10].[1-255]. Select + Another client address to add up to 10 exceptions.
    • Encryption key: By default, Oracle manages the keys that encrypt a file system. If you want greater control over the key's lifecycle and how it's used, you can select your own Vault encryption key
After creation starts, it will take many minutes to complete the provisioning of the back-end resources. The time varies depending on the performance tier and size of the file system.

Mount Lustre File System

  1. Navigate to the Lustre file system details and find the mount command from the Console.
  2. From a Linux client with Lustre client modules installed, you can mount the file system. In this client, Lustre DKMS modules are installed.
    [root@lustre-ol8-client ~]# rpm -qa | grep lustre
    lustre-client-2.15.5_oci1-1.el8.x86_64
    lustre-client-dkms-2.15.5_oci1-1.el8.noarch

    In some environments with GPU hosts with RDMA networking managed by Oracle Cloud Agent, adding the LNet interface injects a rule that prevents the host communication through the default interface (first interface of the host). This can cause problems in environments that use local IP for internal communication, especially in OKE environments. The immediate system is that the ping to local IP will not work. To fix this, use the following workaround to disable LNet from adding additional routes and rules.

    Do this before adding the LNet interface. if the LNet interface is not getting added explicitly, apply this before mounting the file system.

    echo 'options ksocklnd skip_mr_route_setup=1' >/etc/modprobe.d/lnet.conf && rmmod lnet; modprobe -v lnet
  3. Unload the Lustre modules (if it is already loaded) to get a clean start. The unload of the modules will not work if there is a file system already mounted. If lustre_rmmod doesn't unload all the modules, reboot the system to get a fresh start.
    [root@lustre-ol8-client ~]# lustre_rmmod 
  4. Mount the Lustre file system. If a specific interface should be configured for Lustre, use the lnetctl configuration before attempting to load modules. This is explained in the next section.
    [root@lustre-ol8-client ~]# modprobe lustre
    [root@lustre-ol8-client ~]# mount -t lustre 10.0.3.8@tcp:/lustrefs /mnt/mymountpoint

    If the mount fails with the error message No such file or directory. Is the MGS specification correct?, it is likely that it is unable to auto-configure LNet (Lustre networking) due to multiple interfaces, or other networking situations in the host. When this happens, you need to manually configure LNet.

    The error resembles:

    mount.lustre: mount 10.0.3.8@tcp:/lustrefs at /mnt/mymountpoint failed: No such file or directory
    Is the MGS specification correct?
    Is the filesystem name correct?
    If upgrading, is the copied client log valid? (see upgrade docs)

Configure LNet (Lustre Networking)

The following steps are only required if the steps to mount the file system fail, and the correct interface for Lustre doesn't display in lnetctl net show. If lnetctl net show displays the correct interface for Lustre, these steps are not required.
  1. Find the interface name. You may have multiple interfaces in the system. Pick the appropriate interface that is close to the Lustre file system. Use the ip route command to display the networks in the system, and find an interface that is more appropriate to reach the Lustre file system. This will be the interface on which major data transfer happens. Here, the interface used is enp0s5.
    [root@lustre-ol8-client ~]# ip addr
    The output resembles:
    ...
    2: enp0s5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
        link/ether 02:00:17:12:91:e9 brd ff:ff:ff:ff:ff:ff
        inet 10.0.3.42/24 brd 10.0.3.255 scope global dynamic enp0s5
           valid_lft 84865sec preferred_lft 84865sec
        inet6 fe80::17ff:fe12:91e9/64 scope link
           valid_lft forever preferred_lft forever
  2. Run the following commands:
    [root@lustre-ol8-client opc]# modprobe lnet
    [root@lustre-ol8-client opc]# lnetctl lnet configure
    [root@lustre-ol8-client opc]# lnetctl net add --net tcp --if enp0s5
    [root@lustre-ol8-client opc]# lnetctl net show
    The output resembles:
    net:
        - net type: lo
          local NI(s):
            - nid: 0@lo
              status: up
        - net type: tcp
          local NI(s):
            - nid: 10.0.3.42@tcp  <<<<<<<<<
              status: up
              interfaces:
                  0: enp0s5   <<<<<<<<

Mount the File System

  1. Run the commands:
    [root@lustre-ol8-client opc]# mount -t lustre 10.0.3.8@tcp:/lustrefs /mnt/mymountpoint
    [root@lustre-ol8-client opc]# df -h /mnt/mymountpoint
    The output resembles:
    Filesystem Size Used Avail Use% Mounted on
    10.0.3.8@tcp:/lustrefs 31T 17M 30T 1% /mnt/mymountpoint
  2. Run the command:
    [root@lustre-ol8-client opc]# lfs df -h /mnt/mymountpoint
    The output resembles:
    UUID bytes Used Available Use% Mounted on
    lustrefs-MDT0000_UUID 563.4G 33.8M 513.4G 1% /mnt/mymountpoint[MDT:0]
    lustrefs-MDT0001_UUID 563.4G 33.7M 513.4G 1% /mnt/mymountpoint[MDT:1]
    lustrefs-OST0000_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:0]
    lustrefs-OST0001_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:1]
    lustrefs-OST0002_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:2]
    lustrefs-OST0003_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:3]
    lustrefs-OST0004_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:4]
    lustrefs-OST0005_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:5]
    lustrefs-OST0006_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:6]
    lustrefs-OST0007_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:7]
    lustrefs-OST0008_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:8]
    lustrefs-OST0009_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:9]
    lustrefs-OST000a_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:10]
    lustrefs-OST000b_UUID 2.6T 1.4M 2.4T 1% /mnt/mymountpoint[OST:11]
    
    filesystem_summary: 30.9T 16.4M 29.3T 1% /mnt/mymountpoint

Test Read and Write Operations

The File Storage with Lustre service metrics help you measure the performance, capacity, and health of a file system. You can use metrics data to diagnose and troubleshoot issues.

We'll test read and writes by running a dd command, and then navigating to the metrics page to view the metrics dashboard.

  1. Run a dd command that resembles:
    [root@hpc-client-0 test]# dd if=/dev/zero of=10G_file  bs=1M count=10240 oflag=direct ;sleep 120 ; dd of=/dev/null if=10G_file  bs=1M count=10240 iflag=direct
    The output resembles:
    10240+0 records in
    10240+0 records out
    10737418240 bytes (11 GB, 10 GiB) copied, 40.6989 s, 264 MB/s
    10240+0 records in
    10240+0 records out
    10737418240 bytes (11 GB, 10 GiB) copied, 39.2326 s, 274 MB/s
    In this example, we have a 1MB block size.
  2. To view a default set of metrics charts in the Console, click the navigation menu and click Storage.
  3. Under Lustre, select File Systems with Lustre.
  4. Select the file system for which you want to view metrics.
  5. On the details page, under Resources, select Metrics.
The Metrics page displays a default set of charts for the current file system. File Storage with Lustre provides the following metrics:
  • ReadThroughput: Expressed in bytes read per minute.
  • WriteThroughput: Expressed in bytes read per minute.
  • DataReadOperations: Number of Read operations per minute.
  • DataWriteOperation: Number of Write operations per minute.
  • MetadataOperations: Number of Metadata operations. 14 different metadata operations available dimension such as getattr, setattr, mknod, link, unlink, mkdir, and so on.
  • FileSystemCapacity: Total and available capacity of the file system.
  • FileSystemInodeCapacity: Total and available inodes of the file system.

These metrics can be explored using the OCI Metric Explorer oci_lustrefilesystem namespace as shown in the screenshot below.


Description of lustre-metrics-explorer.png follows
Description of the illustration lustre-metrics-explorer.png

Here is an example of how queries in monitoring query language (MQL) can be used in the metrics explorer, or other dashboards such as Grafana.

To get read throughput:

ReadThroughput[1m]{resourceId = "your_filesystem_ocid", targetType = "OST", clientName ="all@all"}.grouping().sum()/60

Note:

60 is used because the interval here is 1 minute. Similarly, you can use MQL for other metrics.