7.3.4 Debugging File System Locks

If an OCFS2 volume hangs, you can use the following steps to help you determine which locks are busy and the processes that are likely to be holding the locks.

  1. Mount the debug file system.

    # mount -t debugfs debugfs /sys/kernel/debug

  2. Dump the lock statuses for the file system device (/dev/sdx1 in this example).

    # echo "fs_locks" | debugfs.ocfs2 /dev/sdx1 >/tmp/fslocks 62
    Lockres: M00000000000006672078b84822 Mode: Protected Read
    Flags: Initialized Attached
    RO Holders: 0 EX Holders: 0
    Pending Action: None Pending Unlock Action: None
    Requested Mode: Protected Read Blocking Mode: Invalid

    The Lockres field is the lock name used by the DLM. The lock name is a combination of a lock-type identifier, an inode number, and a generation number. The following table shows the possible lock types.

    Identifier

    Lock Type

    D

    File data.

    M

    Metadata.

    R

    Rename.

    S

    Superblock.

    W

    Read-write.

  3. Use the Lockres value to obtain the inode number and generation number for the lock.

    # echo "stat <M00000000000006672078b84822>" | debugfs.ocfs2 -n /dev/sdx1
    Inode: 419616   Mode: 0666   Generation: 2025343010 (0x78b84822)
    ... 

  4. Determine the file system object to which the inode number relates by using the following command.

    # echo "locate <419616>" | debugfs.ocfs2 -n /dev/sdx1
    419616 /linux-2.6.15/arch/i386/kernel/semaphore.c

  5. Obtain the lock names that are associated with the file system object.

    # echo "encode /linux-2.6.15/arch/i386/kernel/semaphore.c" | \
      debugfs.ocfs2 -n /dev/sdx1
    M00000000000006672078b84822 D00000000000006672078b84822 W00000000000006672078b84822  

    In this example, a metadata lock, a file data lock, and a read-write lock are associated with the file system object.

  6. Determine the DLM domain of the file system.

    # echo "stats" | debugfs.ocfs2 -n /dev/sdX1 | grep UUID: | while read a b ; do echo $b ; done
    82DA8137A49A47E4B187F74E09FBBB4B  

  7. Use the values of the DLM domain and the lock name with the following command, which enables debugging for the DLM.

    # echo R 82DA8137A49A47E4B187F74E09FBBB4B \
      M00000000000006672078b84822 > /proc/fs/ocfs2_dlm/debug  

  8. Examine the debug messages.

    # dmesg | tail
    struct dlm_ctxt: 82DA8137A49A47E4B187F74E09FBBB4B, node=3, key=965960985
      lockres: M00000000000006672078b84822, owner=1, state=0 last used: 0, 
      on purge list: no granted queue:
          type=3, conv=-1, node=3, cookie=11673330234144325711, ast=(empty=y,pend=n), 
          bast=(empty=y,pend=n) 
        converting queue:
        blocked queue:  

    The DLM supports 3 lock modes: no lock (type=0), protected read (type=3), and exclusive (type=5). In this example, the lock is mastered by node 1 (owner=1) and node 3 has been granted a protected-read lock on the file-system resource.

  9. Run the following command, and look for processes that are in an uninterruptable sleep state as shown by the D flag in the STAT column.

    # ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN

    At least one of the processes that are in the uninterruptable sleep state will be responsible for the hang on the other node.

If a process is waiting for I/O to complete, the problem could be anywhere in the I/O subsystem from the block device layer through the drivers to the disk array. If the hang concerns a user lock (flock()), the problem could lie in the application. If possible, kill the holder of the lock. If the hang is due to lack of memory or fragmented memory, you can free up memory by killing non-essential processes. The most immediate solution is to reset the node that is holding the lock. The DLM recovery process can then clear all the locks that the dead node owned, so letting the cluster continue to operate.