4 Servicing Storage Drives

This section describes how to service storage drives.

Storage drives are replaceable components that do not require you to power off the server before servicing. For more information about replaceable components, see Illustrated Parts Breakdown and Replaceable Components.

Note:

The procedures and illustrations in this chapter apply to both NVMe and SAS storage drives, except where noted.

The following sections describe how to remove and replace hard-disk drives (HDDs) and NVMe solid-state drives (SSD) storage drives.

Storage Drive Hot-Plug Conditions

The SAS hard-disk drives (HDDs) or NVMe solid-state drives (SSDs) that are installed in the server are in most cases hot-pluggable. The hot-plug capability depends on how the drives are configured and whether the drive is an NVMe device. To hot-plug a drive you must take the drive offline before you can remove it. When you take the drive offline, you prevent any application from accessing the drive and remove the logical software links to the drive. For an NVMe storage drive, you must not only take the drive offline, but also power down the drive slot.

The following conditions inhibit the ability to perform hot-plugging of a drive:

  • The drive provides the operating system, and the operating system is not mirrored on another drive.

  • The drive cannot be logically isolated from the online operations of the server.

  • The operating system does not support hot plug for the drive.

If any of these disk drive conditions are true, then you must shut down the system before you replace the drive. See Powering Down the Server.

Note:

Replacing a drive does not require extending or removing the server from a rack.

Storage Drive Failure and RAID

A single storage drive failure does not cause a data failure if you configured the storage drives as a mirrored RAID 1 volume (optional). The storage drive can be removed, and when a new storage drive is inserted, the contents are automatically rebuilt from the rest of the array with no need to reconfigure the RAID parameters. If you configured the replaced storage drive as a hot-spare, the new HDD is automatically configured as a new hot-spare.

See Configure NVMe RAID Using BRU for server instructions.

Remove a Storage Drive

  1. Prepare the system for the drive removal.
  2. Identify the location of the drive that you want to remove.

    For storage drive locations, see either NVMe Storage Drives or SAS Storage Drives.

    If NVMe storage drives are installed in the server front panel, they are labeled NVMe0 through NVMe3. Server operating systems may assign these storage drives different names.

    The following table shows sample corresponding names assigned by the operating systems. Example drive names provided in the table assume that the NVMe cabling between the motherboard and the NVMe disk backplane is correct.

    Storage Drive Labels Example Names Assigned by Server Operating Systems

    NVMe0

    PCIe Slot 100

    NVMe1

    PCIe Slot 101

    NVMe2

    PCIe Slot 102

    NVMe3

    PCIe Slot 103

  3. Remove the storage drive.
  4. Push the latch release button to open the drive latch [1, 2].

    Figure showing the location of the storage drive release button and latch.
    Callout Description

    1

    Pressing the latch release button.

    2

    Opening the latch.

    Caution:

    The latch is not an ejector. Do not open the latch too far to the right. Doing so can damage the latch.
  5. Grasp the latch and pull the drive out of the drive slot.

    Figure showing a storage drive being removed from the server.
  6. Consider your next steps:
    • If you are replacing the drive, continue to Install a Storage Drive.

    • If you are not replacing the drive, install a filler in the empty drive slot to maintain proper airflow and perform administrative tasks to configure the server to operate without the drive. See Remove and Install Filler Panels.

      1. Locate the vacant storage drive module slot in the server, and then ensure that the release lever on the filler panel is fully opened.

      2. Slide the filler panel into the vacant slot by pressing the middle of the filler panel faceplate with your thumb or finger.

        The release lever will close as it makes contact with the chassis. Do not slide the filler panel in all the way. Leave the filler panel out approximately 0.25 to 0.50 inch (6 to 12 mm) from the opening.

      3. Using your thumb or finger, press on the middle of the filler panel faceplate until the release lever engages with the chassis.

      4. Close the release lever until it clicks into place and is flush with the front of the server

Install a Storage Drive

  1. Remove the replacement drive from its packaging, and place the drive on an antistatic mat.
  2. If necessary, remove the drive filler panel.
    1. Locate the storage drive filler panel to be removed from the server.

    2. To unlatch the storage drive filler panel, pull the release lever, and tilt the lever out into a fully opened position.

    3. To remove the filler panel from the slot, hold the opened release lever, and gently slide the filler panel toward you.

    See Remove and Install Filler Panels.

  3. Align the replacement drive with the drive slot.

    The drive is physically addressed according to the slot in which it is installed. It is important to install a replacement drive in the same slot as the drive that you removed.

  4. Slide the drive into the slot until the drive is fully seated.

    Figure showing a storage drive being installed in the server.
  5. Close the drive latch to lock the drive in place.
  6. Perform administrative procedures to reconfigure the drive.

    The procedures that you perform at this point depend on how your data is configured. You might need to partition the drive, create file systems, load data from backups, or have the drive updated from a RAID configuration.

Removing and Replacing Storage Drives Using an OS

The following sections describe how to remove and replace an HDD or SSD storage drive using supported Operating Systems.

Removing and Replacing an NVMe Storage Drive Using Oracle Linux

The following sections describe how to remove and replace an NVMe storage drive on a server that is running the Oracle Linux operating system.

Unmount an NVMe Storage Drive
  1. Log in to Oracle Linux that is running on the server.
  2. Remove the NVMe storage device path.
    1. To find the PCIe addresses (Bus Device Function), type:

      # find /sys/devices |egrep 'nvme[0-9][0-9]'

      This command returns output similar to the following example, with the PCIe addresses in bold text:

      /sys/devices/pci0000:d7/0000:d7:02.0/0000:e3:00.0/0000:e4:07.0/0000:e8:00.0/nvme/nvme10
       /sys/devices/pci0000:d7/0000:d7:02.0/0000:e3:00.0/0000:e4:07.0/0000:e8:00.0/nvme/nvme10/uevent
       /sys/devices/pci0000:d7/0000:d7:02.0/0000:e3:00.0/0000:e4:07.0/0000:e8:00.0/nvme/nvme10/cntlid
    2. To obtain the slot number (APIC ID) for the bus address, type the following command to list the PCIe slot numbers with corresponding bus addresses:

      # egrep -H ‘.*’ /sys/bus/pci/slots/*/address

      This commands returns output similar to the following example, with the bus addresses for the corresponding NVMe instances in bold text.

      Note:

      In the following output, notice that the instance names for the NVMe drives do not correspond to the NVMe drive labels on the front of the server. That is, pci/slots/12/address: 0000:b2:00 corresponds to instance nvme0; however, on the front of the server, this drive is labeled NVMe2. For a table that shows the relationship between the pci/slot# and the NVMe storage drive label on the front of the server, see Server Operating System Names for NVMe Storage Drives.
      /sys/bus/pci/slots/10/address:0000:b8:00
       /sys/bus/pci/slots/11/address:0000:b6:00
       /sys/bus/pci/slots/12/address:0000:b2:00 (instance nvme0, pcie slot 12, drive label nvme2)
       /sys/bus/pci/slots/13/address:0000:b4:00 (instance nvme1, pcie slot 13, drive label nvme3)
    3. Disconnect all users from the NVMe drive and back up the NVMe drive data, as needed.
      1. Use the umount command to unmount any file systems that are mounted on the device.
      2. Remove the device from any multiple device (md) and Logical Volume Manager (LVM) volume using the device.
      3. If the device uses multipathing, run multipath -l and note all the paths to the device. Then, remove the multipathed device using the multipath -f device command.
      4. Run the blockdev --flushbufs device command to flush any outstanding I/O to all paths to the device.
  3. To prepare the NVMe drive for removal, that is, to detach the NVMe device driver and power off the NVMe drive slot, type:

    # echo 0 > /sys/bus/pci/slots/$slot/power

    Where $slot is the slot number obtained in Step 2.b above.

  4. Verify that the OK to Remove indicator (LED) on the NVMe drive is lit.
Remove an NVMe Storage Drive

Perform this procedure to physically remove an NVMe storage drive from the server.

  1. Identify the location of the NVMe drive that you want to remove.

    For storage drive locations, see NVMe Storage Drives.

  2. Verify that the OK to Remove indicator (LED) on the NVMe drive is lit.
  3. On the NVMe drive you plan to remove, push the latch release button to open the drive latch.
  4. Grasp the latch and pull the drive out of the drive slot.
  5. Consider your next steps:
Verify Removal of an NVMe Storage Drive
  1. To check the NVMe drive enumeration to verify that NVMe drive has been removed, type:

    # lspci -nnd :0a54

  2. View the command output and verify that the entry for the slot number that was disabled no longer appears.

    This command returns output similar to the following:

    86:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
    8d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
    d9:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
    e0:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
Install an NVMe Storage Drive

Perform this procedure to physically install an NVMe storage drive into the server.

Note:

After you physically remove an NVMe storage drive from the server, wait at least 10 seconds before installing a replacement drive.
  1. Remove the replacement drive from its packaging and place the drive on an antistatic mat.
  2. If necessary, remove the drive filler panel.
  3. Align the replacement drive with the drive slot.

    The drive is physically addressed according to the slot in which it is installed. It is important to install a replacement drive in the same slot as the drive that you removed.

  4. Slide the drive into the slot until the drive is fully seated.
  5. Close the drive latch to lock the drive in place.
Power On an NVMe Storage Drive and Attach a Device Driver
  1. To power on the slot and attach the device driver, type:

    # echo 1 > /sys/bus/pci/slots/$slot/power

    Where $slot is the slot number for the NVMe storage drive.

Verify Operation of an NVMe Storage Drive
  1. To verify that an NVMe drive is operating properly, do one of the following:
    • Check the /var/log/messages log file.

    • Type: ls -l /dev/nvme*