G Manual Disk Configuration Steps
It is recommended that you run the bdaconfiguredisk
utility in order to resolve disk errors and to configure or reconfigure a data or operating systems disk on Oracle Big Data Appliance. The utility automates the process. These manual steps should not be needed, but are included as a fallback that you can refer to if necessary.
G.1 Identifying the Function of a Disk Drive
The server with the failed disk is configured to support either HDFS or Oracle NoSQL Database, and most disks are dedicated to that purpose. However, two disks are dedicated to the operating system. Before configuring the new disk, find out how the failed disk was configured.
Oracle Big Data Appliance is configured with the operating system on the first two disks.
To confirm that a failed disk supported the operating system:
-
Check whether the replacement disk corresponds to
/dev/sda
or/dev/sdb
, which are the operating system disks.# lsscsi
See the output from Step 11 of "Replacing a Disk Drive".
-
Verify that
/dev/sda
and/dev/sdb
are the operating system mirrored partitioned disks:# mdadm -Q –-detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Mon Jul 22 22:56:19 2013 Raid Level : raid1 . . . Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2
-
If the previous steps indicate that the failed disk is an operating system disk, then proceed to "Configuring an Operating System Disk" .
G.2 Configuring an Operating System Disk
The first two disks support the Linux operating system. These disks store a copy of the mirrored operating system, a swap partition, a mirrored boot partition, and an HDFS data partition.
To configure an operating system disk, you must copy the partition table from the surviving disk, create an HDFS partition (ext4 file system), and add the software raid partitions and boot partitions for the operating system.
Complete these procedures after replacing the disk in either slot 0 or slot 1.
G.2.2 Repairing the RAID Arrays
After partitioning the disks, you can repair the two logical RAID arrays:
-
/dev/md0
contains/dev/disk/by-hba-slot/s0p1
and/dev/disk/by-hba-slot/s1p1
. It is mounted as/boot
. -
/dev/md2
contains/dev/disk/by-hba-slot/s0p2
and/dev/disk/by-hba-slot/s1p2
. It is mounted as/
(root).
Caution:
Do not dismount the /dev/md
devices, because that action shuts down the system.
To repair the RAID arrays:
-
Remove the partitions from the RAID arrays:
# mdadm /dev/md0 -r detached # mdadm /dev/md2 -r detached
-
Verify that the RAID arrays are degraded:
# mdadm -Q –-detail /dev/md0 # mdadm -Q –-detail /dev/md2
-
Verify that the degraded file for each array is set to 1:
# cat /sys/block/md0/md/degraded 1 # cat /sys/block/md2/md/degraded 1
-
Restore the partitions to the RAID arrays:
# mdadm –-add /dev/md0 /dev/disk/by-hba-slot/snp1 # mdadm –-add /dev/md2 /dev/disk/by-hba-slot/snp2
-
Check that resynchronization is started, so that
/dev/md2
is in a state of recovery and not idle:# cat /sys/block/md2/md/sync_action repair
-
To verify that resynchronization is proceeding, you can monitor the
mdstat
file. A counter identifies the percentage complete.# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 204736 blocks [2/2] [UU] md2 : active raid1 sdb2[2] sda2[0] 174079936 blocks [2/1] [U_] [============>........] recovery = 61.6% (107273216/174079936) finish=18.4min speed=60200K/sec
The following output shows that synchronization is complete:
Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 204736 blocks [2/2] [UU] md2 : active raid1 sdb2[1] sda2[0] 174079936 blocks [2/2] [UU] unused devices: <none>
-
Display the content of /etc/mdadm.conf:
# cat /etc/mdadm.conf # mdadm.conf written out by anaconda DEVICE partitions MAILADDR root ARRAY /dev/md0 level=raid1 num-devices=2 UUID=df1bd885:c1f0f9c2:25d6... ARRAY /dev/md2 level=raid1 num-devices=2 UUID=6c949a1a:1d45b778:a6da...
-
Compare the output of the following command with the content of /etc/mdadm.conf from Step 7:
# mdadm --examine --brief --scan --config=partitions
-
If the UUIDs in the file are different from UUIDs in the output of the
mdadm
command:-
Open /etc/mdadm.conf in a text editor.
-
Select from ARRAY to the end of the file, and delete the selected lines.
-
Copy the output of the command into the file where you deleted the old lines.
-
Save the modified file and exit.
-
-
Complete the steps in "Formatting the HDFS Partition of an Operating System Disk".
G.2.3 Formatting the HDFS Partition of an Operating System Disk
Partition 4 (sda4) on an operating system disk is used for HDFS. After you format the partition and set the correct label, HDFS rebalances the job load to use the partition if the disk space is needed.
To format the HDFS partition:
-
Format the HDFS partition as an ext4 file system:
# mkfs -t ext4 /dev/disk/by-hba-slot/snp4
Note:
If this command fails because the device is mounted, then dismount the drive now and skip step 3. See "Prerequisites for Replacing a Failing Disk" for dismounting instructions.
-
Verify that the partition label (such as
/u01
fors0p4
) is missing:# ls -l /dev/disk/by-label
-
Dismount the appropriate HDFS partition, either
/u01
for/dev/sda
, or/u02
for/dev/sdb
:# umount /u0n
-
Reset the partition label:
# tune2fs -c -1 -i 0 -m 0.2 -L /u0n /dev/disk/by-hba-slot/snp4
-
Mount the HDFS partition:
# mount /u0n
-
Complete the steps in "Restoring the Swap Partition".
G.2.4 Restoring the Swap Partition
To restore the swap partition:
-
Set the swap label:
# mkswap -L SWAP-sdn3 /dev/disk/by-hba-slot/snp3 Setting up swapspace version 1, size = 12582907 kB LABEL=SWAP-sdn3, no uuid
-
Verify that the swap partition is restored:
# bdaswapon; bdaswapoff Filename Type Size Used Priority /dev/sda3 partition 12287992 0 1 /dev/sdb3 partition 12287992 0 1
-
Verify that the replaced disk is recognized by the operating system:
$ ls -l /dev/disk/by-label total 0 lrwxrwxrwx 1 root root 10 Aug 3 01:22 BDAUSB -> ../../sdn1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 BDAUSBBOOT -> ../../sdm1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 SWAP-sda3 -> ../../sda3 lrwxrwxrwx 1 root root 10 Aug 3 01:22 SWAP-sdb3 -> ../../sdb3 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u01 -> ../../sda4 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u02 -> ../../sdb4 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u03 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u04 -> ../../sdd1 . . .
-
If the output does not list the replaced disk:
-
On Linux 6, run
udevadm trigger
.
Then repeat step 3. The
lsscsi
command should also report the correct order of the disks. -
-
Complete the steps in "Restoring the GRUB Master Boot Records and HBA Boot Order".
G.2.5 Restoring the GRUB Master Boot Records and HBA Boot Order
After restoring the swap partition, you can restore the Grand Unified Bootloader (GRUB) master boot record.
The device.map
file maps the BIOS drives to operating system devices. The following is an example of a device map file:
# this device map was generated by anaconda (hd0) /dev/sda (hd1) /dev/sdb
However, the GRUB device map does not support symbolic links, and the mappings in the device map might not correspond to those used by /dev/disk/by-hba-slot
. The following procedure explains how you can correct the device map if necessary.
To restore the GRUB boot record:
-
Check which kernel device the drive is using in slot1
# ls -ld /dev/disk/by-hba-slot/s1 lrwxrwxrwx 1 root root 9 Apr 22 12:54 /dev/disk/by-hba-slot/s1 -> ../../sdb
-
If the output displays
/dev/sdb
as shown in step 1, then proceed to the next step (open GRUB).If another device is displayed, such as
/dev/sdn
, then you must first sethd1
to point to the correct device:-
Make a copy of the
device.map
file:# cd /boot/grub # cp device.map mydevice.map # ls -l *device* -rw-r--r-- 1 root root 85 Apr 22 14:50 device.map -rw-r--r-- 1 root root 85 Apr 24 09:24 mydevice.map
-
Edit
mydevice.map
to pointhd1
to the new device. In this example,s1
pointed to/deb/sdn
in step 1.# more /boot/grub/mydevice.map # this device map was generated by bda install (hd0) /dev/sda (hd1) /dev/sdn
-
Use the edited device map (
mydevice.map
) in the remaining steps.
-
-
Open GRUB, using either
device.map
as shown, or the editedmydevice.map
:# grub --device-map=/boot/grub/device.map GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ]
-
Set the root device, entering hd0 for
/dev/sda
, or hd1 for/dev/sdb
:grub> root (hdn,0) root (hdn,0) Filesystem type is ext2fs, partition type 0x83
-
Install GRUB, entering hd0 for
/dev/sda
, or hd1 for/dev/sdb
:grub> setup (hdn) setup (hdn) Checking if "/boot/grub/stage1" exists... no Checking if "/grub/stage1" exists... yes Checking if "/grub/stage2" exists... yes Checking if "/grub/e2fs_stage1_5" exists... yes Running "embed /grub/e2fs_stage1_5 (hdn)"... failed (this is not fatal) Running "embed /grub/e2fs_stage1_5 (hdn,0)"... failed (this is not fatal) Running "install /grub/stage1 (hdn) /grub/stage2 p /grub/grub.conf "... succeeded Done.
-
Close the GRUB command-line interface:
grub> quit
-
Ensure that the boot drive in the HBA is set correctly:
# MegaCli64 /c0 show bootdrive
If BootDrive VD:0 is set, the command output is as follows:
Controller = 0 Status = Success Description = None Controller Properties : ===================== ---------------- Ctrl_Prop Value ---------------- BootDrive VD:0 ----------------
If BootDrive VD:0 is not set, the command output shows
No Boot Drive
:Controller = 0 Status = Success Description = None Controller Properties : ===================== ---------------- Ctrl_Prop Value ---------------- BootDrive No Boot Drive ----------------
-
If
MegaCli64 /c0 show bootdrive
reports that the boot drive is not set, then set it as follows:# MegaCli64 /c0/v0 set bootdrive=on Controller = 0 Status = Success Description = None Detailed Status : =============== ----------------------------------------- VD Property Value Status ErrCd ErrMsg ----------------------------------------- 0 Boot Drive On Success 0 - ------------------------------------------
-
Verify that the boot drive is now set:
# MegaCli64 /c0 show bootdrive Controller = 0 Status = Success Description = None Controller Properties : ===================== ---------------- Ctrl_Prop Value ---------------- BootDrive VD:0 ----------------
-
Ensure that the auto-select boot drive feature is enabled:
# MegaCli64 adpBIOS EnblAutoSelectBootLd a0 Auto select Boot is already Enabled on Adapter 0.
-
Check the configuration. See "Verifying the Disk Configuration" .
G.3 Configuring an HDFS or Oracle NoSQL Database Disk
Complete the following instructions for any disk that is not used by the operating system. See "Identifying the Function of a Disk Drive".
To configure a disk, you must partition and format it.
Note:
Replace snp1 in the following commands with the appropriate symbolic name, such as s4p1.
To format a disk for use by HDFS or Oracle NoSQL Database:
-
Complete the steps in "Replacing a Disk Drive", if you have not done so already.
-
Partition the drive:
# parted /dev/disk/by-hba-slot/sn -s mklabel gpt mkpart primary ext4 0% 100%
-
Format the partition for an ext4 file system:
# mkfs -t ext4 /dev/disk/by-hba-slot/snp1
-
Reset the appropriate partition label to the missing device. See Table 13-2.
# tune2fs -c -1 -i 0 -m 0.2 -L /unn /dev/disk/by-hba-slot/snp1
For example, this command resets the label for
/dev/disk/by-hba-slot/s2p1
to/u03
:# tune2fs -c -1 -i 0 -m 0.2 -L /u03 /dev/disk/by-hba-slot/s2p1 Setting maximal mount count to -1 Setting interval between checks to 0 seconds Setting reserved blocks percentage to 0.2% (976073 blocks)
-
Verify that the replaced disk is recognized by the operating system:
$ ls -l /dev/disk/by-label total 0 lrwxrwxrwx 1 root root 10 Aug 3 01:22 BDAUSB -> ../../sdn1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 BDAUSBBOOT -> ../../sdm1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 SWAP-sda3 -> ../../sda3 lrwxrwxrwx 1 root root 10 Aug 3 01:22 SWAP-sdb3 -> ../../sdb3 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u01 -> ../../sda4 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u02 -> ../../sdb4 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u03 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Aug 3 01:22 u04 -> ../../sdd1 . . .
-
If the output does not list the replaced disk:
-
On Linux 6, run
udevadm trigger
.
Then repeat step 5. The
lsscsi
command should also report the correct order of the disks. -
-
Mount the HDFS partition, entering the appropriate mount point:
# mount /unn
For example,
mount /u03
. -
If you are configuring multiple drives, then repeat the previous steps.
-
If you previously removed a mount point in Cloudera Manager for an HDFS drive, then restore it to the list.
-
Open a browser window to Cloudera Manager. For example:
http://bda1node03.example.com:7180
-
Open Cloudera Manager and log in as
admin
. -
On the Services page, click hdfs.
-
Click the Instances subtab.
-
In the Host column, locate the server with the replaced disk. Then click the service in the Name column, such as datanode, to open its page.
-
Click the Configuration subtab.
-
If the mount point is missing from the Directory field, then add it to the list.
-
Click Save Changes.
-
From the Actions list, choose Restart.
-
-
If you previously removed a mount point from NodeManager Local Directories, then also restore it to the list using Cloudera Manager.
-
On the Services page, click Yarn.
-
In the Status Summary, click NodeManager.
-
From the list, click to select the NodeManager that is on the host with the failed disk.
-
Click the Configuration sub-tab.
-
If the mount point is missing from the NodeManager Local Directories field, then add it to the list.
-
Click Save Changes.
-
From the Actions list, choose Restart.
-
-
Check the configuration. See "Verifying the Disk Configuration" .
G.4 Verifying the Disk Configuration
Before you can reinstall the Oracle Big Data Appliance software on the server, you must verify that the configuration is correct on the new disk drive.
To verify the disk configuration:
-
Check the software configuration:
# bdachecksw
-
If there are errors, then redo the configuration steps as necessary to correct the problem.
-
Check the
/root
directory for a file namedBDA_REBOOT_SUCCEEDED
. -
If you find a file named
BDA_REBOOT_FAILED
, then read the file to identify and fix any additional problems. -
Use this script to generate a
BDA_REBOOT_SUCCEEDED
file:# /opt/oracle/bda/lib/bdastartup.sh
-
Verify that
BDA_REBOOT_SUCCEEDED
exists. If you still find aBDA_REBOOT_FAILED
file, then redo the previous steps.