A Issues with Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1

The following are known issues deploying, updating, and managing Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1:

ODA_BASE is in read-only mode or cannot start
The /OVS directory is full and ODA_BASE is in read-only mode.
OAKERR:7007 Error encountered while starting VM
When starting a virtual machine (VM), an error message appears that the domain does not exist.
Unrecognized Token Messages Appear in /var/log/messages
After updating Oracle Database Appliance, unrecognized token messages appear in /var/log/messages.
Virtual Machine Task Blocked
After updating to Oracle Database Appliance 12.1.2.11.0, the IOs to local disks can get stuck and block tasks.
ERROR: ORA-27301 ASM alert error log shows resource problem in the OS
After upgrading to Oracle Database Appliance 12.1.2.11.0, the ASM alert error log shows failed processes due to a resource problem in the operating system.

ODA_BASE is in read-only mode or cannot start

The /OVS directory is full and ODA_BASE is in read-only mode.

The vmcore file in the /OVS/ var directory can cause the /OVS directory (Dom 0) to become 100% used. When Dom 0 is full, ODA_BASE is in read-only mode or cannot start.

Hardware Models

Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1

Workaround

Perform the following to correct or prevent this issue:

Periodically check the file usage on Dom 0 and clean up the vmcore file, as needed.
Edit the oda_base vm.cfg file and change the on_crash = 'coredump-restart' parameter to on_crash = 'restart'. Especially when ODA_BASE is using more than 200 GB (gigabytes) of memory.

OAKERR:7007 Error encountered while starting VM

When starting a virtual machine (VM), an error message appears that the domain does not exist.

If a VM was cloned in Oracle Database Appliance 12.1.2.10 or earlier, you cannot start the HVM domain VMs in Oracle Database Appliance 12.1.2.11.

This issue does not impact newly cloned VMs in Oracle Database Appliance 12.1.2.11 or any other type of VM cloned on older versions. The vm templates were fixed in 12.1.2.11.0.

When trying to start the VM (vm4 in this example), the output is similar to the following:

# oakcli start vm vm4 -d 
.
Start VM : test on Node Number : 0 failed.
DETAILS:
        Attempting to start vm on node:0=>FAILED.  
<OAKERR:7007 Error  encountered while starting VM -  Error: Domain 'vm4' does not exist.>

The following is an example of the vm.cfg file for vm4:

vif = ['']
name = 'vm4'
extra = 'NODENAME=vm4'
builder = 'hvm'
cpus = '0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23'
vcpus = 2
memory = 2048
cpu_cap = 0
vnc = 1
serial = 'pty'
disk =
[u'file:/OVS/Repositories/odarepo1/VirtualMachines/vm4/68c32afe2ba8493e89f018a
 
970c644ea.img,xvda,w']
maxvcpus = 2
maxmem = 2048

Hardware Models

Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1

Workaround

Delete the extra = 'NODENAME=vm_name' line from the vm.cfg file for the VM that failed to start.

Open the vm.cfg file for the virtual machine (vm) that failed to start.
- Dom0 : /Repositories/ vm_repo_name /.ACFS/snaps/ vm_name / VirtualMachines/ vm_name
- ODA_BASE : /app/sharedrepo/ vm_repo_name /.ACFS/snaps/ vm_name / VirtualMachines/ vm_name

Delete the following line: extra=’NODENAME=vmname’. For example, if virtual machine vm4 failed to start, delete the line extra = 'NODENAME=vm4'.

vif = ['']
name = 'vm4'
extra = 'NODENAME=vm4' 
builder = 'hvm'
cpus = '0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23'
vcpus = 2
memory = 2048
cpu_cap = 0
vnc = 1
serial = 'pty'
disk =
[u'file:/OVS/Repositories/odarepo1/VirtualMachines/vm4/68c32afe2ba8493e89f018a
 
970c644ea.img,xvda,w']
maxvcpus = 2
maxmem = 2048

Start the virtual machine on Oracle Database Appliance 12.1.2.11.0.
```
# oakcli start vm vm4
```

Unrecognized Token Messages Appear in /var/log/messages

After updating Oracle Database Appliance, unrecognized token messages appear in /var/log/messages.

Updating to Oracle Database Appliance 12.1.2.11.0 updates the Oracle VM Server version to 3.4.3. After updating, the following messages appear in /var/log/messages:

Unrecognized token: "max_seq_redisc"
Unrecognized token: "rereg_on_guid_migr"
Unrecognized token: "aguid_inout_notice"
Unrecognized token: "sm_assign_guid_func"
Unrecognized token: "reports"
Unrecognized token: "per_module_logging"
Unrecognized token: "consolidate_ipv4_mask"

You can ignore the messages for these parameters, they do not impact the InfiniBand compliant Subnet Manager and Administration (opensm) functionality. However, Oracle recommends removing the parameters to avoid flooding /var/log/messages.

Hardware Models

Oracle Database Appliance X6-2-HA and X5-2 with InfiniBand

Workaround

Perform the following to remove the parameters:

After patching, update the /etc/opensm/opensm.conf file in bare metal deployments and in Dom0 in virtualized platform environment to remove the parameters.

cat /etc/opensm/opensm.conf  | egrep -w
'max_seq_redisc|rereg_on_guid_migr|aguid_inout_notice|sm_assign_guid_func|repo
rts|per_module_logging|consolidate_ipv4_mask' | grep -v ^#
max_seq_redisc 0
rereg_on_guid_migr FALSE
aguid_inout_notice FALSE
sm_assign_guid_func uniq_count
reports 2
per_module_logging FALSE
consolidate_ipv4_mask 0xFFFFFFFF

Reboot. The messages will not appear after rebooting the node.

Virtual Machine Task Blocked

After updating to Oracle Database Appliance 12.1.2.11.0, the IOs to local disks can get stuck and block tasks.

The issue is cased by an Oracle Linux bug when using Oracle VM 3.4.3. All Oracle Database Appliance guest virtual machines that use multiple VLANS and have VDISKS might encounter this bug, causing the IO to hang. The problem can manifest itself in different ways, depending on which process gets stuck. For example, after deploying ODA_BASE, the untar command cannot proceed or virtual machines can hang.

Hardware Models

Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1 with guest virtual machines that use multiple VLANS and VDISKS.

Workaround

Oracle Database Appliance X6-2-HA and X5-2 Dom0 use grub2. For these models, perform the following to set the gnttab_max_frames to 256 on dom0 of both nodes:

Increase the gnttab_max_frames in the update /etc/default/grub file by changing the following line:

GRUB_CMDLINE_XEN="dom0_mem=max:4096MM allowsuperpage crashkernel=256M@64M extra_ guest_irqs=256,2048 nr_irqs=2048 dom0_vcpus_pin dom0_max_vcpus=20"

GRUB_CMDLINE_XEN="dom0_mem=max:4096MM allowsuperpage crashkernel=256M@64M extra_ guest_irqs=256,2048 nr_irqs=2048 dom0_vcpus_pin dom0_max_vcpus=20 gnttab_max_frames=256"

Create a new configuration file based on the changes.
```
grub2-mkconfig -o /boot/grub2/grub.cfg 
```
Reboot.
Repeat the process on Dom0 of the second node.

Oracle Database Appliance X4-2, X3-2, and V1 Dom0 uses grub1. For these models, perform the following to set the gnttab_max_frames to 256 in the xen hypervisor on Dom0 of both nodes:

Open the /boot/grub/grub.conf file in Dom0.

Add the line gnttab_max_frames=256 at the xen.gz command line.

For example, change the following line:

kernel /xen.gz dom0_mem=4096M crashkernel=256M@64M

kernel /xen.gz dom0_mem=4096M crashkernel=256M@64M gnttab_max_frames=256

Reboot.
Repeat the process on Dom0 of the second node.

ERROR: ORA-27301 ASM alert error log shows resource problem in the OS

After upgrading to Oracle Database Appliance 12.1.2.11.0, the ASM alert error log shows failed processes due to a resource problem in the operating system.

The failure indicates that buffer space is not available and is caused by High value of MTU for loop back adapter.

Hardware Models

Oracle Database Appliance X6-2-HA, X5-2, X4-2, X3-2, and V1

Workaround

Add the following line in the/etc/sysconfig/network-scripts/ifcfg-lo file.
```
MTU=16436
```
Save the file and restart the network service to load the changes.
```
# service network restart
```
Repeat the process on the second node.