21 Troubleshooting Data Preserving Reprovisioning on Oracle Database Appliance
Understand tools you can use to validate changes and troubleshoot issues that may occur when using Data Preserving Reprovisioning on Oracle Database Appliance.
- Errors When Running odaupgradeutil on Oracle Database Appliance
Troubleshooting errors that may occur during initialization of the utility. - Errors Detected by the Prechecks Option of the Upgrade Utility
Troubleshooting errors that are detected by the prechecks option of theodaupgradeutil
utility. - Errors When Running odaupgradeutil detach-node Command on Oracle Database Appliance
Troubleshooting errors that may occur when running theodaupgradeutil detach-node
command. - Errors When Running odacli restore-node Command on Oracle Database Appliance
Troubleshooting errors that may occur when running theodacli restore-node
command.
Errors When Running
odaupgradeutil
on Oracle Database Appliance
Troubleshooting errors that may occur during initialization of the utility.
Errors that can occur during
initialization of the odaupgradeutil
utility
Cause of the failure:
odaupgradeutil
fails to initialize.
When you run the odaupgradeutil
tool, it discovers
basic parameters and saves them in
/opt/oracle/oak/restore/init.params
. In case of failures, the
error is reported on the screen and is also logged in
/opt/oracle/oak/restore/log/odaupgradeutil_init_timestamp.log
.
A successful run is as follows:
[root@node1 odaupgradeutil]# ./odaupgradeutil run-prechecks
Initializing...
########################## ODAUPGRADEUTIL - INIT - BEGIN ##########################
Please check /opt/oracle/oak/restore/log/odaupgradeutil_init_30-03-2022_22:30:28.log for details.
Get System Version...BEGIN
System Version is: 12.1.2.12.0
Get System Version...DONE
Get Hardware Info...BEGIN
Hardware Model: X5-2, Hardware Platform: HA
Get Hardware Info...DONE
Get Grid home...BEGIN
Grid Home is: /u01/app/12.1.0.2/grid
Get Grid home...DONE
Get system configuration details...BEGIN
Grid user is: grid
Oracle user is: oracle
Get system configuration details...DONE
########################## ODAUPGRADEUTIL - INIT - END ##########################
The failures can display as follows:
Initializing...
########################## ODAUPGRADEUTIL - INIT - BEGIN ##########################
Please check /opt/oracle/oak/restore/log/odaupgradeutil_init_30-03-2022_22:39:13.log for details.
Get System Version...BEGIN
System Version is: 12.1.2.12.0
Get System Version...DONE
Get Hardware Info...BEGIN
Hardware Model: X5-2, Hardware Platform: HA
Get Hardware Info...DONE
Get Grid home...BEGIN
Grid Home is: /u01/app/12.1.0.2/grid
Get Grid home...DONE
Get system configuration details...BEGIN
Exception occurred: Failed to find configured databases, Cause: Error processing command output: list.index(x): x not in list
As explained in the following occurrence, the log file indicates Oracle Clusterware was not running and hence the utility could not collect the information about the databases.
2022-03-30 22:39:13,956 - DEBUG - CMD: /opt/oracle/oak/bin/oakcli show databases
2022-03-30 22:39:14,461 - DEBUG - Output:
^[[1m^[[35mWARNING: ^[[0m2022-03-30 22:39:14: Clusterware is not running on one or more nodes of the cluster
^[[1m^[[34mINFO: ^[[0m2022-03-30 22:39:14: Start the clusterware before running this command again
Name Type Storage HomeName HomeLocation Version
----- ------ -------- -------------- ---------------- ----------
2022-03-30 22:39:14,461 - ERROR - Exception occurred:
Traceback (most recent call last):
File "/root/odaupgradeutil/src/init.py", line 291, in get_configuration
name_index = headers.index("Name")
ValueError: list.index(x): x not in list
2022-03-30 22:39:14,461 - ERROR - Exception occurred:
Traceback (most recent call last):
File "/root/odaupgradeutil/src/init.py", line 394, in main
get_configuration(ENV)
File "/root/odaupgradeutil/src/init.py", line 294, in get_configuration
raise UtilException("Failed to find configured databases", "Error processing command output: %s" % (str(e)))
UtilException: Failed to find configured databases, caused by: Error processing command output: list.index(x): x not in list
2022-03-30 22:39:14,462 - ERROR - Exception occurred:
Traceback (most recent call last):
File "/root/odaupgradeutil/src/init.py", line 415, in <module>
main()
File "/root/odaupgradeutil/src/init.py", line 398, in main
raise ue
UtilException: Failed to find configured databases, caused by: Error processing command output: list.index(x): x not in list
Resolution: Errors in this phase are typically caused by issues on the system. The causes include run-time errors related to Oracle Grid Infrastructure software or DCS software. You must review the causes, fix them, and rerun the command.
Errors Detected by the Prechecks Option of the Upgrade Utility
Troubleshooting errors that are detected by the prechecks option of the
odaupgradeutil
utility.
Prechecks run by the
odaupgradeutil
utility
odaupgradeutil
utility:[root@node1 odaupgradeutil]# ./odaupgradeutil describe-precheck-report
******************************
ODAUPGRADEUTIL
------------------------------
Version : 19.20.0.0.0
Build : 19.20.0.0.0.230629
******************************
COMPONENT STATUS MESSAGE ACTION
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SYSTEM VERSION PASSED PASSED NONE
SYSTEM CONFIG PASSED PASSED NONE
REQUIRED FILES PASSED PASSED NONE
DISK SPACE PASSED PASSED NONE
OAK PASSED PASSED NONE
ASM PASSED PASSED NONE
DATABASES PASSED PASSED NONE
AUDIT FILES WARNING Audit files found under ['/u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/audit', '/u01/app/ These files will be lost after reimage. Backup the audit files to a location outside the ODA
oracle/admin', '/var/log'] system.
OS RPMS PASSED PASSED NONE
Errors Detected by the System Version Precheck
- 12.1.2.12.0
- 12.2.1.4.0
- 18.3.0.0.0
- 18.5.0.0.0
- 18.7.0.0.0
- 18.8.0.0.0
COMPONENT STATUS MESSAGE ACTION
------------------------------------------------------------------
SYSTEM VERSION FAILED System version not supported. NONE
/opt/oracle/oak/restore/log/odaupgraeutil_precheck_timestamp.log
for the following log
entry:2022-03-30 23:00:21,276 - INFO - System version precheck...BEGIN
2022-03-30 23:00:21,276 - ERROR - System version found: 12.1.2.10.0, Supported system versions: ['12.1.2.12.0', '12.2.1.4.0', '18.3.0.0.0', '18.5.0.0.0', '18.7.0.0.0', '18.8.0.0.0']
2022-03-30 23:00:21,276 - INFO - System version precheck...FAILED
Resolution: Patch to a supported system version and then run the
command odaupgradeutil reinitialize
. This regenerates the metadata
in the init.params
file.
Errors Detected by the Required Files Precheck
COMPONENT STATUS MESSAGE ACTION
-------------------------------------------------------------------------------------------------------------------------------
...
REQUIRED FILES FAILED Required file /opt/oracle/extapi/asmappl.config not found. No advisable action. Unsafe to continue
Resolution: Investigate why the file is missing. Recreate the file manually with the right format and content. Contact Oracle Support, if needed.
Errors Detected by the Disk Space Precheck
Cause of the failure: Available space in Oracle ASM disk groups will potentially exhaust.
odacli restore-node
command will
be run in that target system. In such a target Oracle Database Appliance system, the
database homes are created on Oracle ACFS. Hence, the required space, at the time of
running odacli restore-node -d
can be deduced using the following
expression:
Space required for database homes = Number of database homes in the system to be upgraded X approximate space for each database home to be created on target version
odacli restore-node -d
command is about 15 GB. Additionally,
the database clones are also unzipped on Oracle ACFS. Therefore, the space consumed
on Oracle ACFS, after running the odacli update-repository
command
also consumes space from Oracle ACFS volume. The Oracle ACFS volume is created out
of the DATA disk group.
Total space required on Oracle ASM disk group = Space required for database homes + (Space required for database clones X Number of database clones needed)
COMPONENT STATUS MESSAGE ACTION
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
DISK SPACE FAILED Insufficient space for DB homes on DATA and RECO disk group(s). On target ODA environment, D Free up space on DATA or RECO disk group(s).
B homes are on ACFS for which space is allocated from ASM disk groups ( DATA or RECO for ODA
HA and DATA for ODA Lite ).Total space required is: 15GB * 1 (no. of DB homes on this syste
m) = 16106127360 MB. Usable space on DATA disk group = 6819749 MB. Usable space on RECO disk
group = 9022620 MB.
Insufficient space for clones repository on DATA disk group. On target ODA environment, clon Free up space on the DATA disk group.
es repository is mounted on ACFS for which space is allocated from the DATA disk group. Addi
tional space required = 16099461211 MB.
Resolution: The approximate space required is explained in the ACTION column of the report. The required space must be freed up on Oracle ASM disk group to ensure databases are restored successfully.
Errors Detected by the OAK and Oracle ASM Precheck
OAK FAILED Cluster is not online Please start the cluster
Failed to acquire disk info Check oakd status
Failed to get valid disk configurations for current hardware Check /opt/oracle/oak/restore/log/odaupgradeutil_prechecks_30-03-2022_23:30:24.log
ASM FAILED Cluster is not online Please start the cluster
Failed to run command /bin/su grid -c ' /opt/oracle/oak/bin/stordiag/asm_script.sh 1 6 ' | / Check /opt/oracle/oak/restore/log/odaupgradeutil_prechecks_30-03-2022_23:30:24.log for ASM e
bin/grep -E /dev/mapper/.*D_.*p.* rrors
Failed to get valid disk configurations for current hardware Check /opt/oracle/oak/restore/log/odaupgradeutil_prechecks_30-03-2022_23:30:24.log
Resolution: The ACTION column in the report point the log file names. They
must be reviewed and the causes of such issues must be fixed. After that, re-run the
command odaupgradeutil run-prechecks
to confirm that the validation
can be completed.
Errors Detected by the Databases Precheck
DATABASES FAILED Database mydb is not running. Investigate why the database is down. Fix the cause and start the database. The utility cannot collect metadata if database is not running.
Database tdb is not running. Investigate why the database is down. Fix the cause and start the database. The utility cannot collect metadata if database is not running.
Database test is not running. Investigate why the database is down. Fix the cause and start the database. The utility cannot collect metadata if database is not running.
Resolution: The database must be in a RUNNING status for the utility to collect metadata. The instances must be fixed and restarted.
Errors Detected by the Audit Files Precheck
AUDIT FILES WARNING Audit files found under
['/u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/audit', '/u01/app/ These files will be lost after reimage. Backup the audit files to a location outside the ODA oracle/product/12.1.0.2/dbhome_2/rdbms/audit',
'/u01/app/oracle/product/12.1.0.2/dbhome_3/rd system.
bms/audit', '/u01/app/oracle/admin', '/var/log']
Resolution: The check is a warning to alert users so that files can be copied
outside of the Oracle Database Appliance system. This alert does not prevent the
command odaupgradeutil detach-node
from running successfully.
Errors Detected by the Custom RPMs Precheck
If there are user-installed operating system RPMs on the system, this
precheck raises a warning and save the list on such RPMs under
/opt/oracle/oak/restore/prechecks/custom-rpms.list
. If these
RPMs are required to be present, you must reinstall the appropriate RPMs for Oracle
Linux 7 on the target environment.
Resolution: The check is a warning to alert users about the loss
of additional RPMs. The warning does not prevent the command odaupgradeutil
detach-node
from running successfully. You can re-install the RPMs
after the command odacli restore-node -d
runs successfully.
Errors When Running
odaupgradeutil detach-node
Command on Oracle Database Appliance
Troubleshooting errors that may occur when running the
odaupgradeutil detach-node
command.
Errors that can occur due to inability to discover databases
The command odaupgradeutil detach-node
has two phases
of operation. The first phase saves the configuration of the Oracle Database
Appliance system and the second performs the detach operation.
Potential causes of the failure: Inability to communicate with database or run time failures
not all
components discovered
. This is the generic error message when database
configuration discovery fails. The following error message is displayed and is also
available in the log at
/opt/oracle/oak/restore/log/odaupgradeutil_saveconf_timestamp.log
########################## ODAUPGRADEUTIL - SAVECONF - BEGIN ##########################
Please check /opt/oracle/oak/restore/log/odaupgradeutil_saveconf_04-06-2023_23:22:13.log for details.
Setting up passwordless SSH login on node2...BEGIN
root@node2's password:
Setting up passwordless SSH login...SUCCESS
Backup files to /opt/oracle/oak/restore/bkp...BEGIN
Backup files to /opt/oracle/oak/restore/bkp...SUCCESS
Get provision instance...BEGIN
Need to scan database homes for os user/group discovery
Get Database homes...BEGIN
Database Home: /u01/app/oracle/product/12.1.0.2/dbhome_1, Database Home Name: OraDb12102_home1, Database Home Version: 12.1.0.2.170814
Database Home: /u01/app/oracle/product/12.1.0.2/dbhome_2, Database Home Name: OraDb12102_home2, Database Home Version: 12.1.0.2.170814
Database Home: /u01/app/oracle/product/12.1.0.2/dbhome_3, Database Home Name: OraDb12102_home3, Database Home Version: 12.1.0.2.170814
Get Database homes...SUCCESS
Get provision instance...SUCCESS
Get network configuration...BEGIN
Get network configuration...SUCCESS
Get databases...BEGIN
Database Name: mydb
Oracle Home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Database Name: tdb
Oracle Home: /u01/app/oracle/product/12.1.0.2/dbhome_2
Database Name: test
Oracle Home: /u01/app/oracle/product/12.1.0.2/dbhome_3
Failed to find configuration info for database 'test'
Exception occurred: DB discovery failed, Cause: Not all components discovered for database 'test'
/opt/oracle/oak/restore/log/odaupgradeutil_saveconf_timestamp.log
for potential
causes.2022-04-04 23:23:06,274 - DEBUG - Target node found:
scaoda415c1n12023-06-04 23:23:06,275 - INFO - Could not find passwd file from
srvctl2023-06-04 23:23:06,275 - INFO - Database test is configured on local node, looking
for passwd file inside $ORACLE_HOME/dbs/ ...2023-06-04 23:23:06,275 - INFO - Looking for
file /u01/app/oracle/product/12.1.0.2/dbhome_3/dbs/orapwtest2022-04-04 23:23:06,275 -
ERROR - passwd file not found2022-04-04 23:23:06,275 - INFO - Failed to find configuration
info for database 'test'
For example, in the above case, the utility was looking for the database
password file orapwtest
in the $ORACLE_HOME/dbs/
directory, but could not find it. In this case, the password file must be created
with the same path name and then the odaupgradeutil detachnode
command should be re-run.
Errors in discovery of Oracle ACFS volumes or file systems
Get Volumes...BEGIN
Exception occurred: Volumes discovery failed, Cause: Volume on device '/dev/asm/datastore-2' is not running
Resolution: Use srvctl status volume
or
srvctl status filesystem
or the relevant CRSCTL commands to
investigate why the volume or file system is not available. Once the issues are
fixed, the volumes or file systems can be started by srvctl start
volume
or srvctl start filesystem
respectively. If the
issue persists, contact Oracle support.
Errors When Running odacli
restore-node
Command on Oracle Database Appliance
Troubleshooting errors that may occur when running the odacli
restore-node
command.
Error: Server archive files not unpacked
Cause of the failure: If the commandodacli restore-node -g
is run without unpacking the server
archives, the following error is displayed:
DCS-10001:Internal error encountered: Failed to get source system version from /opt/oracle/oak/restore/init.params: File does not exist.
Possible cause:update-repository was not done for server archives.
odacli
update-repository
with the server
archives./opt/oracle/dcs/bin/odacli update-repository -f serverarchivefile_node0, serverarchivefile_node1, serverarchive_common
Error: GI clone not unpacked
Cause of the failure: The Oracle Database Appliance release 19.20 Oracle Grid Infrastructure clone must be updated
in the repository before running the command odacli restore-node -g
or else an error message is displayed.
odacli update-repository
must be run with the Oracle Grid Infrastructure
clone:/opt/oracle/dcs/bin/odacli update-repository -f p30403673_1920000_Linux-x86-64.zip
To run the -g
option with the command odacli
restore-node
, only the Oracle Grid Infrastructure clone is required. It
is recommended to run the command odacli update-repository
with the
Oracle Grid Infrastructure clone. After the Oracle Grid Infrastructure clone is
updated in the repository, then run the command odacli
update-repository
with database clones. The Oracle Grid Infrastructure
restore creates an Oracle ACFS filesystem and mount the clones repo
/opt/oracle/oak/pkgrepos/clones
on it. This means that the
space availability becomes 150 GBs (size of the Oracle ACFS clones repo) and space
check failures does not occur while unpacking database clones.
Validation errors when running the
command odacli restore-node -g
Cause of the failure: Incorrectly configured public network.
serverarchive_nodename.zip
) contain the
configure-firstnet.rsp
file. The files can be viewed on a
terminal by running the command unzip -p serverarchive
restore/configure-firstnet.rsp
. The values in this file must be used to
run the command odacli configure-firstnet
to set up the public
network after reimage. If not done, the public network does not match the network on
the source and a validation error is
displayed:DCS-10045: Validation error encountered: No existing network matches this public network on (detached) source: string representation of the network.
Possible cause: configure-firstnet was not done correctly
Resolution: Delete all the networks in DCS metadata using the
command odacli list-networks
and odacli
delete-network
. Rerun the command odacli
configure-firstnet
using the values in
configure-firstnet.rsp
and restart DCS agent using
systemctl restart initdcsagent
. In some cases, another reimage
may be required.
Possibility of system reboot when
running the command odacli restore-node -g
This step also reconfigures the CPU core count on the target environment. If the number of active CPUs is equal to the number of licensed CPUs, then reboot does not happen. This is the case when you have licensed the maximum number of CPU cores. Else, if the licensed CPU count is lesser than that available on the Oracle Database Appliance system, the nodes restart to enable the CPU count at the BIOS level.
Validation errors when running the command odacli restore-node
-g
Potential causes of the failure: Runtime failures while setting up Oracle Grid Infrastructure
-g
option for the command odacli
restore-node
cannot be run again. Hence, if a failure is encountered
during this operation, then follow these steps:
- Locate the cause of failure using
odacli describe-job
and the dcs-agent log at/opt/oracle/dcs/log/dcs-agent.log
. - Use the utility
/opt/oracle/oak/onecmd/cleanup.pl
to clean up the failed system. Note that the nodes are restarted after the operation is successful.On an environment where
odacli restore-node -g
has been attempted, runningcleanup.pl
is as follows:[root@node1 ~]# /opt/oracle/oak/onecmd/cleanup.pl -griduser ygrid INFO: Log file is /opt/oracle/oak/log/node1/cleanup/cleanup_2023-06-22_11-46-28.log INFO: Log file is /opt/oracle/oak/log/node1/cleanup/dcsemu_diag_precleanup_2023-06-22_11-46-28.log INFO: ******************************************************************* INFO: ** Starting process to cleanup provisioned host scaoda7s002 ** INFO: ******************************************************************* WARNING: DPR environment detected. DPR specific cleanup involves WARNING: deconfiguring the ODA software stack without touching ASM WARNING: storage to allow rerunning of the 'odacli restore-node -g' WARNING: command. If regular cleanup(which erases ASM disk headers) WARNING: is intended, rerun cleanup.pl with '-nodpr' option. Do you want to continue (yes/no) : yes INFO: nodes will be rebooted Do you want to continue (yes/no) : yes INFO: /u01/app/19.20.0.0/ygrid/.patch_storage/33781359_Jan_27_2023_08_45_38/files/bin/crsctl.bin INFO: /u01/app/19.20.0.0/ygrid/.patch_storage/33529556_Jan_9_2023_21_15_36/files/bin/crsctl.bin INFO: /u01/app/19.20.0.0/ygrid/bin/crsctl.bin INFO: ************************************* INFO: ** Checking for GI bits presence INFO: ************************************* INFO: GI bits /u01/app/19.20.0.0/ygrid found on system under /u01/app directory... INFO: ************************************* INFO: ** DPR Cleanup INFO: ************************************* INFO: ** Disabling AFD filtering SUCCESS: AFD filtering disabled on all devices INFO: Cleaning up acfsclone filesystem... INFO: Deconfiguring GI on this node... SUCCESS: DPR cleanup actions completed. . . . . INFO: Cleanup was successful INFO: Log file is /opt/oracle/oak/log/scaoda7s002/cleanup/cleanup_2023-06-22_11-46-28.log WARNING: After system reboot, please re-run "odacli update-repository" for GI/DB clones, WARNING: before running "odacli restore-node -g". Connection to scaoda7s002 closed by remote host. Connection to scaoda7s002 closed.
Note that this cleanup does not remove the Oracle ASM disk headers, to allow
odacli restore-node -g
to be reattempted. - Update repository with GI clones and rerun restore-node -g after node(s) have rebooted.
Note: The script cleanup.pl
also provides a
-nodpr
flag which overrides the default behaviour on
environments using the Data preserving reprovisioning feature. Using this flag runs
regular cleanup and all the disks are formatted. This option can be used if a full
reset of the appliance is required.
Validation errors when running the odacli restore-node -g command: System not provisioned
Cause of the failure: The command odacli restore-node -g
was
not run.
[root@oak bin]# odacli restore-node -d
DCS-10037:System is not yet Provisioned.
Resolution:
Run the command odacli restore-node -g
and then run the command
odacli restore-node -d
.
Error: No disk group configured for storing database homes
odacli
restore-node -g
, is to specify the name of the Oracle ASM disk group,
where database homes can be created, using Oracle ACFS. The following error is
displayed if they do not set up the same.[root@node1 ~]# odacli restore-node -d
DCS-10601:The system is not set up to create database homes on ACFS.
Resolution: Run the following command:
[root@oda1 opt]# /opt/oracle/dcs/bin/odacli configure-dbhome-storage -dg DATA -s 80
Error in unpacking required clones
odacli restore-node -d
to succeed,
all the database homes need a clone, that must be present in the repository. The
list of clones that must be unpacked are listed in
/opt/oracle/oak/restore/metadata/dbVersions.list
. If the user
does not update the repository with all the required clones, then the following
error is displayed:[root@oak bin]# odacli restore-node -d
DCS-10237:The DB clone for version 11.2.0.4.170814, 12.1.0.2.170814 is not registered.
Resolution: Unpack all the required clones using odacli
update-repository
.
Error: Insufficient space error
during odacli update-repository
odacli update-repository
. For example:[root@scaoda7m001 clones]# odacli update-repository -f odacli-dcs-19.20.0.0.0-date-DB-12.1.0.2.zip
DCS-10802:Insufficient disk space on file system: /opt/oracle/oak/pkgrepos/orapkgs/clones.
Expected free space: 8.3 Gb, available space: 1.16 Gb
Resolution: To allocate additional space to this volume, run the following:
acfsutil size +<value>G /opt/oracle/oak/pkgrepos/orapkgs/clones
For
example, to add an additional 10 GB of space, use the
command:acfsutil size +10G /opt/oracle/oak/pkgrepos/orapkgs/clones
Error: Insufficient space for database homes
odacli restore-node
.
[root@scaoda703c1n1 ~]# odacli restore-node -d
DCS-10609:The configured size for Database homes storage is insufficient to create all database homes.
Current size 25 GB is less than expected size of 46 GB.
Resolution:
More space needs to be allocated to the file system. Run the command odacli
configure-dbhome-storage
to allocate more space to the file system.
Note that the file system is created only when the first database home is created.
odacli configure-dbhome-storage -dg DATA -s 100