Example Configuration Problems

Below are some configuration problems that can be encountered with GoldenGate in a RAC environment and how to diagnose and resolve them.

Incorrect parameter settings in the mount-dbfs.conf file

When XAG fails to mount DBFS, the failure will be reported either on the command line (if you are running the manual agctl command) or in the XAG log file:

$ agctl start goldengate <instance_name> --node exadb-node1

CRS-2672: Attempting to start 'dbfs_mount' on 'exadb-node1'
CRS-2674: Start of 'dbfs_mount' on 'exadb-node1' failed
CRS-2679: Attempting to clean 'dbfs_mount' on 'exadb-node1'
CRS-2681: Clean of 'dbfs_mount' on 'exadb-node1' succeeded
CRS-4000: Command Start failed, or completed with errors.

The XAG log file (agctl_goldengate_grid.trc) has the advantage that it shows timestamps that can be used when looking at other log or trace files:

2022-04-19 15:32:16: executing cmd: /u01/app/19.0.0.0/grid/bin/crsctl start resource xag.<INSTANCE_NAME>.goldengate -f -n exadb-node1
2022-04-19 15:32:19: Command output:
>  CRS-2672: Attempting to start 'dbfs_mount' on 'exadb-node1'
>  CRS-2674: Start of 'dbfs_mount' on 'exadb-node1' failed
>  CRS-2679: Attempting to clean 'dbfs_mount' on 'exadb-node1'
>  CRS-2681: Clean of 'dbfs_mount' on 'exadb-node1' succeeded
>  CRS-4000: Command Start failed, or completed with errors.
>End Command output
2022-04-19 15:32:19: start resource failed rc=1

Next, check the CRS trace file (crsd_scriptagent_oracle.trc), which shows why DBFS failed to mount. Below are some example errors caused by incorrect parameter settings in the mount-dbfs.conf file.

  • Incorrect DBNAME
    2022-04-19 15:32:16.679 :    AGFW:1190405888: {1:30281:17383} dbfs_mount
     1 1 state changed from: UNKNOWN to: STARTING
    2022-04-19 15:32:16.680 :CLSDYNAM:1192507136: [dbfs_mount]{1:30281:17383} [start]
     Executing action script: /u01/oracle/scripts/mount-dbfs.sh[start]
    2022-04-19 15:32:16.732 :CLSDYNAM:1192507136: [dbfs_mount]{1:30281:17383} [start]
     mount-dbfs.sh mounting DBFS at /mnt/dbfs from database ggdg
    2022-04-19 15:32:17.883 :CLSDYNAM:1192507136: [dbfs_mount]{1:30281:17383} [start]
     ORACLE_SID is
    2022-04-19 15:32:17.883 :CLSDYNAM:1192507136: [dbfs_mount]{1:30281:17383} [start]
     No running ORACLE_SID available on this host, exiting
    2022-04-19 15:32:17.883 :    AGFW:1192507136: {1:30281:17383} Command: start for
     resource: dbfs_mount 1 1 completed with invalid status: 2
  • Incorrect MOUNT_POINT
    2022-04-19 16:45:14.534 :    AGFW:1734321920: {1:30281:17604} dbfs_mount
     1 1 state changed from: UNKNOWN to: STARTING
    2022-04-19 16:45:14.535 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     Executing action script: /u01/oracle/scripts/mount-dbfs.sh[start]
    2022-04-19 16:45:14.586 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     mount-dbfs.sh mounting DBFS at /mnt/dbfs from database ggdgs
    2022-04-19 16:45:15.638 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     ORACLE_SID is ggdg1
    2022-04-19 16:45:15.738 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     spawning dbfs_client command using SID ggdg1
    2022-04-19 16:45:20.745 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     fuse: bad mount point `/mnt/dbfs': No such file or directory
    2022-04-19 16:45:21.747 :CLSDYNAM:1736423168: [dbfs_mount]{1:30281:17604} [start]
     Start - OFFLINE
    2022-04-19 16:45:21.747 :    AGFW:1736423168: {1:30281:17604} Command: start for
     resource: dbfs_mount 1 1 completed with status: FAIL
  • Incorrect DBFS_USER or DBFS_PASSWD
    2022-04-19 16:47:47.855 :    AGFW:1384478464: {1:30281:17671} dbfs_mount
     1 1 state changed from: UNKNOWN to: STARTING
    2022-04-19 16:47:47.856 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     Executing action script: /u01/oracle/scripts/mount-dbfs.sh[start]
    2022-04-19 16:47:47.908 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     mount-dbfs.sh mounting DBFS at /mnt/dbfs from database ggdgs
    2022-04-19 16:47:48.959 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     ORACLE_SID is ggdg1
    2022-04-19 16:47:49.010 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     spawning dbfs_client command using SID ggdg1
    2022-04-19 16:47:55.118 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     Fail to connect to database server. Error: ORA-01017: invalid username/password;
     logon denied
    2022-04-19 16:47:55.118 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
    2022-04-19 16:47:56.219 :CLSDYNAM:1386579712: [dbfs_mount]{1:30281:17671} [start]
     Start - OFFLINE
    2022-04-19 16:47:56.220 :    AGFW:1386579712: {1:30281:17671} Command: start for
     resource: dbfs_mount 1 1 completed with status: FAIL
  • Incorrect ORACLE_HOME
    2022-04-19 16:50:38.952 :    AGFW:567502592: {1:30281:17739} dbfs_mount
     1 1 state changed from: UNKNOWN to: STARTING
    2022-04-19 16:50:38.953 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     Executing action script: /u01/oracle/scripts/mount-dbfs.sh[start]
    2022-04-19 16:50:39.004 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     mount-dbfs.sh mounting DBFS at /mnt/dbfs from database ggdgs
    2022-04-19 16:50:39.004 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     /u01/oracle/scripts/mount-dbfs.sh: line 136:
     /u01/app/oracle/product/19.0.0.0/rdbms/bin/srvctl: No such file or directory
    2022-04-19 16:50:39.004 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     /u01/oracle/scripts/mount-dbfs.sh: line 139:
     /u01/app/oracle/product/19.0.0.0/rdbms/bin/srvctl: No such file or directory
    2022-04-19 16:50:39.004 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     ORACLE_SID is
    2022-04-19 16:50:39.004 :CLSDYNAM:569603840: [dbfs_mount]{1:30281:17739} [start]
     No running ORACLE_SID available on this host, exiting
    2022-04-19 16:50:39.004 :    AGFW:569603840: {1:30281:17739} Command: start for
     resource: dbfs_mount 1 1 completed with invalid status: 2

To resolve these configuration issues, set the correct parameter values in mount-dbfs.conf.

Problems with file locking on DBFS

If using Oracle Database 12c Release 2 (12.2) and the nolock DBFS mount option is not used, there can be problems with GoldenGate processes trying to lock checkpoint or trail files. The same problem will be encountered if using Oracle Database 11g Release 2 (11.2.0.4) or 12c Release 1 (12.1) with a patch for bug 22646150 applied. This patch changes how DBFS handles file locking to match Oracle Database 12c Release 2 (12.2). To add the nolock DBFS mount option, a patch for bug 27056711 must be applied to the database. If the patch for bug 22646150 has not been applied to the database, the patch for bug 27056711 and the nolock mount option is not required.

Below is an example of diagnosing a GoldenGate Microservices Architecture locking problem.

When starting a deployment with XAG, one or more processes may not start due to detecting a locking conflict on one or more files. This will often occur after a RAC node failover where the deployment did not get a chance to shut down cleanly.

When one of the deployment server processes fails to start (Administration Server, Performance Metrics Server, Distribution Server, Receiver Server, or Service Manager), check the log file for the particular server located in the deployment var/log directory.

For example, the log file /mnt/dbfs/goldengate/deployments/<INSTANCE_NAME>/var/log/pmsrvr.log shows the following error on startup:

2022-04-11T12:41:57.619-0700 ERROR| SecureStore failed on open after
 retrying due to extended file lock. (main)
2022-04-11T12:41:57.619-0700 ERROR| SecureStore failed to close (28771). (main)
2022-04-11T12:41:57.619-0700 INFO | Set network configuration. (main)
2022-04-11T12:41:57.619-0700 INFO | Asynchronous operations are enabled with default
 synchronous wait time of 30 seconds (main)
2022-04-11T12:41:57.619-0700 INFO | HttpServer configuration complete. (main)
2022-04-11T12:42:07.674-0700 ERROR| Unable to lock process file, Error is [1454]
 - OGG-01454 (main)
2022-04-11T12:42:07.675-0700 ERROR| Another Instance of PM Server is Already Running
 (main)

An Extract process will report start-up failures in the ER-events.log logfile located in the deployment log file directory.

For example, /mnt/dbfs/goldengate/deployments/<instance_name>/var/log/ER-events.log shows the following error:

2022-04-11T00:14:56.845-0700  ERROR   OGG-01454  Oracle GoldenGate Capture for
 Oracle, EXT1.prm:  Unable to lock file
 "/mnt/dbfs/goldengate/deployments/<instance_name>/var/run/EXT1.pce" (error 11, Resource
 temporarily unavailable).  Lock currently held by process id (PID) 237495.
2022-04-11T00:14:56.861-0700  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle,
 EXT1.prm:  PROCESS ABENDING.

Next, check to ensure the process failing to start up is not running on any of the RAC nodes.

Example:

$ ps -ef|grep EXT1|grep -v grep

Once it has been determined that the process is not running, the deployment must be shutdown cleanly, the file system unmounted, and the correct DBFS patch applied.

Example:

$ agctl stop goldengate <INSTANCE_NAME>
$ crsctl stop resource dbfs_mount

Check the DBFS mount options:

$ ps -ef|grep dbfs_client

oracle   204017      1  0 14:37 ? 
       00:00:00 /u01/app/oracle/product/19.1.0.0/dbhome_1/bin/dbfs_client dbfs@dbfs.local
 -o allow_other,failover,direct_io /mnt/dbfs

It is clear the nolock mount option was not used, which leads to the locking errors.

Use the guidelines above to determine if a DBFS patch is required. After which, add the nolock mount option to the mount-dbfs.conf file on all Oracle RAC nodes that are part of the deployment.

Example:

MOUNT_OPTIONS=allow_other,direct_io,failover,nolock

Finally, restart the deployment:

$ agctl start goldengate <INSTANCE_NAME>