11.7 Oracle Trace File Analyzer Fails with TFA-00103 After Applying the July 2015 Release Update Revision (RUR) or Later

Phase 1 of Oracle Trace File Analyzer upgrade

Oracle Trace File Analyzer communication model has been changed in versions greater than 12.1.2.4.1. To avoid communication problems, Oracle Trace File Analyzer communication change must be complete across all nodes of the Oracle Trace File Analyzer configuration. Oracle Trace File Analyzer is upgraded on each node locally as part of application of Release Update Revision (RUR). The Release Update Revision (RUR) process applies the new software and restarts Oracle Trace File Analyzer, but does not put in place the new connection model.

Phase 2 of Oracle Trace File Analyzer upgrade

Before automatically implementing the new communication model, Oracle Trace File Analyzer waits for 24 hours to complete the application of Release Update Revision (RUR) on all nodes. Once Oracle Trace File Analyzer is upgraded on all the nodes, phase 2 must occur within 10 minutes. The new Oracle Trace File Analyzer communication model is not implemented (phase 2) until Release Update Revision (RUR) is applied on all nodes (phase 1).

Oracle Trace File Analyzer indicates by displaying the message:

TFA-00103 - TFA is not yet secured to run all commands.

Once Oracle Trace File Analyzer is upgraded on all nodes in the configuration (phase 1), Oracle Trace File Analyzer:

  • Generates new SSL keys

  • Sends the keys to the valid nodes in the cluster

  • Restart Oracle Trace File Analyzer on each of these nodes (phase 2)

On completion of phase 2, Oracle Trace File Analyzer must process commands normally using the new communication model.

How can I verify that both phases have been completed and that Oracle Trace File Analyzer communication among all the nodes has been established?

First, as root run:

 tfactl print status

.--------------------------------------------------------------------------------.
| Host   | Status  | PID   | Port |   Version  |      Build ID        | Inventory|
+--------+---------+-------+------+------------+----------------------+----------+
| sales1 | RUNNING | 4390  | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales2 | RUNNING | 23604 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales3 | RUNNING | 28653 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales4 | RUNNING | 5989  | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
'--------+---------+-------+------+------------+----------------------+----------'

Once all nodes are shown to be at the same version and build ID then within about 10 minutes maximum the synchronization of keys must complete.

Ensure that you run the following command:

tfactl print directories

Running tfactl print directories must return the list of directories registered in Oracle Trace File Analyzer. If the communication is not established among all the nodes, then the command returns the message, TFA is not yet secured to run all commands.

The message also indicates that phase 2 has not been completed. To verify on which nodes phase 2 has not yet been completed, on each node, check the existence of the following files. The files must be readable only by root, ownership:group of root. The checksum for each file must match on all nodes.

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/client.jks

-rwx------   1 root  root   3199 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/client.jks

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/server.jks

-rwx------   1 root  root   3201 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/server.jks

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/internal/ssl.properties

-rwx------   1 root  root   220 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/internal/ssl.properties

What if I do not upgrade all my nodes at the same time by choice or if some are down for maintenance?

Oracle Trace File Analyzer waits to complete the phase 2 operations until all nodes have completed upgrade or until 24 hours has passed.

After 24 hours, Oracle Trace File Analyzer:

  • Generates new keys

  • Copies the key to all the nodes that have been upgraded

  • Restarts Oracle Trace File Analyzer on those nodes

Any nodes that did not get the keys are outside of the Oracle Trace File Analyzer configuration. After upgrading Oracle Trace File Analyzer, manually synchronize the keys with other nodes.

If the application of Release Update Revision (RUR) on all the nodes is completed within 24 hours, then manually synchronize the keys.

To manually synchronize the keys, go to one node that has completed Phase 2 and run the synctfanodes.sh script as root.

# $GIHOME/tfa/nodename/tfa_home/bin/synctfanodes.sh

Note:

The script uses SSH and SCP. If root does not have passwordless SSH, then Oracle Trace File Analyzer prompts you 3 time per node for password each time a command is run.

If the Expect utility is available on the node, then Oracle Trace File Analyzer uses Expect thus reducing the number of prompts for password.

The script displays all the nodes in Oracle Trace File Analyzer configuration, including the nodes where Oracle Trace File Analyzer is yet to upgrade.

The script also shows the nodes that are part of the Oracle Grid Infrastructure configuration.

Verify the node list provided and supply a space-separated list of nodes to synchronize. It doesn't hurt to include the nodes that were previously upgraded as the process is idempotent.

For example:

Nodes sales1, sales2, sales3, and sales4 are all part of Oracle Grid Infrastructure. The nodes were running Oracle Trace File Analyzer 12.1.2.0.0 until the July 2015 Release Update Revision (RUR) was applied.

The Release Update Revision (RUR) was applied initially only to sales1 and sales3 due to outage restrictions.

After completion of phase 1 of the Oracle Trace File Analyzer upgrade, run print status. Running the command lists all nodes even though different versions of Oracle Trace File Analyzer are running on some of the nodes.

-bash-3.2# /u01/app/12.1.0/grid/bin/tfactl print status
.--------------------------------------------------------------------------------.
| Host   | Status  | PID   | Port |   Version  |       Build ID       |Inventory |
+--------+---------+------ +------+------------+----------------------+----------+
| sales1 | RUNNING | 27270 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales3 | RUNNING | 19222 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales2 | RUNNING | 10141 | 5000 | 12.1.2.0.0 | 12120020140619094932 | COMPLETE |
| sales4 | RUNNING | 17725 | 5000 | 12.1.2.0.0 | 12120020140619094932 | COMPLETE |
'--------+---------+-------+------+------------+----------------------+----------'

Since the new Oracle Trace File Analyzer communication model is not set up among all the nodes, many commands when run as root fail with the message:

TFA is not yet secured to run all commands.

Failed attempts to run tfactl commands as a non-root indicates that there is no sufficient permission to use Oracle Trace File Analyzer.

After 24 hours, Oracle Trace File Analyzer completes phase 2 for sales1 and sales3. Oracle Trace File Analyzer communication model is established for sales1 and sales3. You can perform normal Oracle Trace File Analyzer operations on sales1 and sales3. Communication with sales2 and sales4 has not yet been established and so running remote commands to them fail.

When running print status on sales1 and sales3, we no longer see sales2 and sales4. Only Oracle Trace File Analyzer using the new Oracle Trace File Analyzer communication model communicates.

-bash-3.2# /u01/app/12.1.0/grid/bin/tfactl print status

.--------------------------------------------------------------------------------.
| Host   | Status  | PID   | Port |   Version  |     Build ID         |Inventory |
+--------+---------+-------+------+------------+----------------------+----------+
| sales1 | RUNNING | 4390  | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales3 | RUNNING | 23604 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
'--------+---------+-------+------+------------+----------------------+----------'

Running the command tfactl diagcollect collects from sales1 and sales3 but not from the other nodes.

$ tfactl diagcollect
Choose the event you want to perform a diagnostic collection for:
1. Mar/12/2019 16:08:20 [ db.orcl.orcl ]  ORA-04030: out of process memory when trying to allocate
2. Mar/12/2019 16:08:18 [ db.orcl.orcl ]  ORA-04031: unable to allocate 8 bytes of shared memory
3. Mar/12/2019 16:08:16 [ db.orcl.orcl ]  ORA-00494: enqueue held for too long more than seconds by osid
4. Mar/12/2019 16:08:14 [ db.orcl.orcl ]  ORA-29709: Communication failure with Cluster Synchronization
5. Mar/12/2019 16:08:04 [ db.orcl.orcl ]  ORA-29702: error occurred in Cluster Group Service operation
6. Mar/12/2019 16:07:59 [ db.orcl.orcl ]  ORA-32701: Possible hangs up to hang ID= detected
7. Mar/12/2019 16:07:51 [ db.orcl.orcl ]  ORA-07445: exception encountered: core dump [] [] [] [] [] []
8. Mar/12/2019 16:07:49 [ db.orcl.orcl ]  ORA-00700: soft internal error, arguments: [700], [], [],[]
9. Mar/11/2019 22:02:19 [ db.oradb.oradb ]  DIA0 Critical Database Process Blocked: Hang ID 1 blocks 5 sessions
10. Default diagnostic collection, for no specific event

Please choose the event : 1-10 [] 10

By default TFA will collect diagnostics for the last 12 hours. This can result in large collections
For more targeted collections enter the time of the incident, otherwise hit <RETURN> to collect for the last 12 hours
[YYYY-MM-DD HH24:MI:SS,<RETURN>=Collect for last 12 hours] :

Collecting data for the last 12 hours for all components...
Collecting data for all nodes

Collection Id : 20190312163846node1

Detailed Logging at : /scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_38_47_PDT_2019_node_all/diagcollect_20190312163846_node1.log
2019/03/12 16:38:50 PDT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2019/03/12 16:38:50 PDT : Collection Name : tfa_Tue_Mar_12_16_38_47_PDT_2019.zip
2019/03/12 16:38:50 PDT : Collecting diagnostics from hosts : [node1]
2019/03/12 16:38:50 PDT : Scanning of files for Collection in progress...
2019/03/12 16:38:50 PDT : Collecting additional diagnostic information...
2019/03/12 16:38:55 PDT : Getting list of files satisfying time range [03/12/2019 04:38:50 PDT, 03/12/2019 16:38:55 PDT]
2019/03/12 16:39:02 PDT : Collecting ADR incident files...
2019/03/12 16:39:06 PDT : Completed collection of additional diagnostic information...
2019/03/12 16:39:07 PDT : Completed Local Collection
.------------------------------------.
|         Collection Summary         |
+----------+-----------+------+------+
| Host     | Status    | Size | Time |
+----------+-----------+------+------+
| node1 | Completed | 21MB |  17s |
'----------+-----------+------+------'

Logs are being collected to: /scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_38_47_PDT_2019_node_all
/scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_38_47_PDT_2019_node_all/node1.tfa_Tue_Mar_12_16_38_47_PDT_2019.zip

While upgrading on the remaining nodes, Oracle Trace File Analyzer cannot see the nodes already upgraded until the configuration is synchronized.

bash-3.2# /u01/app/12.1.0/grid/bin/tfactl print status

.------------------------------------------------------------------------------.
| Host   | Status  | PID | Port | Version    |     Build ID         | Inventory|
+--------+---------+-----+------+------------+----------------------+----------+
| sales3 | RUNNING | 9   | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
'--------+---------+-----+------+------------+----------------------+----------'

For nodes, on which the application of Release Update Revision (RUR) was not completed within the 24 hour waiting period to become part of Oracle Trace File Analyzer configuration:

  1. Run the synchronize script from a node that has the keys already generated

  2. Manually copy the SSL configuration to those nodes

In our example from sales1:

/u01/app/12.1.0/grid/tfa/sales1/tfa_home/bin/synctfanodes.sh
Current Node List in TFA :
sales1
sales2
sales3
sales4

Node List in Cluster :
sales1 sales2 sales3 sales4

Node List to sync TFA Certificates :
1 sales2
2 sales3
3 sales4

Do you want to update this node list? [Y|N] [N]: Y

Please Enter all the nodes you want to sync...

Enter Node List (seperated by space) : sales2 sales4

Syncing TFA Certificates on sales2 :

TFA_HOME on sales2 : /u01/app/12.1.0/grid/tfa/sales2/tfa_home

Copying TFA Certificates to sales2...
Copying SSL Properties to sales2...
Shutting down TFA on sales2...
Sleeping for 5 seconds...
Starting TFA on sales2...

Syncing TFA Certificates on sales4 :

TFA_HOME on sales4 : /u01/app/12.1.0/grid/tfa/sales4/tfa_home

Copying TFA Certificates to sales4...
Copying SSL Properties to sales4...
Shutting down TFA on sales4...
Sleeping for 5 seconds...
Starting TFA on sales4...

Successfully re-started TFA..

.--------------------------------------------------------------------------------.
| Host   | Status  | PID   | Port |   Version  |      Build ID        | Inventory|
+--------+---------+-------+------+------------+----------------------+----------+
| sales1 | RUNNING | 4390  | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales2 | RUNNING | 23604 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales3 | RUNNING | 28653 | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
| sales4 | RUNNING | 5989  | 5000 | 12.1.2.4.2 | 12124220150629072212 | COMPLETE |
'--------+---------+-------+------+------------+----------------------+----------'

Note:

The node list was changed to only the nodes that needed the keys synchronized, sales2 and sales4.

In this case, it’s fine to synchronize sales3 as it would have received the same files and restart Oracle Trace File Analyzer.

I know that not all nodes are upgraded at the same time. I do not want to wait 24 hours for Oracle Trace File Analyzer to sync the key files. What do I do?

Use the synchronize script to force Oracle Trace File Analyzer to generate and synchronize certificates. While running, the script prompts if you wish to generate SSL configuration files and then synchronizes them to the remote nodes.

For example:

-bash-3.2# /u01/app/12.1.0/grid/tfa/sales1/tfa_home/bin/synctfanodes.sh

Current Node List in TFA : 
sales1 
sales2 
sales3 
sales4

TFA has not yet generated any certificates on this Node.

Do you want to generate new certificates to synchronize across the nodes? [Y|N] [Y]:

Generating new TFA Certificates...

Restarting TFA on sales1...
Shutting down TFA 
TFA-00002 : Oracle Trace File Analyzer (TFA) is not running
TFA Stopped Successfully 
. . . . . 
. . . 
Successfully shutdown TFA.. 
Starting TFA.. 
Waiting up to 100 seconds for TFA to be started.. 
. . . . . 
. . . . . 
Successfully started TFA Process.. 
. . . . . 
TFA Started and listening for commands

Node List in Cluster :
sales1 sales2 sales3 sales4

Node List to sync TFA Certificates : 
1 sales2 
2 sales3 
3 sales4

Do you want to update this node list? [Y|N] [N]:

After the key files are generated and synchronized, on each node you must find the files as follows:

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/client.jks

-rwx------   1 root     root    3199 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/client.jks

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/server.jks

-rwx------   1 root     root    3201 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/server.jks

# ls -al /u01/app/12.1.0/grid/tfa/sales1/tfa_home/internal/ssl.properties

-rwx------   1 root     root    220 Jun 30 14:12 /u01/app/12.1.0/grid/tfa/sales1/tfa_home/internal/ssl.properties

Readable only by root, ownership:group of root. The checksum for each file must match on all nodes.