Skip Headers
Oracle® Communications IP Service Activator System Administrator's Guide
Release 7.2

E39366-01
Go to Documentation Home
Home
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

9 Troubleshooting IP Service Activator

This chapter provides troubleshooting information for Oracle Communications IP Service Activator.

Running Troubleshooting Scripts

The following script is available to assist you in retrieving information about your deployment platform. This script is located in the IP Service Activator bin directory.

  • ipsaps: when run, this script displays the IP Service Activator processes that are currently running.

Example: Using the IPSAPS script to verify which components are running

  1. On the server where you want to verify the components, log in as the IP Service Activator administrator (ipsaadm).

  2. View which processes are running:

    $ ipsaps
    
  3. Check, for example, that the Network Processor is started. If a process is listed that contains the word networkprocessor, the Network Processor is running.

     ipsaadm 13964 13763 S 31760 138176        0:14 event_handler +debugFile -debu
     ipsaadm 13778 13763 S 38640 143848        0:37 policy_server +debugFile -debu
     ipsaadm 13768 13763 S 13368 114008        0:00 system_log -ComponentName mast
        USER   PID  PPID S  RSS  VSZ        TIME COMMAND
     ipsaadm 13763     1 S 7448 8864        0:00 cman -Service no +debugFile -debu
     ipsaadm 13974 13763 S 22656 29424        0:14 integration_manager -timeout 0 
     ipsaadm 13982 13763 S 56784 87120        0:55 networkprocessor 
     ipsaadm 12919     1 S 5352 6440        0:02 omniNames -logdir /opt/Orchestrea
    
  4. If a component is not running, start the component. See "Starting and Stopping IP Service Activator".

  5. If problems arise, refer to install-related logs in logs/ipsacm.log.

Communication Errors with the Database

Communication errors may be reported when starting IP Service Activator and can indicate a problem with the connection to the database. You may see an error message displayed in the Start Component Manager dialogue box. For example, the following type of error message may appear:

!!!The Orchestream Policy Server has failed to start!!!
This failure is possibly due to the following:
  1. 1. OCDB or Oracle client are not set up correctly.
    2. Database Username and Password are incorrect.
    3. Database is not contactable from this machine.
    SHUTTING DOWN
    !!!!Component Manager is now shutting down and stopping components!!!!
    

This type of error can result from improper configuration of the Oracle client or ODBC Driver. Refer to IP Service Activator Installation Guide for configuration instructions.

Component Errors

This section provides topics to help you resolve connection issues about restarting components, reconnecting to the server, and resolving other component errors.

This section includes the following topics:

Component Will Not Restart

If a component does not start, the component manager attempts to restart it. If, after three attempts, the component fails to start, the component manager abandons the restart and the following message is displayed in the current faults pane.

Component bouncing: shutting it down

Restart the component manager to restart the failed component.

GUI Cannot Connect to Server

The error message ”Could not connect to server” can occur if you attempt to log in to the user interface before you start the Policy Server. The problem generally arises if you are manually starting components on the Policy Server, and you start components and the GUI in the wrong order. If you have set up the system components to run automatically, the Policy Server automatically starts before you attempt to log in to the user interface.

Note:

If this error is displayed and you have installed system components to run automatically, remember that initially you must start the component manager manually. The component manager will subsequently restart automatically if the host machine is rebooted or the run level lowered and resumed.

To resolve the error:

  1. Click OK to close the error message box.

  2. Run ipsaps to confirm that the naming service component (omniNames) is running. If ipsaps is not installed, you can use the following command to confirm that omniNames is running:

    $ ps –ef | grep omniNames
    
  3. Navigate to /opt/OracleCommunications/IP Service Activator/bin and run the command to start the component manager:

    $ ipsacm start
    
  4. If the Policy Server starts successfully, start the user interface.

    If the problem persists and the same error message appears, contact Oracle GCS.

    If the Policy Server fails to start, see the discussion about manually configuring the Oracle client connection for Solaris in IP Service Activator Installation Guide.

GUI Hangs When Running Multiple Interfaces on Same Machine

The IP Service Activator GUI hangs when multiple network interfaces are running on the same machine. When IP Service Activator GUI hangs either turn off the extra network interfaces or add the BOAiiop_name_port option.

By default, the BOA can work out the IP address of the host machine. This address is recorded in the object references of the local objects. However, when the host has multiple network interfaces and multiple IP addresses, it may be desirable for the application to control what address the BOA should use. This can be done by using the -BOAiiop_name_port option as mentioned in the following procedure.

To resolve hanging of IP Service Activator GUI:

  1. Choose which of the network interfaces you want to use to communicate with the IP Service Activator server from the IP Service Activator GUI client.

  2. Note the IP address of the machine on which the IP Service Activator GUI is running.

  3. Edit the "User Interface" shortcut. For this do the following:

    • From Start > Programs, go to <ORACLE_HOME> > IP Service Activator > User Interface.

    • Right-click on User Interface and select Properties. This brings up the User Interface dialog. On the User Interface Properties dialog select Shortcut tab.

  4. Add the "-BOAiiop_name_port <hostname[:port number]>" option in the Target field; where <hostname[:port number]> is the IP address from Step 1. This options tells the BOA the hostname and optionally the port number to be used.

    For example:

    "C:\Program Files\OracleCommunications\IP Service Activator\Program\ipsa_explorer.exe" -BOAiiop_name_port 10.178.139.176

Client Cannot Connect to Server

The following error message may appear:

Error message: Client not authorized to connect to server
  1. When you change users, you should export your display to the machine you are working on. Enter:

    DISPLAY=machine_name:0.0 
    export DISPLAY
    
  2. In a new terminal window, enter:

    xhost +machine_name
    

    The xhost + command allows other programs to use your display. For example, if you are logged in as root, it enables the oracle user to use your display also.

Component Manager Fails on Start-up

The component manager may fail while appearing to start successfully. The user interface component will therefore not start.

  1. Check whether the component manager is running, using ipsaps.

  2. If the component manager is running, go to the directory /opt/OracleCommunications/ServiceActivator/bin and shut it down:

    $ ./ipsacm stop
    
  3. Delete the cman.ini file.

    The default path is /opt/OracleCommunications/ServiceActivator/WorkingData

  4. Restart the component manager. For this, go to the directory /opt/OracleCommunications/ServiceActivator/bin and use the command:

    $ ./ipsacm start
    
  5. If you have changed the IP address, delete it from the omninames_*.log files in the /opt/OracleCommunications/ServiceActivator/WorkingData directory.

If the problem persists, contact Oracle GCS.

Component Does Not Restart

If a component does not start, the component manager attempts to restart it. If, after three attempts, the component fails to start, the component manager abandons the restart and the following message is displayed in the current faults pane.

Component bouncing: shutting it down

Restart the component manager to restart the failed component.

Investigating a User Interface Error on Windows

If an error occurs in the IP Service Activator user interface, a Critical message is written to the current faults pane. These messages are displayed on a red background.

If you are running IP Service Activator on Windows, a message is written to the Windows Application Log (viewable with the Event Viewer).

Details of the error appear in a Problem Reports and Solutions report file. The file can be sent to Oracle GCS for investigation of the cause.

Insufficient privilege to stop the component manager or naming service

The component manager and naming service can only be started and stopped by the ipsaadm user. If you try to stop either of these processes as another user the message &rsquor;Insufficient privilege' is displayed. Change to the ipsaadm user before trying the stop command.

Transactions

This section provides topics to help the installer resolve any transaction issues that the user may be having.

This section includes the following topics:

Cannot Save or Commit a Transaction

This generally indicates connection loss between the user interface and the policy server and may be due to the fact that the policy server has failed. If the policy server is not available, the toolbar's save and commit buttons are grayed out and the relevant menu options are not available. When the connection to the policy server is restored, users can save or commit the changes they have made in the current transaction.

Reviewing Transactions

When a problem is encountered, it is helpful to review the transaction to identify the problem.

To review a transaction:

  1. Click the System tab in the hierarchy pane.

  2. Expand the Transactions folder.

  3. Right-click on either the Committed Transactions or the Scheduled Transactions folder.

    Note:

    You can also right-click on the Transactions folder and perform the search at a higher level.
  4. Select Find From Here from the pop-up menu.

    The Find dialog box appears.

  5. Enter search criteria in the Find What field and click Find.

    Search results appear in the Find dialog box.

  6. Right-click on the desired transactions and select Properties to open the transaction's dialog box.

    From the transaction dialog box, you can also review all information related to the transaction.

    Note:

    You can also search for transactions from the Tools menu. Select Find to open the Find dialog box.

Exporting Transactions

If you want to send information on the transaction to Oracle GCS, you can export the details of the transaction to a text file.

To export all transactions:

  1. Select Options from the Tools menu.

    The Options dialog box appears.

  2. Select the Transactions property page.

    Description of adm_trbl_options.png follows
    Description of the illustration adm_trbl_options.png

  3. Click Export to export the transactions to a text file.

    The Save As dialog appears.

  4. Enter a filename for the text file and click Save.

To export a single transaction:

  1. Locate the single transaction that you want to export. If needed, refer to the procedure "Reviewing Transactions".

  2. Right-click on the transaction in the Find dialog box and select Properties from the pop-up menu to open the transaction's dialog box.

    Description of adm_trbl_transaction.png follows
    Description of the illustration adm_trbl_transaction.png

  3. From the Transaction property page, click Export to export the transaction to a text file.

    The Save As dialog appears.

  4. Enter a filename for the text file and click Save.

Log Files

This section describes:

  • How to generate the debug log files required by Oracle GCS

  • How to collate log files on Solaris/Linux

Generating debug log files for non-Java based IP Service Activator components

Note: This section applies to the following components:

  • Component manager

  • Integration manager

  • Policy server

  • Event handler

  • Proxy agent

  • Device drivers

  • IP Service Activator GUI

For details about how to set the logging level to Debug for the Network Processor, refer to "Changing the Logging Level Filter".

For more information about setting logging levels for Java-based components, refer to "Generating Debug Log Files for Java-based IP Service Activator Components".

By default, IP Service Activator outputs basic logging information to its system log file. You may be instructed by Oracle GCS to switch on more detailed &rsquor;debug' logging for one or more components if, for example:

  • Configuration is not being successfully propagated to devices

  • A component fails to restart after failing

  • A critical error message is displayed in the user interface

Switching on debug logging for a component outputs additional information about the component's execution to a log file stored in the DebugLogs directory.

Note:

Note that debug logging significantly increases CPU usage on the component's host machine. You should turn on debug logging only when instructed by Oracle GCS, and turn off debug logging when the necessary information has been obtained.

Debug logging is switched on using a command-line parameter that is passed to the relevant component. The parameter can be passed to a component &rsquor;on the fly' using a component parameters utility which is supplied with IP Service Activator. This is a command-line utility that can be run against any component.

Note:

The component parameters utility is installed by default on both Windows, Solaris and Linux. If the utility is not installed, re-run the installation program.

Note that when full debug output is turned on, output debugging trace log files can become very large. Parameters are also available for limiting the size of these files and for reusing the same output log file for a component when its maximum size is reached. By default there is no maximum file size/file reuse instruction set for the output log file.

You can enable debug output for any IP Service Activator component.

Log files are named according to the date on which they were started, in the format yyyymmdd, e.g. 20040403 for the 3 April 2004. If there is more than one log file for the same day then they are successively numbered with a dash, such as cisco.log.20040403-1.

To enable or disable debug output for a component:

  1. Do one of the following:

    • On a Windows platform:

      • Open a command window.

      • Change to the IP Service Activator Program directory.

    • On a Solaris/Linux platform:

      • Change to the /opt/OracleCommunications/ServiceActivator/bin directory.

  2. Type the following:

    ComponentParameters -ComponentName ComponentName 
    -set Debug.All enabled | disabled | default [-set Debug.FileMaxSize value] [-set Debug.FileReuse enabled | disabled] 
    

    where ComponentName is the name of an IP Service Activator component as listed on IP Service Activator's System tab.

    Table 9-1 lists the parameter definitions.

Table 9-1 Component Parameter Definitions

Parameter Description

Debug.All enabled | disabled | default

Enables or disables full debug output for a component. Setting the parameter to default resets debug output to the default debug streams – that is, only critical and important events.

Debug.FileMaxSize value

Specifies the maximum log file size in bytes. Note that value must be 2 GB or less, but we strongly recommend limiting file size to no more than 20 MB. A value of 0 indicates there is no limit, i.e. the log file may be as large as the file system allows.

Debug.FileReuse enabled | disabled

Enables or disables file reuse. When the maximum log file size is reached the log starts again at the beginning of the file.


For example, on Solaris/Linux:

./ComponentParameters -ComponentName juniper-USNTW666 
-set Debug.All enabled -set Debug.FileMaxSize 150000 -set Debug.FileReuse enabled

enables debug output for the Juniper M-series driver, restricting the size of the debug log file to 150,000 bytes (150 KB). When the maximum file size is reached, the log starts again at the beginning of the same output file.

To check the value of a specific parameter for a component:

On Windows:

  1. Open a command window.

  2. Change to the IP Service Activator Program directory and type:

    ComponentParameters -ComponentName ComponentName -get Parameter
    

On Solaris/Linux:

  • Change to the IP Service Activator bin directory and enter:

    ./ComponentParameters -ComponentName ComponentName -get Parameter
    

The -get parameter does not work for some components such as:

  • All Cartridges

  • LogReader

  • Discovery Server

  • AL5620SAMDiscoveryServer

  • CTM Server

  • TransactionMonitor

To check the value of all parameters for a component:

On Windows:

  1. Open a command window.

  2. Change to the IP Service Activator Program directory and enter:

    ComponentParameters -ComponentName ComponentName -all
    

On Solaris/Linux:

  • Change to the IP Service Activator bin directory and enter:

    ./ComponentParameters -ComponentName ComponentName -all
    

Collating Log Files

Note:

If you have identified the time at which the problem occurred, we recommend you collate only the relevant log file sections. For example, if the problem occurred at 13:20:45, send logging information for the five-minute period leading up to this time.

Solaris/Linux

Supply the log files from the following directory:

  • /opt/OracleCommunications/ServiceActivator/logs

Windows

Supply the log files from the following directory:

  • Program Files\Oracle Communications\Service Activator\DebugLogs

For information about generating debug log files, see "Generating debug log files for non-Java based IP Service Activator components".

Enabling Debug Log Files for a Windows GUI

If you encounter any errors or issues with the Windows GUI, turn on the debug log file as follows:

  1. Close IP Service Activator.

  2. From the Start menu, go to Oracle Communications >Service Activator > User Interface.

  3. Right-click on User Interface and select Properties.

  4. In the Target field add the following text after the existing text:

    "CORBA;master_server" +debugAll +debugFile -debugFileName explorer.log
    
  5. Close the Properties window.

Refer to this debug log file if additional troubleshooting is required with Oracle GCS.

Generating Debug Log Files for Java-based IP Service Activator Components

Logging levels for IP Service Activator components using log4j are set by editing values in their properties files. These components include:

  • Network Processor

  • Configuration Template module

  • InfoVista Integration module

  • Micromuse module

  • TACC module

  • Policy Services INA Integration module

  • Log reader

For details about logging levels, refer to "Logging Levels". The same settings apply to all Java-based IP Service Activator components.

For details about how to set the logging level to Debug for the Network Processor, refer to "Changing the Logging Level Filter". The same principles apply for all Java-based IP Service Activator components, except you must set the level in the properties file for the particular component you are configuring.

Checking and Deleting Core Files

If an IP Service Activator component fails, a process creates a core dump file in the logs directory. Core dump files are named as follows:

core.process_name.

The following processes create a core dump file on failure:

  • Integration Manager

  • Policy Server

  • Event Handler

  • Proxy Agent

  • Device drivers

  • Component Manager

  • Naming Service

To conserve disk space you should regularly check and delete these files.

Managing Swap File Size

When installing the IP Service Activator GUI, Windows is unable to correctly process the available swap space when the target Windows computer is configured for "System Managed" swap file size. A command window appears explaining that no swap space is available and a dialog box appears reporting an error.

To avoid this error, change the swap file management to 'Custom Setting'.

To change the swap file management:

  1. Click Start, Settings, Control Panel, and then System.

  2. On the Advanced tab, Performance panel, click Settings.

  3. On the Advanced tab, Virtual memory panel, click Change.

  4. Select Custom size and set a swap file size.

Discovering Manually Created Subinterfaces on Juniper M-series Devices

In order to discover manually created subinterfaces using the Network Processor on Juniper M-series devices, the atm-option vpi maximum vcs should be set in order to populate the ATM subinterface data into SNMP data. For example:

at-1/1/1 {
atm-options {
vpi 0 {
maximum-vcs 200;
 }
vpi 1 {
maximum-vcs 200;
}
}
}

Troubleshooting the Network Processor

This section provides topics to help the installer isolate and resolve issues during installation, setup, and operation.

Note:

There is no mechanism for repairing IP Service Activator components. If faults are reported by the operating system when trying to start up a system component, you should remove and reinstall the software.

This section includes the following topics:

Viewing Debug Logs

One of the first steps to solving a problem is to view the logging at a more detailed level. The default logging level is ”Info”, which provides a medium logging level. You can open up Network Processor logging for debug purposes by performing the procedure "Changing the Logging Level Filter".

Then you can use the following procedure to display debug logging for the Network Processor component while performing other Network Processor-related activities.

  1. To display the log output on the screen, enter:

    $ tail -f /opt/OracleCommunications/ServiceActivator/logs/networkprocessor.log
    $ tail -f /opt/OracleCommunications/ServiceActivator/AuditTrails/np<cartridgeName>.audit.log
    

    Replace <cartridgeName> with the name of the cartridge.

  2. To exit the log file, press Ctrl+C.

Network Processor Cannot Find a Cartridge

If the cartridge does not appear in the Drivers folder under the Network Processor in the Hierarchy pane (and on the Drivers property page of the Network Processor dialog box), or if the cartridge appears in the failed state, perform the following steps:

  1. You may not have installed the cartridge jar file. Check that the cartridge jar file is present in the /opt/OracleCommunications/ServiceActivator/lib/java-lib/cartridges/ directory.

  2. You may not have installed the latest version. Check the version/date of the cartridge jar file to ensure that the current version has been installed.

  3. The cartridge file might not have been detected by the Network Processor. Stop and restart the Network Processor.

  4. Verify that the Network Processor and cartridge are now functional by checking the GUI screens:

    • Right-click the <NetworkName> on the Domains tab and select Properties. The Network Processor dialog box shows the expected Network Processor state and restart time.

    • Check the Drivers property page. The cartridge should be listed as cartridgeName-<hostname>, for example, Huawei-srvotlab481.

    • In the Drivers folder under the Network Processor object, right-click the cartridge object and select Properties. On the Cartridge dialog box (the cartridgeName-<hostname> dialog box), ensure that the cartridge is in a running state.

Turning Off the Offline Maintenance Mode Warning

When a device object is put into Offline Maintenance mode in the GUI, or the Network Processor is running with -FileInterface enabled, any IP Service Activator-generated configuration commands intended for that device may not actually be sent to the device.

A warning is issued for each affected concrete:

Message 3311: Changes to configuration were attempted while the device was in Offline Maintenance mode; some configuration has not been applied to the device.

The parameter warnInOfflineMaintenanceMode can be set using the Configuration GUI to control whether these warnings are issued.

See "About the warnInOfflineMaintenanceMode Parameter".

Command line and ComponentParameters for Network Processor

The command line and ComponentParameters for Network Processors must recognize the FileInterface and NoCommandDelivery options. They must also recognize OfflineMaintenance and OfflineTest options. The same set of rules apply to the new preferred options. The values are boolean (true, false), and are enabled when set to true, and disabled when set to false.

For more information about command delivery modes, see "Monitoring and Managing the Network Processor".

Setting Severity of Fault ”Re-issue commands operation failed”

The fault message 3510 DeviceID: Re-issue commands operation failed is generated if commands were not successfully delivered to a device, following a Configuration Audit - Reissue commands operation.

You can modify the severity level of this fault message. To do this, edit the parameter reconcileFailureIsCritical in the network processor default.properties file, as follows:

  • True (default value) sets message 3510 severity level as Critical. When message 3510 is raised with severity Critical, the device is put into Intervention Required state. In this state, the device no longer accepts configuration commands.

  • False sets message 3510 severity level as Error. When message 3510 is raised with severity Error, the device is put into Down state. In this state, the device can still accept configuration commands.

  • If the parameter reconcileFailureIsCritical is omitted from the default.properties file, the default setting True (Critical) is applied to message 3510.

The default.properties file is in Service_Activator_Home/Config/networkProcessor/com/Oracle/serviceactivator/networkprocessor.

(For more information about the Configuration Audit in the IP Service Activator GUI, refer to Online Help topic Device properties - Audit/Migrate page.)

Recovering from Rollback Failure

This procedure describes how to recover from rollback failure on a cartridge-managed device. Rollback failure can occur when manual configuration exists on a device. When rollback failure occurs, the device is put into Intervention Required state. No additional configuration can be sent to the device until you clear the manual configuration that is causing the problem. You should restore the proper configuration on the device so that its state can change back to Installed.

Pre-requisites: Only experienced IP Service Activator users with in-depth knowledge of device configuration commands should perform this procedure, or use the Command Re-issue feature.

A feature is provided to show the commands that were not sent due to failure on rollback. Such command lines are marked in the audit log file with the prefix [unsent-command].

To recover from rollback failure:

  1. Look in the Network Processor audit trails for this device. Find the entries with the ”unsent-command” prefix.

    The last sent command entry before the block of unsent-commands is the one that caused the failure.

  2. Note the full command string that failed, and its context (such as the sub-interface, ...).

  3. Note the device response in the fault text. If required, get the complete response from the Network Processor log. (Again, search for ”unsent-command”; the rollback failure will precede the block of unsent-commands.)

  4. Based on the command being sent, and on the nature of the failure, determine the conflicting manual configuration that exists on the device, that caused the failure.

    • For example: the rollback cannot remove a service policy applied by IP Service Activator because a user has manually re-used the same policy on another interface.

    • It may be helpful to perform a device audit.

    • A better approach is to get the running configuration from the device. This step requires knowledge of the language and form of the vendor's device configuration.

  5. After you identify the conflicting manual configuration, either remove the conflicting configuration or remove the conflict.

    So, in the example given in step 4:

    • either remove the reference to the policy on the non-managed interface,

    • or clone the policy manually and change the reference on the non-managed interface to the cloned policy.

  6. When the conflicting manual configuration has been resolved, you can decide which of the following methods to use to synchronize the IP Service Activator device model with what is actually on the device.

    Note:

    Command Re-issue allows you to send missing device configuration commands independently. If additional device configuration is required, you must manually re-issue the commands. Do not attempt to use Command Re-issue in this case.
    • Run a device audit and see what commands are identified as missing. If the missing commands can be re-sent in isolation (without any pre-requisite commands other than context establishing commands), use the Command Re-issue feature on the Device properties - Audit/Migrate page to resend the missing commands. For more details, follow the procedure ”Re-sending commands marked as MISSING in the Configuration Audit” in the Online Help.

    • Look at the unsent-command list to determine which of the commands must be issued in order to allow the missing commands to be re-sent, and, to identify commands that are either no longer in the audit or are present but flagged as Conflict and need to be removed. All commands from the unsent-command list must be manually re-issued to the device.

      Note:

      It might be simpler to manually re-issue all unsent-commands in the list and then run a device audit to see what commands are reported as missing. If there are any missing commands, then these can be re-sent using the Command Re-issue feature (if appropriate) or manually. However, this approach may churn the device configuration more than necessary.
  7. Repeat this procedure until a clean audit response is returned.

  8. Delete the Critical fault against the device.

    The device is restored to its previous Online state and provisioning can resume.

Network Processor Does Not Start

The Network Processor may fail to start for reasons such as errors in the cartridge registry, options, or synonym files, missing cartridge files, missing error message files.

Some of the primary reasons are:

CORBA handshake delay time

The component manager waits a configurable amount of time for a CORBA handshake from the network processor to indicate its readiness. If the handshake is not completed in that time, the component manager restarts the network processor.

Cartridge handshake delay time

A large number of complex cartridges may require the network processor startup time to exceed the default wait time (300 seconds) before the component manager checks for the handshake. Increasing the component manager startup wait time before checking for the handshake allows the network processor additional time to complete its startup.

To change the component manager startup wait time, edit the cman.cfg file in the directory /opt/OracleCommunications/ServiceActivator/. Change the default value to a higher value.

An example from the cman.cfg file follows:

#NetworkProcessorEntry
/opt/OracleCommunications/ServiceActivator/bin/networkprocessor "-ComponentName proxy-np-srvotlab452 -ComponentLocation srvotlab452" 1 240 1 0

Credentials expiry

The Network Processor does not start when the database credentials have expired or are about to expire. In this case, the Network Processor displays one of the following SQL exceptions:

  • ORA-28002: The password will expire within %s days

  • ORA-28001: ORA-28001: The password has expired

The Oracle database alerts the database user to change the password after its expires, and enters a grace period, which is a default value of 10 days. During the grace period the password must be changed.

There are two ways to prevent this:

  • Set the PASSWORD_GRACE_TIME to UNLIMITED. A warning is issued but the connection to the database can still be made.

  • Change PASSWORD_VERIFY_FUNCTION to NULL for the specific user profile.

However, these two methods are not recommended. They are not part of a strong security policy as they do not encourage the user to change the password. It is recommended that the database user change the password within the defined expiry time for the specific database user profile.

VMSize

The Network Processor may fail to start when the vmsize (defined in startup script) is set too small with the xmx value set to lower than 3.5 GB.

If Concrete Rules Are Not Created for Abstract Rules

If no concrete rules are listed for an abstract rule, check that:

  • the correct device and interface roles have been associated with the rule and assigned to the relevant policy targets.

  • devices to which the rule should apply are managed by IP Service Activator.

Running Troubleshooting Scripts

The following script is available to assist you in retrieving information about your deployment platform. This script is located in IP Service Activator's bin directory.

  • ipsaps – when run, this script displays the IP Service Activator processes that are currently running.

Example: Using the IPSAPS script to verify which components are running

  1. On the server where you want to verify the components, log in as the IP Service Activator administrator (ipsaadm).

  2. View which processes are running:

    $ ipsaps
    
  3. Check, for example, that the Network Processor is started. If a process is listed that contains the word networkprocessor, then the Network Processor is running.

     ipsaadm 13964 13763 S 31760 138176        0:14 event_handler +debugFile -debu
     ipsaadm 13778 13763 S 38640 143848        0:37 policy_server +debugFile -debu
     ipsaadm 13768 13763 S 13368 114008        0:00 system_log -ComponentName mast
        USER   PID  PPID S  RSS  VSZ        TIME COMMAND
     ipsaadm 13763     1 S 7448 8864        0:00 cman -Service no +debugFile -debu
     ipsaadm 13974 13763 S 22656 29424        0:14 integration_manager -timeout 0 
     ipsaadm 13982 13763 S 56784 87120        0:55 networkprocessor 
     ipsaadm 12919     1 S 5352 6440        0:02 omniNames -logdir /opt/Orchestrea
    
  4. If a component is not running, start the component. See "Starting and Stopping IP Service Activator".

If problems arise, refer to install-related logs in logs/ipsacm.log.

Effects of Component Failures

This section provides an overview of the events that occur if an IP Service Activator software component fails, or the machine hosting one or more components fails. The effects of these failures on other system components are described.

Component failure can be divided into two areas:

  • Software failure – an IP Service Activator component fails

  • Hardware failure – the machine hosting one or more components fails

For information on backing up and restoring IP Service Activator data, see "Backing Up and Restoring Data".

Component Distribution and Communication

In a standard deployment, components are distributed across multiple host machines. Components communicate with each other using CORBA (Common Object Request Broker Architecture) with the CORBA naming service coordinating inter-component communication.

An Oracle Database stores the system, topology and policy data generated within the system. The policy server and user interface components communicate with the database.

Figure 9-1 outlines the distribution of major components and shows the connections between them. All connections except those to the Oracle Database use CORBA.

Figure 9-1 Standard Configuration of Component Distribution and Communication

Description of Figure 9-1 follows
Description of "Figure 9-1 Standard Configuration of Component Distribution and Communication"

Note:

Note that all IP Service Activator components communicate with the naming service and these connections are therefore not shown.

Software Failure

This section describes the events that occur if an IP Service Activator component fails.

Naming Service

The naming service maintains CORBA name and location details for all other system components. If run as a service, it starts up automatically on installation or system restart.

As each components starts up, it registers with the naming service, passing its name and the port on which it will listen for communication from other components. When one component needs to contact another, it contacts the naming service for that component's contact details.

On Windows, an associated program called the naming service wrapper allows the naming service to be started as a system service and allows the correct environment to be set. On Solaris/Linux, these functions are performed when you run the ipsans script.

Note:

For a list of IP Service Activator process names on Solaris/Linux see "IP Service Activator Process Names on Solaris/Linux".

Once a component has retrieved contact details for a particular component from the naming service, it maintains that information and does not need to contact the naming service for subsequent communication. If a component fails and restarts, however, it registers a new listening port. Calling components retrieve the new contact details from the naming service the next time they try to communicate with the component.

The naming service is therefore only accessed when:

  • Components register with the naming service at startup

  • A registered component fails and restarts and re-registers with the naming service

  • One component contacts another component for the first time or after a component has failed and restarted

The naming service maintains a record of registered component details in the omninames-machine_name.log file in the WorkingData directory. On restarting after a failure, the naming service reads in the component details held in the file.

Effects of failure

If the naming service is run as a service, it may fail and restart automatically before any components have tried to make contact. In this case, the failure is imperceptible.

A component may try to retrieve another component's contact details while the naming service is down. To the calling component it will appear that the target component is down as its contact details are unavailable. The calling component makes periodic calls to the naming service, however, and, if the service returns, contact details will be obtained with minimal effect on the system.

Component Manager

The component manager starts all system components, monitoring and reporting their status and restarting any that fail. It raises faults on components and updates the system logger with component status information – that is, whether the component is up or down.

On restarting after a failure, the component manager checks the lock files in the WorkingData directory. This file records which processes were running when the component manager failed. The component manager tries to connect to these processes. If it is unable to make a connection, the component manager restarts the relevant components.

Effects of failure

If the component manager fails:

  • Any components that fail are not restarted automatically

  • Users cannot execute the Shutdown All or Startup command from the user interface to shut down or start up components on the host machine

  • Information on component states is not reported to the system logger

System Logger

The system logger receives messages from the policy server and proxy agent and details of component status from the component manager. It stores the information that it receives in the system database.

Effects of failure

If the system logger fails, proxy agent and policy server faults and component status details are not passed on to the database and they are silently dropped.

Policy Server

The policy server is the central component of IP Service Activator. It coordinates access to the database from the user interface components, performs network topology discovery, and validates and transmits policy configurations to the proxy agents.

The policy server maintains the current state of the system, stored in the database – operations with the database are transaction controlled to maintain consistency.

Effects of failure

If the policy server fails while saving a transaction to the database, when it restarts it returns to the system state as it was before the most recent successful save.

Note that, by default, data that is saved by a user is queued by the policy server before being saved to the database. If there is a large number of users accessing the database, users may lose changes that appeared to be saved in the user interface. Running the user interface in &rsquor;confirmed transaction mode' reduces the likelihood of lost data, but slows a user's progress in the user interface. In this mode, changes cannot be made in the user interface until the policy server notifies the component that a transaction's changes have been saved to the database. For more information on confirmed transaction mode, see IP Service Activator User's Guide.

If the policy server fails while propagation is in progress, on restart it re-propagates the most recent saved configuration to the proxy agents. This may take some time to complete. Proxy agents receive the re-sent configuration and only push information that they have not received before to the device drivers.

If components try to contact the policy server while it is down, they continue to attempt contact until the policy server is back on line. An error message is logged by the components reporting on the loss of connectivity.

Users may enter changes via the user interface while the policy server is down but may not save or commit transactions. When connectivity is reinstated, the object model maintained by the user interface component is synchronized with that maintained by the policy server and changes may be saved to the database.

Proxy Agent

The proxy agent configures the individual devices within a policy domain with the policy configurations it receives from the policy server. It maintains state information in memory only, so in the event of failure, all state information is lost and must be retrieved from the policy server.

When the proxy agent restarts it contacts the policy server which sends the proxy agent the configuration information relevant to the proxy agent – for example, the currently-configured rules and PHB groups and details for the device drivers and devices managed by the proxy agent. The proxy agent pushes only the configuration that it has not encountered before to the device drivers.

Effects of failure

Depending on the system set-up, retrieval of the current configuration details from the policy server may take some time to complete. Significant factors include the size of the system being managed, the number of rules, PHB groups and VPN sites configured and the bandwidth of the connection to the policy server.

Device Driver

Device drivers configure the network devices, converting the policy configuration into the appropriate device configuration protocol.

Juniper M-series Device Driver

On restart, the device driver performs an SNMP query to check the device type and capabilities of each managed device. When a transaction is next committed, the driver checks whether the configuration must be updated and, if necessary, pushes the IP Service Activator configuration group to the device.

Effects of failure

While a device driver is down, configuration cannot be propagated to the devices it manages. In the case of the Cisco device driver, if a device's configuration was manually changed while the device driver was down, this change is not detected until a transaction is next committed from the user interface.

Network Processor

The network processor manages communication to and from devices through cartridges. Within the context of IP Service Activator, it performs the conversion of user changes to the configuration of IP Service Activator objects into a set of CLI commands for delivery to devices.

The network processor / cartridge architecture is extensible and scalable. Support for new services and devices can be added by creating and deploying new cartridges and cartridge components.

Within the context of Configuration Management, the network processor manages the flow of information from a configuration policy (or configlet) to a device, again by performing the conversion of configuration commands into a set of CLI commands for delivery to the device.

Effects of failure

When the Network Processor fails, the component manager raise a fault and attempts a restart.

In-process audits, backups, restores, and repairs are not carried forward and will not resolve - you will have to re-submit them.

Service changes which were not yet in the communication phase, will be eventually pushed when the Network Processor comes back online.

If the failure occurred during a period when a device was actually being configured, the commands will be re-issued. For devices which ignore configuration commands that are re-applied, you might see faults indicating a write-failure to the device if the device reports back a warning that the configuration already existed. These can be ignored since the original commands were applied, and the re-application does not change any configuration settings on the device. On transaction-oriented devices (such as some Juniper devices) this behavior will not occur.

If there is not enough virtual memory allocated to the Network Processor, you could experience a series of failures and restarts. This can also be caused by an incorrect Network Processor configuration (e.g. missing cartridge files, message files, etc.)

If you see repeated failures, check the logs.

If you are having trouble getting the affected Network Processor to restart, consider moving your devices to another Network Processor.

Log Reader

The log reader collects logging information from multiple IP Service Activator components and stores it in a normalized format in the Oracle Database. The log schema is compliant with log4j.

Any third-party product which supports log4j can be used to interact with the logging data.

Effects of failure

The log reader can scavenge logging information that was created while it was unavailable. As long as the individual component log files have not rolled over, there should be no loss of logging information.

User Interface

There is no component manager installed on the user interface host machine. Therefore, the user must restart the user interface if it fails. On restart, the user interface retrieves information about the object model from the policy server.

Effects of failure

Any transactions that were not saved before the component failed are lost.

Event Handler

The event handler collects, filters and delivers details of faults and other events occurring anywhere in the network managed by IP Service Activator and can notify external systems of these events. The events could be, for example, a new device has been discovered, a site has been put into a VPN, or the ability to provision a device has been lost.

The event handler can be configured to output information to a file, through SNMP traps, or through the CORBA API.

The event handler processes each active event collector and checks the events that have occurred within the collector's scope against those that have already been forwarded to the relevant third-party systems. Any outstanding events are sent to the third-party systems.

Effects of failure

If the component fails, on restart it retrieves the EOM and details of subscribed events. This information is stored in the database or the SubscribedEvents.log file in the WorkingData directory, depending on whether a direct connection has been set up between the event handler and the database.

Events which occur while the event handler is down will not have their associated notifications sent to the third-party, external systems, nor will these messages be sent once the event handler is available again. This could have different implications depending on the external system and the type of event.

OSS Integration Manager (OIM)

The OIM provides an API to Operational Support Systems (OSSs), enabling IP Service Activator to be integrated with third-party software, such as billing, monitoring and fault management systems. The OIM enables automated or programmatic control of IP Service Activator's service provisioning facility.

If the OIM fails, the component manager automatically attempts to restart the component:

  • If the policy server is running (and it is the correct version), the OIM starts and connects to the policy server.

  • If the policy server is not running, the OIM attempts to connect to the policy server every five seconds.

  • If an incorrect version of the policy server is running, the OIM fails to start and does not make any further attempt to connect to the policy server. However, the component manager continues to restart the OIM and only stops its automatic restart of the component if the OIM fails three times.

Effects of failure

If the OIM fails, the Command Line Interface (CLI) gives no immediate indication of the failure. However, when the next command is entered, the CLI recognizes that the OIM session that it was connected to is no longer available. The CLI attempts to reconnect three times. On reconnection, the OIM prompts for the user name and password. If the CLI fails to reconnect after three attempts, the CLI closes.

Any scripts that are running when the OIM fails will stop.

If the measurement component is running as a client of the OIM, the measurement component fails if the OIM fails. The component manager automatically attempts to restart both components.

CTM Server Component

The CTM Server acts as a back-end repository for Configuration Templates for CTM clients.

When the CTM Server is unavailable, CTM clients cannot perform CTM work.

Any instantiations of templates that are already in the IP Service Activator system are still processed and their commands pushed onto the target devices.

Note:

Note that there is a provision in the Configuration GUI to manually set the CTM server port.

Discovery Module Server Components

Discovery modules include the Alcatel 5620 SAM and JUNOSe discovery modules. Each of these have a server component.

Discovery modules allow the discovery of devices using protocols other than SNMP-based MIB2. They are also used in cases where the standard discovery protocol can take too much time use too many system resources (such as JUNOSe).

Effects of failure

Failure of these components prevent their use to discover new devices.

Tacc Module Server Component

The TACC module server component acts as a centralized collector for all TACC requests and responses. Whenever a network-touching component (i.e. device driver or Network Processor/cartridge combination) crosses a threshold it, it sends a confirmation request to the TACC module server.

The TACC server will then relate information to all TACC clients that have subscribed to the server. Approval or denial of the request are passed from the client back to the server and then back to the originator to accept or deny the action.

Effect of failure

No TACC services (such as subscriptions) will be available while the server is not running. The TACC clients will raise errors.

Device drivers and network processors will be unable to contact the TACC server and will revert to thresholding functionality instead of using TACC functionality and will immediately fail the transaction in progress at the time of the TACC server failure.

Hardware Failure

In most cases, there are multiple IP Service Activator components on a host machine. The exception is the user interface host, where only the user interface component is installed. A hardware failure therefore generally causes a number of components to fail simultaneously. This raises the issue of component startup order.

About the Component Manager and the Restart of Managed Components

On all hosts, if set to start automatically upon system start, the component manager restarts when the host is restarted. The component manager will also restart any managed components on the host on which it resides.

If the component manager is not set to start automatically upon system start, it will have to be manually restarted. It will then restart its managed components.

Proxy Agent and Device Driver Host

The component manager will restart the proxy agent and device drivers. The order of startup is not significant, but situations may occur where one or other component attempts to contact the other before it has restarted:

  • If the device driver restarts before the proxy agent, the device driver produces debug trace error output reporting that it cannot contact its proxy agent. It continues to do this until the proxy agent becomes available. On contact, the proxy agent sends the latest configuration to the device driver.

  • If the proxy agent restarts before the device driver, the proxy agent waits for a notification from the device driver before pushing the latest configuration to the driver.

Policy Server Host

The naming service and component manager restart automatically if run as services. Otherwise they must be started manually. The component manager subsequently restarts the policy server and system logger.

User Interface Host

The user interface component is the only IP Service Activator component running on the host machine. Therefore, if the host fails there are no component startup order issues to be considered. The user interface component's behavior is described in "User Interface".

OIM Host

The component manager, when restarted, restarts the OIM.

Event Handler Host

The component manager, when restarted, restarts the event handler.

Measurement Component Host

The component manager, when restarted, restarts the measurement component.

The measurement component should be installed on the same host machine as the OIM. If the OIM is not available when the measurement component restarts, the measurement component waits and attempts to reconnect with the OIM every five seconds until the OIM becomes available.