7 Recovering from Site Failures

This chapter describes and compares various disaster backup strategies and describes how to prepare for disaster recovery. For each disaster recovery strategy, this chapter also describes the recovery procedures and a list of assumptions.

Introduction

The LSMS system administrator needs to plan a recovery strategy for situations when both the LSMS active server and the standby server are unable to receive data from the NPAC. This occurs when the LSMS hardware is unable to operate, perhaps due to a fire or a natural disaster.

This chapter describes and compares various disaster backup strategies and describes how to prepare for disaster recovery. For each disaster recovery strategy, this chapter also describes the recovery procedures and a list of assumptions.

Choosing a Disaster Backup Strategy

Choose one of the following backup strategies, in which a shadow LSMS is defined to be an entire LSMS, with its own service provider ID, located in a separate geographical location from the main LSMS:

  • Active shadow
  • Inactive shadow
  • No shadow

The various backup strategies provide different methods for ensuring that the shadow LSMS contains the same data as the main LSMS.

Note:

Whenever you must manually enter locally provisioned data at the shadow LSMS, be sure that you use the same service provider identifier (SPID) that was used to enter the same locally provisioned data at the main LSMS. For more information, see Synchronizing Data Between the Main LSMS and Shadow LSMS.

The following sections provide an overview of each strategy. Detailed descriptions or recovery procedures for each strategy are described in Performing Disaster Recovery with an Active Shadow LSMS through Returning Operation from Shadow LSMS to Main LSMS .

Using an Active Shadow

Figure 7-1 shows the configuration of a main LSMS that uses an active shadow as its backup.

An active shadow LSMS is an entire LSMS that is active and has active associations with each NPAC from which the LSMS needs data (only one NPAC is shown in Figure 7-1).

Figure 7-1 Overview of Main LSMS and Active Shadow LSMS


img/c_choosing_a_disaster_backup_strategy_mm-fig1.jpg

The disaster recovery backup strategy for this configuration provides the least out-of-service time for the LSMS. The recovery procedures for this strategy are described in Performing Disaster Recovery with an Active Shadow LSMS.

Using an Inactive Shadow

Figure 7-3 shows the configuration of a main LSMS that uses an inactive shadow as its backup.

The shadow LSMS does not maintain active connections with the NPACs that supply data to the main LSMS. However, disaster recovery is still more feasible than using no shadow, especially for disaster situations in which the physical site of the main LSMS is damaged (such as fire or natural disaster).

Figure 7-2 Overview of Main LSMS and Inactive Shadow LSMS


img/c_choosing_a_disaster_backup_strategy_mm-fig2.jpg

With this configuration, during disaster recovery you need to restore all databases from the NPAC. The recovery procedures are described in Performing Disaster Recovery with an Inactive Shadow LSMS.

Using No Shadow

Figure 7-3 shows the configuration of a main LSMS that has no shadow as its backup.

Figure 7-3 Overview of Main LSMS without a Shadow LSMS


img/c_choosing_a_disaster_backup_strategy_mm-fig3.jpg

When no shadow LSMS exists, disaster recovery requires immediate repair of the main LSMS and its physical site, and then restore all databases from the NPAC. The recovery procedures are described in Performing Disaster Recovery without a Shadow LSMS.

Synchronizing Data Between the Main LSMS and Shadow LSMS

Both NPAC data and locally provisioned data need to be synchronized between the main and shadow LSMS so that the shadow can take over when the main LSMS fails.

  • NPAC data synchronization occurs in one of the following ways:
    • With an active shadow, active connections from both main and active shadow to the NPACs allow transmission of the same NPAC data to both LSMSs.
    • With an inactive shadow, NPAC data is synchronized by loading files from a backup tape and/or downloading files from the NPAC to the inactive shadow LSMS.
  • Locally provisioned data must be manually entered at both the main LSMS and shadow LSMS.

    Note:

    When you log in to manually enter any locally provisioned data, always use the same service provider ID (SPID) at both the main LSMS and the shadow LSMS. Locally provisioned data is correlated with a SPID. In order for the data to be the same at the main LSMS and shadow LSMS, it must be entered with the same SPID at both LSMSs. The main LSMS and shadow LSMS must use different NPAC-assigned SPIDs for their association with the NPAC. You can create SPIDs used just for entering data, or you can use the main LSMS’s NPAC-assigned SPID for entering locally provisioned data at both the main LSMS and shadow LSMS.

    For information about manually entering locally provisioned data, refer to the Database Administrator's Guide.

Preparing for a Disaster Situation

For all recovery strategies, prepare for disaster situations by doing the following:

  • Make sure that the following conditions are true:
    • The main LSMS, any restored LSMS, and the shadow LSMS have the required software licenses. Use the procedure described in Verifying the Processes Running on the Active Server for each server on each LSMS; licenses are required for processes to run.
    • Hardware and software versions on the main and shadow LSMS are identical.
    • Any optional features are installed and configured on both the main and shadow LSMS.
  • Make sure the following items are always available and easy to locate:
    • The most recent database backup tape
    • TPD USB media
    • LSMS application USB media
    • Completed Disaster Recovery sheet, as shown in Recovery Preparation Worksheet.

In addition, if you use an active shadow LSMS, make sure the following conditions are true:

  • The shadow LSMS hardware has received the same required maintenance as the main LSMS.
  • You have the ability to connect to the shadow LSMS using the Secure Shell (ssh).
  • You have the ability to display LSMS applications on your workstation.
  • The network connections from the network elements to the shadow LSMS, which are critical during a disaster, have been periodically tested. Networks are often subject to frequent changes, and these changes can affect your connection between the shadow LSMS and the network elements.
  • Any data you have added, modified, or deleted on the main LSMS has also been added, modified, and deleted on the shadow LSMS.

At least annually, your site should prepare a drill in which the key personnel perform the disaster recovery procedure. This ensures that any potential problems or questions can be addressed in a non-emergency situation.

Determining When to Switch to Shadow LSMS

Switching to a shadow LSMS is the obvious solution in cases of fire or other destruction of the main LSMS site. In addition to these cases, some problems with the main LSMS may warrant switching to the shadow LSMS. These situations can be determined with the Surveillance feature.

If the Surveillance feature is active, it posts a notification every five minutes. If the Surveillance feature has detected an error, it posts a notification reporting the error. If no errors have been detected, the Surveillance feature posts the following “keep alive” message to indicate that the Surveillance feature is running, where <Host Name> indicates the host name of the server that is reporting the notification.


LSMS8000|14:58 Jun 22, 2000|<Host Name>|Keep alive

Absence of “keep alive” messages is an indication that a potential problem exists. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for help in determining whether the problems warrants switching to the shadow LSMS.

For more information about the Surveillance feature, see Understanding the Surveillance Feature. For more information about Surveillance notifications, see Automatic Monitoring of Events

Disaster Recovery Procedure Overview

Table 7-1 provides an overview comparison of the procedures you should perform and the order in which to perform them, according to the disaster backup strategy you are using. Following sections describe each disaster backup strategy in more detail and list any conditions assumed.

Table 7-1 Comparison of Recovery Procedures to Perform

Recovery Procedure Note: This table is for comparison; for detailed procedures by strategy, see through . Active Shadowa Inactive Shadowa No shadow b Restoring Operations to the Main LSMS After Running on Active Shadowb Restoring Operations to the Main LSMS After Running on Inactive Shadowb

Repair or replace the LSMS

   

1

1

1

Recovery acceptance test

1

1

2

2

2

Contact each NPAC from which the LSMS needs data to request download files

 

2

3

 

3

Contact each NPAC from which the LSMS needs data to provide it with the IP address with which to establish association to the mate LSMS

 

3

 

3c

4

FTP data from NPAC and import it into the LSMS

 

4

4c

4c

5c

Start LSMS GUI

 

5

5

5

6

Add locally provisioned data that had been entered since last backup (or not already entered on mate LSMS)

2

6

6

*

*

Reconnect network elements

3

7

7

6

7

If the disaster outage has lasted 7 days or less, perform a time range audit and reconcile to network elements and a full-range audit of DGTT, OGTT, and NPA-Splits (otherwise perform a bulk download to network elements and then reassociate network elements)

4

8

8c

7c

8c

If query servers are installed, stop all directly connected query servers

5

9

 

8

9

If query servers are installed, configure each directly connected query server to use the IP address of the mate LSMS for its master host

6

10

 

9

10

If query servers are installed, reload each directly connected query server from the mate LSMS

7

11

9

10

11

Run on the shadow LSMS until main LSMS is restored

8

12

     

Return operations to restored main LSMS

9d

13d

     

aPerform these procedures on the shadow LSMS.

bPerform these procedures on the main LSMS.

cPerform only as required.

dAs described in Table 7-5 (and summarized in the rightmost columns of this table).

*Backups should always be scheduled immediately before switching from the shadow LSMS to the main LSMS; no additional data should have been locally provisioned.

Performing Disaster Recovery with an Active Shadow LSMS

In this configuration, an entire LSMS is active and has active associations with each NPAC from which the LSMS needs data. This disaster recovery backup strategy provides the least out-of-service time for the LSMS.

In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:

  • Both the main LSMS and shadow LSMS are associated with each NPAC (up to eight) from which the LSMS needs data, and both the main LSMS and the shadow LSMS are receiving automatic updates. Each regional NPAC database at both LSMS sites is synchronized with the NPACs.

  • A network connection from each serviced network element to the shadow LSMS exists, but the network element is not associated with the shadow LSMS at the time the main LSMS fails.

  • Users, groups, and passwords are identically configured at the main LSMS and shadow LSMS.

  • Any data locally provisioned at the main LSMS is also locally provisioned at the shadow LSMS.

Perform the procedures shown in Table 7-2 on the shadow LSMS when a disaster occurs on the main LSMS.

Table 7-2 Recovery Procedures When LSMS Shadow Is Active

Active In the order shown, perform the following recovery procedures:

1

(Optional) Recovery acceptance test on active server of shadow LSMS:

  1. Verifying the State of the Servers

  2. Verifying the Processes Running on the Active Server (with primary server as active server)

  3. Verifying the GUI Operability on the Active Server (with primary server as active server)

Note:

Do not switch over to the shadow LSMS’s standby server until all EMSs have been resynchronized because all queued subscription data would be immediately flushed.

2

Add any locally provisioned data that may have been added to the main LSMS before it failed and has not yet been added to the active shadow.

3

Perform the procedures in “Reconnecting Network Elements” (start with 4 and use the main LSMS as the source and the shadow LSMS as the destination).

4

For each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide.

5, 6, 7

If any query servers are installed:

  1. Stop the directly connected query servers.

  2. Configure each directly connected query server to use the shadow LSMS as its master host (refer to the procedure described in “MySQL Replication Configuration for Query Servers” in the Configuration Guide).

  3. For each directly connected query server, perform the procedure in “Reload a Query Server Database from the LSMS.

8

Run on the shadow LSMS until the main LSMS is restored.

9

Returning Operation from Shadow LSMS to Main LSMS.

Performing Disaster Recovery with an Inactive Shadow LSMS

In this disaster recovery strategy, you have a complete LSMS system installed at a geographically remote site, but it is not running and does not receive updates from the NPAC until you perform the procedures described in this section. This strategy requires a much longer recovery period than having an active shadow requires, but is still much safer than having no shadow. Having no shadow can result in a very long recovery period in serious disaster situations, such as fire or natural disaster.

In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:

  • At the shadow site, all hardware and software components have already been installed and passed an acceptance test.
  • At the main LSMS, valid backups exist for all data. These backups are ready to be shipped to the shadow LSMS.
  • A network connection exists between the shadow LSMS and each network element and each NPAC. At the time of failure, the shadow LSMS is not associated with any of the network elements or NPACs.

Perform the procedures shown in Table 7-3 on the shadow LSMS when a disaster occurs on the main LSMS.

Table 7-3 Recovery Procedures When LSMS Shadow Is Inactive

Inactive In the order shown, perform the following recovery procedures:

1

Recovery acceptance test on inactive shadow LSMS:

  1. Verifying the State of the Servers

  2. Verifying the Processes Running on the Active Server (with primary server as active server)

  3. Verifying the GUI Operability on the Active Server (with primary server as active server)

  4. Manually Switching Over from the Active Server to the Standby Server

  5. Verifying the Processes Running on the Active Server (with secondary server as active server)

  6. Verifying the GUI Operability on the Active Server (with secondary server as active server)

  7. Manually Switching Over from the Active Server to the Standby Server

2, 3

Contact each NPAC from which the LSMS needs data to:

  • Provide them with the IP address with which to establish association to the shadow LSMS.

  • Request which files will be needed to download to the shadow LSMS. It is recommended that the request be for all NPAC files dated from one hour before the time shown on the backup tape.

4

FTP data from the NPAC and import it into the LSMS (see Downloading Files from an NPAC to the LSMS).

5

Start the LSMS GUI (association with each NPAC is automatically attempted).

6

At the shadow, add any locally provisioned data that needs to be added.

At shadow, manually enter any locally provisioned data that had been entered at the main since the last backup tape was made.

7

Perform the procedures described in “Reconnecting Network Elements”.

8

If the disaster outage has lasted for 7 days or less, for each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide.

(If the disaster outage has lasted more than 7 days, perform a complete bulk download from the shadow LSMS to each network element. For information about performing bulk downloads to network elements, refer to the LNP Database Synchronization User's Guide.)

9, 10,11

If any query servers are installed:

  1. Stop the directly connected query servers.

  2. Configure each directly connected query server to use the shadow LSMS as its master host (refer to the procedure described in “MySQL Replication Configuration for Query Servers” in the Configuration Guide.

  3. For each directly connected query server, perform the procedure in Reload a Query Server Database from the LSMS.

12

Run on the shadow LSMS until the main LSMS is restored.

13

After main LSMS has been repaired, “Returning Operation from Shadow LSMS to Main LSMS.

Performing Disaster Recovery without a Shadow LSMS

In this disaster backup strategy, you have no physical backup for the LSMS. In a disaster situation, you must restore the main LSMS. Having no shadow can result in a very long recovery period in serious disaster situations, such as fire or natural disaster.

In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed for this procedure:

  • The main LSMS is restored at the same physical site. If another site is used, you must perform site survey and preparation as you do for any initial LSMS installation. For more information about installing LSMS, refer to Application B Card Hardware and Installation Guide.

  • A network connection exists between the restored main LSMS and each NPAC and network element.

Perform the procedures shown in Table 7-4 to restore the main LSMS when a disaster occurs.

Table 7-4 Recovery Procedures When No LSMS Shadow Exists

No shadow In the order shown, perform the following recovery procedures:

1, 2

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to arrange repair or replacement of the LSMS. Oracle will dispatch technicians who will perform repairs, return the LSMS to operational status, and perform recovery acceptance tests.

3

Contact each NPAC from which the LSMS needs data to request which files will be needed to download to the shadow LSMS. It is recommended that the request be for all NPAC files dated from one hour before the time shown on the backup tape.

4

FTP data from NPAC and import it into the LSMS (see Downloading Files from an NPAC to the LSMS).

5

Start the LSMS GUI (association with each NPAC is automatically attempted).

6

If any locally provisioned data needs to be added, add it.

7

Perform the procedures in “Reconnecting Network Elements”.

8

If the disaster outage has lasted for 7 days or less, for each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide.

(If the disaster outage has lasted more than 7 days, perform a complete bulk download to each network element. For information about performing bulk downloads to network elements, refer to the LNP Database Synchronization User's Guide.)

9

If any query servers are installed, for each directly connected query server, perform the procedure in “Reload a Query Server Database from the LSMS.

Returning Operation from Shadow LSMS to Main LSMS

Use the procedures described in this section to return operations from the shadow LSMS to the main LSMS after the main LSMS has been restored. Do not take the shadow LSMS out of service until you have completed this procedure, including the resynchronization of LNP data with the NPAC and network elements. If any problem occurs during the restoration of operations to the main LSMS, you can return to using the shadow LSMS.

In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:

  • The main LSMS is restored at the same physical site. If another site is used, you must perform site survey and preparation as you do for any initial LSMS installation. For more information about installing LSMS, refer to Application B Card Hardware and Installation Guide.
  • A network connection exists between the restored main LSMS and each NPAC and network element.
  • Encryption keys have been exchanged between the NPAC and the restored main LSMS.
  • License keys are valid for the main LSMS.
  • At the main LSMS, valid backups exist for all data.
  • At a previously inactive shadow LSMS, valid backups exist for all data. A complete backup should be scheduled immediately before the scheduled return to the main LSMS, so that no locally provisioned data is entered after the switch back to the main LSMS.

Perform the procedures shown in Table 7-5 to restore the main LSMS.

Table 7-5 Procedures to Return Operations from Shadow LSMS to Main LSMS

Restoring Operations to the Main LSMS After Running on Active Shadow Main LSMS Restoring Operations to the Main LSMS After Running on Previously Inactive Shadow In the order shown in the appropriate column, perform the following recovery procedures:

1

1

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to arrange repair or replacement of the LSMS. Oracle will dispatch technicians who will perform repairs and return the LSMS to operational status.

2

2

Recovery acceptance test or manufacturing acceptance test, depending on the severity of original failure (performed by technicians).

3

3

After the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 personnel have performed an acceptance test, if desired, customers may wish to perform the following tests to verify that the restored main LSMS is fully functional:

  1. Verifying the State of the Servers

  2. Verifying the Processes Running on the Active Server (with primary server as active server)

  3. Verifying the GUI Operability on the Active Server (with primary server as active server)

  4. Manually Switching Over from the Active Server to the Standby Server

  5. Verifying the Processes Running on the Active Server (with secondary server as active server)

  6. Verifying the GUI Operability on the Active Server (with secondary server as active server)

  7. Manually Switching Over from the Active Server to the Standby Server

3

4

If any NPAC data may be updated during the period of time between when you plan to disconnect the shadow LSMS and connect with the main LSMS, contact each NPAC from which the LSMS needs data and request download files for that time period.

 

5

If returning from a shadow LSMS that was previously inactive, contact each NPAC from which the LSMS needs data to provide them with the IP address with which to establish association to the main LSMS.

4

6

If any download files were requested from any NPAC above, FTP the files and import them into the LSMS (see Downloading Files from an NPAC to the LSMS).

5

7

Start the LSMS GUI.

6

8

Perform the procedures in “Reconnecting Network Elements”, where the source LSMS is the shadow LSMS, and the destination LSMS is the main LSMS.

7

9

For each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide.

8, 9, 10

10, 11, 12

If any query servers are installed:

  1. Stop the directly connected query servers.

  2. Configure each directly connected query server to use the main LSMS as its master host (refer to the procedure described in “MySQL Replication Configuration for Query Servers” in the Configuration Guide.

  3. For each directly connected query server, perform the procedure in “Reload a Query Server Database from the LSMS.

Resynchronizing After an Outage Between an NPAC and the LSMS

When an outage between the LSMS and NPAC occurs, the LSMS attempts to resynchronize automatically as soon as the association is reestablished. The NPAC then resends to the LSMS all transactions that were missed by the LSMS.

Automatic Resynchronization between the NPAC and the LSMS

Whenever association is reestablished between the NPAC and the LSMS, the NPAC and the LSMS automatically resynchronize their databases. The time required for automatic resynchronization between an NPAC and the LSMS is directly proportional to the number of transactions that need to be sent. If you believe you have a lot of subscription version records, you can choose to perform a manual NPAC/LSMS recovery, as described in Downloading Files from an NPAC to the LSMS.

If the NPAC and the LSMS are unable to complete automatic recovery, one of the following notifications will display on the LSMS console window, where either PRIMARY or SECONDARY indicates the NPAC for which recovery is underway:


[Critical] 2018: 99-07-05 12:55:56 NPAC [<PRIMARY|SECONDARY>]  Recovery Failed

or


[Critical] 2019: 99 -07-05 12:55:56 NPAC [<PRIMARY|SECONDARY>]  Recovery Partial Failure

If you receive one of these messages, perform the procedure described in Downloading Files from an NPAC to the LSMS using the example for performing a bulk download of files from the NPAC.

Reconnecting Network Elements

The following procedures explain how to reconnect the LSMS with network element software that manages database updates from the LSMS. Reconnecting is required in one of the following situations:

  • When you switch from the main LSMS to the shadow LSMS after a disaster has occurred

  • When you switch from the shadow LSMS back to the main LSMS after the main LSMS has been restored

  • When you restore an LSMS that had no shadow

Perform the procedures described in the following sections. (In these procedures, the “source LSMS” is the LSMS you switch from and the “destination LSMS” is the LSMS you switch to.)

  1. “Preparing to Reconnect Network Elements”

  2. “Reconnecting Network Elements Procedures”

    These procedures will be followed by automatic resynchronization as described in Automatic Resyncronization after Reconnect.

Preparing to Reconnect Network Elements

  1. Locate the completed Disaster Recovery Sheet, a current system backup tape, and a current database backup tape.
  2. Alert the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 that you are switching to the destination LSMS.
    The unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 will remain online to provide support during this procedure.
  3. From the network element, enter the following command to verify that the destination LSMS is reachable, where <LSMS_IP_Address> is the IP address of the LSMS:
    > ping <LSMS_IP_Address>
  4. From the destination LSMS, enter the following command to verify that the network element (NE) is reachable:
    # ping <ELAP_IP_Address>
  5. If the destination LSMS is not already running, log in as a user in the lsmsadm group to the destination LSMS and start an LSMS GUI session.
    Verify that the destination LSMS is in stable condition by checking the following:
    1. Verify that there are no active alarm conditions.
      Because the destination LSMS is not connected with the EMS, there are always error messages regarding the network element queue level alarms and its connection with the LSMS. For a destination LSMS, these messages are normal. If the Surveillance feature is active, these normal messages will be notifications LSMS 0004 and LSMS 8003 or LSMS 8004. (For more information, see Automatic Monitoring of Events)
    2. Verify that the NPACs are connected to the LSMS by examining the NPAC status area on a graphical user interface; verify that the NPAC icon for each supported NPAC displays green.
    3. Use following method to verify that no LSMS hardware failure indications are present:
      If the Surveillance feature is active, verify that no hardware failure notifications (LSMS 4003, LSMS 2000, LSMS 0001, LSMS 4004, LSMS 4005, LSMS 4006, LSMS 4007, or LSMS 4009) have been posted. For more information about these notifications, see Automatic Monitoring of Events
    4. Verify that the LSMS is not currently in recovery mode with any NPAC by ensuring that none of the following GUI notifications have been posted for any NPAC, where <PRIMARY|SECONDARY> indicates whether the NPAC to be connected is the primary NPAC or the secondary NPAC:
      
      [Critical]: <Timestamp> 2006: NPAC <PRIMARY|SECONDARY> Bind Timed Out - Auto retry after 2 min
      [Critical]: <Timestamp> 2007: NPAC <PRIMARY|SECONDARY> Connection Aborted by PEER - Auto retry same host 
      after 2 min
      [Critical]: <Timestamp> 2008: NPAC <PRIMARY|SECONDARY> Connection Aborted by PEER - Auto retry other host 
      after 2 min
      [Critical]: <Timestamp>: 2009 NPAC <PRIMARY|SECONDARY> Connection Aborted by Provider - Auto retry same 
      host after 2 min
      [Critical]: <Timestamp> 2010: NPAC <PRIMARY|SECONDARY> Connection Aborted due to recovery failure - Auto 
      retry after 2 min
      [Critical]: <Timestamp> 2012: NPAC <PRIMARY|SECONDARY> Connection Attempt Failed : Access Control Failure
      [Critical]: <Timestamp> 2014: NPAC <PRIMARY|SECONDARY> Connection Attempt Failed : Access Denied
      [Critical]: <Timestamp> 2015: NPAC <PRIMARY|SECONDARY> Connection disconnected by NPAC
      [Critical]: <Timestamp> 2018: NPAC iiii Recovery Failed
      [Critical]: <Timestamp> 2019: NPAC iiii Recovery Partial Failure
      [Critical]: <Timestamp> 2020: NPAC iiii Security Violation. Association aborted
      
      Also, if the Surveillance feature is active, verify that none of the following Surveillance notifications have been posted for any NPAC, where xxxxxxx is the hostname of the server reporting the notification, <PRIMARY|SECONDARY> indicates the primary or secondary NPAC, <NPAC_cust_ID> is a numeric indicator for the NPAC region, and <NPAC_IP_address> is the IP address of the NPAC:
      
      LSMS2000|14:58 Jul 22, 1997|xxxxxxx|Notify:Sys Admin - NPAC interface failure
      LSMS2001|14:58 Jul 22, 1997|xxxxxxx|Notify:Sys Admin - NPAC= <PRIMARY|SECONDARY> - <NPAC_cust_ID>
      LSMS2002|14:58 Jul 22, 1997|xxxxxxx|Notify:Sys Admin - NPAC= <NPAC_IP_address>
      
      If any of these notifications has been posted, verify that the following GUI notifications have been posted for the same NPAC:
      
      [Cleared] 2025: <Timestamp>: NPAC <PRIMARY|SECONDARY> Connection Successfully established
      [Cleared] 8055: <Timestamp>: NPAC <PRIMARY|SECONDARY> Recovery Complete
      
Continue with the next procedure.

Reconnecting Network Elements Procedures

Perform the following procedure:

  1. At the source LSMS, log in as lsmsadm on the active server.
  2. Enter the following command to display the status of all eagleagent processes: eagle status
    Scan the output for the names of all active EAGLE agents, similar to the values shown in bold in the following example:
    
    CLLI        Pid   State       Resync        Conn A  Conn B  DCM     EBDA    Debug Queue  
    Memory  CPU  Timestamp 1190801
         13622 A_ACTIVE    COMPLETE      ACTIVE  STANDBY NONE    IDLE     OFF    0 %   71
    M  0.1 % 13:00:40
    
  3. At the source LSMS, for each EAGLE agent process that is running, enter the following command to stop the EAGLE agent processes (<CLLI> is the Common Language Location Identifier for the EAGLE node):
    $LSMS_DIR/eagle stop <CLLI>

    For the example shown in step 2, you would enter the following commands:

    $LSMS_DIR/eagle stop 1190801

  4. At the destination LSMS, for each network element serviced by the LSMS, do one of the following:
    • In an inactive shadow configuration, create the EMS for the given network element (refer to the Configuration Guide, “Creating an EMS Configuration Component”). When you finish creating the EMS, sentryd process automatically starts the Eagle agent.
    • In an active shadow configuration, modify the EMS for the given network element (refer to the Configuration Guide, “Modifying an EMS Configuration Component”). Next, stop and restart the Eagle agent for the given CLLI using the following commands, then go to “Automatic Resyncronization after Reconnect”.

      $LSMS_DIR/eagle stop <CLLI>

      $LSMS_DIR/eagle start <CLLI>

    Next, the LSMS and the network elements will automatically resynchronize as described in “Automatic Resyncronization after Reconnect”.

Automatic Resyncronization after Reconnect

When the LSMS and MPS are reconnected, the LSMS automatically starts an automatic resynchronization of the databases. For more information, see “Automatic Resynchronization Process” in the LNP Database Synchronization User's Guide. If the LSMS cannot complete automatic resynchronization, it posts a notification to the LSMS GUI. For more information, refer to “Notifications that Database Maintenance Is Required” in the LNP Database Synchronization User's Guide.

If the Surveillance feature is active, the following Surveillance notification is also posted, where <Host Name> is the hostname and <CLLI> is the 11-character CLLI code of the network element:


LSMS8001|14:58 Jul 22, 1997|<Host Name>|Notify:Sys Admin - NE CLLI=<CLLI>