7 Recovering from Site Failures
This chapter describes and compares various disaster backup strategies and describes how to prepare for disaster recovery. For each disaster recovery strategy, this chapter also describes the recovery procedures and a list of assumptions.
Introduction
The LSMS system administrator needs to plan a recovery strategy for situations when both the LSMS active server and the standby server are unable to receive data from the NPAC. This occurs when the LSMS hardware is unable to operate, perhaps due to a fire or a natural disaster.
This chapter describes and compares various disaster backup strategies and describes how to prepare for disaster recovery. For each disaster recovery strategy, this chapter also describes the recovery procedures and a list of assumptions.
Choosing a Disaster Backup Strategy
Choose one of the following backup strategies, in which a shadow LSMS is defined to be an entire LSMS, with its own service provider ID, located in a separate geographical location from the main LSMS:
- Active shadow
- Inactive shadow
- No shadow
The various backup strategies provide different methods for ensuring that the shadow LSMS contains the same data as the main LSMS.
Note:
Whenever you must manually enter locally provisioned data at the shadow LSMS, be sure that you use the same service provider identifier (SPID) that was used to enter the same locally provisioned data at the main LSMS. For more information, see Synchronizing Data Between the Main LSMS and Shadow LSMS.The following sections provide an overview of each strategy. Detailed descriptions or recovery procedures for each strategy are described in Performing Disaster Recovery with an Active Shadow LSMS through Returning Operation from Shadow LSMS to Main LSMS .
Using an Active Shadow
Figure 7-1 shows the configuration of a main LSMS that uses an active shadow as its backup.
An active shadow LSMS is an entire LSMS that is active and has active associations with each NPAC from which the LSMS needs data (only one NPAC is shown in Figure 7-1).
Figure 7-1 Overview of Main LSMS and Active Shadow LSMS

The disaster recovery backup strategy for this configuration provides the least out-of-service time for the LSMS. The recovery procedures for this strategy are described in Performing Disaster Recovery with an Active Shadow LSMS.
Using an Inactive Shadow
Figure 7-3 shows the configuration of a main LSMS that uses an inactive shadow as its backup.
The shadow LSMS does not maintain active connections with the NPACs that supply data to the main LSMS. However, disaster recovery is still more feasible than using no shadow, especially for disaster situations in which the physical site of the main LSMS is damaged (such as fire or natural disaster).
Figure 7-2 Overview of Main LSMS and Inactive Shadow LSMS

With this configuration, during disaster recovery you need to restore all databases from the NPAC. The recovery procedures are described in Performing Disaster Recovery with an Inactive Shadow LSMS.
Using No Shadow
Figure 7-3 shows the configuration of a main LSMS that has no shadow as its backup.
Figure 7-3 Overview of Main LSMS without a Shadow LSMS

When no shadow LSMS exists, disaster recovery requires immediate repair of the main LSMS and its physical site, and then restore all databases from the NPAC. The recovery procedures are described in Performing Disaster Recovery without a Shadow LSMS.
Synchronizing Data Between the Main LSMS and Shadow LSMS
Both NPAC data and locally provisioned data need to be synchronized between the main and shadow LSMS so that the shadow can take over when the main LSMS fails.
- NPAC data synchronization occurs in one of the following ways:
- With an active shadow, active connections from both main and active shadow to the NPACs allow transmission of the same NPAC data to both LSMSs.
- With an inactive shadow, NPAC data is synchronized by loading files from a backup tape and/or downloading files from the NPAC to the inactive shadow LSMS.
- Locally provisioned data must be manually entered at both the main LSMS and shadow LSMS.
Note:
When you log in to manually enter any locally provisioned data, always use the same service provider ID (SPID) at both the main LSMS and the shadow LSMS. Locally provisioned data is correlated with a SPID. In order for the data to be the same at the main LSMS and shadow LSMS, it must be entered with the same SPID at both LSMSs. The main LSMS and shadow LSMS must use different NPAC-assigned SPIDs for their association with the NPAC. You can create SPIDs used just for entering data, or you can use the main LSMS’s NPAC-assigned SPID for entering locally provisioned data at both the main LSMS and shadow LSMS.For information about manually entering locally provisioned data, refer to the Database Administrator's Guide.
Preparing for a Disaster Situation
For all recovery strategies, prepare for disaster situations by doing the following:
- Make sure that the following conditions are true:
- The main LSMS, any restored LSMS, and the shadow LSMS have the required software licenses. Use the procedure described in Verifying the Processes Running on the Active Server for each server on each LSMS; licenses are required for processes to run.
- Hardware and software versions on the main and shadow LSMS are identical.
- Any optional features are installed and configured on both the main and shadow LSMS.
- Make sure the following items are always available and easy to locate:
- The most recent database backup tape
- TPD USB media
- LSMS application USB media
- Completed Disaster Recovery sheet, as shown in Recovery Preparation Worksheet.
In addition, if you use an active shadow LSMS, make sure the following conditions are true:
- The shadow LSMS hardware has received the same required maintenance as the main LSMS.
- You have the ability to connect to the shadow LSMS using the Secure Shell (
ssh
). - You have the ability to display LSMS applications on your workstation.
- The network connections from the network elements to the shadow LSMS, which are critical during a disaster, have been periodically tested. Networks are often subject to frequent changes, and these changes can affect your connection between the shadow LSMS and the network elements.
- Any data you have added, modified, or deleted on the main LSMS has also been added, modified, and deleted on the shadow LSMS.
At least annually, your site should prepare a drill in which the key personnel perform the disaster recovery procedure. This ensures that any potential problems or questions can be addressed in a non-emergency situation.
Determining When to Switch to Shadow LSMS
Switching to a shadow LSMS is the obvious solution in cases of fire or other destruction of the main LSMS site. In addition to these cases, some problems with the main LSMS may warrant switching to the shadow LSMS. These situations can be determined with the Surveillance feature.
If the Surveillance feature is active, it posts a notification every five minutes. If the Surveillance feature has detected an error, it posts a notification reporting the error. If no errors have been detected, the Surveillance feature posts the following “keep alive” message to indicate that the Surveillance feature is running, where <Host Name>
indicates the host name of the server that is reporting the notification.
LSMS8000|14:58 Jun 22, 2000|<Host Name>|Keep alive
Absence of “keep alive” messages is an indication that a potential problem exists. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for help in determining whether the problems warrants switching to the shadow LSMS.
For more information about the Surveillance feature, see Understanding the Surveillance Feature. For more information about Surveillance notifications, see Automatic Monitoring of Events
Disaster Recovery Procedure Overview
Table 7-1 provides an overview comparison of the procedures you should perform and the order in which to perform them, according to the disaster backup strategy you are using. Following sections describe each disaster backup strategy in more detail and list any conditions assumed.
Table 7-1 Comparison of Recovery Procedures to Perform
Recovery Procedure Note: This table is for comparison; for detailed procedures by strategy, see through . | Active Shadowa | Inactive Shadowa | No shadow b | Restoring Operations to the Main LSMS After Running on Active Shadowb | Restoring Operations to the Main LSMS After Running on Inactive Shadowb |
---|---|---|---|---|---|
Repair or replace the LSMS |
1 |
1 |
1 |
||
Recovery acceptance test |
1 |
1 |
2 |
2 |
2 |
Contact each NPAC from which the LSMS needs data to request download files |
2 |
3 |
3 |
||
Contact each NPAC from which the LSMS needs data to provide it with the IP address with which to establish association to the mate LSMS |
3 |
3c |
4 |
||
FTP data from NPAC and import it into the LSMS |
4 |
4c |
4c |
5c |
|
Start LSMS GUI |
5 |
5 |
5 |
6 |
|
Add locally provisioned data that had been entered since last backup (or not already entered on mate LSMS) |
2 |
6 |
6 |
* |
* |
Reconnect network elements |
3 |
7 |
7 |
6 |
7 |
If the disaster outage has lasted 7 days or less, perform a time range audit and reconcile to network elements and a full-range audit of DGTT, OGTT, and NPA-Splits (otherwise perform a bulk download to network elements and then reassociate network elements) |
4 |
8 |
8c |
7c |
8c |
If query servers are installed, stop all directly connected query servers |
5 |
9 |
8 |
9 |
|
If query servers are installed, configure each directly connected query server to use the IP address of the mate LSMS for its master host |
6 |
10 |
9 |
10 |
|
If query servers are installed, reload each directly connected query server from the mate LSMS |
7 |
11 |
9 |
10 |
11 |
Run on the shadow LSMS until main LSMS is restored |
8 |
12 |
|||
Return operations to restored main LSMS |
9d |
13d |
|||
aPerform these procedures on the shadow LSMS. bPerform these procedures on the main LSMS. cPerform only as required. dAs described in Table 7-5 (and summarized in the rightmost columns of this table). *Backups should always be scheduled immediately before switching from the shadow LSMS to the main LSMS; no additional data should have been locally provisioned. |
Performing Disaster Recovery with an Active Shadow LSMS
In this configuration, an entire LSMS is active and has active associations with each NPAC from which the LSMS needs data. This disaster recovery backup strategy provides the least out-of-service time for the LSMS.
In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:
-
Both the main LSMS and shadow LSMS are associated with each NPAC (up to eight) from which the LSMS needs data, and both the main LSMS and the shadow LSMS are receiving automatic updates. Each regional NPAC database at both LSMS sites is synchronized with the NPACs.
-
A network connection from each serviced network element to the shadow LSMS exists, but the network element is not associated with the shadow LSMS at the time the main LSMS fails.
-
Users, groups, and passwords are identically configured at the main LSMS and shadow LSMS.
-
Any data locally provisioned at the main LSMS is also locally provisioned at the shadow LSMS.
Perform the procedures shown in Table 7-2 on the shadow LSMS when a disaster occurs on the main LSMS.
Table 7-2 Recovery Procedures When LSMS Shadow Is Active
Active | In the order shown, perform the following recovery procedures: |
---|---|
1 |
(Optional) Recovery acceptance test on active server of shadow LSMS:
Note: Do not switch over to the shadow LSMS’s standby server until all EMSs have been resynchronized because all queued subscription data would be immediately flushed. |
2 |
Add any locally provisioned data that may have been added to the main LSMS before it failed and has not yet been added to the active shadow. |
3 |
Perform the procedures in “Reconnecting Network Elements” (start with 4 and use the main LSMS as the source and the shadow LSMS as the destination). |
4 |
For each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide. |
5, 6, 7 |
If any query servers are installed:
|
8 |
Run on the shadow LSMS until the main LSMS is restored. |
9 |
Performing Disaster Recovery with an Inactive Shadow LSMS
In this disaster recovery strategy, you have a complete LSMS system installed at a geographically remote site, but it is not running and does not receive updates from the NPAC until you perform the procedures described in this section. This strategy requires a much longer recovery period than having an active shadow requires, but is still much safer than having no shadow. Having no shadow can result in a very long recovery period in serious disaster situations, such as fire or natural disaster.
In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:
- At the shadow site, all hardware and software components have already been installed and passed an acceptance test.
- At the main LSMS, valid backups exist for all data. These backups are ready to be shipped to the shadow LSMS.
- A network connection exists between the shadow LSMS and each network element and each NPAC. At the time of failure, the shadow LSMS is not associated with any of the network elements or NPACs.
Perform the procedures shown in Table 7-3 on the shadow LSMS when a disaster occurs on the main LSMS.
Table 7-3 Recovery Procedures When LSMS Shadow Is Inactive
Inactive | In the order shown, perform the following recovery procedures: |
---|---|
1 |
Recovery acceptance test on inactive shadow LSMS:
|
2, 3 |
Contact each NPAC from which the LSMS needs data to:
|
4 |
FTP data from the NPAC and import it into the LSMS (see Downloading Files from an NPAC to the LSMS). |
5 |
Start the LSMS GUI (association with each NPAC is automatically attempted). |
6 |
At the shadow, add any locally provisioned data that needs to be added. At shadow, manually enter any locally provisioned data that had been entered at the main since the last backup tape was made. |
7 |
Perform the procedures described in “Reconnecting Network Elements”. |
8 |
If the disaster outage has lasted for 7 days or less, for each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide. (If the disaster outage has lasted more than 7 days, perform a complete bulk download from the shadow LSMS to each network element. For information about performing bulk downloads to network elements, refer to the LNP Database Synchronization User's Guide.) |
9, 10,11 |
If any query servers are installed:
|
12 |
Run on the shadow LSMS until the main LSMS is restored. |
13 |
After main LSMS has been repaired, “Returning Operation from Shadow LSMS to Main LSMS”. |
Performing Disaster Recovery without a Shadow LSMS
In this disaster backup strategy, you have no physical backup for the LSMS. In a disaster situation, you must restore the main LSMS. Having no shadow can result in a very long recovery period in serious disaster situations, such as fire or natural disaster.
In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed for this procedure:
-
The main LSMS is restored at the same physical site. If another site is used, you must perform site survey and preparation as you do for any initial LSMS installation. For more information about installing LSMS, refer to Application B Card Hardware and Installation Guide.
-
A network connection exists between the restored main LSMS and each NPAC and network element.
Perform the procedures shown in Table 7-4 to restore the main LSMS when a disaster occurs.
Table 7-4 Recovery Procedures When No LSMS Shadow Exists
No shadow | In the order shown, perform the following recovery procedures: |
---|---|
1, 2 |
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to arrange repair or replacement of the LSMS. Oracle will dispatch technicians who will perform repairs, return the LSMS to operational status, and perform recovery acceptance tests. |
3 |
Contact each NPAC from which the LSMS needs data to request which files will be needed to download to the shadow LSMS. It is recommended that the request be for all NPAC files dated from one hour before the time shown on the backup tape. |
4 |
FTP data from NPAC and import it into the LSMS (see Downloading Files from an NPAC to the LSMS). |
5 |
Start the LSMS GUI (association with each NPAC is automatically attempted). |
6 |
If any locally provisioned data needs to be added, add it. |
7 |
Perform the procedures in “Reconnecting Network Elements”. |
8 |
If the disaster outage has lasted for 7 days or less, for each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide. (If the disaster outage has lasted more than 7 days, perform a complete bulk download to each network element. For information about performing bulk downloads to network elements, refer to the LNP Database Synchronization User's Guide.) |
9 |
If any query servers are installed, for each directly connected query server, perform the procedure in “Reload a Query Server Database from the LSMS”. |
Returning Operation from Shadow LSMS to Main LSMS
Use the procedures described in this section to return operations from the shadow LSMS to the main LSMS after the main LSMS has been restored. Do not take the shadow LSMS out of service until you have completed this procedure, including the resynchronization of LNP data with the NPAC and network elements. If any problem occurs during the restoration of operations to the main LSMS, you can return to using the shadow LSMS.
In addition to the assumptions listed in “Preparing for a Disaster Situation”, the following conditions are assumed:
- The main LSMS is restored at the same physical site. If another site is used, you must perform site survey and preparation as you do for any initial LSMS installation. For more information about installing LSMS, refer to Application B Card Hardware and Installation Guide.
- A network connection exists between the restored main LSMS and each NPAC and network element.
- Encryption keys have been exchanged between the NPAC and the restored main LSMS.
- License keys are valid for the main LSMS.
- At the main LSMS, valid backups exist for all data.
- At a previously inactive shadow LSMS, valid backups exist for all data. A complete backup should be scheduled immediately before the scheduled return to the main LSMS, so that no locally provisioned data is entered after the switch back to the main LSMS.
Perform the procedures shown in Table 7-5 to restore the main LSMS.
Table 7-5 Procedures to Return Operations from Shadow LSMS to Main LSMS
Restoring Operations to the Main LSMS After Running on Active Shadow Main LSMS | Restoring Operations to the Main LSMS After Running on Previously Inactive Shadow | In the order shown in the appropriate column, perform the following recovery procedures: |
---|---|---|
1 |
1 |
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to arrange repair or replacement of the LSMS. Oracle will dispatch technicians who will perform repairs and return the LSMS to operational status. |
2 |
2 |
Recovery acceptance test or manufacturing acceptance test, depending on the severity of original failure (performed by technicians). |
3 |
3 |
After the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 personnel have performed an acceptance test, if desired, customers may wish to perform the following tests to verify that the restored main LSMS is fully functional:
|
3 |
4 |
If any NPAC data may be updated during the period of time between when you plan to disconnect the shadow LSMS and connect with the main LSMS, contact each NPAC from which the LSMS needs data and request download files for that time period. |
5 |
If returning from a shadow LSMS that was previously inactive, contact each NPAC from which the LSMS needs data to provide them with the IP address with which to establish association to the main LSMS. |
|
4 |
6 |
If any download files were requested from any NPAC above, FTP the files and import them into the LSMS (see Downloading Files from an NPAC to the LSMS). |
5 |
7 |
Start the LSMS GUI. |
6 |
8 |
Perform the procedures in “Reconnecting Network Elements”, where the source LSMS is the shadow LSMS, and the destination LSMS is the main LSMS. |
7 |
9 |
For each network element, perform a time-range audit (specify the start time to be one hour before the outage occurred) and a full-range audit of DGTT, OGTT, and NPA Splits. For information about performing audits, refer to “Audit and Optional Reconcile from the LSMS GUI” in the LNP Database Synchronization User's Guide. |
8, 9, 10 |
10, 11, 12 |
If any query servers are installed:
|
Resynchronizing After an Outage Between an NPAC and the LSMS
When an outage between the LSMS and NPAC occurs, the LSMS attempts to resynchronize automatically as soon as the association is reestablished. The NPAC then resends to the LSMS all transactions that were missed by the LSMS.
Automatic Resynchronization between the NPAC and the LSMS
Whenever association is reestablished between the NPAC and the LSMS, the NPAC and the LSMS automatically resynchronize their databases. The time required for automatic resynchronization between an NPAC and the LSMS is directly proportional to the number of transactions that need to be sent. If you believe you have a lot of subscription version records, you can choose to perform a manual NPAC/LSMS recovery, as described in Downloading Files from an NPAC to the LSMS.
If the NPAC and the LSMS are unable to complete automatic recovery, one of the following notifications will display on the LSMS console window, where either PRIMARY
or SECONDARY
indicates the NPAC for which recovery is underway:
[Critical] 2018: 99-07-05 12:55:56 NPAC [<PRIMARY|SECONDARY>] Recovery Failed
or
[Critical] 2019: 99 -07-05 12:55:56 NPAC [<PRIMARY|SECONDARY>] Recovery Partial Failure
If you receive one of these messages, perform the procedure described in Downloading Files from an NPAC to the LSMS using the example for performing a bulk download of files from the NPAC.
Reconnecting Network Elements
The following procedures explain how to reconnect the LSMS with network element software that manages database updates from the LSMS. Reconnecting is required in one of the following situations:
-
When you switch from the main LSMS to the shadow LSMS after a disaster has occurred
-
When you switch from the shadow LSMS back to the main LSMS after the main LSMS has been restored
-
When you restore an LSMS that had no shadow
Perform the procedures described in the following sections. (In these procedures, the “source LSMS” is the LSMS you switch from and the “destination LSMS” is the LSMS you switch to.)
-
“Reconnecting Network Elements Procedures”
These procedures will be followed by automatic resynchronization as described in Automatic Resyncronization after Reconnect.
Preparing to Reconnect Network Elements
Reconnecting Network Elements Procedures
Perform the following procedure:
Automatic Resyncronization after Reconnect
When the LSMS and MPS are reconnected, the LSMS automatically starts an automatic resynchronization of the databases. For more information, see “Automatic Resynchronization Process” in the LNP Database Synchronization User's Guide. If the LSMS cannot complete automatic resynchronization, it posts a notification to the LSMS GUI. For more information, refer to “Notifications that Database Maintenance Is Required” in the LNP Database Synchronization User's Guide.
If the Surveillance feature is active, the following Surveillance notification is also posted, where <Host Name>
is the hostname and <CLLI> is the 11-character CLLI code of the network element:
LSMS8001|14:58 Jul 22, 1997|<Host Name>|Notify:Sys Admin - NE CLLI=<CLLI>