59 Message Store Automatic Recovery On Startup

This chapter provides a conceptual overview of the Oracle Communications Messaging Server startup and automatic recovery process of stored, and Message Store Database Snapshot. See "Administering Message Store Database Snapshots (Backups)" for task information.

Overview of Automatic Recovery on Startup

Message store data consists of the messages, index data, and the message store database. While this data is fairly robust, on rare occasions there may be message store data problems in the system. These problems are indicated in the default log file, and almost always are fixed transparently. In rare cases an error message in the log file may indicate that you need to run the reconstruct utility. In addition, as a last resort, messages are protected by the backup and restore processes. See "Backing Up and Restoring the Message Store" for more information.

The message store automates many recovery operations which were previously the responsibility of the administrator. These operations are performed by message store daemon stored during startup and include database snapshots and automatic fast recovery as necessary. stored thoroughly checks the message store's database and automatically initiates repairs if it detects a problem.

stored also provides a comprehensive analysis of the state of the database by writing status messages to the default log, reporting on repairs done to the message store and automatic attempts to bring it into operation.

Automatic Startup and Recovery Theory of Operations

The stored daemon starts before the other message store processes. It initializes and, if necessary, recovers the message store database. The message store database keeps folder, quota, subscription, and message flag information. The database is logging and transactional, so recovery is already built in. In addition, some database information is copied redundantly in the message index area for each folder.

Although the database is fairly robust, on the rare occasions that it breaks, in most cases stored recovers and repairs it transparently. However, whenever stored is restarted, you should check the default log files to make sure that additional administrative intervention is not required. Status messages in the log file indicate that reconstruct should be run if the database requires further rebuilding.

Before opening the message store database, stored analyzes its integrity and sends status messages to the default log under the category of warning. Some messages are useful to administrators and some messages consist of coded data to be used for internal analysis. If stored detects any problems, it attempts to fix the database and try starting it again.

When the database is opened, stored signals that the rest of the services may start. If the automatic fixes failed, messages in the default log specify what actions to take. See "Error Messages Signifying reconstruct Is Needed" for more information.

After most recoveries, the database is usually be up-to-date and no further action is required. However, some recoveries require a reconstruct -m to synchronize redundant data in the message store. Again, this is stated in the default log, so it is important to monitor the default log after a startup. Even though the message store seems to be up and running normally, it is important to run any requested operations such as reconstruct.

Another reason for reading the log file is to determine what caused damage to the database in the first place. Although stored is designed to bring up the message store regardless of any problem on the system, you should ascertain cause of the database damage as this may be a sign of a larger hidden problem.

Error Messages Signifying reconstruct Is Needed

This section describes the type of error messages that require reconstruct to be run.

When the error message indicates mailbox error, run reconstruct <mailbox>. Example:

Invalid cache data for msg 102 in mailbox user/joe/INBOX. Needs reconstruct

Mailbox corrupted, missing fixed headers: user/joe/INBOX

Mailbox corrupted, start_offset beyond EOF: user/joe/INBOX

When the error message indicates a database error, run reconstruct -m. Example:

Removing extra database logs. Run reconstruct -m soon after startup to resync redundant data

Recovering database from snapshot. Run reconstruct -m soon after startup to resync redundant data

Message Store Database Snapshot Theory of Operations

This section describes concepts about the message store database snapshot. See "Administering Message Store Database Snapshots (Backups)" for a description of related tasks.

A snapshot is a hot backup of the database and is used by stored to restore a broken database transparently. This is much quicker than using reconstruct to rebuild the entire database from scratch with the information stored in the message and index partitions.

Snapshots of the database are taken automatically by the scheduler. The default snapshot schedule consists of a full snapshot every day and incremental snapshots every 10 minutes. (Note that older versions of Messaging Server have a more frequent default incremental snapshot schedule).

If the recovery process decides to remove the current database because it is determined to be bad, stored will move it into the removed directory if it can. This allows the database to be analyzed if desired.

Message Store Database Snapshot Interval and Location

There should be five times as much space for the database and snapshots combined. It is highly recommended that the administrator reconfigure snapshots to run on a separate disk, and that it is tuned to the system's needs.

If stored detects a problem with the mboxlist database at startup, the most recent verified snapshot is automatically restored. Two snapshot options can be configured: the location of the snapshot file and number of snapshots saved. See "Message Store Database Snapshot Options" for more information about these options.

Having a snapshot interval which is too small results in a frequent burden to the system and a greater chance that a problem in the database is copied as a snapshot. Having a snapshot interval too large can create a situation where the database holds the state it had back when the snapshot was taken.

A snapshot updates all the snapshots with the current data in the mboxlist database.

The ultimate role the snapshot plays is to get the system as close to up-to-date and ease the burden of the rest of the system trying to rebuild the data on the fly.

Message Store Database Snapshot Options

Table 59-1 shows the snapshot options that you set with the msconfig command.

Table 59-1 Message Store Database Snapshot Options

Option Description

store.snapshotpath

Location of message store database snapshot files. Either existing absolute path or path relative to the store directory.

Default: dbdata/snapshots

store.snapshotdirs

Number of snapshots to maintain. Do not set this to more than 3.

Default: 3