Sun Java System Messaging Server 6.3 Administration Guide Automatic Startup and Recovery—Theory of Operations

The stored daemon starts before the other message store processes. It initializes and, if necessary, recovers the message store database. The message store database keeps folder, quota, subscription, and message flag information. The database is logging and transactional, so recovery is already built in. In addition, some database information is copied redundantly in the message index area for each folder.

Although the database is fairly robust, on the rare occasions that it breaks, in most cases stored recovers and repairs it transparently. However, whenever stored is restarted, you should check the default log files to make sure that additional administrative intervention is not required. Status messages in the log file will tell you to run reconstruct if the database requires further rebuilding.

Before opening the message store database, stored analyzes its integrity and sends status messages to the default log under the category of warning. Some messages will be useful to administrators and some messages will consists of coded data to be used for internal analysis. If stored detects any problems, it will attempt to fix the database and try starting it again.

When the database is opened, stored will signal that the rest of the services may start. If the automatic fixes failed, messages in the default log will specify what actions to take. See Error Messages Signifying that reconstruct is Needed

In previous releases, stored could start a recovery process which would take a very long time leaving the administrator wondering if stored was “stuck.” This type of long recovery has been removed and stored should determine a final state in less than a minute. However, if stored needs to employ recovery techniques such as recovering from a snapshot, the process may take a few minutes.

After most recoveries, the database will usually be up-to-date and nothing else will be required. However, some recoveries will require a reconstruct -m in order to synchronize redundant data in the message store. Again, this will be stated in the default log, so it is important to monitor the default log after a startup. Even though the message store will seem to be up and running normally, it is important to run any requested operations such as reconstruct.

Another reason for reading the log file is to determine what caused damage to the database in the first place. Although stored is designed to bring up the message store regardless of any problem on the system, you will still want to try to ascertain cause of the database damage as this may be a sign of a larger hidden problem.

Error Messages Signifying that reconstruct is Needed

This section describes the type of error messages that require reconstruct to be run.

When the error message indicates mailbox error, run reconstruct <mailbox>. Example:

"Invalid cache data for msg 102 in mailbox user/joe/INBOX. Needs reconstruct"

"Mailbox corrupted, missing fixed headers: user/joe/INBOX"

"Mailbox corrupted, start_offset beyond EOF: user/joe/INBOX"

When the error message indicates a database error, run reconstruct -m. Example:

"Removing extra database logs. Run reconstruct -m soon after startup to resync redundant data"

"Recovering database from snapshot. Run reconstruct -m soon after startup to resync redundant data"

Database Snapshots

A snapshot is a hot backup of the database and is used by stored to restore a broken database transparently in a few minutes. This is much quicker than using reconstruct, which relies on the redundant information stored in other areas.

Message Store Database Snapshot—Theory of Operations

Snapshots of the database, located in the mboxlist directory, are taken automatically, by default, once every 24 hours. Snapshots are copied by default into a subdirectory of the store directory. By default, there are five snapshots kept at any given time: one live database, three snapshots, and one database/removed copy. The database/removed copy is newer and is an emergency copy of the database which is put into a subdirectory removed of the mboxlist database directory.

If the recovery process decides to remove the current database because it is determined to be bad, stored will move it into the removed directory if it can. This allows the database to be analyzed if desired.

The data move will only happen once a week. If there is already a copy of the database there, stored will not replace it every time the store comes up. It will only replace it if the data in the removed directory is older than a week. This will prevent the original database which had the problem from being replaced too soon by successive startups.

To Specify Message Store Database Snapshot Interval and Location

There should be five times as much space for the database and snapshots combined. It is highly recommended that the administrator reconfigure snapshots to run on a separate disk, and that it is tuned to the system’s needs.

If stored detects a problem with the database on startup, the best snapshot will automatically be recovered. Three snapshot variables can set the following parameters: the location of the snapshot file, the interval for taking snapshots, number of snapshots saved. These configutil parameters are shown in Table 20–13.

Having a snapshot interval which is too small will result in a frequent burden to the system and a greater chance that a problem in the database will be copied as a snapshot. Having a snapshot interval too large can create a situation where the database will hold the state it had back when the snapshot was taken.

A snapshot interval of a day is recommended and a week or more of snapshots can be useful if a problem remains on the system for a number of days and you wish to go back to a period prior to the point at which the problem existed.

stored monitors the database and is intelligent enough to refuse the latest snapshot if it suspects the database is not perfect. It will instead retrieve the latest most reliable snapshot. Despite the fact that a snapshot may be retrieved from a day ago, the system will use more up to date redundant data and override the older snapshot data, if available.

Thus, the ultimate role the snapshot plays is to get the system as close to up-to-date and ease the burden of the rest of the system trying to rebuild the data on the fly.

Table 20–13 Message Store Database Snapshot Parameters



Location of message store database snapshot files. Either existing absolute path or path relative to the store directory.

Default: dbdata/snapshots

Minutes between snapshots. Valid values: 1 - 46080 

Default: 1440 (1440 minutes = 1 day)

Number of different snapshots kept. Valid values: 2 -367 

Default: 3