65 Reducing Message Store Size Due to Duplicate Storage

This chapter describes how Oracle Communications Messaging Server uses the relinker feature to reduce message store size due to duplicate storage.

When a message is sent to multiple recipients, that message is placed in each recipient's mailbox. Some messaging systems store separate copies of the same message in each recipient's mailbox. By contrast, Messaging Server strives to retain a single copy of a message regardless of the number of mailboxes in which that message resides. It does this by creating hard links to that message in the mailboxes containing that message.

When other messaging systems are migrated to the Messaging Server, these multiple message copies may be copied over with the migration process. With a large message store, this means that a lot of messages are duplicated unnecessarily. In addition, multiple copies of the same message can be accumulated in normal server operation, for example, from IMAP append operations or other sources.

Messaging Server provides a command called relinker that removes the excess message copies and replaces them with hard links to a single copy.

Note:

The relinker feature is intended to repair the situation where the normal single-copy nature of the message store has become broken for some reason. You should only need to use the relinker if you have done something which could have caused duplicate messages to become individual copies instead of the normal single-copy. This feature is not the normal way the store normally accomplishes its single-copy feature. You should not need to keep the real-time relinker feature enabled for long periods of time. You should not need to use the relinker command on an ongoing basis. You should only need this feature if you have done (or will soon be doing) something which would break the single-copy feature of the store. See "How the Message Store Works" for more information about single-copy.

Relinker Overview

The relinking function can be run in the command or realtime mode. When the relinker command is run, it scans through the message store partitions, creates or updates the MD5 message digest repository (as hard links), deletes excess message files, and creates the necessary hard links.

The digest repository consists of hard links to the messages in the message store. It is stored in the directory hierarchy partition_path/=md5. This directory is parallel to the user mailbox hierarchy _ partition_path_/=user (see "Classic Message Store Directory Layout"). Messages in the digest repository are uniquely identified by their MD5 digest. For example, if the digest for fredb/00/1.msg is 4F92E5673E091B43415FFFA05D2E47EA, then partition/=user/hashdir/hashdir/=fredb/00/1.msg is linked to partition/=md5/_ hashdir_/hashdir/4F92E5673E091B43415FFFA05D2E47EA.msg. If another mailbox has this same message, for example, partition_path/=user/hashdir/hashdir/gregk/00/17.msg, that message will also be hard linked to partition_path/=md5/4F/92/4F92E5673E091B43415FFFA05D2E47EA.msg. This is shown in Figure 65-1.

Figure 65-1 Message Store Digest Repository

Description of Figure 65-1 follows
Description of ''Figure 65-1 Message Store Digest Repository''

For this message, the link count will be three. If both messages are deleted from the mailboxes of fredb and gregk, then the link count will be one and the message can be purged.

The relinker process can also be run in the realtime mode for similar functionality. See "Using Relinker in the Realtime Mode" for details.

Using relinker in the Command Line Mode

The relinker scans through a message store partition, creates or updates the MD5 message repository (as hard links) and deletes excess message files. After relinker scans a store partition, it outputs statistics on the number of unique messages and size of the partition before and after relinking. To run more quickly on an already hashed store, relinker only computes the digest of the messages not yet present in =md5. It also has an option to erase the entire digest repository (which doesn't affect the user mailboxes).

The syntax for the command is as follows:

relinker [-ppartitionname] [-d]

where partitionname specifies the partition to be processed (default: all partitions) and -d specifies that the digest repository be deleted. Sample output is shown below:

relinker
Processing partition: primary
Scanning digest repository...
Processing user directories..............................
---------------------------------------------------------
Partition statistics Before After
---------------------------------------------------------
Total messages 4531898 4531898
Unique messages 4327531 3847029
Message digests in repository 0 3847029
Space used 99210Mb 90481Mb
Space savings from single-copy 3911Mb 12640Mb
---------------------------------------------------------
relinker -d
Processing partition: primary
Purging digest repository...
---------------------------------------------------------
Partition statistics Before After
---------------------------------------------------------
Message digests in repository 3847029 0
---------------------------------------------------------

relinker can take a long time to run, especially for the first time if there are no messages are in the repository. This is because it has to calculate the digest for every message (if the relinker criteria is configured to include all messages-see "Configuring Relinker" for information on configuring relinker criteria.) For example, it could take six hours to process a 100 Gigabyte message store. However, see "Using Relinker in the Realtime Mode" if run-time relinking is enabled

If the relinker command line mode is used exclusively, and not the run-time option, it is necessary to purge the digest repository (=md5), otherwise messages purged in the store (=user) will not become available disk space since they still have a link in the digest repository (they become orphaned). If you are just performing a one-time optimization of the store-for example after a migration-you can run relinker once, then delete the entire repository with relinker -d. For repeated purging during migration, it is sufficient to just run the relinker command repeatedly, since each time it runs it also purges the expired or orphaned messages from the repository.

It is safe to run multiple instances of relinker in parallel with each processing a different partition (using the -p option). Messages are only relinked inside the same partition.

Using Relinker in the Realtime Mode

The relinker function can be enabled in the realtime mode by setting the msconfig option store.relinker.enable to 1. Using relinker in the realtime mode will compute the digest of every message delivered (or restored, IMAP appended, and so forth) which matches the configured relinker criteria ("Configuring Relinker"), then look in the repository to see if that digest is already present. If the digest is present, it creates a link to it in the destination mailbox instead of creating a new copy of the message. If there is no digest, it creates the message and adds a link to it in the repository afterwards.

stored scans the digest repositories of each partition and purges the messages having a link count of 1, or which don't match the relinker criteria. The scan is done one directory at a time over a configurable time period. This is so that the I/O load is evenly distributed and does not noticeably impact other server operations. By default the purge cycle is 24 hours, which means messages can still be present on the disk for up to 24 hours after they have been deleted from the store or have exceeded the configured maximum age. This task is enabled when the relinker realtime mode is enabled.

Configuring Relinker

Table 65-1 shows the options used to set relinker criteria.

Table 65-1 relinker msconfig options

Option Description

store.relinker.enable

Enables real-time relinking of messages in the append code and stored purge. The relinker command-line tool may be run even if this option is off. However since stored will not purge the repository, relinker -d must be used for this task. Turning this option on affects message delivery performance in exchange for the disk space savings.

Default: 0

store.relinker.maxage

Maximum age in hours for messages to be kept in the repository, or considered by the relinker command-line. -1 means no age limit, that is, only purge orphaned messages from the repository. For relinker it means process existing messages regardless of age. Shorter values keep the repository smaller thus allow relinker or stored purge to run faster and reclaim disk space faster, while longer values allow duplicate message relinking over a longer period of time, for example, when users copy the same message to the store several days apart, or when running a migration over several days or weeks.

Default: 24

store.relinker.minsize

Minimum size in kilobytes for messages to be considered by run-time or command-line relinker. Setting a non-zero value gives up the relinker benefits for smaller messages in exchange for a smaller repository.

Default: 0

store.relinker.purgecycle

Approximate duration in hours of an entire stored purge cycle. The actual duration depends on the time it takes to scan each directory in the repository. Smaller values will use more I/O and larger values will not reclaim disk space as fast. 0 means run purge continuously without any pause between directories. -1 means don't run purge in stored (then purge must be performed using the relinker -d command).

Default: 24