46 Handling Message Store Overload

This chapter describes how to handle message store overload in Oracle Communications Messaging Server.

Overview of Managing Message Store Load

An overloaded message store can suffer from degraded performance. The mboxlist database is particularly sensitive to overload conditions. When the database detects deadlocks, all database operations that cannot acquire the locks they need must abort the transactions and retry, thereby decreasing the throughput. If this situation continues, the message store can become very inefficient. In extreme cases, you need to restart the message store to recover.

Therefore, having the ability to control the message store load is crucial to prevent performance degradation. The message store uses transaction checkpoint time as the stress indicator. The stored daemon measures the transaction checkpoint duration (the time it takes to sync the database pages from the memory pool to disks). When the transaction checkpoint exceeds one minute, it raises an alarm.

Message Store Load Throttling

Message store throttling is used to regulate short spikes of activities. When the ims_master program detects the stressed status from the message store, it informs the Job Controller. The Job Controller responds by temporarily decreasing the number of ims_master processes for the ims-ms channel. Similarly, when the LMTP server detects the stressed status, it tells the LMTP client, which informs the Job Controller, to back off. By decreasing the number of delivery threads, the Job Controller enables the message store to recover before performance begins to degrade.

Job Controller Stress Handling

Channel programs can now tell the Job Controller if they are being overwhelmed. If this occurs, then the job controller sees if it has happened recently. The job controller ignores stressed channel messages that are received within job_controller.stressblackout seconds of a previous stressed message for the same channel. If the message is processed, then the job controller multiplies the effective threaddepth option for the channel by job_controller.stressfactor, and subtracts job_controller.stressjob from the job limit for the channel. threaddepth never goes over 134,217,727, and job limit never goes below 1. In addition, then the Job Controller asks all current master programs for the channel to exit, and, if the queue is not empty, starts an appropriate number of processes.

When job_controller.stresstime seconds has passed after the last stress change, the Job Controller divides threaddepth by job_controller.unstressfactor (never allowing thread depth to drop below the original configured threaddepth), and adds UnstressJob to the job limit (never allowing the job limit to rise above the original configured limit. a "stress change" is either an increase in stress or a decrease in stress.

The unstresscount job controller option adds an additional criteria for lowering the stress level for a channel. The level is also lowered when unstresscount messages have been processed by the channel and stresstime time has elapsed without any indication of stress.

Default Job Controller Configuration

These configuration options have the following default values:

job_controller.stressblackout=60

job_controller.stresstime=120

job_controller.stressfactor=5

job_controller.stressjobs=2

job_controller.unstressfactor=stressfactor

job_controller.unstressjobs=stressjobs

job_controller.unstresscount=10000