Sun Java System Messaging Server 6.3 Administration Guide

8.7 The Job Controller

Each time a message is enqueued to a channel, the Job Controller ensures that there is a job running to deliver the message. This might involve starting a new job process, adding a thread, or simply noting that a job is already running. If a job cannot be started because the job limit for the channel or pool has been reached, the Job Controller waits until another job has exited. When the job limit is no longer exceeded, the Job Controller starts another job.

Channel jobs run inside processing pools within the Job Controller. A pool can be thought of a “place” where the channel jobs are run. The pool provides a computing area where a set of jobs can operate without vying for resources with jobs outside of the pool. For more information on pools see 10.4.8 Job Controller File and 12.5.4 Processing Pools for Channel Execution Jobs.

Job limits for the channel are determined by the maxjobs channel keyword. Job limits for the pool are determined by the JOB_LIMIT option for the pool.

Messaging Server normally attempts to deliver all messages immediately. If a message cannot be delivered on the first attempt, however, the message is delayed for a period of time determined by the appropriate backoff keyword. As soon as the time specified in the backoff keyword has elapsed, the delayed message is available for delivery, and if necessary, a channel job is started to process the message.

The Job Controller’s in-memory data structure of messages currently being processed and awaiting processing typically reflects the full set of message files stored on disk in the MTA queue area. However, if a backlog of message files on disk builds up enough to exceed the Job Controller’s in-memory data structure size limit, then the Job Controller tracks in memory only a subset of the total number of messages files on disk. The Job Controller processes only those messages it is tracking in memory. After a sufficient number of messages have been delivered to free enough in-memory storage, the Job Controller automatically refreshes its in-memory store by scanning the MTA queue area to update its list of messages. The Job Controller then begins processing the additional message files it just retrieved from disk. The Job Controller performs these scans of the MTA queue area automatically.

Previously, the Job Controller read all the files in the queue directory in the order in which they are found. It now reads several channel queue directories at once. This makes for much more reasonable behavior on startup, restart, and after max_messages has been exceeded. The number of directories to be read at once is controlled by the Job Controller option Rebuild_Parallel_Channel. This can take any value between 1 and 100. The default is 12.

If your site routinely experiences heavy message backlogs, you might want to tune the Job Controller by using the MAX_MESSAGES option. By increasing the MAX_MESSAGES option value to allow Job Controller to use more memory, you can reduce the number of occasions when message backlogs overflow the Job Controller’s in-memory cache. This reduces the overhead involved when the Job Controller must scan the MTA queue directory. Keep in mind, however, that when the Job Controller does need to rebuild the in-memory cache, the process will take longer because the cache is larger. Note also that because the Job Controller must scan the MTA queue directory every time it is started or restarted, large message backlogs mean that starts or restarts of the Job Controller will incur more overhead than starts or restarts when no such backlog exists.

You do not want to overwhelm the job controller by keeping information about huge numbers of messages in memory. For this reason, there has to be a and upper and lower limit. The number specified by MAX_MESSAGES is the number of messages that the job controller will hold in memory. It will get this high if there are new messages delivered, for instance ones received by tcp_smtp_server. Beyond this number, messages are queued (put on disk), but not put into the job controller memory structure. The job controller notices this condition and when the number of messages in memory drops below half this maximum, it starts scanning the disk queues for more messages. It always looks for untried messages "ZZ..." files first, then previously tried messages.

In addition, the job controller limits the number of messages reclaimed from disk. It only reads from disk up to three–quarters of the MAX_MESSAGES to allow for headroom for new messages (if messages are being reclaimed from disk, they have been delayed, which is an undesirable state).

Furthermore, you want to avoid cluttering up the memory structure with delayed messages—those that can not be processed yet. When a message is delayed because it cannot be delivered immediately (a delivery attempt has failed if the number of messages the job controller knows about is greater than 5/8 of MAX_MESSAGES and the number of delayed messages is greater than 3/8 of MAX_MESSAGES) the message is forgotten until the next sweep of the on disk structures, which will be when the number of messages drops below 1/2 MAX_MESSAGES.

The only obvious problems with having MAX_MESSAGES too small is that the scheduling of jobs will become suboptimal. The scanning of the disk queues is also a bit simplistic. If you have huge numbers of messages backlogged in both the tcp_local and ims_ms queues, then the rebuild thread will find all the messages for one channel first, then the ones for the next channel. This can result in alarmed administrators reporting that they've fixed one issue, but are only seeing only one specific channel dequeuing.

This is not a problem. There is a memory cost of approximately 140 bytes for each message. Having a message limit of 100000, you are limiting the job controller data structures to about 20 Megabytes (there are other data structures representing jobs, channels, destination hosts and so on). This is insignificant on a big server.

All the named objects in the job controller are tracked in a hash table. This is sized at the next power of 2 bigger than MAX_MESSAGES, and is never re-sized. Each entry in this hash table is a pointer, so we are looking at a memory usage of four times MAX_MESSAGES rounded up to a power of two. Being a hash table, this will tend all to be in memory as the hash function is supposed to be random. This is another 0.5 Megabytes in the default case.

For information about pools and configuring the Job Controller, see 10.4.8 Job Controller File and 12.5 Configuring Message Processing and Delivery.

8.7.1 To Start and Stop the Job Controller

To start the Job Controller, execute the command:

start-msg job_controller

To shut down the Job Controller, execute the command:

stop-msg job_controller

To restart the Job Controller, execute the command:

imsimta restart job_controller

Restarting the Job Controller has the effect of shutting down the currently running Job Controller, then immediately starting a new one.