Troubleshooting Misfires

The AIS log displays if notifications and orchestrations are backing up. In other words, some may be running long and causing others to be placed in an exception queue where the scheduler will try to pick them up later.

Note: A job misfire is logged only if it is five minutes overdue.

Here is an example of what the message might look like:

13 Feb 2018 17:53:18,007 [SEVERE] - [AIS]             Scheduler Trigger for Name: NTF_1712060007JDE, and Group SCH_1802080001JDE (Notification/Orchestration:  description , Schedule: description) MISFIRED. This condition is caused when notifications or orchestrations are not starting close to their scheduled start time. This may occur if other notifications or orchestrations have not yet completed, indicating that more jobs are scheduled than the system can complete within the schedule time. This can be fixed by scheduling the jobs to run less frequently, staggering the schedules, or by increasing the number of threads in the Scheduler, which may require additional system resources.

Some possible solutions for this issue include:

  • Decrease Frequency. There may be jobs scheduled to run every 5 minutes that can run every 30 minutes.

  • Staggering. Do not schedule too many jobs to run at the same time, or in multiples of the same number. Schedules that are set to run every 5, 10, 15, and 30 minutes will all attempt to run all of the notifications and orchestrations at the same time at the 30-minute mark. Using numbers such as 7, 17, 31, and 39 will balance the load better.

  • More Threads. The scheduler uses one thread each time a job is triggered to run on its schedule. If more jobs are triggered to run than there are available threads, a group of jobs will be picked up, one for each thread while the other jobs wait for their own thread. These other jobs will wait for a thread to be returned by a job that has completed execution. Increasing the threads will allow more jobs to run at one time, but might tax system resources with the added activity.

  • Clustering. Additional schedulers can pick up jobs that might otherwise become misfires. If the first scheduler has 10 threads, misfires could result when more than 10 jobs are scheduled to run at one time. Adding a second scheduler with 10 threads doubles the number of simultaneous jobs that can be running. There is no explicit load balancing between the two schedulers. The first one to examine scheduled jobs runs as many as it can. Only the remaining scheduled jobs are picked up by the next scheduler. In the preceding example of two schedulers with 10 threads each, if there are 12 jobs that are running on the same schedule, the first server to acquire a set of jobs might acquire 10, leaving the remaining 2 for the next scheduler. Reducing the threads on the first server to 8 might mean that 4 jobs are picked up by the next scheduler. Multiple schedulers running at the same time do not know about each other, so attempting to load balance across schedulers is an administrative task. Also, if the two AIS Servers point to the same HTML Server, little or no benefit will be gained.