Execution of Restarted Map/Reduce Stages

A map/reduce script can involve many jobs. The input, shuffle, and summarize stages are each processed with a single job. However, multiple jobs can participate in the map and reduce stages. Within a map stage or a reduce stage, jobs can run in parallel. Any of the jobs can be forcefully terminated at any moment. The impact of this event depends on the status of the job (what it was doing), and in which stage it was running.

For details, see the following topics:

Termination of getInput Stage

The work of a serial stage (getInput, shuffle, and summarize stages) is done in a single job. If the getInput stage job is forcefully terminated, it is later restarted. The getInput portion of the script can find out whether it is the restarted execution by examining the isRestarted attribute of the context argument (inputContext.isRestarted). The script is being restarted if and only if (context.isRestarted === true).

Note that the input for the next stage is computed from the return value of the getInput script. Next stage input is written after the getInput stage finishes. Therefore, even the restarted getInput script is expected to return the same data. The map/reduce framework helps to ensure that no data is written twice.

However, if the getInput script is changing some additional data (for example, creating NetSuite records), it should contain code to handle duplicated processing. The script needs idempotent operations to ensure that these records are not created twice, if this is undesired.

Termination of Shuffle Stage

The shuffle stage does not contain any custom code, so if the shuffle stage job is forcefully terminated, it is later restarted and all the work is completely redone. There is no impact other than that the stage takes longer to finish.

Termination of Parallel Stages

Map and reduce stages can execute jobs in parallel, so they are considered parallel stages. An application restart will affect parallel stages in the same way. The following example covers impact of restart during the map stage. Note that termination of a reduce stage will behave similarly.

The purpose of a map stage is to execute a map function on each key-value pair supplied by the previous stage (getInput). Multiple jobs participate in the map stage. Map jobs will claim key-value pairs (or a specific number of key-value pairs) for which the map function was not executed yet. The job sets a flag for these key-value pairs so that no other job can execute the map function on them. Then, the job sequentially executes the map function on the key-value pairs it flagged. The map stage is finished when the map function is executed on all key-value pairs.

The number of jobs that can participate on the map stage is unlimited. Only the maximum concurrency is limited. Initially, the number of map jobs is equal to the selected concurrency in the corresponding map/reduce script deployment. However, to prevent a single map/reduce task from monopolizing all computational resources in the account, each map job can yield itself to allow other jobs to execute. The yield creates an additional map job and the number of yields is unlimited.

Note:

This is a different type of yield compared to yield in a SuiteScript 1.0 scheduled script. In SuiteScript 1.0, the yield happens in the middle of a script execution. In a map job, the yield can happen only between two map function executions, and not in the middle of one.

If a map job is forcefully terminated, it is later restarted. First, the job executes the map function on all key-value pairs that it took and did not mark finished before termination. It is the only map job that can execute the map function on those pairs. They cannot be taken by other map job. After those key-value pairs are processed, the map job continues normally (takes other unfinished key-value pairs and executes the map function on them).

In some cases, the map function can be re-executed on multiple key-value pairs. The number of pairs that a map function can re-execute will depend on the buffer size selected on the deployment page. The buffer size determines the number of key-value pairs originally taken in a batch. The job marks the batch as finished only when the map function is executed on all of them. Therefore, if the map job is forcefully terminated in the middle of the batch, the entire batch will be processed from the beginning when the map job is restarted.

Note that the map/reduce framework deletes all key-value pairs written from a partially-executed batch, so that they are not written out twice. Therefore, the map function does not need to check whether mapContext.write(options) for a particular key-value has already been executed. However, if the map function is changing some additional data, it must also be designed to use idempotent operations. For example, if a map function created NetSuite records, the script should perform additional checks to ensure that these records are not created twice, if this is undesired.

To check if a map function execution is a part of a restarted batch, the script must examine the isRestarted attribute in the context argument (mapContext.isRestarted). The map function is in the restarted batch if and only if (context.isRestarted === true).

Be aware that a restarted value of true is only an indication that some part of the script might have already been executed. Even if context.isRestarted === true, a map function could run on a particular key-value for the first time. For example, the map job was forcefully terminated after the map job took the key-value pair for processing, but before it executed the map function on it. This is more likely to occur if a high buffer value is set on the map/reduce deployment.

Termination of Summarize Stage

If the summarize stage job is forcefully terminated, it is later restarted. The summarize portion of the script can find out whether it is the restarted execution by examining the isRestarted attribute of the summary argument (summaryContext.isRestarted).

The script is being restarted if and only if (summary.isRestarted === true).

Related Topics

Map/Reduce Script Error Handling
System Response After a Map/Reduce Interruption
Configuration Options for Handling Map/Reduce Interruptions
Logging Errors
Adding Logic to Handle Map/Reduce Restarts

General Notices