Map/Reduce Script Best Practices

The following are best practices for working with map/reduce scripts.

General

As described in Hard Limits on Function Invocations, NetSuite imposes governance limits on single invocations of map, reduce, getInputData, and summarize functions:

map: 1,000 usage units (the same as mass update scripts)
reduce: 5,000 usage units
getInputData: 10,000 usage units
summarize: 10,000 usage units

If you are concerned about potential issues with these limits, review your script to make sure that your map and reduce functions are relatively lightweight. Your map and reduce functions should not include a long or complex series of actions. For example, consider a situation in which your map or reduce function loads and saves multiple records all at the same time. This approach might cause an issue with the limits described above. If your getInputData function returns a list of record IDs, a better approach might be to use the map function to load each record, update fields on the record, and save it.

If you have a script that performs a significantly more complex series of operations within a single function (such as loading and saving multiple records, or transforming multiple records), consider using a different script type, such as a scheduled script.

Passing search data to getInputData

In the getInputData stage, your script must return an object that can be transformed into a list of key-value pairs. A common approach is to use a search. If you decide to use this technique, note that you should have your function return either a search.Search object or an object reference to a saved search. By contrast, if you execute a search within the getInputData function and return the results (for example, as an array), there is a greater risk that the search will time out.

Instead, you should use one of the following approaches:

Return a search object. That is, return an object created using search.create(options) or search.load(options).
Return a search object reference. That is, return an inputContext.ObjectRef object that references a saved search.

In both cases, the time limit available to the search is more generous than it would be for a search executed within the function.

The following snippet shows how to return a search object:

                    function getInputData()
{
    return search.create({
        type: record.Type.INVOICE,
        filters: [['status', search.Operator.IS, 'open']],
        columns: ['entity'],
        title: 'Open Invoice Search'
    });
}

And the following snippet shows how to return a search object reference:

                    ...
function getInputData {
{
    // Reference a saved search with internal ID 1234. 

    return {
        type: 'search',
        id: 1234
    }; 
}
...

For information about additional ways to return data, see getInputData(inputContext).

Minimizing risk of data duplication

A map/reduce script can be interrupted at any time by an application server disruption. Afterward, the script is restarted.

Depending on how the script is configured, when a map or reduce job starts again, it may attempt to retry processing for the same key-value pairs it had flagged for processing when the interruption occurred. Similarly, if an uncaught error disrupts the job, the system may retry processing for the pair that was being processed when the error occurred.

Note:

For an overview of the system’s behavior following an interruption, see System Response After a Map/Reduce Interruption.

Handling restarts

When a job is restarted, there is an inherent risk of data duplication. Every map/reduce script should be written in such a way that each entry point function checks to see whether the function has been previously invoked. To do this, use the context.isRestarted property, which exists for every map/reduce entry point. If the function has been restarted, the script should provide any logic needed to avoid duplicate processing. For examples, see Adding Logic to Handle Map/Reduce Restarts.

Buffer size

When you deploy a script, the deployment record includes a field called Buffer Size. The default value of this field is 1. In general, you should leave this value set to the default.

The purpose of the Buffer Size field is to control how many key-value pairs are flagged for processing at one time, and how frequently a map or reduce job saves data about its progress. Setting this field to a higher value may have a small performance advantage. However, the disadvantage is that, if the job is interrupted by an application server restart, there is a greater likelihood of one or more key-value pairs being processed twice. For that reason, you should leave this value set to 1, particularly if the script is processing records.

For more details on this field, see Buffer Size.

Map/Reduce Script Best Practices

Related Support Articles

Related Topics