Map/Reduce Key Concepts

Inspired by the map/reduce paradigm, the general idea behind a map/reduce script is as follows:

  1. Your script identifies some data that requires processing.

  2. This data is split into key-value pairs.

  3. Your script defines a function that the system invokes one time for each key-value pair.

  4. Optionally, your script can also use a second round of processing.

The system can create multiple jobs for each round of processing and process the data in parallel, depending on how you deploy the script.

If you've worked with other SuiteScript 2.x scripts, you'll find that map/reduce scripts work differently. Before you start writing a map/reduce script, make sure you understand the differences. Keep the following in mind:

Map/reduce scripts are executed in stages

Most script types run as a single continuous process. In contrast, map/reduce scripts run in five stages, in a specific order.

You can control the script’s behavior in four of the five stages. That is, each of these four stages corresponds to an entry point. Your corresponding function defines the script’s behavior during that stage. For example:

  • For the getInputData stage, you write a function that returns an object that can be transformed into a list of key-value pairs. For example, if your function returns a search of NetSuite records, the system runs the search. The key-value pairs would be the results of the search where each key would be the internal ID of a record and each value would be a JSON representation of the record’s field IDs and values.

  • For the map stage, you can optionally write a function that the system invokes one time for each key-value pair. If needed, your map function can generate new key-value pairs as output. If the script also uses a reduce function, this output data is sent as input to the shuffle and then the reduce stage. Otherwise, the new key-value pairs are sent directly to the summarize stage.

  • You do not write a function for the shuffle stage. In this stage, the system organizes key-value pairs for the reduce stage, if you've defined one. These pairs may have been provided by the map function, if you're using one. If a map function was not used, the shuffle stage uses data provided by the getInputData stage. The shuffle stage groups this data by key to form a new set of key-value pairs, where each key is unique and each value is an array.

    For example, suppose there are 100 key-value pairs. Let's say each key is an employee, and each value is a record they created. If there were only two unique employees, and one employee created 90 records, while the other created 10, then the shuffle stage would provide two key-value pairs. The keys are the employee IDs. One value is an array with 90 elements, and the other has 10.

  • For the reduce stage, you write a function that runs one time for each key-value pair from the shuffle stage. This function can also generate key-value pairs to send to the summarize stage.

  • In the summarize stage, your function can get and log stats about the script's work. It can also act on data from the reduce stage.

Note that you can omit either the map or reduce function. You can also skip the summarize function. For more details, review SuiteScript 2.x Map/Reduce Script Entry Points and API.

The system supplements your logic

Most script types rely solely on the code in your script file. Map/reduce scripts work differently. In a map/reduce script, your logic is key, but the system also adds its own standardized logic. For instance, the system handles data transfer between stages. The system also calls your map and reduce functions multiple times. For this reason, think of the logic of the map and reduce functions as being similar to the logic you'd use in a loop. Each of these functions should perform a relatively small amount of work. For details about the system’s behavior during and between the stages, see Map/Reduce Script Stages.

The system provides robust context objects

For each entry point function you write in your map/reduce script, the system provides a context object. The system provides context objects to most SuiteScript 2.x entry points, but map/reduce entry point functions are especially robust. These objects contain data and properties that are critical to writing an effective map/reduce script. For instance, you can use these objects to get data from the previous stage and send output to the next one. These objects can also hold error data, usage stats, and other metrics. For details, see SuiteScript 2.x Map/Reduce Script Entry Points and API.

Multiple jobs are used to execute one script

Map/reduce scripts run using SuiteCloud Processors, which handle work through a series of jobs. Each job is run by a processor, a virtual unit of processing power. SuiteCloud Processors also handle scheduled scripts. However, these two script types are handled differently. For instance, the system uses only one job for a scheduled script. In contrast, the system creates multiple jobs for a single map/reduce script. The system creates at least one job per stage. The system can also create multiple jobs for the map and reduce stages. When there are multiple map and reduce jobs, they work independently and can run in parallel across processors. For this reason, the map and reduce stages are considered parallel stages.

In contrast, the getInputData and summarize stages each use one job. In each case, that job calls your function only one time. These stages are serial stages. The shuffle stage is also a serial stage.

Map/reduce scripts permit yielding and other interruptions

The map and reduce stages can be easily split into multiple jobs because they consist of independent function invocations. The structure is naturally flexible. It enables parallel processing and lets map and reduce jobs manage their own resource usage to some extent.

If a job takes up a processor for too long, the system can finish it after the current function is done. In this case, the system creates a new job to continue executing remaining key-value pairs. The new job starts either right after the original job finishes or later, depending on its priority and when it was submitted, to let higher-priority jobs run. For more details, see Map/Reduce Yielding.

Note that the system has usage limits for map/reduce scripts that aren't managed through yielding. For details, see Map/Reduce Governance.

General Notices