N/documentCapture Module Script Samples

Note:

The content in this help topic pertains to SuiteScript 2.1.

The following script samples demonstrate how to use the features of the N/documentCapture module.

Extract Text from a PDF File
Extract Feature Content from a Document Synchronously
Extract Content from a Document Asynchronously

Extract Text from a PDF File

The following code sample extracts text content from a PDF file using documentCapture.documentToText(options). The file is located in the NetSuite File Cabinet and specified using its internal ID, but you can also specify a file using its name and path.

After the content is extracted, it's passed to the llm.generateText(options) method of the N/llm module. This method lets you use a large language model (LLM) to analyze the content further by providing a suitable prompt. The data returned from documentCapture.documentToText(options) can be provided to llm.generateText(options) as a document, which helps you integrate these modules for more advanced use cases. For more information about the N/llm module, see N/llm Module.

In this example, the LLM is asked about the purpose of the provided invoice. The response includes citations, which indicate the content in the document that was used to generate the response. Finally, the LLM response and the citations are logged.

For instructions about how to run a SuiteScript 2.1 code snippet in the debugger, see On-Demand Debugging of SuiteScript 2.1 Scripts.

Note:

This sample script uses the require function so that you can copy it into the SuiteScript Debugger and test it. You must use the define function in an entry point script (the script you attach to a script record and deploy). For more information, see SuiteScript 2.x Script Basics and SuiteScript 2.x Script Types.

            require(['N/file', 'N/documentCapture', 'N/llm'],
    function(file, documentCapture, llm) {
        // "14" is the unique ID of a PDF stored in the NetSuite File Cabinet
        const fileObj = file.load({
            id: "14"
        });
        const extractedData = documentCapture.documentToText({
            file: fileObj
        });

        const response = llm.generateText({
            prompt: "What is this invoice for?",
            documents: [{
                id: '14',
                data: extractedData
            }]
        });
      
        log.debug("Answer: ", response.text);
        log.debug("Citations: ", response.citations);
    }
);

Extract Feature Content from a Document Synchronously

The following code sample extracts content from a document using documentCapture.documentToStructure(options). This method provides a synchronous way to extract content from documents up to five pages in length. For longer documents, you must submit an asynchronous task using the N/task module. For a code sample, see Extract Content from a Document Asynchronously.

The file is located in the NetSuite File Cabinet and specified using its internal ID, but you can also specify a file using its name and path. The sample specifies the features to extract (text, tables, and key-value pairs). If you don't specify the features to extract, text and tables are extracted by default. The sample also indicates that the provided document is an invoice. Specifying the document type is optional, but doing so can improve the quality and accuracy of the extracted content.

After the content is extracted, it's passed to the llm.chat(options) method, which is an alias of llm.generateText(options). This method lets you use a large language model (LLM) to analyze the content further by providing a suitable prompt. The data returned from documentCapture.documentToStructure(options) can be converted to a string and provided to llm.chat(options) (llm.generateText(options)) as a document, which helps you integrate these modules for more advanced use cases. For more information about the N/llm module, see N/llm Module.

In this example, the LLM is provided with a preamble, which contains additional instructions for the LLM to consider when generating its response. The LLM is asked about the purpose of the provided invoice, and the extracted content is converted to a string and provided as a document. The response includes citations, which indicate the content in the document that was used to generate the response. Finally, the LLM response and the citations are logged.

For instructions about how to run a SuiteScript 2.1 code snippet in the debugger, see On-Demand Debugging of SuiteScript 2.1 Scripts.

Note:

            require(['N/file', 'N/documentCapture', 'N/llm'],
    function(file, documentCapture, llm) { 
        // "14" is the internal ID of a file stored in the NetSuite File Cabinet 
        const extractedData = documentCapture.documentToStructure({
            file: file.load(
                id: "14"
            ),
            features: [
                documentCapture.Feature.TEXT_EXTRACTION,
                documentCapture.Feature.TABLE_EXTRACTION,
                documentCapture.Feature.FIELD_EXTRACTION
            ],
            documentType: documentCapture.DocumentType.INVOICE
        });

        const documentText = extractedData.getText();
         
        const response = llm.chat({
            preamble: "Your task is to parse the provided document and answer questions about that document.",
            prompt: "What is this invoice for?",
            documents: [{
                id: "1",
                data: documentText
            }]
        });

        log.debug("Answer: ", response.text);
        log.debug("Citations: ", response.citations);
    }
);

Extract Content from a Document Asynchronously

The following code samples show how to extract content from a document asynchronously using the N/task module, which lets you submit tasks that run asynchronously. For more information, see N/task Module. If you want to extract content from a document that's longer than five pages, you must submit an asynchronous task and can't use documentCapture.documentToStructure(options). For a synchronous example, see Extract Feature Content from a Document Synchronously.

The first sample creates a document capture task and populates its properties. The task accepts the same parameters as documentCapture.documentToStructure(options), such as document type, features to extract, and language. The sample loads a file by its internal ID, and it specifies the path of another file to contain the extracted content. The extracted content will be added to the specified file in JSON format. The sample also indicates that the provided document is an invoice.

Finally, the sample submits the document capture task and logs the submission ID.

For instructions about how to run a SuiteScript 2.1 code snippet in the debugger, see On-Demand Debugging of SuiteScript 2.1 Scripts.

Note:

            require(['N/task', 'N/file', 'N/documentCapture'],
    function(task, file, documentCapture) {
        // Create the document capture task
        var docTask = task.create(task.TaskType.DOCUMENT_CAPTURE);
        
        // Specify task parameters
        docTask.inputFile = file.load("443");
        docTask.outputFilePath="SuiteScripts/result.json";
        docTask.documentType = documentCapture.DocumentType.INVOICE;
     
        // Submit the task
        const submissionId = docTask.submit();
        log.debug("Submission ID: ", submissionId);
});

The second sample is a script that's deployed in a NetSuite account and processes the result of the document capture task. The sample loads the result file and uses documentCapture.parseResult(options) to parse the JSON result data into a documentCapture.Document object. The sample then extracts the full text of the document.

The extracted text is passed to the llm.chat(options) method, which is an alias of llm.generateText(options). This method lets you use a large language model (LLM) to analyze the content further by providing a suitable prompt. You can provide text to llm.chat(options) (llm.generateText(options)) as a document, which helps you integrate these modules for more advanced use cases. For more information about the N/llm module, see N/llm Module.

In this example, the LLM is provided with a preamble, which contains additional instructions for the LLM to consider when generating its response. The LLM is asked about the purpose of the provided invoice, and the extracted text is provided as a document. The LLM response is logged, and execute entry point function is returned.

Note:

This script sample uses the define function, which is required for an entry point script (a script you attach to a script record and deploy). You must use the require function if you want to copy the script into the SuiteScript Debugger and test it. For more information, see SuiteScript 2.x Global Objects.

            /**
 * @NApiVersion 2.1
 * @NScriptType ScheduledScript
 * @NModuleScope SameAccount
 */
define(['N/documentCapture', 'N/file', 'N/llm'],
    function(documentCapture, file, llm) {
        function execute(scriptContext) {
            const resultFile = file.load("SuiteScripts/result.json");
            const resultDocument = documentCapture.parseResult(resultFile);
            const resultText = resultDocument.getText();

            const response = llm.chat({
                preamble: "Your task is to parse the provided document and answer questions about that document.",
                prompt: "What is this invoice for?",
                documents:[{
                    id: "1",
                    data: resultText
                }]
            });
            log.debug("Response: ", response.text);
        }
        return {
            execute: execute
        };
    }
);

General Notices