Extract Feature Content from a Document Synchronously

The following code sample extracts content from a document using documentCapture.documentToStructure(options). This method provides a synchronous way to extract content from documents up to five pages in length. For longer documents, you must submit an asynchronous task using the N/task module. For a code sample, see Extract Content from a Document Asynchronously.

The file is located in the NetSuite File Cabinet and specified using its internal ID, but you can also specify a file using its name and path. The sample specifies the features to extract (text, tables, and key-value pairs). If you don't specify the features to extract, text and tables are extracted by default. The sample also indicates that the provided document is an invoice. Specifying the document type is optional, but doing so can improve the quality and accuracy of the extracted content.

After the content is extracted, it's passed to the llm.chat(options) method, which is an alias of llm.generateText(options). This method lets you use a large language model (LLM) to analyze the content further by providing a suitable prompt. The data returned from documentCapture.documentToStructure(options) can be converted to a string and provided to llm.chat(options) (llm.generateText(options)) as a document, which helps you integrate these modules for more advanced use cases. For more information about the N/llm module, see N/llm Module.

In this example, the LLM is provided with a preamble, which contains additional instructions for the LLM to consider when generating its response. The LLM is asked about the purpose of the provided invoice, and the extracted content is converted to a string and provided as a document. The response includes citations, which indicate the content in the document that was used to generate the response. Finally, the LLM response and the citations are logged.

For instructions about how to run a SuiteScript 2.1 code snippet in the debugger, see On-Demand Debugging of SuiteScript 2.1 Scripts.

Note:

This sample script uses the require function so that you can copy it into the SuiteScript Debugger and test it. You must use the define function in an entry point script (the script you attach to a script record and deploy). For more information, see SuiteScript 2.x Script Basics and SuiteScript 2.x Script Types.

          require(['N/file', 'N/documentCapture', 'N/llm'],
    function(file, documentCapture, llm) { 
        // "14" is the internal ID of a file stored in the NetSuite File Cabinet 
        const extractedData = documentCapture.documentToStructure({
            file: file.load(
                id: "14"
            ),
            features: [
                documentCapture.Feature.TEXT_EXTRACTION,
                documentCapture.Feature.TABLE_EXTRACTION,
                documentCapture.Feature.FIELD_EXTRACTION
            ],
            documentType: documentCapture.DocumentType.INVOICE
        });

        const documentText = extractedData.getText();
         
        const response = llm.chat({
            preamble: "Your task is to parse the provided document and answer questions about that document.",
            prompt: "What is this invoice for?",
            documents: [{
                id: "1",
                data: documentText
            }]
        });

        log.debug("Answer: ", response.text);
        log.debug("Citations: ", response.citations);
    }
);

General Notices