documentCapture.documentToText(options)

Note:

The content in this help topic pertains to SuiteScript 2.1.

Method Description

Extracts text content from a PDF file.

This method returns a string with the text of the specified PDF file located in the NetSuite File Cabinet. If you want to extract other content from a file, such as tables and fields (key-value pairs), or extract content from a JPG, PNG, or TIFF file, use documentCapture.documentToStructure(options) instead. Encrypted files are not supported.

You can use the text returned from this method in calls to N/llm methods for further querying. For example, you can provide the returned text to llm.generateText(options) and ask questions about the data, as the following code sample shows:

                    // "14" is the unique ID of a PDF file stored in the NetSuite File Cabinet
const fileObj = file.load({
    id: "14"
});   
const extractedData = documentCapture.documentToText({
    file: fileObj
});

const response = llm.generateText({
    prompt: "What is this invoice for?",
    documents: [{
        id: '14',
        data: extractedData
    }]
}); 

                  

This method doesn't consume usage from the monthly usage pool of free requests provided by NetSuite (unlike documentCapture.documentToStructure(options), which does consume usage).

Returns

string

Supported Script Types

Server scripts

For more information, see SuiteScript 2.x Script Types.

Governance

100

Module

N/documentCapture Module

Since

2025.2

Parameters

Parameter

Type

Required / Optional

Description

Since

options.file

file.File

required

The PDF file to extract content from.

The specified file must be located in the NetSuite File Cabinet and be in PDF format. You can specify the file using its internal ID or the path to the file in the File Cabinet. For more information, see N/file Module. Encrypted files are not supported.

2025.2

options.timeout

number

optional

The timeout period to wait for a response from the service.

By default, the timeout period is 30,000 milliseconds (30 seconds). You can specify a longer timeout period, but you can't specify one that's shorter than 30,000 milliseconds. If you try to specify a shorter timeout period, the default value of 30,000 milliseconds is used instead.

2025.2

Errors

Error Code

Thrown If

FILE_CANNOT_BE_EMPTY

The specified file is empty.

FILE_CORRUPTED_OR_INVALID

The specified file couldn't be parsed. It could be corrupted or invalid.

UNSUPPORTED_ENCODING_EXCEPTION

The specified file is corrupted or contains invalid characters.

UNSUPPORTED_FILE_TYPE_1_USE_2

The specified file is not a PDF file.

Syntax

Important:

The following code sample shows the syntax for this member. It is not a functional example. For a complete script example, see N/documentCapture Module Script Samples.

            // Add additional code
...

// "14" is the unique ID of a PDF file stored in the NetSuite File Cabinet
const fileObj = file.load({
    id: "14"
});   
const extractedData = documentCapture.documentToText({
    file: fileObj,
    timeout: 40000
});

...
// Add additional code 

          

Related Topics

General Notices