Getting Started with the N/documentCapture Module

Note:

The content in this help topic pertains to SuiteScript 2.1.

The following sections help you get started with the N/documentCapture module:

Extracting Text from a PDF File

To extract text from a PDF file of any length, use documentCapture.documentToText(options). For a sample, see Extract Text from a PDF File.

Provide the following parameters:

  • options.file – The PDF file to extract text from. This file must be located in the NetSuite File Cabinet, and you can specify the file using its internal ID or file path.

  • options.timeout (optional) – The timeout period, in milliseconds, to wait for the service to return results. The default value is 30,000 milliseconds (30 seconds). You can specify a longer timeout period, but you can't specify a period shorter than 30,000 milliseconds. If you do, the default 30,000 millisecond timeout is used instead.

The documentCapture.documentToText(options) method returns a string with the text of the PDF file. If you want to analyze the text further, you can provide the extracted text to the llm.generateText(options) method in the N/llm module, as the following example shows:

            // "14" is the unique ID of a PDF stored in the NetSuite File Cabinet
const fileObj = file.load({
    id: "14"
});
const extractedData = documentCapture.documentToText({
    file: fileObj
});

const response = llm.generateText({
    prompt: "What is this invoice for?",
    documents: [{
        id: '14',
        data: extractedData
    }]
}); 

          

Keep the following considerations in mind:

Extracting Feature Content from a Document

To extract specific feature content (such as tables and fields) from a file in PDF, JPG, PNG, or TIFF format, use documentCapture.documentToStructure(options). For a sample, see Extract Feature Content from a Document Synchronously.

Provide the following parameters:

  • options.file – The document file to extract content from. This file must be located in the NetSuite File Cabinet, and you can specify the file using its internal ID or file path.

  • options.documentType (optional) – The document type. By specifying the type of document, the service can apply pretrained models that are optimized for that type, which can provide more accurate extraction results. Use values from the documentCapture.DocumentType enum to set this parameter. If you don't specify a value for this parameter, the DocumentType.OTHERS type is used by default.

  • options.features (optional) – The features to extract from the specified document. Use values from the documentCapture.Feature enum to set this parameter. If you don't specify a value for this parameter, the Feature.TEXT_EXTRACTION and Feature.TABLE_EXTRACTION features are used by default.

  • options.language (optional) – The language of the specified document. Use values from the documentCapture.Language enum to set this parameter. If you don't specify a value for this parameter, ENG (English) is used by default.

  • options.timeout (optional) – The timeout period, in milliseconds, to wait for the service to return results. The default value is 30,000 milliseconds (30 seconds). You can specify a longer timeout period, but you can't specify a period shorter than 30,000 milliseconds. If you do, the default 30,000 millisecond timeout is used instead.

  • options.ociConfig (optional) – Oracle Cloud Infrastructure (OCI) credentials to obtain unlimited usage. For more information about providing these credentials, see Using OCI Credentials to Obtain Additional Usage. If you don't specify these credentials, successful calls to documentCapture.documentToStructure(options) consume usage from the free monthly usage pool of requests provided in NetSuite by default.

The documentCapture.documentToStructure(options) method returns a documentCapture.Document object with the following structure:

            {
    mimeType: string,
    pages: {
        fields: Field[],
        lines: Line[],
        tables: Table[],
        words: Word[]
    }
} 

          

The data that's available in this object depends on the features you specify when you call documentCapture.documentToStructure(options). For example, this object includes fields (as documentCapture.Field objects) only when you specify the Feature.FIELD_EXTRACTION feature.

Keep the following considerations in mind:

Using OCI Credentials to Obtain Additional Usage

NetSuite provides a free monthly usage pool of requests for the N/documentCapture module. Successful calls to documentCapture.documentToStructure(options) consume usage from this pool, and the pool is refreshed each month. You can track your current monthly usage on the AI Preferences page in NetSuite. For more information, see View SuiteScript AI Usage Limit and Usage. Calls to documentCapture.documentToText(options) don't consume usage from this pool.

Each SuiteApp installed in your account gets its own separate monthly usage pool for N/documentCapture methods, and these SuiteApp pools are independent from the usage pool for your regular (non-SuiteApp) scripts. For example, if you install two SuiteApps, each with scripts that use N/documentCapture methods, each SuiteApp draws from its own unique usage pool. This approach means you get twice the total SuiteApp usage (one pool per SuiteApp). Any other scripts outside of SuiteApps use a separate usage pool, and SuiteApp usage doesn't count against it. This setup ensures that SuiteApps can't use up all your monthly allocation and block your own scripts from calling N/documentCapture methods.

If you want more monthly usage, you can provide the Oracle Cloud Infrastructure (OCI) credentials for an Oracle Cloud account that includes the OCI Document Understanding service. When you provide these credentials, usage is consumed from the provided OCI account instead of the free usage pool provided by NetSuite. You can provide OCI configuration parameters in two ways:

For a list of required OCI configuration parameters for synchronous requests, see documentCapture.documentToStructure(options). For asynchronous requests (those that use a document capture task in the N/task module), you must provide three additional OCI configuration parameters: objectStorageNamespace, outputBucketName, and inputBucketName. You can provide these additional parameters in two ways:

  • Using the fields in the Optional configuration for Asynchronous Document Capture section on the Settings subtab of the AI Preferences page.

    OCI configuration fields required for asynchronous requests.
  • Using the ociConfig property of the document capture task. When using this approach, the credentials you provide override any OCI credentials that are configured on the AI Preferences page. Here is the full list of parameters required in the object you provide for the ociConfig property:

                    docTask.ociConfig = {
        userId: 'user-ocid',
        tenancyId: 'user-tenancy',
        compartmentId: 'user-compartment',
        fingerprint: 'custsecret_secret_fingerprint_id',
        privateKey: 'custsecret_secret_privatekey_id',
        objectStorageNamespace: 'oraclenetsuite',
        outputBucketName: 'in-bucket-name',
        inputBucketName: 'out-bucket-name'
    }; 
    
                  

Related Topics

General Notices