Document Classification

Document Classification can be used to classify a document.

Document Understanding provides a list of possible document types for the analyzed document. Each document type has a confidence score. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while lower scores indicate lower confidence score. The range of the confidence score for each label is between 0-1. The list of possible document types is:
  • Invoice
  • Receipt
  • Resume or CV
  • Tax form
  • Driver's license
  • Passport
  • Bank statement
  • Check
  • Payslip
  • Other
Supported features are:
  • Classify document
  • Confidence score
  • Single request
  • Batch request

Document Classification Example

An example of document classification use in Document Understanding.

Input document
Figure 1. Document Classification Input
Receipt from a fictitious cafe, including two line items, tax, subtotal and total amounts.
API Request:
{ 
            "processorConfig": {   
            "processorType": "GENERAL",  
            "features": [    
            {   
            "featureType": "DOCUMENT_CLASSIFICATION",    
            "maxResults": 5   
            }  
            ] 
            }, 
            "inputLocation": {  
            "sourceType": "OBJECT_STORAGE_LOCATIONS",  
            "objectLocations": [  
            {    
            "source": "OBJECT_STORAGE",   
            "namespaceName": "",   
            "bucketName": "",    
            "objectName": ""  
            }  
            ] 
            }, 
            "compartmentId": "", 
            "outputLocation": {  
            "namespaceName": "",  
            "bucketName": "",  
            "prefix": "" 
            }
            }
Output:
API Response:
{ "documentMetadata":
            { "pageCount": 1,
            "mimeType": "image/jpeg" },
            "pages":
            [ { "pageNumber": 1,
            "dimensions": 
            { "width": 361,
            "height": 600,
            "unit": "PIXEL" },
            "detectedDocumentTypes":
            [ { "documentType": "RECEIPT",
            "confidence": 1 },
            { "documentType": "TAX_FORM",
            "confidence": 6.465067e-9 },
            { "documentType": "CHECK",
            "confidence": 6.031838e-9 },
            { "documentType": "BANK_STATEMENT",
            "confidence": 5.413888e-9 },
            { "documentType": "PASSPORT",
            "confidence": 1.5554872e-9 } ],
            ...
            detectedDocumentTypes":
            [ { "documentType": "RECEIPT",
            "confidence": 1 } ], ...