5 Document Verification Framework

This topic describes the document verification framework where after deployment of Common Core applications in the WebLogic server.

Prerequisites

Python, Tesseract and other required libraries must be installed for running document verification APIs.

Python Installation:

Oracle Linux yum server hosts software for Oracle Linux and compatible distributions. If the latest approved distribution of python (python 3.9.5) is not available on the yum server, then follow the steps mentioned in the below location for installing python manually. Please make sure you specify the python version properly.

  1. sudo yum install wget
  2. sudo yum install gcc readline readline-devel
  3. sudo yum install zlib zlib-devel
  4. sudo yum install libffi-devel openssl-devel
  5. wget https://www.python.org/ftp/python/3.9.5/Python-3.9.5.tgz
  6. tar xzf Python-3.9.5.tgz
  7. cd Python-3.9.5
  8. sudo ./configure --enable-optimizations --prefix=/scratch/software/python39
  9. sudo make altinstall (make altinstall is used to prevent replacing the default python binary file /usr/bin/python.)
  10. sudo update-alternatives --install /usr/bin/python3 python3 /scratch/software/python39/bin/python3.9 (Adding symlink to python3)
  11. sudo update-alternatives --config python3 (This will show the list of symlinks available. Select the required/particular version, in our case python3.9.5)

Note:

The user may encounter some problems while executing the 10th and 11th steps. If the user gets an error as shown below, then the user should be a root user to perform these two additional steps.

Error: "sudo: update-alternatives: command not found"

To be a root user, you will have to execute the command mentioned below and then continue to execute 10th and 11th step,

sudo su – root

Tesseract Installation

Tesseract is an optical character recognition (OCR) engine for various operating systems. The latest version must be installed on the machine to extract the text from the documents.

Refer to 9.4 Tesseract Installation section in Oracle Banking Microservices Platform Foundation Installation Guide to manually install the latest version of the tesseract.

Document Verification Application Installation

The app will be shipped as a byte-coded whl file. This wheel file will install all the implementation files without the dependencies. All the required dependencies are bundled together in a python.zip file which are to be extracted and installed separately (refer to Step 4 below). It's recommended to install the whl file and the dependencies in a virtual environment using "pip" so that it doesn't affect any other operations or applications running in the system.

In addition to this, a Config.ini file is required for the Eureka server configuration. Please create a config file with the name Config.ini and paste the text below:

[DEFAULT]
eureka_server=http://<Host Name>:<Port Number>/plato-discovery-service/eureka

You can edit the eureka server address if needed. The app name should not be changed. This is important for Role-Based Access Control. Please note that registering the app on Eureka is optional and you can skip this if not needed. But in any case, Config.ini is required. In eureka_server variable you can simply give localhost url.

The folder structure to be followed is:

├── root_dir
    ├── python
    ├── Config.ini
    └── venv

Note:

Please make sure that the user are using linux operating system and the installed Python version is 3.9.5 and that of pip is above 20.0.0. Run the following command to upgrade pip to the latest version.
pip install --upgrade pip
Once the pip is upgraded, follow the steps below to install the app and the dependencies:
  1. Create python virtual environment using the following command in python3.9.5,

    python –m venv <venv_name>

  2. Activate the newly created virtual environment by using the following command.

    source <venv_name>/bin/activate

  3. Use the below command to install the application wheel package provided,

    pip install <wheel_package_name>.whl

  4. Now all the dependencies need to be installed. In order to do this, extract the python.zip file provided, go into the python folder (cd python/ ) and run the following commands:
    pip install configparser --no-index --find-links.
    pip install connexion --no-index --find-links.
    pip install datefinder --no-index --find-links.
    pip install dateparser --no-index --find-links.
    pip install Flask --no-index --find-links.
    pip install importlib-metadata --no-index --find-links.
    pip install numpy --no-index --find-links.
    pip install opencv-python --no-index --find-links.
    pip install pdf2image --no-index --find-links.
    pip install Pillow --no-index --find-links.
    pip install pyap --no-index --find-links.
    pip install pybase64 --no-index --find-links.
    pip install pytesseract --no-index --find-links.
    pip install python-dateutil --no-index --find-links.
    pip install scipy --no-index --find-links.
    pip install six --no-index --find-links.
    pip install python-magic --no-index --find-links.
    (optional) pip install py-eureka-client --find-links.
    Installing py-eureka-client is optional as it is needed only if you want to register the app on Eureka.
  5. After installing the wheel package and the dependencies, we can run the document verification server using the below-mentioned command,

    python -m ofss_ml_document_verification_server

  6. Please note: This will by default run the app on port 8090 and not register the app to Eureka. To do that please use the below-mentioned command:

    python -m ofss_ml_document_verification_server -p 5000 -r true
    The above commands make the app run on port 5000 and registers to the Eureka server as well. These arguments may or may not be used together and the port number can be any. By default, the system has been configured to -r false.
    Please note that once the service is registered on Eureka, it will need role-based access to send and receive requests.
    For example: if the app is registered on http://<Host Name>:<Port Number>/plato-discovery-service, then we need a bearer token from http://<Host Name>:<Port Number>/api-gateway/platojwtauth and then call http://<Host Name>:<Port Number>/api-gateway/ofss_ml_document_verification_server/extractInformation with the following headers:
    1. Authorization – bearer <token>
    2. appid- (ex- CMNCORE)
    3. branchCode
    4. content-Type – application/json
    5. userId

    Please note that the userId and branchCode will be based on the flyway script entries.

    SMS Scripts:

    Insert into SMS_TM_SERVICE_ACTIVITY
    SERVICE_ACTIVITY_CODE,DESCRIPTION,CLASS_NAME,METHOD_NAME,APPLICATION_ID,SERVICE_TYPE,UI_ACTIVITY_CODE)
    values ('CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','Extracts meaningful details from an
    image','OFSS_ML_DOCUMENT_VERIFICATION_SERVER','extractInformation','CMC','Service API',null);
    commit;
    Insert into SMS_TM_FUNCTIONAL_ACTIVITY
    (FUNCTIONAL_ACTIVITY_CODE, APPLICATION_ID, TYPE) values
    ('CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','CMC','O');
    commit;
    Insert into SMS_TM_FUNC_ACTIVITY_DETAIL
    (ID,FUNCTIONAL_ACTIVITY_CODE,SERVICE_ACTIVITY_CODE) values
    ('CMC_FD_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION',
    'CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION',
    'CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION');
    commit;
    Plato Scripts:
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.path','/ofss_ml_document_verification_server/**');
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.serviceId','ofss_ml_document_verification_server');
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.stripPrefix','false');
    commit;
    This procedure makes sure that only authenticated users can use the API. However, the developers running the app have an option to disable registry on Eureka and test the API normally.
  7. To run the document verification server in the background, use the command below.

    nohup python -m ofss_ml_document_verification_server & tail -f nohup.out

    Note:

    After the execution of the above command, all the execution logs will be added to nohup.out (text file). Now the user may close the terminal and the app will still be running on port.
  8. To terminate/kill the app, we can use the netstat command to find the process_id using the port on which the app is running and then use the kill command with the process_id of the app as shown below to terminate the application.

    netstat -nlp | grep  8090
    kill -9 <process_id>