5 Document Verification Framework

This topic describes about the document verification framework where after deployment of Common Core applications in the webLogic server.

Prerequisites

Python, Tesseract and other required libraries must be installed for running document verification APIs.

OML4Py Installation

Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API.

OML4Py is available in the following Oracle database environments:
  • The Python interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database.
  • An OML4Py client connection to OML4Py in an on premise Oracle Database instance.
We have setup OML4Py using on premise Oracle Database. For which, you must install Python, the required Python libraries, and the OML4Py server components in the database, and you must install the OML4Py client as mentioned in below steps.
  1. First check the system requirements to setup OML4Py

    OML4Py On Premises System Requirements

  2. Python 3.9.5 is required to install and use OML4Py.

    Steps to install Python for OML4Py

  3. Both the OML4Py server and client installations for an on-premises Oracle database require installing a set of supporting Python packages.

    Steps to install Python supporting packages for OML4Py

  4. Follow the steps mentioned in the below link for installing and setting up OML4PY server and OML4PY client for on premise oracle database respectively.

    Steps to install and setup OML4Py server

    Steps to install and setup OML4Py client

Note:

Installing and setting up OML4Py server is optional for customers. Customers can only install OML4Py client, and the Python version supported by OML4PY.
Tesseract Installation

Tesseract is an optical character recognition (OCR) engine for various operating systems. The latest version must be installed on the machine to extract the text from the documents.

Refer to Tesseract Installation section in Oracle Banking Microservices Platform Foundation Installation Guide to manually install the latest version of the tesseract.

Document Verification Application Installation

The app will be shipped as a byte-coded whl file. This wheel file will install all the implementation files without the dependencies. All the required dependencies are bundled together in a python.zip file which are to be extracted and installed separately (refer to Step 4 below). It's recommended to install the whl file and the dependencies in a virtual environment using "pip" so that it doesn't affect any other operations or applications running in the system.

Applications using microservices based architecture and using the same for security, needs to create a Config.ini file, same is required for the Eureka server configuration. Please create a config file with the name Config.ini and paste the text below:

[DEFAULT]
eureka_server=http://<Host Name>:<Port Number>/plato-discovery-service/eureka

You can edit the eureka server address if needed. The app name should not be changed. This is important for Role-Based Access Control. Please note that registering the app on Eureka is optional and you can skip this if not needed. But in any case, Config.ini is required. In eureka_server variable you can simply give localhost url.

The folder structure to be followed is:

├── root_dir
    ├── python
    ├── Config.ini

Note:

Please make sure that the user are using linux operating system and the installed Python version is 3.9.5 and that of pip is above 20.0.0. Run the following command to upgrade pip to the latest version.
pip install --upgrade pip
Once the pip is upgraded, follow the steps below to install the app and the dependencies:
  1. Use the below command to install the application wheel package provided, e.g. ofss_ml_document_verification_server_without_req-{version}-py3-none-any.whl

    pip install <wheel_package_name>.whl

    Note:

    Refer to OSDC file for the exact version number.
  2. Now all the dependencies need to be installed. In order to do this, extract the python.zip file provided, go into the python folder (cd python/ ) and run the following commands:
    pip install configparser --no-index --find-links.
    pip install connexion --no-index --find-links.
    pip install datefinder --no-index --find-links.
    pip install dateparser --no-index --find-links.
    pip install Flask --no-index --find-links.
    pip install importlib-metadata --no-index --find-links.
    pip install opencv-python --no-index --find-links.
    pip install pdf2image --no-index --find-links.
    pip install Pillow --no-index --find-links.
    pip install pyap --no-index --find-links.
    pip install pybase64 --no-index --find-links.
    pip install pytesseract --no-index --find-links.
    pip install python-dateutil --no-index --find-links.
    pip install six --no-index --find-links.
    pip install pyxDamerauLevenshtein --no-index--find-links.
    pip install python-magic --no-index --find-links.
    (optional) pip install py-eureka-client --find-links.

    Note:

    Few dependencies including numpy, pandas and scikit-learn that are already installed during OML4Py setup are skipped in above step.

    Note:

    This application works when above libraries are installed with required versions. Please don’t upgrade the libraries unless instructed in the documentation.
    Installing py-eureka-client is optional as it is needed only if you want to register the app on Eureka.
  3. After installing the wheel package and the dependencies, we can run the document verification server using the below-mentioned command,

    python -m ofss_ml_document_verification_server

  4. Please note: This will by default run the app on port 8090 and not register the app to Eureka. To do that please use the below-mentioned command:

    python -m ofss_ml_document_verification_server -p 5000 -r true
    The above commands make the app run on port 5000 and registers to the Eureka server as well. These arguments may or may not be used together and the port number can be any. By default, the system has been configured to -r false.
    Please note that once the service is registered on Eureka, it will need role-based access to send and receive requests.
    For example: if the app is registered on http://<Host Name>:<Port Number>/plato-discovery-service, then we need a bearer token from http://<Host Name>:<Port Number>/api-gateway/platojwtauth and then call http://<Host Name>:<Port Number>/api-gateway/ofss_ml_document_verification_server/extractInformation with the following headers:
    1. Authorization – bearer <token>
    2. appid- (ex- CMNCORE)
    3. branchCode
    4. content-Type – application/json
    5. userId

    Please note that the userId and branchCode will be based on the flyway script entries.

    SMS Scripts:

    Insert into SMS_TM_SERVICE_ACTIVITY
    SERVICE_ACTIVITY_CODE,DESCRIPTION,CLASS_NAME,METHOD_NAME,APPLICATION_ID,SERVICE_TYPE,UI_ACTIVITY_CODE)
    values ('CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','Extracts meaningful details from an
    image','OFSS_ML_DOCUMENT_VERIFICATION_SERVER','extractInformation','CMC','Service API',null);
    commit;
    Insert into SMS_TM_FUNCTIONAL_ACTIVITY
    (FUNCTIONAL_ACTIVITY_CODE, APPLICATION_ID, TYPE) values
    ('CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','CMC','O');
    commit;
    Insert into SMS_TM_FUNC_ACTIVITY_DETAIL
    (ID,FUNCTIONAL_ACTIVITY_CODE,SERVICE_ACTIVITY_CODE) values
    ('CMC_FD_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION',
    'CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION',
    'CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION');
    commit;
    Plato Scripts:
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.path','/ofss_ml_document_verification_server/**');
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.serviceId','ofss_ml_document_verification_server');
    Insert into PROPERTIES
    (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values
    (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.stripPrefix','false');
    commit;
    This procedure makes sure that only authenticated users can use the API. However, the developers running the app have an option to disable registry on Eureka and test the API normally.
  5. To run the document verification server in the background, use the command below.

    nohup python -m ofss_ml_document_verification_server & tail -f nohup.out

    Note:

    After the execution of the above command, all the execution logs will be added to nohup.out (text file). Now the user may close the terminal and the app will still be running on port.
  6. To terminate/kill the app, we can use the netstat command to find the process_id using the port on which the app is running and then use the kill command with the process_id of the app as shown below to terminate the application.

    netstat -nlp | grep  8090
    kill -9 <process_id>