5 Document Verification Framework
This topic describes the document verification framework where after deployment of Common Core applications in the WebLogic server.
Prerequisites
Python, Tesseract and other required libraries must be installed for running document verification APIs.
Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API.
- The Python interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database.
- An OML4Py client connection to OML4Py in an on premise Oracle Database instance.
- First check the system requirements to setup OML4Py
- Python 3.9.5 is required to install and use OML4Py.
- Both the OML4Py server and client installations for an on-premises Oracle database require installing a set of supporting Python packages.
- Follow the steps mentioned in the below link for installing and setting up OML4PY server and OML4PY client for on premise oracle database respectively.
Note:
Installing and setting up OML4Py server is optional for customers. Customers can only install OML4Py client, and the Python version supported by OML4PY.Tesseract is an optical character recognition (OCR) engine for various operating systems. The latest version must be installed on the machine to extract the text from the documents.
Refer to 9.4 Tesseract Installation section in Oracle Banking Microservices Platform Foundation Installation Guide to manually install the latest version of the tesseract.
Document Verification Application Installation
The app will be shipped as a byte-coded whl file. This wheel file will install all the implementation files without the dependencies. All the required dependencies are bundled together in a python.zip file which are to be extracted and installed separately (refer to Step 4 below). It's recommended to install the whl file and the dependencies in a virtual environment using "pip" so that it doesn't affect any other operations or applications running in the system.
Applications using microservices based architecture and using the same for security, needs to create a Config.ini file, same is required for the Eureka server configuration. Please create a config file with the name Config.ini and paste the text below:
[DEFAULT]
eureka_server=http://<Host Name>:<Port Number>/plato-discovery-service/eureka
You can edit the eureka server address if needed. The app name should not be changed. This is important for Role-Based Access Control. Please note that registering the app on Eureka is optional and you can skip this if not needed. But in any case, Config.ini is required. In eureka_server variable you can simply give localhost url.
The folder structure to be followed is:
├── root_dir
├── python
├── Config.ini
Note:
Please make sure that the user are using linux operating system and the installed Python version is 3.9.5 and that of pip is above 20.0.0. Run the following command to upgrade pip to the latest version.pip install --upgrade pip
- Use the below command to install the application wheel package
provided, e.g.
ofss_ml_document_verification_server_without_req-8.2.0-py3-none-any.whl
pip install <wheel_package_name>.whl
- Now all the dependencies need to be installed. In order to do
this, extract the python.zip file provided, go into the python folder (cd
python/ ) and run the following
commands:
pip install configparser --no-index --find-links. pip install connexion --no-index --find-links. pip install datefinder --no-index --find-links. pip install dateparser --no-index --find-links. pip install Flask --no-index --find-links. pip install importlib-metadata --no-index --find-links. pip install opencv-python --no-index --find-links. pip install pdf2image --no-index --find-links. pip install Pillow --no-index --find-links. pip install pyap --no-index --find-links. pip install pybase64 --no-index --find-links. pip install pytesseract --no-index --find-links. pip install python-dateutil --no-index --find-links. pip install six --no-index --find-links. pip install pyxDamerauLevenshtein --no-index--find-links. pip install python-magic --no-index --find-links. (optional) pip install py-eureka-client --find-links.
Note:
Few dependencies including numpy, pandas and scikit-learn that are already installed during OML4Py setup are skipped in above step.Note:
This application works when above libraries are installed with required versions. Please don’t upgrade the libraries unless instructed in the documentation. - After installing the wheel package and the dependencies, we can
run the document verification server using the below-mentioned
command,
python -m ofss_ml_document_verification_server
- Please note: This will by default run the app on port 8090 and
not register the app to Eureka. To do that please use the below-mentioned
command:
python -m ofss_ml_document_verification_server -p 5000 -r true
The above commands make the app run on port 5000 and registers to the Eureka server as well. These arguments may or may not be used together and the port number can be any. By default, the system has been configured to -r false.
Please note that once the service is registered on Eureka, it will need role-based access to send and receive requests.
For example: if the app is registered onhttp://<Host Name>:<Port Number>/plato-discovery-service
, then we need a bearer token fromhttp://<Host Name>:<Port Number>/api-gateway/platojwtauth
and then callhttp://<Host Name>:<Port Number>/api-gateway/ofss_ml_document_verification_server/extractInformation
with the following headers:
1. Authorization – bearer <token>
2. appid- (ex- CMNCORE)
3. branchCode
4. content-Type – application/json
5. userId
Please note that the userId and branchCode will be based on the flyway script entries.SMS Scripts:
Insert into SMS_TM_SERVICE_ACTIVITY SERVICE_ACTIVITY_CODE,DESCRIPTION,CLASS_NAME,METHOD_NAME,APPLICATION_ID,SERVICE_TYPE,UI_ACTIVITY_CODE) values ('CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','Extracts meaningful details from an image','OFSS_ML_DOCUMENT_VERIFICATION_SERVER','extractInformation','CMC','Service API',null); commit;
Insert into SMS_TM_FUNCTIONAL_ACTIVITY (FUNCTIONAL_ACTIVITY_CODE, APPLICATION_ID, TYPE) values ('CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','CMC','O'); commit;
Insert into SMS_TM_FUNC_ACTIVITY_DETAIL (ID,FUNCTIONAL_ACTIVITY_CODE,SERVICE_ACTIVITY_CODE) values ('CMC_FD_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION', 'CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION', 'CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION'); commit;
Plato Scripts:
This procedure makes sure that only authenticated users can use the API. However, the developers running the app have an option to disable registry on Eureka and test the API normally.Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.path','/ofss_ml_document_verification_server/**'); Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.serviceId','ofss_ml_document_verification_server'); Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.stripPrefix','false'); commit;
- To run the document verification server in the background, use
the command
below.
nohup python -m ofss_ml_document_verification_server & tail -f nohup.out
Note:
After the execution of the above command, all the execution logs will be added to nohup.out (text file). Now the user may close the terminal and the app will still be running on port. - To terminate/kill the app, we can use the netstat command to
find the process_id using the port on which the app is running and then use
the kill command with the process_id of the app as shown below to terminate
the
application.
netstat -nlp | grep 8090
kill -9 <process_id>