Deploying and Configuring the Data Service

4 Deploying and Configuring the Data Service

Learn how to deploy and configure the data service for Oracle Monetization Suite in both Oracle Cloud Infrastructure (OCI) and non-OCI environments.

Topics in this document:

Deploying Data Service in OCI and Non-OCI Environments

You can deploy the data service in Oracle Cloud Infrastructure or in a non-OCI environment, depending on your requirements.

For deployments in Oracle Cloud Infrastructure:

Complete all installation prerequisites described in "Overview of Installation Tasks". Once the environment is prepared, install the data service Helm charts as described below.
For deployments in non-OCI environments:

Install the data service Helm charts as described in this section.

To deploy the data service using Helm charts:

Download the data service Helm charts from the provided deployment package.

For more information, see "Downloading Packages for the Cloud Native Helm Charts and Docker Files" and "Setting Up Prerequisite Software and Tools".

Create a new key and generate an SSL certificate:

openssl req -newkey rsa:2048 -nodes -keyout privateKeyName.key -x509 -days 365 -out certificateName.crt

Create a Kubernetes TLS secret using the generated certificate and private key:
```
kubectl create secret tls secretName --cert=certificateName.crt --key=privateKeyName.key
```

Install the NGINX Ingress Controller:

Add the ingress-nginx Helm repository:
```
helm repo add ingress-nginx
```
Update the Helm repository:
```
helm repo update
```

Install the Ingress Controller:

helm install ingress-nginx ingress-nginx/ingress-nginx --namespace nginxNamespace --create-namespace

Use the following command to attach your service to the NGINX controller (set appropriate values for parameters as required):

helm install ingress-nginx ingress-nginx/ingress-nginx --namespace nginxNamespace \
   --set controller.service.enableHttp=false \ 
   --set controller.service.enableHttps=true \ 
   --set controller.service.ports.https=443 \
   --set controller.service.nodePorts.https=31231 nodePort \
   --set controller.config.ssl-redirect=true \
   --set controller.config.force-ssl-redirect=true \
   --set controller.ingressClassResource.name=ingressClassName \
   --set controller.ingressClass=ingressClassName \
   --create-namespace

Update the values.yaml file for your deployment:
1. Set mount paths:
  - dataFilesPath: Mount path for data files if tech_choice is spark and storage_type is pvc.
  - logFilesPath: Mount path for log files. If not set, it defaults to dataFilesPath.
    
    Note:
    
    Ensure you create the necessary StorageClass (SC), PersistentVolume (PV), and PersistentVolumeClaim (PVC). Update the PVC name in the deployment configuration as required.
  - artifactsPath: Mount path for customizable Python scripts and configuration files. You may leave this blank if these files are in the data files directory.
  - Create separate PVs and PVCs for logs (data-fetch-logs-pvc) and, if needed, artifacts (data-fetch-artifacts-pvc).
    
    Note:
    
    These are container paths and do not need to match the host path.
  - Set folder permissions:
```
groupadd -g 10001 oracle 
useradd -mr -u 10001 -g oracle oracle 
chown 10001:10001 -R mountPath
```
2. Configure imageRepository and image tags for dataFetchOrchestrator and dataFetchProcessor.
3. Specify the paths to place the following files in the mount path, under dataFetchProcessor.configurableFiles:
  - filterFile: Path to the file containing the Python script used for data filtering using /fetch request. Additional files can be imported as required.
  - dataSourceFile: Path to the configuration file with information about data source and OCI connection.
  - ociConfigFile: Path to the configuration file with OCI connection. Also, copy your private.pem file, downloaded while creating the API key, to this folder and mention its path.
  - logConfigFile: Path to the configuration file used for any modifications required in the log level or format.
  Note:
  
  A sample file is available for each of them in the artifacts provided with the package: sample_filter_data.py (for filterFile), dnn_preprocess.py (for additional files), config.json (for dataSourceFile), oci_config (for ociConfigFile), and logging_config.py (for logConfigFile). You can configure these files as per your requirements.
4. Set the host value to your deployment host's name.
5. Set tlsSecretName to the name creating a Kubernetes TLS secret.
6. Set ingressClassName to your ingress class name.
7. Set identityURI, clientID, and clientSecret to your IDCS configuration. To disable IDCS, set dataFetchOrchestrator.idcs.enabled to false.
8. Set serviceMonitor.serviceNamespace to the namespace where you have deployed the services.
Create SC, PV, and PVC resources in your namespace. In pv-template.yaml, update the spec.hostPath.path to the host system path that needs to be mounted:
```
kubectl apply -f helm/sc.yaml 
kubectl apply -f helm/pv-template.yaml 
kubectl apply -f helm/pvcTemplate.yaml
```
Update the helm/cgiudatafetch/templates/deployment.yaml file as required, especially the spec.template.spec.volumes section.

Install the Data Service Helm charts:

helm upgrade --install data-fetch-services helm/cgiudatafetch/ --namespace=data-fetch-service

You can schedule the data fetching process at regular intervals. To do this, you need to configure a cron job to run the provided shell script as per your requirements.

Note:

A sample script called sample_bash.sh is available for your reference in the artifacts provided with the package. This sample script sets the delta_data to false for the first time and sets it to true for all subsequent run.

For example, you can set up a cron job to run the script once a month. You can do this by performing the following steps:

Make the script executable:
```
chmod +x /opt/scripts/sample_bash.sh
```
Edit the crontab:
```
crontab -e
```
Run the following command to schedule the script to run at 2 AM on the first day of each month:
```
0 2 1 * * /opt/scripts/sample_bash.sh >> /opt/scripts/sample_bash.log 2>&1
```

Configuring and Using the Data Service

You interact with the data service using REST APIs and can connect other systems through API calls.

For data service, there are three APIs that you need to use:

Collect API (/data/collect)

Collects data from specified sources, such as Billing and Revenue Management databases or your custom databases.
Fetch API (/data/fetch)

Retrieves collected data based on request parameters and delivers it to required clients.
Cache API (/data/cache)

Caches fetched data for fast, repeated access.

These APIs are designed to collect, cache, and make data available for downstream machine learning and analytical tasks.

For more information, see "About the REST APIs".