4 Monitoring PDC REST Services Manager
You can use logs, tracing, metrics, and system health data to monitor Oracle Communications Pricing Design Center (PDC) REST Services Manager.
Topics in this document:
About Logging
You can review the PDC REST Services Manager logs to troubleshoot errors and monitor system activity.
- Start up and shut down activity
- Interaction with other applications at integration points while processing publication events. This includes interactions with PDC, Oracle Identity Cloud Service, and your enterprise product catalog.
- Authorization requests
- Authentication requests
- Tracing (see "About Tracing")
You access the logs as the user who installed PDC REST Services Manager. See "Accessing the Logs".
The logs support the standard Log4j logging levels. By default, the framework log levels are set to INFO, and the application log levels are set to DEBUG. You can change the levels after installation. For example, setting the log levels to ALL allows you to log detailed authentication or authorization errors for Helidon security providers. See "Changing the Log Levels".
You can configure the format of the logs. See "Configuring the Log Format."
By default, PDC REST Services Manager routes Java logging to the Log4j log manager. After setting up PDC REST Services Manager, you can change the log manager. See "Changing the Default Log Manager".
For information about Java logging, see "Java Logging Overview" in Java Platform, Standard Edition Core Libraries. For information about Log4j, see https://logging.apache.org/log4j/2.x/manual/index.html.
Oracle recommends using automated log file rotation for PDC REST Services Manager logs. For information about configuring log file rotation, see My Oracle Support article 2087525.1 at https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2087525_1.html.
Accessing the Logs
You access the logs to monitor and troubleshoot your system.
2020-09-05T20:19:28.399-0700 START RestServicesManager.sh
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:06.702Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | ctPublishEventServiceImpl | Processing Publish Event 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40->500FreeMinutes
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:07.303Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | ductOfferingServiceLaunch | Retrieving ProductOffering for ID OOO_DayTech201_OOO
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:09.088Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | .c.b.i.d.PdcRmiConnection | Attempting to connect to PDC using t3s://pdc-service:8002 ...
pdcrsm-6f88869785-vtbw2 pdcrsm Handshake failed: TLSv1.3, error = No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
pdcrsm-6f88869785-vtbw2 pdcrsm Handshake succeeded: TLSv1.2
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:12.437Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | c.b.i.d.PdcDatasourceImpl | Checking if PDC object with the name "500FreeMinutes" exists
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:12.479Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | o.c.b.i.s.PdcServiceImpl | Updating the PDC object "500FreeMinutes"
pdcrsm-6f88869785-vtbw2 pdcrsm 2020-11-13T15:58:16.134Z | INFO | 9fcdb109-8682-4368-b4d5-b5b720a1af77 | 548aee87-5ef0-4c1a-b8c8-d2b8a8c6fb40 | 500FreeMinutes | 4ca071fde65d2a61 | pool-3-thread-1 | o.c.b.i.s.PdcServiceImpl | PDC object successfully updated for "500FreeMinutes"
See "Configuring the Log Format" for information about the format of the logs.
Changing the Log Levels
You can change the global log level and the level for PDC REST Services Manager application-specific log entries.
To change the log levels:
- Open the PDC_RSM_home/apps/conf/logging-config.yaml file in a text editor, where PDC_RSM_home is the directory in which you installed PDC REST Services Manager.
- To change the global log level:
- Search for the following property:
-name: ROOT_LOG_LEVEL
. - Set the value to
"${env:ROOT_LOG_LEVEL:-LEVEL}"
, where LEVEL is the new log level.
- Search for the following property:
- To change the PDC REST Services Manager application log level:
- Search for the following property:
-name: PDC_RSM_LOG_LEVEL
. - Set the value to
"${env:PDC_RSM_LOG_LEVEL:-LEVEL}"
, where LEVEL is the new log level.
- Search for the following property:
- Save and close the file.
- Restart PDC REST Services Manager using the control script located in
the bin
directory:
PDC_RSM_home/apps/bin/RestServicesManager.sh restart
The following shows sample entries in the logging-config.yaml file, with the relevant lines and default values in bold:
Configuration:
name: Default
Properties:
Property:
- name: ROOT_LOG_LEVEL
value: "${env:ROOT_LOG_LEVEL:-INFO}"
- name: PDC_RSM_LOG_LEVEL
value: "${env:PDC_RSM_LOG_LEVEL:-INFO}"
Appenders:
Console:
name: LogToConsole
target: SYSTEM_OUT
PatternLayout:
Pattern: "%d{ISO8601_OFFSET_DATE_TIME_HHCMM} | %5p | %X{eventId} | %X{projectId} | %X{productOfferId} | %X{traceId} | %-20.20thread | %-25.25logger{25} | %m%n"
loggers:
Root:
level: "${ROOT_LOG_LEVEL}"
AppenderRef:
- ref: LogToConsole
Logger:
- name: io.jaegertracing.internal.JaegerSpan
level: error
AppenderRef:
- ref: LogToConsole
- name: io.jaegertracing.internal.reporters
level: warn
AppenderRef:
- ref: LogToConsole
- name: oracle.communications
level: "${PDC_RSM_LOG_LEVEL}"
additivity: false
AppenderRef:
- ref: LogToConsole
Note:
The Zipkin tracing logs that appear under Logger are filtered from the logs, so you do not need to adjust their levels.Configuring the Log Format
Configure the log format to change the order and number of elements that appear in the logs.
The default log format is:
%d{ISO8601_OFFSET_DATE_TIME_HHCMM} | %5p | %X{eventId} | %X{projectId} | %X{productOfferId} | %X{traceId} | %-20.20thread | %-25.25logger{25} | %m%n
- %d is the date and time of the log, in ISO 8601 format.
- %5p is the log level. See "Changing the Log Levels".
- eventId, projectId, productOfferId, and traceId are tags added for tracing events and objects through the system. See "Using Trace Tags to Troubleshoot Issues".
- %-20.20thread is the thread pool for the logged event.
- %-25.25logger is the service logging the event.
- %m%n is the message associated with the event.
To configure the log format:
- Open the PDC_RSM_home/apps/conf/logging-config.yaml file in a text editor, where PDC_RSM_home is the directory in which you installed PDC REST Services Manager.
- Under Appenders, locate the PatternLayout property.
- In the value for Pattern, change the order of the elements or
remove any unwanted elements.
Note:
Removing elements from the logs might reduce your ability to troubleshoot issues and trace messages. - Save and close the file.
- Restart PDC REST Services Manager using the control script located in
the bin
directory:
PDC_RSM_home/apps/bin/RestServicesManager.sh restart
Changing the Default Log Manager
By default, PDC REST Services Manager uses the Log4J Log Manager. You can change this after configuring PDC REST Services Manager.
To change the log manager, run the following command:
Java_home/bin/java -Djava.util.logging.manager=logManager
where:
- Java_home is the directory where you installed the latest compatible version of Java.
- logManager is the log manager you want to use. By default, this is set to org.apache.logging.log4j.jul.LogManager when you install PDC REST Services Manager. To use your system default, leave this empty.
About Tracing
You can trace the flow of messages through PDC REST Services Manager by using the Zipkin tracer integrated with the Helidon framework, or another transaction tracing tool of your choice.
Helidon is a collection of Java libraries used by PDC REST Services Manager. Zipkin is an open-source tracing system integrated with Helidon. You can use the Zipkin interface to monitor the PDC REST Services Manager traces.
Figure 4-1 shows an example of tracing an event through the system in the Zipkin tracer.
For more information about Helidon and Zipkin, see:
-
The Helidon project website: https://helidon.io/
-
The discussion of Zipkin tracing in the Helidon documentation: https://helidon.io/docs/v2/#/se/tracing/02_zipkin
-
The Zipkin website: https://zipkin.io/
To set up tracing in PDC REST Services Manager:
-
Do one of the following:
-
Install Zipkin. See the Zipkin Quickstart documentation: https://zipkin.io/pages/quickstart.html.
-
Integrate Zipkin tracing with Helidon SE. See the Helidon SE Tracing Guide documentation: https://helidon.io/docs/v2/#/se/guides/06_tracing.
-
-
Enable Zipkin tracing in PDC REST Services Manager. See "Enabling Tracing in PDC REST Services Manager".
-
Optionally, add trace tags to help troubleshoot and trace messages and objects through the system. See "Using Trace Tags to Troubleshoot Issues".
Afterward, you can start tracing the flow of messages by using the Zipkin UI or Zipkin API.
Using Trace Tags to Troubleshoot Issues
Instead of reading through logs to identify and troubleshoot issues, you can use trace tags in PDC REST Services Manager to correlate logs and traces.
PDC REST Services Manager tags events with the following trace tags:
- publishId: A general tag for the event. In the example below, this is the first id.
- eventId: A tag for the event that is specific to PDC REST Services Manager. In the example below, this is the eventId.
- projectId: A tag for the project in the enterprise product catalog. In the example below, this is the ID under project.
- productOfferId: A tag for a product offering. The example below shows the ID under each entry in the projectItems array.
- productSpecificationId: A tag for product specifications. This does not appear in the example below but would appear in log messages. You use the productOfferId tag to filter logs and locate related productSpecificationId tags as needed.
The following shows an example event for publishing updates to two product offerings from an enterprise product catalog to PDC. To illustrate an error scenario, a URL in the payload for the testInit4Offer product offering has become corrupt. The IDs corresponding to trace tags are shown in bold.
{
"id": "d64066bd-2954-4f43-b8f2-69603c88c683",
"eventId": "ea09ae5a-8098-4fb2-b634-ee8048b9cc1d",
"eventTime": "2030-11-18T09:31:50.001Z",
"eventType": "projectPublishEvent",
"correlationId": "UC4Fcfc6a70f-60f5-456c-93d5-d8e038215201",
"domain": "productCatalogManagement",
"timeOccurred": "2030-11-18T09:31:50.001Z",
"event": {
"project": {
"id": "demopackage11",
"lifecycleStatus": "IN_DESIGN",
"name": "Project01",
"acknowledgementUrl": "http://host:port/mobile/custom/PublishingAPI",
"projectItems": [
{
"id": "55c8362b32d36b49",
"href": "http://host:port/mobile/custom/catalogManagement/productOffering/testSuccess",
"name": "testSuccess",
"version": "1.0",
"@referredType": "ProductOfferingOracle"
},
{
"id": "55c8362b32d36b55",
"href": "http://host:port/mobile/custom/CORRUPTDATA/productOffering/testInit4Offer",
"name": "100Minutes",
"version": "1.0",
"@referredType": "ProductOfferingOracle"
}
]
}
}
}
Trace Tags in Tracer Tools
After submitting the event, you can follow its progress and look for the trace tags in a tracer tool like Zipkin.
Figure 4-2 shows excerpts from a tracer. You can immediately see that the error occurred in the GET request of the getProductOfferingDetails operation. You can expand the trace spans to get the IDs for the event and the object in question, then search in the logs for those tags, as well as the span and trace IDs, to troubleshoot the issue.
{
"traceID": "55c8362b32d36b49",
"spanID": "bad2ef5f3ff26084",
"flags": 1,
"operationName": "listenToProjectPublishEvent",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f2f902949ee8e661",
"spanID": "8ce5e8f8cda38d3b"
}
],
"startTime": 1605709909244000,
"duration": 18160,
"tags": [
{
"key": "eventId",
"type": "string",
"value": "ea09ae5a-8098-4fb2-b634-ee8048b9cc1d"
},
{
"key": "http.status_code",
"type": "int64",
"value": 201
},
{
"key": "component",
"type": "string",
"value": "jaxrs"
},
{
"key": "span.kind",
"type": "string",
"value": "server"
},
{
"key": "http.url",
"type": "string",
"value": "http://host:port/productCatalogManagement/v1/projectPublishEvent"
},
{
"key": "http.method",
"type": "string",
"value": "POST"
},
{
"key": "projectId",
"type": "string",
"value": "demopackage11"
},
{
"key": "publishId",
"type": "string",
"value": "d64066bd-2954-4f43-b8f2-69603c88c683"
},
{
"key": "internal.span.format",
"type": "string",
"value": "Zipkin"
}
],
"logs": [],
"processID": "p1",
"warnings": null
},
...
{
"traceID": "f2f902949ee8e661",
"spans": [
{
"traceID": "f2f902949ee8e661",
"spanID": "03031b1c18e679f2",
"flags": 1,
"operationName": "getProductOfferingDetails",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f2f902949ee8e661",
"spanID": "528a32ac350706e2"
}
],
"startTime": 1605709909256000,
"duration": 688729,
"tags": [
{
"key": "productOfferId",
"type": "string",
"value": "testInit4Offer"
},
{
"key": "internal.span.format",
"type": "string",
"value": "Zipkin"
}
],
"logs": [],
"processID": "p1",
"warnings": null
},
{
"traceID": "f2f902949ee8e661",
"spanID": "303707dcd9c9d1ef",
"flags": 1,
"operationName": "getProductOfferingDetails",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f2f902949ee8e661",
"spanID": "d1d2c068248a5542"
}
],
"startTime": 1605709909277000,
"duration": 529234,
"tags": [
{
"key": "error",
"type": "bool",
"value": true
},
{
"key": "productOfferId",
"type": "string",
"value": "testInit4Offer"
},
{
"key": "internal.span.format",
"type": "string",
"value": "Zipkin"
}
],
"logs": [
{
"timestamp": 1605709909807000,
"fields": [
{
"key": "event",
"type": "string",
"value": "error"
},
{
"key": "error.object",
"type": "string",
"value": "oracle.communications.brm.integration.exceptions.EccServiceException"
}
]
}
],
"processID": "p1",
"warnings": null
}
]
}
Trace Tags in Logs
After finding the trace tags in the tracer tool, you can search the logs for them. You can do simple searches in the raw log data or search and filter by the tags using a logging tool, such as Grafana Loki.
yyyy-MM-dd'T'HH:mm:ss.SSSXXX, UTC | level | eventId | projectId | productOfferId | traceId | thread | logging service | message
2030-10-11T11:34:36,231+05:30 | INFO | ea09ae5a-8098-4fb2-b634-ee8048b9cc1d | demopackage11 | testInit4Offer | 55c8362b32d36b49 | pool-4-thread-1 | ctPublishEventServiceImpl | Processing Publish Event ea09ae5a-8098-4fb2-b634-ee8048b9cc1d->testInit4Offer
For the testInit4Offer product, the following error log appears:
2020-11-18T14:31:49.814Z | ERROR | ea09ae5a-8098-4fb2-b634-ee8048b9cc1d | demopackage11 | testInit4Offer | f2f902949ee8e661 | pool-3-thread-4 | .s.LaunchPdcItemPublisher | Error calling API service 'Product Offering Service' for 'testInit4Offer'. Status Code: 404 Error: '
Based on this message and what you saw in the tracer, you would know that PDC REST Services Manager couldn't call the enterprise product catalog to request information about the testInit4Offer product offering. Expanding and inspecting the GET span in the tracer would reveal the corrupt URL. You could then review the message from your enterprise product catalog to confirm and make appropriate changes to resolve the issue.
About Metrics
You can monitor the PDC REST Services Manager metrics by using the Metrics REST endpoint. The metrics count successful and failed messages passing through the PDC REST Services Manager integration points.
Use a monitoring tool that scrapes metrics data, such as Prometheus, to monitor the metrics available from the PDC REST Services Manager Metrics endpoint. You can get the metrics in plain text format, which is compatible with Prometheus, or JSON format. See "Checking Access to Metrics" for information about accessing the metrics endpoint and requesting different formats. For more information about Prometheus, see: https://prometheus.io/.
Table 4-1 shows the available metrics.
Table 4-1 PDC REST Services Manager Metrics
Metric | Description | Example |
---|---|---|
http-request-duration | The total return time taken for executing HTTP Resource by Method Type and Status Code. | For total time taken for executing /productOffering Resource
on HTTP GET:
http_request_duration("path=productOffering", "method=GET") |
http-request-total | The total return requests for HTTP Resource by Method Type and Status Code. | For total number of requests made for /productOffering
Resource on HTTP GET with 200
Status:
http_request_total("path=productOffering", "method=GET", "statusCode=200") |
You can also use built-in Helidon metrics. See the Helidon documentation for more information: https://helidon.io/docs/latest/#/mp/metrics/01_introduction.
Installing Prometheus
Follow the instructions in this section to install Prometheus Operator. Prometheus Operator is an extension to Kubernetes that manages Prometheus monitoring instances in a more automated and effective way. It optimizes the running of Prometheus with Kubernetes, while retaining the native configuration options of Kubernetes. For more information on installing Prometheus, See "Installation" in the Prometheus documentation.
Topics in this section:
Prerequisites Required before Installing Prometheus
Before you install Prometheus, you must fulfil the following prerequisites in your system:
- You must have Kubernetes cluster running before you start configuring Prometheus operator.
- Helm software must be configured.
- You must create a namespace for monitoring. You can do this with the following
command:
kubectl create namespace monitoring
- You must set up HTTP proxy on all cluster nodes. You can do this with
the following
command:
export HTTP_PROXY="proxy_host:proxy_port" export HTTPS_PROXY=$HTTP_PROXY
- Git must be installed on client node from where the helm chart will be run. If not, use `yum install git-all` with sudo user to install Git.
Downloading the Prometheus Operator Helm Chart
Follow the following commands to download the Prometheus Operator Helm Chart:
helm repo add stable https://charts.helm.sh/stable
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm fetch prometheus-community/kube-prometheus-stack
Note:
Use the following command to remove proxy settings while installing the Prometheus operator with helm chart:unset HTTP_PROXY
unset HTTPS_PROXY
Installing Prometheus Operator
- To install Prometheus operator, You must create the
override-values.yaml file including service modification, alert rules, and Grafana
alerts. The following are the contents of the
override-values.yaml:
additionalPrometheusRulesMap: - rule-name: PDC-RSM-rule groups: - name: pdc-rsm-alert-rules rules: - alert: CPU_UsageWarning annotations: message: CPU has reached 80% utilization expr: avg without(cpu) (rate(node_cpu_seconds_total{job="node-exporter", instance="100.77.52.71:9100", mode!="idle"}[5m])) > 0.8 for: 5m labels: severity: critical - alert: Memory_UsageWarning annotations: message: Memory has reached 80% utilization expr: node_memory_MemTotal_bytes{job="node-exporter", instance="100.77.52.71:9100"} - node_memory_MemFree_bytes{job="node-exporter", instance="100.77.52.71:9100"} - node_memory_Cached_bytes{job="node-exporter",instance="100.77.52.71:9100"} - node_memory_Buffers_bytes{job="node-exporter", instance="100.77.52.71:9100"} > 22322927872 for: 5m labels: severity: critical alertmanager: service: type: LoadBalancer grafana: service: type: LoadBalancer grafana.ini: smtp: enabled: true host: internal-mail-router.example.com:25 user: "xxxxx.xxxxx@example.com" password: "password" skip_verify: true prometheus: service: type: LoadBalancer
- Install the Prometheus operator with kube-prometheus-stack chart in the monitoring
namespace with the following
command:
helm install prometheus kube-prometheus-stack --values override-values.yaml -n monitoring
- Verify the installation and its components with the following
command:
kubectl get all -n monitoring
Checking Access to Metrics
You can access the PDC REST Services Manager metrics from any tool that can access REST API endpoints using an OAuth token generated by Oracle Identity Cloud Service for PDC REST Services Manager. You can check whether you have access by using cURL commands.
Service Monitor Configuration
PDC REST Services Manager supports Helidon provided metrics endpoint. Prometheus can directly read from the /metrics endpoint which is exposed by the PDC REST Services Manager to fetch System and Application Metrics. Service Monitor describes and manages monitoring targets to be scraped by Prometheus, which then understands what applications have to be scraped.
To enable Service Monitor Configuration, you must follow these steps:
- Create a pdcrsm-sm.yaml file with the following
content:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: annotations: meta.helm.sh/release-name: pdcrsm-(release version) meta.helm.sh/release-namespace: op labels: app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: pdcrsm-(release version) app.kubernetes.io/version: (release version) chart: pdcrsm-(release version) heritage: Helm release: prometheus name: pdcrsm-monitoring namespace: monitoring spec: endpoints: - path: /metrics port: ocpdcrsm-port namespaceSelector: matchNames: - op selector: matchLabels: app.kubernetes.io/name: pdcrsm
Note:
You must modify the release-name and namespace according to the PDC REST Services Manager deployment. - Use the following command to apply your
changes:
kubectl apply -f pdcrsm-sm.yaml -n monitoring
- You can verify if Prometheus is scraping the endpoints by navigating to the Prometheus user interface and clicking on Targets under the Status drop down menu. Ensure to refresh the page.
About Monitoring System Health
You can assess the health of the PDC REST Services Manager system by monitoring the process status and overall rates of failure in logs, traces, and metrics.
RestServicesManager.sh status
To maintain an active system, Oracle recommends using a service from your operating system, such as systemd on Linux, to automatically start, monitor, and restart the PDC REST Services Manager system.