Connector Error Handling and Recovery

This page documents steps to recover a connector from a failed state using connector configurations. A connector runs in the background using the task processing framework to check entities for any change and sends messages to an external system for every change. The functionality of a connector has three stages that are active with the application:

Aggregating Change Events: Translates change events into aggregate events periodically.
Transforming Aggregate Events: Creates an activity for each unprocessed aggregate event. The activity transforms the event into a message and publishes that message.
Re-publishing Messages: Picks up messages that failed delivery and re-publishes them.

This process is continuous and vulnerable to errors.

Check on System Properties

You can check the system properties of a connector:

Table 1. Check on System Properties
Stage	System Properties	Default	Additional Information
Aggregating Change Events	ohi.connector.event.aggregation.activated	False	You need to change it to `True` to create an aggregate event.
Aggregating Change Events	ohi.connector.event.aggregation.interval	300	For every 300 seconds, the connector checks new change events to create new aggregate events.
Re-publishing Messages	ohi.connector.message.republishing.activated	False	You need to change it to `True` to enable the republishing of messages.
Re-publishing Messages	ohi.connector.message.republishing.interval	300	For every 300 seconds, the connector checks for the failed messages to republish.
Message publishing	ohi.connector.configuration.<0>.httpmethod	POST	If the connector sends messages in a PUT request, you must change the default operation to PUT. These operations are specific to the connector configuration.
Message publishing	ohi.connector.configuration.<0>.uri		You need to set it to the URL endpoint where the messages are published. The URL is specific to the connector configuration.

Check the Lifespan of a Task

Different tasks run for each stage. You can check how long a task has been running by looking at the creation date of the latest task and the lifespan of the task. A connector creates a new task every five seconds over the lifespan of a task. If a task lasts longer than its lifespan, it can be a reason for a failing connector configuration.

Table 2. Check the Lifespan of a Task
Stage	Number of Tasks	Task Type	Subject ID	Lifespan
Aggregating Change Events	1	IntegrationEventAggregateCreator	-1	The value of the `ohi.connector.event.aggregation.interval` property.
Transforming Aggregate Events	1 per enabled connector	IntegrationTransformation	id of Connector Configuration	The time interval of the connector configuration.
Re-publishing Messages	1	RepublishingFailedMessages	-1	The value of the `ohi.connector.message.republishing.interval` property.

Check the Latest Task

It is important to check for the latest task to determine whether the task is stuck anywhere. The connector must create new tasks periodically.

The lifespan varies for each task. For tasks in aggregate and republishing stages, the lifespan is available in a system property (see Check on System Properties section). For connector configuration tasks, the lifespan is available in the time-interval column of the connector configuration.

Check the Latest Task with Database Access

You can run the following query to fetch the last ten aggregate tasks:

select   *
from     ohi_tasks
where    taty_id = 8720     -- IntegrationEventAggregateCreator task type
order by creation_date desc
fetch first 10 rows only
;

You can run the following query to fetch the last ten tasks for each connector configuration:

select tasks.*
from
(
select tasks.*,
row_number() over(partition by tasks.subject_id
order by tasks.creation_date desc) rn
from ohi_tasks tasks
where tasks.subject_id in (select id from int_connector_configurations where ind_enabled = 'Y')
) tasks
where    tasks.rn <= 10
order by tasks.creation_date desc
;

You can run the following query to fetch the last ten republishing tasks:

select   *
from     ohi_tasks
where    taty_id = 8920     -- RepublishingFailedMessages task type
order by creation_date desc
fetch first 10 rows only
;

Check the Latest Task without Database Access

You can run the following query to fetch the latest aggregate tasks:

http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=taskType.id.eq(8720)&orderBy=creationDate:desc&limit=10

You can run the following query to fetch the last ten tasks for each connector configuration: First, check the ID of the enabled connector configurations and then check their corresponding statuses.

http://[hostName]:[portNumber]/[api-context-root]/generic/connectorconfigurations?q=enabled.eq('true')
http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=subjectId.eq('[CONNECTOR_CONFIGURATION_ID]')&orderBy=creationDate:desc&limit=10

You can run the following query to fetch the last ten republishing tasks:

http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=taskType.id.eq(8920)&orderBy=creationDate:desc&limit=10

Check the Task Status

It is important to know the status of the tasks running in the connector. The tasks are in the PROCESSING status when active and in the PENDING status otherwise. When a task is in the ERRORED or COMPLETED status, it means that the task is inactive.

You can use the TaskProcessing IP to reactivate an inactive task. See Recover from Application Failures for more information.
There are cases when the tasks are in the PENDING status as they are stuck in the Oracle AQ. You can run the following query to check the queue and for how long that task has been there.
The time zone of the database may be different from your local timezone. It is important to consider the time zone when determining the lifespan of a task.

The task’s duration in the queue must not be longer than five seconds over the lifespan of a task.
select otqt.* , otqt.user_data.priority , otqt.user_data.task_id , otqt.user_data.delay , otqt.user_data.ref_code , otqt.user_data.task_name , otqt.user_data.subject_code from ohi_task_queue_table otqt where otqt.user_data.task_id = '[TASK_ID]' order by otqt.enq_time desc ;

Following are some ways to restart a task when it is not active:

Table 3. Restarting Tasks
Status	Default Behaviour	Possible Causes	Steps to Restart
PENDING	The task is in the queue and waits to be picked up. The duration of the status must last five seconds over the lifespan of the task.	This shows issues with the Oracle AQ.	If the task is in AQ, check the KM note: State of the Messages in Oracle AQ Are Not Changed from WAITING to READY After the Delay Expires. If the task is not in the AQ, restart the task using the Task Recovery IP.
PROCESSING	The task is active and lasts a few seconds.	There can be issues with the database connectivity.	Restart the latest task using the Task Recovery IP.
ERRORED		unknown	You need to restart the latest task with the Task Recovery IP.

Other Checks

Checks with Database Access

Following is a list of steps you can perform to debug errors in tasks:

You can run the following query that uses the ID of the changed entity to check whether the connector creates an aggregate event for a change:
select * from int_aggregate_change_events where aggregate_id = [entity_ID];
If there are no aggregate events, check the task’s lifespan and status, system properties, and the change event rule of the message type.
You can run the following query with the ID of an aggregate event to check if the connector is creating an activity:
select acty.* from act_activities acty, int_assigned_aggr_events asae where acty.id = asae.acty_id and asae.acet_id = [aggregate_event_ID];
If the connector is not creating the activity, check the duration of the tasks.
You can run the following query that uses the ID of an aggregate event to check whether the connector is creating a message for the change:
select * from int_outbound_messages where acet_id = [aggregate_event_ID];
If the connector is not creating any outbound message, check the message transformation logic that is not creating messages for this change.
You can run the following query that uses the ID of an activity to check whether the connector is publishing messages:
```
select   *
from     int_message_publish_results
where    acty_id = [activity_ID];
```

Message Publishing Results help with determining successful publishing and thereafter, successful republishing of a message. There are links to an incident file that offer more information.

Checks to Perform without Database Access

You can run the following query that uses the ID of a changed entity to check whether the connector is creating an aggregate event:
http://[hostName]:[portNumber]/[api-context-root]/generic/aggregatechangeevents?q=aggregateId.eq('[ENTITY_ID]')
If there are no aggregate events, you can check the task’s lifespan and status, system properties, or the change event rule of type message for this change.
You can run the following query that uses the ID of an aggregate event to check if the connector is creating an activity:
```
http://[hostName]:[portNumber]/[api-context-root]/generic/activities?q=activityType.id.eq('1107420')&orderBy=creationDate:desc
```
If the connector is not creating an activity, you can check the duration of the task. The API does not filter on aggregate change events. You need to compare time of the change and the time when the connector creates the activity based on the lifespan of the tasks.
You can run the following query that uses the ID of an aggregate event to check if the connector is creating messages:
```
http://[hostName]:[portNumber]/[api-context-root]/generic/outboundmessages?q=changeEvent.id.eq('AGGREGATE_EVENT_ID')
```
If the connector is not creating outbound messages, you can check the message transformation logic that is not creating a message for a change.
You can run the following query that uses the ID of an activity to check if the connector is publishing results:
```
http://[hostName]:[portNumber]/[api-context-root]/generic/messagepublishingresults?q=activity.id.eq('[ACTIVITY_ID')
```
You can check the Message Publish Results for information on successful publishing and thereafter, republishing of a message. There are links to the incident file that offer more information.