Connector Error Handling and Recovery

This page documents steps to recover a connector from a failed state using connector configurations. A connector runs in the background using the task processing framework to check entities for any change and sends messages to an external system for every change. The functionality of a connector has three stages that are active with the application:

  • Aggregating Change Events: Translates change events into aggregate events periodically.

  • Transforming Aggregate Events: Creates an activity for each unprocessed aggregate event. The activity transforms the event into a message and publishes that message.

  • Re-publishing Messages: Picks up messages that failed delivery and re-publishes them.

This process is continuous and vulnerable to errors.

Check on System Properties

You can check the system properties of a connector:

Table 1. Check on System Properties
Stage System Properties Default Additional Information

Aggregating Change Events

ohi.connector.event.aggregation.activated

False

You need to change it to True to create an aggregate event.

Aggregating Change Events

ohi.connector.event.aggregation.interval

300

For every 300 seconds, the connector checks new change events to create new aggregate events.

Re-publishing Messages

ohi.connector.message.republishing.activated

False

You need to change it to True to enable the republishing of messages.

Re-publishing Messages

ohi.connector.message.republishing.interval

300

For every 300 seconds, the connector checks for the failed messages to republish.

Message publishing

ohi.connector.configuration.<0>.httpmethod

POST

If the connector sends messages in a PUT request, you must change the default operation to PUT. These operations are specific to the connector configuration.

Message publishing

ohi.connector.configuration.<0>.uri

You need to set it to the URL endpoint where the messages are published. The URL is specific to the connector configuration.

Check the Lifespan of a Task

Different tasks run for each stage. You can check how long a task has been running by looking at the creation date of the latest task and the lifespan of the task. A connector creates a new task every five seconds over the lifespan of a task. If a task lasts longer than its lifespan, it can be a reason for a failing connector configuration.

Table 2. Check the Lifespan of a Task
Stage Number of Tasks Task Type Subject ID Lifespan

Aggregating Change Events

1

IntegrationEventAggregateCreator

-1

The value of the ohi.connector.event.aggregation.interval property.

Transforming Aggregate Events

1 per enabled connector

IntegrationTransformation

id of Connector Configuration

The time interval of the connector configuration.

Re-publishing Messages

1

RepublishingFailedMessages

-1

The value of the ohi.connector.message.republishing.interval property.

Check the Latest Task

It is important to check for the latest task to determine whether the task is stuck anywhere. The connector must create new tasks periodically.

The lifespan varies for each task. For tasks in aggregate and republishing stages, the lifespan is available in a system property (see Check on System Properties section). For connector configuration tasks, the lifespan is available in the time-interval column of the connector configuration.

Check the Latest Task with Database Access

  • You can run the following query to fetch the last ten aggregate tasks:

    select   *
    from     ohi_tasks
    where    taty_id = 8720     -- IntegrationEventAggregateCreator task type
    order by creation_date desc
    fetch first 10 rows only
    ;
  • You can run the following query to fetch the last ten tasks for each connector configuration:

    select tasks.*
    from
    (
    select tasks.*,
    row_number() over(partition by tasks.subject_id
    order by tasks.creation_date desc) rn
    from ohi_tasks tasks
    where tasks.subject_id in (select id from int_connector_configurations where ind_enabled = 'Y')
    ) tasks
    where    tasks.rn <= 10
    order by tasks.creation_date desc
    ;
  • You can run the following query to fetch the last ten republishing tasks:

    select   *
    from     ohi_tasks
    where    taty_id = 8920     -- RepublishingFailedMessages task type
    order by creation_date desc
    fetch first 10 rows only
    ;

Check the Latest Task without Database Access

  • You can run the following query to fetch the latest aggregate tasks:

    http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=taskType.id.eq(8720)&orderBy=creationDate:desc&limit=10
  • You can run the following query to fetch the last ten tasks for each connector configuration: First, check the ID of the enabled connector configurations and then check their corresponding statuses.

    http://[hostName]:[portNumber]/[api-context-root]/generic/connectorconfigurations?q=enabled.eq('true')
    http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=subjectId.eq('[CONNECTOR_CONFIGURATION_ID]')&orderBy=creationDate:desc&limit=10
  • You can run the following query to fetch the last ten republishing tasks:

    http://[hostName]:[portNumber]/[api-context-root]/generic/tasks?q=taskType.id.eq(8920)&orderBy=creationDate:desc&limit=10

Check the Task Status

It is important to know the status of the tasks running in the connector. The tasks are in the PROCESSING status when active and in the PENDING status otherwise. When a task is in the ERRORED or COMPLETED status, it means that the task is inactive.

  • You can use the TaskProcessing IP to reactivate an inactive task. See Recover from Application Failures for more information.

  • There are cases when the tasks are in the PENDING status as they are stuck in the Oracle AQ. You can run the following query to check the queue and for how long that task has been there.

    The time zone of the database may be different from your local timezone. It is important to consider the time zone when determining the lifespan of a task.
    The task’s duration in the queue must not be longer than five seconds over the lifespan of a task.
    select otqt.*
    ,      otqt.user_data.priority
    ,      otqt.user_data.task_id
    ,      otqt.user_data.delay
    ,      otqt.user_data.ref_code
    ,      otqt.user_data.task_name
    ,      otqt.user_data.subject_code
    from   ohi_task_queue_table otqt
    where otqt.user_data.task_id = '[TASK_ID]'
    order by otqt.enq_time desc
    ;
  • Following are some ways to restart a task when it is not active:

    Table 3. Restarting Tasks
    Status Default Behaviour Possible Causes Steps to Restart

    PENDING

    The task is in the queue and waits to be picked up. The duration of the status must last five seconds over the lifespan of the task.

    This shows issues with the Oracle AQ.

    If the task is in AQ, check the KM note: State of the Messages in Oracle AQ Are Not Changed from WAITING to READY After the Delay Expires. If the task is not in the AQ, restart the task using the Task Recovery IP.

    PROCESSING

    The task is active and lasts a few seconds.

    There can be issues with the database connectivity.

    Restart the latest task using the Task Recovery IP.

    ERRORED

    unknown

    You need to restart the latest task with the Task Recovery IP.

Other Checks

Checks with Database Access

Following is a list of steps you can perform to debug errors in tasks:

  • You can run the following query that uses the ID of the changed entity to check whether the connector creates an aggregate event for a change:

    select *
    from   int_aggregate_change_events
    where  aggregate_id = [entity_ID];

    If there are no aggregate events, check the task’s lifespan and status, system properties, and the change event rule of the message type.

  • You can run the following query with the ID of an aggregate event to check if the connector is creating an activity:

    select acty.*
    from   act_activities acty, int_assigned_aggr_events asae
    where  acty.id = asae.acty_id
    and    asae.acet_id = [aggregate_event_ID];

    If the connector is not creating the activity, check the duration of the tasks.

  • You can run the following query that uses the ID of an aggregate event to check whether the connector is creating a message for the change:

    select   *
    from     int_outbound_messages
    where    acet_id = [aggregate_event_ID];

    If the connector is not creating any outbound message, check the message transformation logic that is not creating messages for this change.

  • You can run the following query that uses the ID of an activity to check whether the connector is publishing messages:

    select   *
    from     int_message_publish_results
    where    acty_id = [activity_ID];

Message Publishing Results help with determining successful publishing and thereafter, successful republishing of a message. There are links to an incident file that offer more information.

Checks to Perform without Database Access

  • You can run the following query that uses the ID of a changed entity to check whether the connector is creating an aggregate event:

    http://[hostName]:[portNumber]/[api-context-root]/generic/aggregatechangeevents?q=aggregateId.eq('[ENTITY_ID]')

    If there are no aggregate events, you can check the task’s lifespan and status, system properties, or the change event rule of type message for this change.

  • You can run the following query that uses the ID of an aggregate event to check if the connector is creating an activity:

    http://[hostName]:[portNumber]/[api-context-root]/generic/activities?q=activityType.id.eq('1107420')&orderBy=creationDate:desc

    If the connector is not creating an activity, you can check the duration of the task. The API does not filter on aggregate change events. You need to compare time of the change and the time when the connector creates the activity based on the lifespan of the tasks.

  • You can run the following query that uses the ID of an aggregate event to check if the connector is creating messages:

    http://[hostName]:[portNumber]/[api-context-root]/generic/outboundmessages?q=changeEvent.id.eq('AGGREGATE_EVENT_ID')

    If the connector is not creating outbound messages, you can check the message transformation logic that is not creating a message for a change.

  • You can run the following query that uses the ID of an activity to check if the connector is publishing results:

    http://[hostName]:[portNumber]/[api-context-root]/generic/messagepublishingresults?q=activity.id.eq('[ACTIVITY_ID')

    You can check the Message Publish Results for information on successful publishing and thereafter, republishing of a message. There are links to the incident file that offer more information.