Troubleshooting Service Connectors

This topic covers troubleshooting techniques for Service Connector Hub.

No data is being moved

This section provides troubleshooting information for service connectors that don't appear to be moving data. For example service connectors, see Service Connector Hub Scenarios.

Check these items: 

  • Error metrics: Determine if errors exist at the source service, target service or Service Connector Hub service.

    To view error metrics for a service connector
    1. Open the navigation menu and click Analytics & AI. Under Messaging, click Service Connector Hub.
    2. Choose the Compartment that contains the service connector you want to view, and then click the service connector's name.
    3. In the Resources menu, click Metrics (if necessary).

      The Metrics page displays a default set of charts for the current service connector.

    4. Review the following metric charts: 
      • Errors at Source
      • Errors at Target
      • Service Connector Hub Errors
  • Authorization to write to the target service: Make sure you have authorization, either through the default policy offered when creating or updating the service connector or through a group-based policy. See Access to Source, Task, and Target Services.
    Note

    Your accepted default policies might take a few minutes to propagate to regions that are not your home region. The service connector does not move data until the policies are propagated.

I can't view my query in Basic mode

Check these items: 

  • Query simplicity: Update the query so that it only includes elements supported in Basic mode:

    • Audit logs only: Type-based filters can use the OR operator. Other filters must use the AND operator.
      Example:
      ((type = value1 OR type = value2) AND field = value3 AND field1 = value4)
    • Any combination of logs (Service logs, custom logs, and Audit logs): Filters joined with the AND operator.
      Example:
      (field = value AND field1 != value1)
Examples of query complexity that are not supported in Basic mode:
  • OR operator (except with type-based filters when only Audit logs are used)
  • Functions (for example: isNull())
  • select
  • summarize

How do I know when issues occur?

Check these items: 

  • Data freshness: Look for unexpected lapses of time between data movement.
    To view data freshness for a service connector
    1. Open the navigation menu and click Analytics & AI. Under Messaging, click Service Connector Hub.
    2. Choose the Compartment that contains the service connector you want to view, and then click the service connector's name.
    3. In the Resources menu, click Metrics (if necessary).

      The Metrics page displays a default set of charts for the current service connector.

    4. Review the following metric charts: 
      • Data Freshness
    To view data freshness for all service connectors in a compartment
    1. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
    2. Choose the Compartment that contains the service connectors you want to view data freshness for.
    3. For Metric namespace, select oci_service_connector_hub.

    4. Review the following metric charts: 
      • Data Freshness
  • Logging source: If your service connector retrieves data from a log, then it might be attempting more than the maximum amount of hourly retrieval of data per connector (1 GB). Log data at the target is not delivered if this issue continues to occur past 24 hours (the maximum duration for catching missed data in previous transmissions by the service connector). To determine if this issue is occurring, create alarms to monitor the following indicators.

    Note

    For steps to edit alarm queries in MQL, see To edit an alarm query using MQL.
    Indicator (Metric) Alarm query in MQL, with comments
    Data older than 12 hours (Data Freshness)
    DataFreshness[1h].mean() > 43200000

    Comments:

    • The value 43200000 is the number of milliseconds in 12 hours.
    • Alarm trigger delay can be 1 minute or more.
    Error at source (any error) (Errors at Source)
    ErrorsAtSource[15m].groupby(errorCode,connectorId).min() > 0

    Comments:

    • Set the alarm trigger delay for 15 minutes. When the alarm trigger delay is set to 15 minutes, the alarm checks for service connectors that have had no successful runs in the last 15 minutes.
    • Results are grouped by error code and service connector.
    Internal errors at source that don't resolve after 15 minutes (5xx) (Errors at Source)
    ErrorsAtSource[15m]{errorCode =~ "5*"}.groupby(connectorId).sum() > 0 && 
    ErrorsAtSource[15m].groupby(connectorId).min() > 0

    Comments:

    • Internal errors might indicate an issue at the source, which could delay delivery of data.
    • To trigger the alarm at shorter intervals, change the interval ([15m]).
    • To delay the trigger, change the alarm trigger delay.
    • Alarm trigger delay can be 1 minute or more.
    Throttling errors at source (429) (Errors at Source)
    ErrorsAtSource[15m]{errorCode = "429"}.groupby(connectorId).sum() >0 && 
    ErrorsAtSource[15m].groupby(connectorId).min() > 0

    Comments:

    • For more information on throttling errors, see documented limits for the relevant service.
    • For example, for throttling errors related to the Streaming source, see Limits on Streaming Resources. Throttling at the Streaming source occurs when a service connector attempts to read a stream from a partition, other calls to the same partition are also occurring, and the number of calls exceeds service limits.
    • Alarm trigger delay can be 1 minute or more.
    Service communication errors at source (-1) (Errors at Source)
    ErrorsAtSource[15m]{errorCode = "-1"}.groupby(connectorId).sum() >0 && 
    ErrorsAtSource[15m].groupby(connectorId).min() > 0
    404 error at source (Errors at Source)
    ErrorsAtSource[15m]{errorCode = "404"}.groupby(connectorId).sum() >0

    Comments:

    Zero bytes read (when data is expected) (Bytes Read from Source)
    BytesReadFromSource[15m].groupby(connectorId).sum() == 0

    Comments:

    • If errors are not occurring at source, target, or task, then the log might not exist. Confirm that the specified log exists by searching for it in Logging.
    • Alarm trigger delay can be 1 minute or more.