Service Health Check

The system provides a plug-in spot on Installation, called Health Check. The algorithms plugged in here are used to check various system conditions and return details to help assess the health of the system.

What Information is Checked?

Each algorithm is responsible for checking one type of health check condition. The condition should be defined as a valid health component type (as defined in the HEALTH_​COMP_​FLG lookup). The algorithm may check conditions across many records (referred to as Health Components) and return a status and details for each health component. For example, if the algorithm is checking the level of service for batch programs in the system, the batch control is the health component in this case. The algorithm should gather the level of service results for each batch program and return the collection of information to the calling program.

The algorithms for the plug-in spot should return the following information for each health component.

  • The Health Component Detail should be populated with information specific to the object that was checked. For example, if the algorithm is checking batch control level of service, this will have the Batch Control code. In addition, for enabling display for details of each health component type, the algorithm should populate the maintenance object code and primary key field and value details along with the navigation option.

  • The Health Component Status Flag and Health Component Status Description should be populated with a valid status code and description appropriate for the condition being checked. This information could differ based on the health component type as each health component type may have its own unique status values.

  • The Health Component Status Reason is meant to provide supporting information about the health component's status value. For example, if the component provides an error type of status, this could have the status reason.

  • The Health Component Response is used to map the status value of the health component to a standard value defined in the lookup HEALTH_​RESPONSE_​FLG. The values are a subset of HTTP response codes. The supported values are All Checks Successful (200); Non-Critical Function Degraded (203), No Content (204) and One or More Critical Functions Degraded (500).

  • The Response Details list is provided for health components that check several conditions as part of the health check. Each condition could return its own status. In this situation, the algorithm should use the response details list to record the individual responses and populate the overall Health Component Status Flag, Description and Reason with summary information. The suggestion is to do the following:

    • Set the status flag and response based on the details of the individual responses. Refer to the batch level of service information below for an example of this logic.

    • When the response list is populated, the health check user interface will show the overall Status Reason text along with an icon to expand the details. The suggestion is to populate this field with text like "See Results for Details".

The system provides an algorithm that checks the Batch Level of Service health component type. This health component type finds all the batch controls that are configured with at least one level of service algorithm and invokes the algorithms for each batch control. The business service populates the output for this health service for each batch control as follows:

  • The Health Component Detail is populated with the Batch Control code and description. In addition, the navigation information for being able to drill into the batch control are provided and used to build the column as hypertext.

  • The Status is populated based on whether the batch control has one algorithm or multiple. If there is one algorithm, the Level of Service lookup value returned by the algorithm is returned. If there are multiple, the system determines an overall status based on the detailed status values from each algorithm. If any algorithm returns Error, that value is returned. Otherwise, if any return a Warning, that value is returned. Otherwise Normal is returned.

  • The Status Reason is populated based on whether the batch control has one algorithm or multiple. If there is one algorithm, the expanded text of the status reason returned by the algorithm is returned. If there are multiple, the text from message category / message number 11002 / 22001 is returned (See Results for Details)

  • The Response is populated based on the value of the overall Level of Service status. It is set to All Checks Successful (200) when the Level of Service is Normal or Disabled; Non-Critical Function Degraded (203) when the Level of Service is Warning and One or More Critical Functions Degraded (500) when the Level of Service is Error.

How are the Algorithms Called?

These algorithms are called by a business service provided by the product F1-HealthCheck.

That service calculates an overall Health Response value based on all the details returned by all the algorithms. The values described above for the HEALTH_​RESPONSE_​FLG are the ones used.

Also note that the system provides Inbound Web Services for this business service for both SOAP and REST service calls allowing external systems to use a web service to retrieve this information.