HeatWave Cluster Failure and Recovery

HeatWave triggers the recovery process when there is either a HeatWave node failure, or maintenance or planned restart of DB system.

HeatWave monitors the HeatWave node status regularly and if there is no response from the node after 60 seconds, it considers it a HeatWave node failure.

During the recovery process, HeatWave automatically attempts to bring the node online, reform the cluster, and reload data that was previously loaded. There are two ways of reloading data:

  • From the HeatWave storage layer: During recovery from a HeatWave node failure, HeatWave reloads data from the HeatWave Storage layer, which is created when you enable the HeatWave cluster for the first time. To facilitate recovery, data is persisted to Object Storage when data is loaded into the HeatWave cluster and when data changes is propagated from the DB system to the HeatWave cluster. Loading data from Object Storage is faster because the data does not need to be converted to the HeatWave storage format, as is required when loading data from the DB system.
    Note

    You can monitor the status of the HeatWave cluster by checking the HeatWave cluster metric named HeatWave cluster health status. See HeatWave Cluster Metrics. The status shows 1 for RECOVERING and the status changed to 0 for HEALTHY when the recovery has completed.
  • From the DB system: During recovery from maintenance or planned restart of MySQL Server, HeatWave cluster reloads data from the DB system.
    Note

    You can monitor the reload progress with the HeatWave cluster metric named HeatWave cluster data load progress. See HeatWave Cluster Metrics. The reload process takes time, especially if the data is large. Don't try to restart again as each restart will need to reload all the data from the beginning again.

During recovery, HeatWave cluster automatically reloads the data. However, if the MySQL Server is in the SUPER_READ_ONLY mode, you cannot load data into HeatWave cluster, and the HeatWave recovery fails. Disable the SUPER_READ_ONLY mode to load data. See Resolving SUPER_READ_ONLY and OFFLINE_MODE Issue.

When you unload a table, the data is removed from the HeatWave cluster, and in a background operation, it is removed from the HeatWave Storage Layer in the Object Storage too.