2 DB Tier Metrics and Alerts

DB Tier generates metrics that can be used by the end users. The metrics, and the alerts generated by DB Tier can be seen in the Prometheus dashboard which could be used by the end users to take necessary actions. Prometheus gets installed as part of common services during the vCNE installation. Following are the available DB Tier metrics and alerts for the end users:

DB Tier Metrics

Following sections list the metrics in DB Tier.

DB Tier Node Status Metrics

Table 2-1 DB Tier Node Status Metrics

Metric Name Parameters Values Description
db_tier_node_status Node ID, Node Type, Node version

Node ID - node_id of the DB node

Node Type - node_type of the DB node (Data Node, Management Node or SQL Node)

Node version: node_version DB Tier Cluster software version

DB Tier node status of the node. The value of this metrics is "0" if the node is DOWN and "1" if the node is UP.

DB Tier Table Read Write Metrics

Table 2-2 DB Tier Table Read Write Metrics

Metric Name Parameters Values Description
db_tier_local_operations Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of local operations in DB Tier for the node.
db_tier_transactions Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of transactions in DB Tier for the node.
db_tier_commits Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of commits in DB Tier for the node.
db_tier_reads Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of reads in DB Tier for the node.
db_tier_local_reads Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of local reads in DB Tier for the node.
db_tier_writes Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of writes in DB Tier cluster for the node.
db_tier_local_writes Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of local writes in DB Tier for the node.
db_tier_aborts Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of aborted transactions in DB Tier for the node.
db_tier_table_scans Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total no of table scan in DB Tier for the node.
db_tier_range_scans Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Total number of range scans in DB Tier for the node.
db_tier_transporter_overload Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Transporter overload in DB Tier for the node.
db_tier_scan_slowdown Node ID, Block Name, Block Instance Node ID: node_id

Block Name: block_name

Block Instance: block_instance
Scan slowdown in DB Tier for the node.

DB Tier CPU Usage Metrics

Table 2-3 DB Tier CPU Usage Metrics

Metric Name Parameters Values Description
db_tier_cpu_os_user Node ID, Thread Node ID: node_id

Thread: thread ID

DB Tier User CPU usage for the node.
db_tier_cpu_os_system Node ID, Thread Node ID: node_id

Thread: thread ID

DB Tier System CPU usage for the node.
db_tier_cpu_os_idle Node ID, Thread Node ID: node_id

Thread: thread ID

Idle CPU statistics for the node.

DB Tier Memory Usage Metrics

Table 2-4 DB Tier Memory Usage Metrics

Metric Name Parameters Values Description
db_tier_memory_used_bytes Node ID

Memory Type

Node ID: node_id

Memory Type: memory_type

Memory used for the node.
db_tier_memory_total_bytes Node ID

Memory Type

Node ID: node_id

Memory Type: memory_type

Total memory for the node.

DB Tier Bin Log Usage Metrics

Table 2-5 DB Tier Bin Log Usage Metrics

Metric Name Parameters Values Description
db_tier_binlog_used_bytes_percentage Node ID Node ID: node_id Percentage of total memory used by bin log in the SQL node.

DB Tier Replication Metrics

Table 2-6 DB Tier Replication Metrics

Metric Name Parameters Values Description
db_tier_replication_status Node ID

Source ID

Node ID: node_id

Source ID: source_uuid

This Metrics value is:

  1. "0" : Replication Channel Status of local site is ON
  2. "1" : Replication Channel Status of local site is OFF
  3. "2" : Replication Channel Status of local site is CONNECTING
db_tier_replication_slave_delay Channel ID

Master Node IP

Slave Node IP

Channel ID: channel_id

Master Node IP: master_node_ip

Slave Node IP: slave_node_ip

Number of seconds that the last record read by the slave is behind the latest record written by the master

DB Tier Alerts

Table 2-7 DB Tier Alerts

Alert Name Summary Severity Expression For SNMP Trap ID Service Affecting? Notes
NODE_DOWN MySQL {{ $labels.node_type }} node having node id {{ $labels.node_id }} is down major db_tier_data_node_status == 0 N/A 2001 Y db_tier_data_node_status value "0" indicates that a node is DOWN and value "1" indicates that the node is UP.
HIGH_CPU Node ID {{ $labels.node_id }} CPU utilization at {{ value }} percent. warning (100 - (avg(avg_over_time(db_tier_cpu_os_idle[10m]))BY (node_id)))>= 85 1m 2002 N HIGH_CPU alert is fired when CPU usage of any node >=85%
LOW_MEMORY Node ID {{ $labels.node_id }} memory utilization at {{ value }} percent. major (avg_over_time(db_tier_memory_used_bytes[1m])BY (node_id,memory_type) / avg_over_time(db_tier_memory_total_bytes[1m])BY (node_id, memory_type)) * 100>= 80 1m 2003 N LOW_MEMORY alert is fired when RAM usage of any node >=80%
OUT_OF_MEMORY Node ID {{ $labels.node_id }} out of memory. critical (avg_over_time(db_tier_memory_used_bytes[1m])BY (node_id,memory_type) / avg_over_time(db_tier_memory_total_bytes[1m])BY (node_id, memory_type)) * 100>= 90 N/A 2004 Y OUT_OF_MEMORY alert is fired when RAM usage of any node >= 90%
BINLOG_STORAGE_LOW Disk storage on SQL node with node ID {{ $labels.node_id }} at {{ $value }} percent minor (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 70) and (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) < 80) 5m 2007 N BINLOG_STORAGE_LOW alert is fired with Minor Severity when the total BinLog size of the SQL node is >=70% and <80% of Total SQL node Disk size.
BINLOG_STORAGE_LOW Disk storage on SQL node with node ID {{ $labels.node_id }} at {{ $value }} percent major (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 80) and (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])< 95) 5m 2007 N BINLOG_STORAGE_LOW alert is fired with Major Severity when the total BinLog size of the SQL node is >=80% and <95% of Total SQL node Disk size.
BINLOG_STORAGE_FULL Disk storage on SQL node with node ID {{ $labels.node_id }} is full critical avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 95   2008 Y BINLOG_STORAGE_LOW alert is fired with Critical Severity when the total BinLog size of the SQL node is >=95% of Total SQL node Disk size.
SLAVE_REPLICATION_DELAY_HIGH Slave replication on SQL node at {{ $labels.slave_ip }} is {{ $value }} seconds behind the master major avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) >= 5*60 and avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) < 48*3600   2009 N The last record read by the slave is more than 5 minutes behind the latest record written by the master.
SLAVE_REPLICATION_FAILED Slave replication has fallen more than 48 hours behind the master. Manual restore from backup may be required. critical avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) >= 48*3600   2010 Y The last record read by the slave is more than 48 hours behind the latest record written by the master.