2 DB Tier Metrics and Alerts
DB Tier Metrics
Following sections list the metrics in DB Tier.
DB Tier Node Status Metrics
Table 2-1 DB Tier Node Status Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_node_status | Node ID, Node Type, Node version |
Node ID - Node Type - Node version: |
DB Tier node status of the node. The value of this metrics is "0" if the node is DOWN and "1" if the node is UP. |
DB Tier Table Read Write Metrics
Table 2-2 DB Tier Table Read Write Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_local_operations | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of local operations in DB Tier for the node. |
db_tier_transactions | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of transactions in DB Tier for the node. |
db_tier_commits | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of commits in DB Tier for the node. |
db_tier_reads | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of reads in DB Tier for the node. |
db_tier_local_reads | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of local reads in DB Tier for the node. |
db_tier_writes | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of writes in DB Tier cluster for the node. |
db_tier_local_writes | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of local writes in DB Tier for the node. |
db_tier_aborts | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of aborted transactions in DB Tier for the node. |
db_tier_table_scans | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total no of table scan in DB Tier for the node. |
db_tier_range_scans | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Total number of range scans in DB Tier for the node. |
db_tier_transporter_overload | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Transporter overload in DB Tier for the node. |
db_tier_scan_slowdown | Node ID, Block Name, Block Instance | Node ID: node_id Block Name:
block_instance |
Scan slowdown in DB Tier for the node. |
DB Tier CPU Usage Metrics
Table 2-3 DB Tier CPU Usage Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_cpu_os_user | Node ID, Thread | Node ID: node_id Thread: |
DB Tier User CPU usage for the node. |
db_tier_cpu_os_system | Node ID, Thread | Node ID: node_id Thread: |
DB Tier System CPU usage for the node. |
db_tier_cpu_os_idle | Node ID, Thread | Node ID: node_id Thread: |
Idle CPU statistics for the node. |
DB Tier Memory Usage Metrics
Table 2-4 DB Tier Memory Usage Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_memory_used_bytes | Node ID
Memory Type |
Node ID: node_id Memory Type: |
Memory used for the node. |
db_tier_memory_total_bytes | Node ID
Memory Type |
Node ID: node_id Memory Type: |
Total memory for the node. |
DB Tier Bin Log Usage Metrics
Table 2-5 DB Tier Bin Log Usage Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_binlog_used_bytes_percentage | Node ID | Node ID: node_id |
Percentage of total memory used by bin log in the SQL node. |
DB Tier Replication Metrics
Table 2-6 DB Tier Replication Metrics
Metric Name | Parameters | Values | Description |
---|---|---|---|
db_tier_replication_status | Node ID
Source ID |
Node ID: node_id Source ID:
|
This Metrics value is:
|
db_tier_replication_slave_delay | Channel ID
Master Node IP Slave Node IP |
Channel ID: Master Node IP: Slave Node IP: |
Number of seconds that the last record read by the slave is behind the latest record written by the master |
DB Tier Alerts
Table 2-7 DB Tier Alerts
Alert Name | Summary | Severity | Expression | For | SNMP Trap ID | Service Affecting? | Notes |
---|---|---|---|---|---|---|---|
NODE_DOWN | MySQL {{ $labels.node_type }} node having node id {{ $labels.node_id }} is down | major | db_tier_data_node_status == 0 | N/A | 2001 | Y | db_tier_data_node_status value "0" indicates that a node is DOWN and value "1" indicates that the node is UP. |
HIGH_CPU | Node ID {{ $labels.node_id }} CPU utilization at {{ value }} percent. | warning | (100 - (avg(avg_over_time(db_tier_cpu_os_idle[10m]))BY (node_id)))>= 85 | 1m | 2002 | N | HIGH_CPU alert is fired when CPU usage of any node >=85% |
LOW_MEMORY | Node ID {{ $labels.node_id }} memory utilization at {{ value }} percent. | major | (avg_over_time(db_tier_memory_used_bytes[1m])BY (node_id,memory_type) / avg_over_time(db_tier_memory_total_bytes[1m])BY (node_id, memory_type)) * 100>= 80 | 1m | 2003 | N | LOW_MEMORY alert is fired when RAM usage of any node >=80% |
OUT_OF_MEMORY | Node ID {{ $labels.node_id }} out of memory. | critical | (avg_over_time(db_tier_memory_used_bytes[1m])BY (node_id,memory_type) / avg_over_time(db_tier_memory_total_bytes[1m])BY (node_id, memory_type)) * 100>= 90 | N/A | 2004 | Y | OUT_OF_MEMORY alert is fired when RAM usage of any node >= 90% |
BINLOG_STORAGE_LOW | Disk storage on SQL node with node ID {{ $labels.node_id }} at {{ $value }} percent | minor | (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 70) and (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) < 80) | 5m | 2007 | N | BINLOG_STORAGE_LOW alert is fired with Minor Severity when the total BinLog size of the SQL node is >=70% and <80% of Total SQL node Disk size. |
BINLOG_STORAGE_LOW | Disk storage on SQL node with node ID {{ $labels.node_id }} at {{ $value }} percent | major | (avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 80) and (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])< 95) | 5m | 2007 | N | BINLOG_STORAGE_LOW alert is fired with Major Severity when the total BinLog size of the SQL node is >=80% and <95% of Total SQL node Disk size. |
BINLOG_STORAGE_FULL | Disk storage on SQL node with node ID {{ $labels.node_id }} is full | critical | avg_over_time(db_tier_binlog_used_bytes_percentage[5m]) >= 95 | 2008 | Y | BINLOG_STORAGE_LOW alert is fired with Critical Severity when the total BinLog size of the SQL node is >=95% of Total SQL node Disk size. | |
SLAVE_REPLICATION_DELAY_HIGH | Slave replication on SQL node at {{ $labels.slave_ip }} is {{ $value }} seconds behind the master | major | avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) >= 5*60 and avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) < 48*3600 | 2009 | N | The last record read by the slave is more than 5 minutes behind the latest record written by the master. | |
SLAVE_REPLICATION_FAILED | Slave replication has fallen more than 48 hours behind the master. Manual restore from backup may be required. | critical | avg(avg_over_time(db_tier_replication_slave_delay[5m])) by (master_node_ip,slave_node_ip) >= 48*3600 | 2010 | Y | The last record read by the slave is more than 48 hours behind the latest record written by the master. |