SNMP Poller
Overview
The SNMP Poller microservice can perform snmp device discovery created through the discovery-service's discovery RESTful API or execute periodic snmp polling once the devices have been provisioned in Unified Assurance through the discovery-service's inventory RESTful API.
The microservice is designed using the Controller-Worker architecture, composed of two components: a single coordinator and multiple instances of workers. The coordinator is responsible for managing workers, publishing metrics, calculating and coordinating snmp workloads between its workers. Whereas the workers are only responsible in executing those workloads and publishing their results to the appropriate microservice pipelines through the Apache Pulsar bus.
The microservice is expected to run in a separate microservices cluster for each Device Zone alongside other mandatory microservices. See Part 3 of the Prerequisites section below for more details.
Prerequisites
-
A microservices cluster must be setup. Refer to Microservice Cluster Setup.
-
Apache Pulsar must be installed. Refer to Apache Pulsar microservice.
-
The following core microservices must be installed as per the requirement
Setup
su - assure1
export NAMESPACE=a1-zone1-pri
export WEBFQDN=<Primary Presentation Web FQDN>
a1helm install snmp-poller assure1/snmp-poller -n $NAMESPACE --set global.imageRegistry=$WEBFQDN
Default Global Configuration
Name | Value | Possible Values | Notes |
---|---|---|---|
LOG_LEVEL | INFO | FATAL, ERROR, WARN, INFO, DEBUG | Global logging level between coordinator and workers. |
PORT_COORDINATOR | 38890 | Integer | Internal port used by the coordinator service. |
PORT_WORKER | 38891 | Integer | Internal port used by the worker service. |
Above configurations can be changed by passing the values to the a1helm install
prefixed with configData.
Example of setting the logging level to DEBUG for both coordinator and worker
a1helm install ... --set configData.LOG_LEVEL=DEBUG
Default Coordinator-Only Configuration
Name | Value | Possible Values | Notes |
---|---|---|---|
LOG_LEVEL | INFO | FATAL, ERROR, WARN, INFO, DEBUG | Coordinator logging level. This overrides the global configuration. |
GRPC_FALLBACK_USE_IP | true | Text (true/false) | Should coordinator communicate with workers using ip addresses instead of hostnames. |
WORKER_CONCURRENCY | 2000 | Integer (0 < value) | How many concurrent snmp workloads can a single worker instance perform. |
DISCOVERY_WORKER_PERCENTAGE | 25 | Integer (0 <= value <= 100) | What percentage of workers are allocated only to perform discovery workloads. |
POLLER_RESYNC_PERIOD | 15m | Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".) | How frequently should the coordinator re-synchronize with the Unified Assurance database. |
PULSAR_SNMP_DISCOVERY_TOPIC_OVERRIDE | "" | Text | Override for the topic from which the coordinator listens for discovery workload requests. |
REDUNDANCY_INIT_DELAY | 20s | Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".) | How long should redundancy wait for primary up status before becoming active over during startup. |
REDUNDANCY_POLL_PERIOD | 5s | Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".) | How frequently should the secondary microservice poll for primary microservices failure. |
REDUNDANCY_FAILOVER_THRESHOLD | 4 | Integer (0 < value) | The number of failed checks after which the secondary microservice becomes active. |
REDUNDANCY_FALLBACK_THRESHOLD | 1 | Integer (0 < value) | The number of successful checks after which the secondary microservice goes back to sleep. |
Configurations can be changed by passing the values to the a1helm install
prefixed with coordinator.configData.
Example of setting the logging level to DEBUG only for the coordinator
a1helm install ... --set coordinator.configData.LOG_LEVEL=DEBUG
Default Worker-Only Configuration
Name | Value | Possible Values | Notes |
---|---|---|---|
LOG_LEVEL | INFO | FATAL, ERROR, WARN, INFO, DEBUG | Worker logging level. This overrides the global configuration. |
GRPC_GRACEFUL_CONN_TIME | 60s | Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".) | Up to how long should the workers try attempt connecting with the coordinator before failing. |
STREAM_OUTPUT_METRIC | "" | Text | Override for the topic where performance polling workload results are published. |
STREAM_OUTPUT_AVAILABILITY | "" | Text | Override for the topic where availability polling workload results are published. |
PULSAR_DISCOVERY_CALLBACK_OVERRIDE | "" | Text | Override for the topic where discovery workload results are published. |
Configurations can be changed by passing the values to the a1helm install
prefixed with worker.configData.
Example of setting the logging level to DEBUG only for workers
a1helm install ... --set worker.configData.LOG_LEVEL=DEBUG
Autoscaling
The SNMP Poller Microservice uses the below formulae to determine the best number of workers required to perform snmp workloads:
polling workers = round up(unique devices being polled / worker concurrency)
discovery workers = round up(polling workers * discovery worker percentage / 100)
total workers required = polling workers + discovery workers
During re-synchronisation with the Unified Assurance database, the coordinator determines the number of unique polled devices and performs the above calculations.
The result is then exposed as snmp_coordinator_metric_workers_required prometheus metric, to be ingested by KEDA to make the scaling decision.
Info:
Autoscaling is disabled by default.
Warn:
While the provided autoscaling has almost out-of-the-box functionality, You will need to manually configure the upper bound autoscaling limit during installation.
Using the expected number of devices to be polled in each Device Zone, decide the percentage of discovery workers (or leave it default), apply the formulae and configure the upper-bound limit.
For common microservice scaling configuration options, please refer to the autoscaling docs.
Examples
-
For 100,000 polled devices with worker concurrency of 2000 and 25% discovery workers, total required workers = 63
- 50 workers will be assigned to perform polling workloads
- 13 workers will be assigned to perform discovery workloads
-
For 250,000 polled devices with worker concurrency of 3000 and 33% discovery workers, total required workers = 112
- 84 workers will be assigned to perform polling workloads
- 28 workers will be assigned to perform discovery workloads
Modifying scaling triggers
By default, only a single autoscaling trigger is defined. You can define additional triggers during installation alongside the common configuration options.
autoscaling:
...
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.a1-monitoring.svc.cluster.local:9090
query: snmp_coordinator_metric_workers_required
threshold: '1'
metricType: Value
Microservice self-metrics
The SNMP Poller Microservice exposes the following self-metrics to Prometheus.
Coordinator metrics table
Note:
Each of the below metrics is prefixed with snmp_coordinator prefix. Example of a full metric name: snmp_coordinator_metric_worker_count
Metric Name | Type | Labels | Description |
---|---|---|---|
metric_worker_count | Gauge | N/A | The number of workers currently enrolled with the coordinator. |
metric_workforce_count | Gauge | N/A | The number of workers multiplied by worker concurrency. |
metric_discovery_worker_count | Gauge | N/A | The number of discovery workers currently enrolled with the coordinator. |
metric_polling_worker_count | Gauge | N/A | The number of polling workers currently enrolled with the coordinator. |
metric_workers_required | Gauge | N/A | The number of workers required for polling and discovery. |
metric_discovery_requests_queued | Gauge | N/A | The number of discovery requests. (queued, realtime) |
metric_discovery_requests_processing | Gauge | N/A | The number of discovery requests. (processing, realtime) |
metric_polling_requests_queued | Gauge | N/A | The number of polling requests. (queued, realtime) |
metric_polling_requests_processing | Gauge | N/A | The number of polling requests. (processing, realtime) |
metric_polled_devices_count | GaugeVec | domain, cycle | The number of polled devices per domain and cycle. |
metric_polled_objects_count | GaugeVec | domain, cycle | The number of polled objects per domain and per cycle. |
metric_polling_duration | GaugeVec | domain, cycle | The total polling duration in seconds for last cycle per domain and per cycle. |
metric_polling_average | GaugeVec | domain, cycle | The average polling duration in seconds for last cycle per domain and per cycle. |
metric_polling_percentile95 | GaugeVec | domain, cycle | The 95th percentile average polling duration in seconds for last cycle per domain and per cycle. |
metric_polling_utilisation | GaugeVec | domain, cycle | The polling utilisation in percent for last cycle per domain and per cycle. |
Microservice redundancy
Redundancy in the SNMP Poller Microservice controls which of the two microservices in a redundant pair is considered active to run periodic device polling.
Info:
Redundancy is disabled by default.
Example of enabling redundancy
a1helm install ... --set redundancy.enabled=true