SNMP Poller

Overview

The SNMP Poller microservice can perform snmp device discovery created through the discovery-service's discovery RESTful API or execute periodic snmp polling once the devices have been provisioned in Unified Assurance through the discovery-service's inventory RESTful API.

The microservice is designed using the Controller-Worker architecture, composed of two components: a single coordinator and multiple instances of workers. The coordinator is responsible for managing workers, publishing metrics, calculating and coordinating snmp workloads between its workers. Whereas the workers are only responsible in executing those workloads and publishing their results to the appropriate microservice pipelines through the Apache Pulsar bus.

The microservice is expected to run in a separate microservices cluster for each Device Zone alongside other mandatory microservices. See Part 3 of the Prerequisites section below for more details.

Prerequisites

A microservices cluster must be setup. Refer to Microservice Cluster Setup.
Apache Pulsar must be installed. Refer to Apache Pulsar microservice.
The following core microservices must be installed as per the requirement
- Discovery Service microservice

Setup

su - assure1
export NAMESPACE=a1-zone1-pri
export WEBFQDN=<Primary Presentation Web FQDN> 
a1helm install snmp-poller assure1/snmp-poller -n $NAMESPACE --set global.imageRegistry=$WEBFQDN

Default Global Configuration

Name	Value	Possible Values	Notes
LOG_LEVEL	INFO	FATAL, ERROR, WARN, INFO, DEBUG	Global logging level between coordinator and workers.
PORT_COORDINATOR	38890	Integer	Internal port used by the coordinator service.
PORT_WORKER	38891	Integer	Internal port used by the worker service.

Above configurations can be changed by passing the values to the a1helm install prefixed with configData.

Example of setting the logging level to DEBUG for both coordinator and worker

a1helm install ... --set configData.LOG_LEVEL=DEBUG

Default Coordinator-Only Configuration

Name	Value	Possible Values	Notes
LOG_LEVEL	INFO	FATAL, ERROR, WARN, INFO, DEBUG	Coordinator logging level. This overrides the global configuration.
GRPC_FALLBACK_USE_IP	true	Text (true/false)	Should coordinator communicate with workers using ip addresses instead of hostnames.
WORKER_CONCURRENCY	2000	Integer (0 < value)	How many concurrent snmp workloads can a single worker instance perform.
DISCOVERY_WORKER_PERCENTAGE	25	Integer (0 <= value <= 100)	What percentage of workers are allocated only to perform discovery workloads.
POLLER_RESYNC_PERIOD	15m	Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".)	How frequently should the coordinator re-synchronize with the Unified Assurance database.
PULSAR_SNMP_DISCOVERY_TOPIC_OVERRIDE	""	Text	Override for the topic from which the coordinator listens for discovery workload requests.
REDUNDANCY_INIT_DELAY	20s	Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".)	How long should redundancy wait for primary up status before becoming active over during startup.
REDUNDANCY_POLL_PERIOD	5s	Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".)	How frequently should the secondary microservice poll for primary microservices failure.
REDUNDANCY_FAILOVER_THRESHOLD	4	Integer (0 < value)	The number of failed checks after which the secondary microservice becomes active.
REDUNDANCY_FALLBACK_THRESHOLD	1	Integer (0 < value)	The number of successful checks after which the secondary microservice goes back to sleep.

Configurations can be changed by passing the values to the a1helm install prefixed with coordinator.configData.

Example of setting the logging level to DEBUG only for the coordinator

a1helm install ... --set coordinator.configData.LOG_LEVEL=DEBUG

Default Worker-Only Configuration

Name	Value	Possible Values	Notes
LOG_LEVEL	INFO	FATAL, ERROR, WARN, INFO, DEBUG	Worker logging level. This overrides the global configuration.
GRPC_GRACEFUL_CONN_TIME	60s	Integer + Text ("ns", "us" (or "µs"), "ms", "s", "m", "h".)	Up to how long should the workers try attempt connecting with the coordinator before failing.
STREAM_OUTPUT_METRIC	""	Text	Override for the topic where performance polling workload results are published.
STREAM_OUTPUT_AVAILABILITY	""	Text	Override for the topic where availability polling workload results are published.
PULSAR_DISCOVERY_CALLBACK_OVERRIDE	""	Text	Override for the topic where discovery workload results are published.

Configurations can be changed by passing the values to the a1helm install prefixed with worker.configData.

Example of setting the logging level to DEBUG only for workers

a1helm install ... --set worker.configData.LOG_LEVEL=DEBUG

Autoscaling

The SNMP Poller Microservice uses the below formulae to determine the best number of workers required to perform snmp workloads:

polling workers = round up(unique devices being polled / worker concurrency)
discovery workers = round up(polling workers * discovery worker percentage / 100)
total workers required = polling workers + discovery workers

During re-synchronisation with the Unified Assurance database, the coordinator determines the number of unique polled devices and performs the above calculations.

The result is then exposed as snmp_coordinator_metric_workers_required prometheus metric, to be ingested by KEDA to make the scaling decision.

Info:

Autoscaling is disabled by default.

Warn:

While the provided autoscaling has almost out-of-the-box functionality, You will need to manually configure the upper bound autoscaling limit during installation.

Using the expected number of devices to be polled in each Device Zone, decide the percentage of discovery workers (or leave it default), apply the formulae and configure the upper-bound limit.

For common microservice scaling configuration options, please refer to the autoscaling docs.

Examples

For 100,000 polled devices with worker concurrency of 2000 and 25% discovery workers, total required workers = 63
- 50 workers will be assigned to perform polling workloads
- 13 workers will be assigned to perform discovery workloads
For 250,000 polled devices with worker concurrency of 3000 and 33% discovery workers, total required workers = 112
- 84 workers will be assigned to perform polling workloads
- 28 workers will be assigned to perform discovery workloads

Modifying scaling triggers

By default, only a single autoscaling trigger is defined. You can define additional triggers during installation alongside the common configuration options.

autoscaling:
  ...
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-kube-prometheus-prometheus.a1-monitoring.svc.cluster.local:9090
        query: snmp_coordinator_metric_workers_required
        threshold: '1'
        metricType: Value

Microservice self-metrics

The SNMP Poller Microservice exposes the following self-metrics to Prometheus.

Coordinator metrics table

Note:

Each of the below metrics is prefixed with snmp_coordinator prefix. Example of a full metric name: snmp_coordinator_metric_worker_count

Metric Name	Type	Labels	Description
metric_worker_count	Gauge	N/A	The number of workers currently enrolled with the coordinator.
metric_workforce_count	Gauge	N/A	The number of workers multiplied by worker concurrency.
metric_discovery_worker_count	Gauge	N/A	The number of discovery workers currently enrolled with the coordinator.
metric_polling_worker_count	Gauge	N/A	The number of polling workers currently enrolled with the coordinator.
metric_workers_required	Gauge	N/A	The number of workers required for polling and discovery.
metric_discovery_requests_queued	Gauge	N/A	The number of discovery requests. (queued, realtime)
metric_discovery_requests_processing	Gauge	N/A	The number of discovery requests. (processing, realtime)
metric_polling_requests_queued	Gauge	N/A	The number of polling requests. (queued, realtime)
metric_polling_requests_processing	Gauge	N/A	The number of polling requests. (processing, realtime)
metric_polled_devices_count	GaugeVec	domain, cycle	The number of polled devices per domain and cycle.
metric_polled_objects_count	GaugeVec	domain, cycle	The number of polled objects per domain and per cycle.
metric_polling_duration	GaugeVec	domain, cycle	The total polling duration in seconds for last cycle per domain and per cycle.
metric_polling_average	GaugeVec	domain, cycle	The average polling duration in seconds for last cycle per domain and per cycle.
metric_polling_percentile95	GaugeVec	domain, cycle	The 95th percentile average polling duration in seconds for last cycle per domain and per cycle.
metric_polling_utilisation	GaugeVec	domain, cycle	The polling utilisation in percent for last cycle per domain and per cycle.

Microservice redundancy

Redundancy in the SNMP Poller Microservice controls which of the two microservices in a redundant pair is considered active to run periodic device polling.

Info:

Redundancy is disabled by default.

Example of enabling redundancy

a1helm install ... --set redundancy.enabled=true

Title and Copyright Information

Implementation Guide

F88952-01