Monitor an Oracle Cloud Infrastructure Database with PostgreSQL using Prometheus and Grafana

Introduction

PostgreSQL is one of the most popular databases among application developers. Beyond the management overhead elimination that Oracle Cloud Infrastructure Database with PostgreSQL (OCI Database with PostgreSQL) offers, it also goes a step further when it comes to implementing an ease of monitoring. Prometheus and Grafana are a very popular pair when it comes to monitoring application performance for time-series data storage and visualization respectively.

In this tutorial, we will demonstrate how to monitor an OCI Database with PostgreSQL by extracting query-level metrics using the PostgreSQL Exporter, scraping them with Prometheus, visualizing them in Grafana, and setting up alerts for bloat metrics.

Architecture Overview

The following schema depicts the overall approach. The following components are used:

Architecture view

Note: While the architecture displays three subnets for clarity, this tutorial uses only two, and the jump host is also the VM used to deploy Prometheus and Grafana.

Objectives

Prerequisites

Task 1: Enable Monitoring Extensions on OCI Database with PostgreSQL

  1. Enable the pg_stat_statements Extension.

    Add pg_stat_statements on the OCI Database with PostgreSQL configurations through OCI Console. For more information, see Enabling Extensions for a Database.

  2. Create Monitoring User.

    Run the following queries to create and connect user using DBeaver through jump host.

    CREATE USER postgres_exporter WITH PASSWORD '<password>';
    GRANT CONNECT ON DATABASE postgres TO postgres_exporter;
    GRANT pg_monitor TO postgres_exporter;
    

    Note: pg_stat_io extension is available from OCI Database with PostgreSQL v16 onwards.

Task 2: Install the PostgreSQL Exporter for Prometheus

Note: Run the following commands on your OCI Compute instance.

  1. Install PostgreSQL Exporter.

    wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
    
    tar -xvf postgres_exporter-0.15.0.linux-amd64.tar.gz
    
    sudo mv postgres_exporter-0.15.0.linux-amd64/postgres_exporter /usr/local/bin/
    
  2. Create Custom queries.yaml.

    nano queries.yaml
    
    # Custom PostgreSQL Query Metrics for Postgres Exporter
    
    # Top 5 slowest SQL statements
    top_sql_statements:
    query: >
    	SELECT query, calls, total_exec_time + total_plan_time AS total_time
    	FROM pg_stat_statements
    	ORDER BY total_time DESC
    	LIMIT 5;
    metrics:
    	- query:
    		usage: "LABEL"
    		description: "SQL query text"
    	- calls:
    		usage: "COUNTER"
    		description: "Number of executions"
    	- total_time:
    		usage: "GAUGE"
    		description: "Total execution + plan time"
    
    # 2. Top users by total execution time
    top_sql_users:
    query: >
    	SELECT usename, SUM(total_exec_time + total_plan_time) AS total_exec_time
    	FROM pg_stat_statements
    	JOIN pg_user ON pg_stat_statements.userid = pg_user.usesysid
    	GROUP BY usename
    	ORDER BY total_exec_time DESC
    	LIMIT 5;
    metrics:
    	- usename:
    		usage: "LABEL"
    		description: "PostgreSQL user"
    	- total_exec_time:
    		usage: "GAUGE"
    		description: "Total execution time by user"
    
    # 3. Top client IPs by connection count
    top_clients:
    query: >
    	SELECT client_addr, COUNT(*) AS connections
    	FROM pg_stat_activity
    	WHERE client_addr IS NOT NULL
    	GROUP BY client_addr
    	ORDER BY connections DESC
    	LIMIT 5;
    metrics:
    	- client_addr:
    		usage: "LABEL"
    		description: "Client IP address"
    	- connections:
    		usage: "GAUGE"
    		description: "Number of active connections"
    
    # 4. Currently running queries by user and IP
    active_queries_by_user_ip:
    query: >
    	SELECT usename, client_addr, COUNT(*) as active_queries
    	FROM pg_stat_activity
    	WHERE state = 'active'
    	GROUP BY usename, client_addr;
    metrics:
    	- usename:
    		usage: "LABEL"
    		description: "User running queries"
    	- client_addr:
    		usage: "LABEL"
    		description: "Client IP address"
    	- active_queries:
    		usage: "GAUGE"
    		description: "Number of currently active queries"
    
    # 5. Query frequency (most called)
    frequent_sql:
    query: >
    	SELECT query, calls
    	FROM pg_stat_statements
    	ORDER BY calls DESC
    	LIMIT 5;
    metrics:
    	- query:
    		usage: "LABEL"
    		description: "SQL query text"
    	- calls:
    		usage: "COUNTER"
    		description: "Number of executions"
    
    # 6. Table bloat statistics
    table_bloat:
    query: >
    	SELECT
    	current_database() AS current_db,
    	schemaname AS schema_name,
    	tblname AS table_name,
    	(bs * tblpages) / 1024 AS actual_table_size_kb,
    	((tblpages - est_tblpages) * bs) / 1024 AS wasted_space_kb,
    	CASE
    		WHEN tblpages > 0 AND tblpages - est_tblpages > 0 THEN 100 * (tblpages - est_tblpages) / tblpages::float
    		ELSE 0
    	END AS wasted_space_percent,
    	fillfactor AS table_fill_factor,
    	CASE
    		WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages - est_tblpages_ff) * bs / 1024
    		ELSE 0
    	END AS bloat_size_kb,
    	CASE
    		WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0 THEN 100 * (tblpages - est_tblpages_ff) / tblpages::float
    		ELSE 0
    	END AS bloat_size_percent
    	FROM (
    	SELECT
    		ceil(reltuples / ((bs - page_hdr) / tpl_size)) + ceil(toasttuples / 4) AS est_tblpages,
    		ceil(reltuples / ((bs - page_hdr) * fillfactor / (tpl_size * 100))) + ceil(toasttuples / 4) AS est_tblpages_ff,
    		tblpages,
    		fillfactor,
    		bs,
    		schemaname,
    		tblname
    	FROM (
    		SELECT
    		(4 + tpl_hdr_size + tpl_data_size + (2 * ma)
    		- CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END
    		- CASE WHEN ceil(tpl_data_size)::int % ma = 0 THEN ma ELSE ceil(tpl_data_size)::int % ma END
    		) AS tpl_size,
    		(heappages + toastpages) AS tblpages,
    		reltuples,
    		toasttuples,
    		bs,
    		page_hdr,
    		schemaname,
    		tblname,
    		fillfactor,
    		ma,
    		tpl_hdr_size,
    		tpl_data_size
    		FROM (
    		SELECT
    			ns.nspname AS schemaname,
    			tbl.relname AS tblname,
    			tbl.reltuples,
    			tbl.relpages AS heappages,
    			COALESCE(toast.relpages, 0) AS toastpages,
    			COALESCE(toast.reltuples, 0) AS toasttuples,
    			COALESCE(substring(array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
    			current_setting('block_size')::numeric AS bs,
    			CASE
    			WHEN version() ~ 'mingw32|64-bit|x86_64|ppc64|ia64|amd64' THEN 8
    			ELSE 4
    			END AS ma,
    			24 AS page_hdr,
    			23 +
    			CASE WHEN MAX(COALESCE(s.null_frac, 0)) > 0 THEN (7 + count(s.attname)) / 8 ELSE 0 END +
    			CASE WHEN bool_or(att.attname = 'oid' AND att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
    			SUM((1 - COALESCE(s.null_frac, 0)) * COALESCE(s.avg_width, 0)) AS tpl_data_size
    		FROM pg_attribute AS att
    		JOIN pg_class AS tbl ON att.attrelid = tbl.oid
    		JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
    		LEFT JOIN pg_stats AS s ON s.schemaname = ns.nspname AND s.tablename = tbl.relname AND s.attname = att.attname
    		LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
    		WHERE NOT att.attisdropped
    			AND tbl.relkind IN ('r', 'm')
    		GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
    		) AS inner_subquery
    	) AS mid_subquery
    	) AS bloat_subquery
    	WHERE schemaname = 'public'
    	ORDER BY schema_name, table_name;
    metrics:
    	- schema_name:
    		usage: "LABEL"
    		description: "Schema name"
    	- table_name:
    		usage: "LABEL"
    		description: "Table name"
    	- bloat_size_kb:
    		usage: "GAUGE"
    		description: "Bloat size in KB"
    	- bloat_size_percent:
    		usage: "GAUGE"
    		description: "Bloat size as a percentage"
    
  3. Set the Data Source Environment Variable.

    export DATA_SOURCE_NAME="postgresql://postgres_exporter:<password>@<POSTGRES_HOST>:5432/postgres?sslmode=require"
    

    Note: <POSTGRES_HOST> is the primary endpoint FQDN. You can find it in the PostgreSQL cluster details page in the OCI Console.

  4. Start the PostgreSQL Exporter.

    Run the PostgreSQL Exporter with the custom queries.yaml file using the following command. This starts the exporter on port 9187 and logs output to exporter.log.

    nohup /usr/local/bin/postgres_exporter --extend.query-path=/home/opc/queries.yaml --web.listen-address=":9187" > exporter.log 2>&1 &
    

    Note: To keep things running after reboot, consider setting up the service using systemd

  5. Verify the Exporter.

    curl http://localhost:9187/metrics
    

Task 3: Install and Configure Prometheus

  1. Install Prometheus.

    wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
    
    tar -xvf prometheus-2.52.0.linux-amd64.tar.gz
    
    sudo mv prometheus-2.52.0.linux-amd64/prometheus /usr/local/bin/
    sudo mv prometheus-2.52.0.linux-amd64/promtool /usr/local/bin/
    
  2. Create Prometheus Config.

    Create /etc/prometheus/prometheus.yml file with the following content.

    global:
    scrape_interval: 15s
    
    scrape_configs:
    - job_name: 'postgres_exporter'
    	static_configs:
    	- targets: ['127.0.0.1:9187']
    
  3. Start Prometheus.

    nohup prometheus --config.file=/etc/prometheus/prometheus.yml > prometheus.log 2>&1 &
    

    Note: To keep things running after reboot, consider setting up the service using systemd

  4. Verify Targets.

    Visit here: http://<instance_ip>:9090/targets.

    Prometheus Dashboard

Task 4: Install and Configure Grafana

  1. Install Grafana.

    sudo yum install -y https://dl.grafana.com/oss/release/grafana-10.4.2-1.x86_64.rpm
    sudo systemctl enable grafana-server
    sudo systemctl start grafana-server
    
  2. Log in to Grafana.

    Go to the following URL: http://<instance_ip>:3000 and enter Username as admin and Password as admin.

  3. Add the Data Source in Grafana.

    1. Go to Connections, Data Sources and click Add data source.

    2. Select Prometheus.

    3. Enter URL as http://localhost:9090.

    Adding Datasource

  4. Import PostgreSQL Dashboard.

    1. Select Import a dashboard.

    2. Import 9628 dashboard ID from Grafana dashboards.

    Import Postgres

Task 5: Explore OCI Database with PostgreSQL Metrics in Grafana

The following images show the final dashboards.

Dashboard Example

Dashboard Example

Task 6: Set Up Alert for OCI Database with PostgreSQL Bloat Metrics in Grafana

  1. Collect Bloat Metric.

    • A custom SQL query was used to estimate table bloat, based on PostgreSQL system catalog views such as pg_class, pg_namespace, pg_attribute, and pg_stats.

    • This query was exposed as a custom metric through a PostgreSQL Exporter by adding it to the queries.yaml file, as shown in Task 2.2, and was subsequently scraped by Prometheus.

  2. Configure Alert Rule.

    1. Navigate to Alerting and click Alert rules in Grafana.

    2. Click + New alert rule.

      New Alert Rule

    3. Enter a Name for your alert rule.

    4. Select Prometheus as the Data source.

    5. In the Define query and alert condition section, enter the following information.

      • Search for the table_bloat_bloat_size_percent metric.

      • In the Alert condition builder, enter the following information.

        • Set Input: Select A.

        • Condition: Select IS ABOVE 30.

        Note: You can click Preview to validate whether the alert would fire under current conditions.

        Define_query

    6. In the Set evaluation behavior section, enter the following information.

      1. Click + New evaluation group.

      2. Evaluation Group: Enter postgres-bloat-alerts.

      3. Evaluation interval: Select 5m (this defines how often Grafana evaluates the alert rule).

      4. Pending period: Set to 5m (the alert must stay true for 5 continuous minutes before it triggers. This should be equal to or longer than the evaluation interval).

    7. Click Save rule and exit to activate your alert.

      Evaluation Behavior

      Evaluation Group

  3. Configure Email Alerting through SMTP (Gmail).

    To enable email notifications for alerts in Grafana, configure the smtp section of your grafana.ini file.

    1. Edit the Grafana config file in the OCI Compute instance where you have installed Grafana.

      sudo nano /etc/grafana/grafana.ini
      
    2. Uncomment and update the smtp section with the following content.

      [smtp]
      enabled = true
      host = smtp.gmail.com:587
      user = your_email@gmail.com
      password = ***************
      skip_verify = true
      from_address = your_email@gmail.com
      from_name = Grafana
      ehlo_identity = localhost
      startTLS_policy = OpportunisticStartTLS
      enable_tracing = false
      

      Note: To use Gmail, you must enable 2-Step Verification and generate an App Password from your Google account settings. Do not use your regular Gmail password.

    3. After saving the config, restart Grafana to apply the changes using the following command.

      sudo systemctl restart grafana-server
      
  4. Set Up Contact Points for Alert Notifications.

    Contact points define where and how Grafana sends alerts when a rule is triggered such as email, Slack, webhook, and so on.

    1. Navigate to Alerting and click Contact points in Grafana.

    2. Click + New contact point to create one, or select and edit an existing one.

    3. To configure an Email Contact Point, enter the following information.

      • Name: Enter the contact point name (for example, grafana-default-email).

      • Integration: Select Email.

      • Addresses: Enter one or more email addresses.

    4. Click Test to verify email delivery.

    5. Click Save contact point.

    Contact Points

  5. Configure Notification Policy.

    Notification policies in Grafana control how and when alerts are routed to contact points. After setting up a contact point, you must create a notification policy to connect alerts to that contact point.

    1. Navigate to Alerting and click Notification policies in Grafana.

    2. Click Edit on the default policy or + New policy.

    3. Under Default contact point, select your contact (grafana-default-email).

    4. Under Group by, you can group alerts based on label values such as:

      • grafana_folder

      • alertname

    5. Click Timing options and enter the following information.

      • Group wait: Select 30s, this is the time to wait before sending the first alert in a group.

      • Group interval: Select 5m, this is the time between batches of grouped alerts.

      • Repeat interval: Select 30m, this is the time after which the same alert is resent if still firing.

    6. Click Update default policy to save.

    Notification Policy

  6. Trigger and Verify Alert.

    1. Simulate a threshold breach (optional).

      To test alert triggering, you can either:

      • Temporarily lower the alert threshold (for example, set bloat_size_percent > 0).

        Or,

      • Introduce bloat by performing repeated UPDATE or DELETE operations on a large test table.

    2. Wait for evaluation.

      Grafana checks alert conditions at your defined evaluation interval (for example, every 5 minutes). If the condition is true for the pending period (for example, 5 minutes), the alert will fire.

    3. Email Delivery Check.

      When triggered, the alert will be sent through the configured SMTP contact point (for example, Gmail). Check the inbox of the recipient address for a notification email from Grafana.

      Email Alert

    4. Grafana UI Feedback.

      Navigate to Alerting and click Alert rules to view the alert status. You will see status indicators like Firing in the Alert rules page.

      Email Alert

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.