C Creating Backup of Prometheus Time Series Database (TSDB) Using Snapshot Utility

This section details how users can create backup of Prometheus Time Series Database (TSDB) using the snapshot utility.

Capturing TSDB Snapshots

Perform the following steps to capture the TSDB snapshots:

  1. Enable the web.enable-admin-api flag provided in ocoso_csar_25_2_100_0_0_0_prom_custom_values.yaml:
    extraFlags: 
               - web.enable-lifecycle
               ## web.enable-admin.api flag controls access to the administrative HTTP API which includes functionality such as 
               ## deleting time series. This is disabled by default.
               - web.enable-admin-api
  2. Install OSO using Installing OSO Using CSAR.
  3. Use the ephemeral (Debug) container to capture the snapshot and export it out of Prometheus. After the export is done successfully, ephemeral container will be exited and stopped.

    Note:

    • Take the backup of current Prometheus data. You can wait for a couple days for Prometheus database to fill up with the required data or until OSO's retention period is over (the default period is 7 days). Then, perform the following steps to take the backup.
    • Use the occne.io/occne/oso_snapshot:25.2.100 image for ephemeral container. This image is created for the purpose of handling the creation and removal of the TSDB snapshots. You must load and push this image into the customer's central or system registry.
  4. Capture snapshots by connecting the Debug container to the Prometheus server.

    When OSO is configured with CLUSTER_NAME_PREFIX, run the following command:

    $ kubectl -n <oso-namespace> debug <oso-prom-pod-name> -it --image=<oso_snapshot_image_url> --target=<name of Prometheus server container> --env OSO_PROMETHEUS_SERVICE_NAME=<oso-prometheus-service-name> --env CLUSTER_NAME_PREFIX=<cluster-name-prefix> &

    When OSO is not configured with CLUSTER_NAME_PREFIX, provide only the OSO_PROMETHEUS_SERVICE_NAME and run the following command:

    $ kubectl -n <oso-namespace> debug <oso-prom-pod-name> -it --image=<oso_snapshot_image_url> --target=<name of Prometheus server container> --env OSO_PROMETHEUS_SERVICE_NAME=<oso-prometheus-service-name> &
    For example:
    $ kubectl -n oso debug oso-p-prom-svr-55f8d47c74-4vwfx -it --image=occne-repo-host:5000/occne.io/occne/oso_snapshot:25_2_100 --target=prom-svr --env OSO_PROMETHEUS_SERVICE_NAME=oso-p-prom-svr --env CLUSTER_NAME_PREFIX=occne3-n2 &
    
    Targeting container "prom-svr". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
    --profile=legacy is deprecated and will be removed in the future. It is recommended to explicitly specify a profile, for example "--profile=general".
    Defaulting debug container name to debugger-x25bq.
    If you don't see a command prompt, try pressing enter.
        
    [1]+  Stopped        kubectl -n oso debug oso-p-prom-svr-55f8d47c74-4vwfx -it --image=occne-repo-host:5000/occne.io/occne/oso_snapshot:25_2_100 --target=prom-svr --env OSO_PROMETHEUS_SERVICE_NAME=oso-p-prom-svr --env CLUSTER_NAME_PREFIX=occne3-n2

    Note:

    There is an ampersand (&) character at the end of the command. This character indicates that the process is running in the background and allows the terminal available for other processes. In the above example, "Defaulting debug container name to debugger-x25bq.", the name "debugger-x25bq" indicates the ephemeral container name of the active container. This is used to copy the snapshot out of the pod and to remove it later.
  5. Run the below command to view the logs, when the snapshot process is running in the background. The output should look like below with "200 OK" code.
    kubectl -n <oso-namespace> logs <oso-prom-pod-name> -c <ephemeral-container-name>

    For example:

    $ kubectl -n oso logs oso-p-prom-svr-55f8d47c74-4vwfx -c debugger-x25bq

    Sample output:

    total 36K
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    -rw-r--r--. 1 nobody nobody 7.2K Aug 20 16:53 snapshots.log
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.233.11.145:80...
    * Connected to oso-p-prom-svr (10.233.11.145) port 80 (#0)
    > POST /occne3-n2/prometheus/api/v1/admin/tsdb/snapshot HTTP/1.1
    > Host: oso-p-prom-svr
    > User-Agent: curl/7.76.1
    > Accept: */*
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Date: Wed, 20 Aug 2025 16:53:24 GMT
    < Content-Length: 72
    <
    { [72 bytes data]
    100    72  100    72    0     0   1440      0 --:--:-- --:--:-- --:--:--  1440
    * Connection #0 to host oso-p-prom-svr left intact
  6. Verify if the log of the curl command that issues the snapshot creation logic inside debug container is successful, by looking for a "< HTTP/1.1 200 OK" in the above sample output. Export the tarball (.tgz artifact) out of the debug container using the following command.
    $ kubectl cp <oso-namespace>/<oso-prom-svr-pod-name>:/proc/1/root/data/snapshots.tgz -c <debug-container-name> /tmp/<snapshot-folder-name>/snapshots.tgz

    Figure C-1 Exporting tgz artifact from Debug container


    Exporting tgz artifact from Debug container

  7. Get the snapshot process back to foreground, once the snapshots tar is available in the local system, by running the following command:
    $ fg $(jobs | awk -F '[][]' '/oso_snapshot/{print $2}')
    For example:
    $ fg $(jobs | awk -F '[][]' '/oso_snapshot/{print $2}')

    Sample output:

    
    $ fg $(jobs | awk -F '[][]' '/oso_snapshot/{print $2}')
    kubectl -n oso debug oso-p-prom-svr-55f8d47c74-4vwfx -it --image=occne-repo-host:5000/occne.io/occne/oso_snapshot:25_2_100 --target=prom-svr --env OSO_PROMETHEUS_SERVICE_NAME=oso-p-prom-svr --env CLUSTER_NAME_PREFIX=occne3-n2
    

    The above step leaves the terminal in a waiting state.

    Press Enter in the terminal to finalize the process and to see the full log of the snapshot creation process.

    ------------------------------ Snapshot procedure started ----------------------------------
      
    STEP 1: CHECKING FOR EXISTING SNAPSHOT ARCHIVES AND CLEANING UP IF NECESSARY
      
    Listing current contents of the directory for snapshot archives:
      
    total 32K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    -rw-r--r--. 1 nobody nobody  239 Aug 20 16:51 snapshots.log
      
    No previous snapshots.tgz found. All clear!
      
    No previous snapshots directory found. All clear!
      
    STEP 2: CAPTURING SNAPSHOT OF CURRENT PROMETHEUS DB USING ADMINISTRATIVE API
      
    Wed Aug 20 16:51:01 UTC 2025
      
    Sending POST request to Prometheus API to create a snapshot with Cluster Name Prefix...
      
    Executing: curl -vvv -XPOST "http://oso-p-prom-svr/occne3-n2/prometheus/api/v1/admin/tsdb/snapshot"
      
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.233.11.145:80...
    * Connected to oso-p-prom-svr (10.233.11.145) port 80 (#0)
    > POST /occne3-n2/prometheus/api/v1/admin/tsdb/snapshot HTTP/1.1
    > Host: oso-p-prom-svr
    > User-Agent: curl/7.76.1
    > Accept: */*
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Date: Wed, 20 Aug 2025 16:51:01 GMT
    < Content-Length: 72
    <
    { [72 bytes data]
    100    72  100    72    0     0    911      0 --:--:-- --:--:-- --:--:--   911
    * Connection #0 to host oso-p-prom-svr left intact
    {"status":"success","data":\{"name":"20250820T165101Z-0eb533ff8d43ed64"}}
      
    ✅ Snapshot is successfully created!
      
     Directory contents of Prometheus DB AFTER successful Snapshot:
      
    total 36K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:51 snapshots
    -rw-r--r--. 1 nobody nobody 2.0K Aug 20 16:51 snapshots.log
      
    STEP 3: CREATING EXPORTABLE ARCHIVE OF THE SNAPSHOT DATA
      
    ✅ Snapshots is successfully archived!
      
    total 40K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:51 snapshots
    -rw-r--r--. 1 nobody nobody  162 Aug 20 16:51 snapshots.tgz
    -rw-r--r--. 1 nobody nobody 2.5K Aug 20 16:51 snapshots.log
    Non-interactive shell detected. Skipping interactive prompt...
      
    ----------------------------------- Snapshot creation procedure completed ---------------------------------------
      
    ------------------------------- Snapshot procedure started ----------------------------------
      
    STEP 1: CHECKING FOR EXISTING SNAPSHOT ARCHIVES AND CLEANING UP IF NECESSARY
      
    Listing current contents of the directory for snapshot archives:
      
    total 40K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:51 snapshots
    -rw-r--r--. 1 nobody nobody  162 Aug 20 16:51 snapshots.tgz
    -rw-r--r--. 1 nobody nobody 3.3K Aug 20 16:53 snapshots.log
      
    Found snapshots.tgz archive. Deleting...
      
    ✅ Successfully deleted!
      
    Found snapshots directory. Deleting...
      
    ✅ Successfully deleted!
      
    STEP 2: CAPTURING SNAPSHOT OF CURRENT PROMETHEUS DB USING ADMINISTRATIVE API
      
    Wed Aug 20 16:53:01 UTC 2025
      
    Sending POST request to Prometheus API to create a snapshot with Cluster Name Prefix...
      
    Executing: curl -vvv -XPOST "http://oso-p-prom-svr/occne3-n2/prometheus/api/v1/admin/tsdb/snapshot"
      
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.233.11.145:80...
    * Connected to oso-p-prom-svr (10.233.11.145) port 80 (#0)
    > POST /occne3-n2/prometheus/api/v1/admin/tsdb/snapshot HTTP/1.1
    > Host: oso-p-prom-svr
    > User-Agent: curl/7.76.1
    > Accept: */*
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Date: Wed, 20 Aug 2025 16:53:01 GMT
    < Content-Length: 72
    <
    { [72 bytes data]
    100    72  100    72    0     0   2000      0 --:--:-- --:--:-- --:--:--  1945
    * Connection #0 to host oso-p-prom-svr left intact
    {"status":"success","data":\{"name":"20250820T165301Z-0572f8dd893c2028"}}
      
    ✅ Snapshot is successfully created!
      
     Directory contents of Prometheus DB AFTER successful Snapshot:
      
    total 40K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody 5.3K Aug 20 16:53 snapshots.log
      
    STEP 3: CREATING EXPORTABLE ARCHIVE OF THE SNAPSHOT DATA
      
    ✅ Snapshots is successfully archived!
      
    total 44K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody  161 Aug 20 16:53 snapshots.tgz
    -rw-r--r--. 1 nobody nobody 5.8K Aug 20 16:53 snapshots.log
    Non-interactive shell detected. Skipping interactive prompt...
      
    ----------------------------------- Snapshot creation procedure completed ---------------------------------------
      
    ------------------------------- Snapshot procedure started ----------------------------------
      
    STEP 1: CHECKING FOR EXISTING SNAPSHOT ARCHIVES AND CLEANING UP IF NECESSARY
      
    Listing current contents of the directory for snapshot archives:
      
    total 44K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody  161 Aug 20 16:53 snapshots.tgz
    -rw-r--r--. 1 nobody nobody 6.6K Aug 20 16:53 snapshots.log
      
    Found snapshots.tgz archive. Deleting...
      
    ✅ Successfully deleted!
      
    Found snapshots directory. Deleting...
      
    ✅ Successfully deleted!
      
    STEP 2: CAPTURING SNAPSHOT OF CURRENT PROMETHEUS DB USING ADMINISTRATIVE API
      
    Wed Aug 20 16:53:24 UTC 2025
      
    Sending POST request to Prometheus API to create a snapshot with Cluster Name Prefix...
      
    Executing: curl -vvv -XPOST "http://oso-p-prom-svr/occne3-n2/prometheus/api/v1/admin/tsdb/snapshot"
      
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.233.11.145:80...
    * Connected to oso-p-prom-svr (10.233.11.145) port 80 (#0)
    > POST /occne3-n2/prometheus/api/v1/admin/tsdb/snapshot HTTP/1.1
    > Host: oso-p-prom-svr
    > User-Agent: curl/7.76.1
    > Accept: */*
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Date: Wed, 20 Aug 2025 16:53:24 GMT
    < Content-Length: 72
    <
    { [72 bytes data]
    100    72  100    72    0     0   1440      0 --:--:-- --:--:-- --:--:--  1440
    * Connection #0 to host oso-p-prom-svr left intact
    {"status":"success","data":\{"name":"20250820T165324Z-3e8dc6572924e398"}}
      
    ✅ Snapshot is successfully created!
      
     Directory contents of Prometheus DB AFTER successful Snapshot:
      
    total 44K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody 8.5K Aug 20 16:53 snapshots.log
      
    STEP 3: CREATING EXPORTABLE ARCHIVE OF THE SNAPSHOT DATA
      
    ✅ Snapshots is successfully archived!
      
    total 48K
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody  163 Aug 20 16:53 snapshots.tgz
    -rw-r--r--. 1 nobody nobody 9.0K Aug 20 16:53 snapshots.log
      
    ----------------------------------- Snapshot creation procedure completed ---------------------------------------
      
    Session ended, the ephemeral container will not be restarted but may be reattached using 'kubectl attach oso-p-prom-svr-55f8d47c74-4vwfx -c debugger-x25bq -i -t' if it is still running
    
    
  8. (Mandatory) Clean up the snapshot archive. Run the following command to remove the snapshots created.

    WARNING:

    Failing to perform the following step will leave a snapshot hanging in your system and may fill up your system's storage.
    kubectl -n <oso-namespace> debug <oso-prom-pod-name> -it --image=<oso_snapshot_image_url> --target=prom-svr --env REMOVE=yes
  9. Verify the output of the above command to confirm if the snapshots were removed.
    For example:
    $ kubectl -n oso debug oso-p-prom-svr-55f8d47c74-4vwfx -it --image=occne-repo-host:5000/occne.io/occne/oso_snapshot:25_2_100 --target=prom-svr --env REMOVE=yes

    Sample output:

    
    Targeting container "prom-svr". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
    --profile=legacy is deprecated and will be removed in the future. It is recommended to explicitly specify a profile, for example "--profile=general".
    Defaulting debug container name to debugger-fc5q6.
      
    ---------------------------------- REMOVE flag is set to 'yes'. Cleaning up snapshot archives ------------------------------------
      
     Prometheus DB Directory contents BEFORE cleanup:
    total 48K
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 3 nobody nobody 4.0K Aug 20 16:53 snapshots
    -rw-r--r--. 1 nobody nobody 9.5K Aug 20 16:54 snapshots.log
    -rw-r--r--. 1 nobody nobody  163 Aug 20 16:53 snapshots.tgz
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
      
    ✅ Snapshots archives removed successfully!
      
     Prometheus DB Directory contents AFTER cleanup:
    total 28K
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 chunks_head
    -rw-r--r--. 1 nobody nobody    0 Aug 19 17:12 lock
    drwxrws---. 2 root   nobody  16K Aug 19 17:12 lost+found
    -rw-r--r--. 1 nobody nobody  20K Aug 19 17:12 queries.active
    drwxr-sr-x. 2 nobody nobody 4.0K Aug 19 17:12 wal
      
    ---------------------------------- Snapshot cleanup process completed -------------------------------------
      
    
  10. (Mandatory)Validate the contents of the snapshot by running the following command on the snapshot file:
    $ tar tvf snapshots.tgz

    For example:

    $ tar tvf snapshots.tgz
    drwxr-sr-x nobody/nobody      0 2025-08-25 21:11 snapshots/
    drwxr-sr-x nobody/nobody      0 2025-08-25 21:11 snapshots/20250825T211118Z-0aa128d1c1f37e71/
    drwxr-sr-x nobody/nobody      0 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/
    drwxr-sr-x nobody/nobody      0 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/chunks/
    -rw-r--r-- nobody/nobody  14004 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/chunks/000001
    -rw-r--r-- nobody/nobody      9 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/tombstones
    -rw-r--r-- nobody/nobody  69072 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/index
    -rw-r--r-- nobody/nobody    273 2025-08-25 21:11 snapshots/20250107T084350Z-2a422bf3ea6cee2c/01JGZYYEXF1DR08XFAHX99Z1EW/meta.json

    Note:

    If the snapshot in your system does not have the structure shown above, delete the snapshot and try the process again.