OCI Database with PostgreSQL Storage Best Practices

Learn best practices to identify, monitor, and resolve storage-related issues when using OCI Database with PostgreSQL.

OCI Database with PostgreSQL is a fully managed service that handles storage automatically. Unlike traditional PostgreSQL that requires you to manually allocate disk space, OCI PostgreSQL uses database-optimized storage that scales independently from compute resources.

OCI Database with PostgreSQL is built on standard PostgreSQL and follows community guidelines for security and disk layout, ensuring a familiar experience for PostgreSQL users. For more information, see the following blog: Introducing OCI Database with PostgreSQL: Completing Our Cloud Database Suite for Every Need.

Storage Types

OCI Database with PostgreSQL uses the following primary storage categories with different characteristics and management requirements:

Auto-Scale Shared Storage

Auto-scale shared storage handles your database data and write-ahead log (WAL) files. This storage type provides high availability through replication across availability domains.

Here are the key characteristics of auto-scale shared storage:

  • Dynamic scaling: Storage scales automatically as you create and drop database objects. No pre-provisioning is required and no downtime occurs during scaling.
  • Decoupled architecture: Compute and storage are separated, allowing each to scale independently based on your needs.
  • High availability: Data is replicated across availability domains, enabling fast failover with no data loss.
  • Hard limit: Maximum capacity is 32 TB per database system.

Auto-scale shared storage stores the following types of content:

  • Database data such as tables, indexes, and other database objects.
  • WAL files for transaction durability and replication.

Temporary Storage

Temporary storage is allocated per node and handles operational files that don't require long-term persistence. This storage doesn't auto-scale and is sized based on your compute shape. The formula for this storage size is based on Max(50 GB, 2 GB + numOCPU × 32 GB).

Here are the key characteristics of temporary storage:

  • Doesn't auto-scale: To increase capacity, you must upgrade to a larger compute shape.
  • Isolated from critical data: If temporary storage fills up, your database data and WAL remain unaffected.

Temporary storage stores the following types of content:

  • Temporary/spill files from pgsql_tmp/pg_tempdir and pg_replslot, which are created when queries exceed available memory.
  • PostgreSQL log files for monitoring and debugging.
  • Replication slot spill files (for logical replication).

The remainder of this topic addresses best practices for handing storage exhaustion issues for these types of storage.

Shared Storage Exhaustion

Replication slots ensure that WAL files are retained until they've been consumed by subscribers. When a replication slot becomes inactive or falls behind, PostgreSQL continues accumulating WAL files, which can eventually exhaust shared storage.

Here are common scenarios where shared storage exhaustion can occur:

  • Abandoned slots: A subscriber was decommissioned but the replication slot wasn't dropped
  • Network issues: High latency or connectivity problems between primary and standby cause the standby to fall behind
  • Slow subscribers: Third-party CDC tools or federated subscribers that can't keep pace with write activity
  • Long-running transactions: Idle-in-transaction sessions block the reorder buffer, causing WAL to accumulate

The following table shows the key WAL retention parameters. WAL disk usage depends on these parameters combined with active or inactive replication slots.

WAL Retention Parameters
Parameter Description
max_slot_wal_keep_size

WAL retention limit for replication slots (-1 = unlimited, which risks filling disk). For example:

max_slot_wal_keep_size = 20GB

Always keeps 20 GB of WAL from each slot's replay_lsn.

max_wal_size

Maximum WAL size before forcing a CHECKPOINT. For example:

max_wal_size = 5GB

forces CHECKPOINT when WAL reaches 5 GB.

wal_keep_size

Minimum WAL files to retain for standby safety (without replication slots). For example:

wal_keep_size = 10GB

always keeps 10 GB of WAL from the current LSN.

The following table lists the benefits and risks of WAL configurations.

Parameter Configuration Benefits and Risks
Parameter Configuration Benefit Risk
max_slot_wal_keep_size = -1 Replication doesn't break because of WAL removal. Primary disk can fill up.
max_slot_wal_keep_size = 50GB Primary is protected from disk exhaustion. Replication might break if standby lags too much.

Monitoring and Alerts

Use the following query to identify WAL retention by replication slots:

SELECT 
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_retained,
    pg_size_pretty(safe_wal_size) AS safe_wal_remaining
FROM pg_replication_slots;

In this example of a return, 15 GB of WAL is retained and only 5 GB more can be retained before the slot becomes invalidated (if max_slot_wal_keep_size is configured). If max_slot_wal_keep_size = -1, safe_wal_remaining shows NULL.

 slot_name | active | wal_retained | safe_wal_remaining
-----------+--------+--------------+--------------------
 replica_1 | t      | 15 GB        | 5 GB

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

  • wal_retained exceeds 50% of available capacity.
  • Any slot has active = false for more than one hour.

Mitigation and Prevention

Use the following mitigation measures as needed:

  • Drop abandoned slots: If a subscriber is no longer needed or can be recreated, drop the replication slot:
    SELECT pg_drop_replication_slot('slot_name');
  • Set finite retention limit: Configure max_slot_wal_keep_size to a value that balances replication stability with primary protection.
We recommend you follow these prevention measures:
  • Select the appropriate max_slot_wal_keep_size based on the following requirements:
    • For critical replication: Set a higher value or -1, but monitor disk usage closely.
    • For primary protection: Set a finite value and accept that replication might break during extended outages.
    • For re-creatable subscribers: Use lower values because the subscriber can be rebuilt if needed.
  • Implement slot monitoring: Set up automated alerts before slots cause storage issues.
  • Document slot ownership: Maintain a record of which application or team owns each replication slot.
  • Plan for network issues: Ensure adequate bandwidth and low latency between primary and replicas.

Temporary Storage Exhaustion

Temporary storage can be exhausted by excessive logging, inadequate memory settings causing spill files, or replication slot spill files. This section covers each scenario with monitoring and resolution guidance.

Excessive PostgreSQL Logging

PostgreSQL provides extensive logging parameters useful for debugging. However, aggressive logging settings can generate excessive disk I/O during peak traffic, rapidly filling the temporary storage volume.

Here are common problematic configurations:

  • log_statement = 'all': Logs every SQL statement, generating massive log volumes on busy systems.
  • log_min_duration_statement = 0: Logs every statement with its duration.
  • log_connections/log_disconnections = on: On high-traffic systems, generates entries for every connection.
  • Debug-level logging left enabled: Parameters such as log_executor_stats, log_parser_stats, log_planner_stats generate verbose output.

Symptoms of excessive PostgreSQL logging include:

  • Temporary storage usage increasing rapidly.
  • High disk I/O on the logging volume.

Mitigation and Prevention

Follow these guidelines to mitigate and prevent issues related to excessive logging:

  • Review and tune logging parameters based on your application requirements.
  • Disable verbose logging after debugging is complete.
  • Avoid leaving debug-level logging enabled in production as this adds unnecessary load to the database server.
  • If legitimate logging needs require more space, increase the node shape or vertically scale the database system.

The following table lists logging parameters and related information.

Logging Parameters
Parameter Description Problematic Value Recommended Value
log_statement Controls which SQL statements are logged all none or ddl or mod
log_min_duration_statement Logs statements exceeding specified milliseconds 0 -1 or ≥1000
log_connections Logs each successful connection on (high traffic) off
log_disconnections Logs session terminations including duration on (high traffic) off
log_autovacuum_min_duration Logs autovacuum actions exceeding specified milliseconds 0 -1 or ≥60000
log_error_verbosity Controls detail level of error messages verbose default
log_executor_stats Logs executor performance statistics on off
log_parser_stats Logs parser performance statistics on off
log_planner_stats Logs planner performance statistics on off
log_statement_stats Logs total statement performance statistics on off
log_duration Logs duration of every completed statement on (high traffic) off

Inadequate Memory Settings (Temporary/Spill Files)

PostgreSQL uses additional memory beyond shared_buffers for query processing. While shared_buffers handles data caching and in-memory DML operations, work_mem is used for query operations such as sorting, joining, and aggregating.

When a query involves JOIN, AGGREGATE, or SORT operations, PostgreSQL first checks if the relevant columns have presorted indexes. If indexes are unavailable, PostgreSQL tries to perform these operations in memory.

If the work_mem setting is insufficient for the data being processed, PostgreSQL generates temporary files and uses disk instead of memory.

Here are common scenarios where inadequate memory settings can occur:

  • Complex analytical queries: Queries with several JOINs, large sorts, or aggregations on big datasets.
  • Default work_mem too low: The default value of 4 MB might be insufficient for complex queries.
  • High concurrency with memory-intensive queries: Many simultaneous queries each consuming work_mem.
  • Maintenance operations: Operations such as CREATE INDEX, REINDEX, CLUSTER, or REFRESH MATERIALIZED VIEW use maintenance_work_mem and can generate temporary files if undersized.

The following example shows a sample log entry when the disk gets full:

2025-12-03 10:55:25.650 UTC [249610] ERROR: could not write block 196800198 of temporary file: No space left on device

Symptoms of excessive inadequate memory settings include:

  • Temporary storage filling up during complex query execution.
  • Slow query performance on JOIN, SORT, or AGGREGATE operations.
  • Increased disk I/O during query processing.

Monitoring and Alerts

Use the following query to identify temporary file usage per database:

SELECT 
    datname,
    temp_files,
    pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE temp_bytes > 0
ORDER BY temp_bytes DESC;

Here is an example of what is returned:

 datname  | temp_files | temp_size  
----------+------------+------------
 app      |       1523 | 45 GB      
 postgres |         12 | 120 MB  
Note

These values are cumulative. As the database continues generating temporary files, these values increase over time.

Use the following query to check live temporary file usage:

SELECT * FROM pg_ls_tmpdir();

Here is an example of what is returned:

       name        |   size   |      modification      
-------------------+----------+------------------------
 pgsql_tmp249610.3 | 14000000 | 2025-12-03 10:20:16+00
 pgsql_tmp249696.0 | 12394496 | 2025-12-03 10:20:16+00
 pgsql_tmp249610.2 | 13860864 | 2025-12-03 10:20:16+00
 pgsql_tmp249697.0 | 12910592 | 2025-12-03 10:20:16+00

Enter the following commands to check your current memory settings:

SHOW work_mem;
SHOW hash_mem_multiplier;
SHOW maintenance_work_mem;

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

  • TempStorageUsedPercent exceeds defined thresholds.
  • Temp file usage (from pg_ls_tmpdir()) shows sustained growth.

Mitigation and Prevention

Follow these practices to mitigate and prevent issues associated with inadequate memory settings:

  • Identify Queries Generating Temporary Files

    Enable logging to capture queries that generate temporary files:

    • Set log_temp_files = 0 to log all temporary file creation
    • Set log_min_error_statement = log to include the SQL query text

    The following is a sample temporary log file entry:

    2025-12-03 10:45:17.650 UTC [249610] LOG: temporary file: path "pg_tempdir/pgsql_tmp/pgsql_tmp249610.5", size 14000000
    
  • Optimize Queries

    Review the logged queries and optimize where possible. Query tuning should be the first approach before increasing memory settings. Consider adding appropriate indexes to avoid in-memory sorts.

  • Adjust work_mem (If Query Optimization Isn't Possible)

    If queries can't be improved, consider increasing work_mem. Use the following formula as a starting point:

    work_mem <= (Available Memory) / (max_connections + max_parallel_workers)

    Where Available Memory is the total system memory minus shared_buffers and oci.pagecache_size.

    Note

    In worst-case scenarios, each session might consume this amount of memory. If all sessions require work_mem simultaneously, it could exhaust available system memory.

    Reduce temporary file generation, but not to eliminate it entirely. Eliminating all temporary files might require a larger compute shape or consuming all available memory.

  • Adjust hash_mem_multiplier (For Hash Operations)

    If queries continue generating temporary files and involve hash operations (not SORT), increase hash_mem_multiplier:

    SHOW hash_mem_multiplier;
     hash_mem_multiplier 
    ---------------------
     1
    (1 row)
    

    Increasing this value allocates more memory for hash operations. The effective hash memory is work_mem × hash_mem_multiplier.

  • Adjust maintenance_work_mem for Maintenance Operations

    Regular application queries aren't the only source of temporary files. Maintenance operations also generate temporary files:

    • REFRESH MATERIALIZED VIEW
    • CREATE INDEX
    • CLUSTER
    • REINDEX

    These operations use maintenance_work_mem instead of work_mem:

SHOW maintenance_work_mem;

If maintenance operations run on large datasets and maintenance_work_mem is insufficient, temporary files are generated.

The following table lists memory parameters and related information.

Memory Parameters Reference
Parameter Description Risk if Misconfigured Recommended Value
work_mem Base memory per sort/hash operation before spilling to disk Too high with high concurrency Lower for OLTP, higher for analytics
hash_mem_multiplier Multiplier for hash operations: hash_mem = work_mem × this value Too high (≥8.0) with high concurrency 2.0 (default)
maintenance_work_mem Memory for VACUUM, CREATE INDEX, REFRESH MATERIALIZED VIEW operations Too high with many autovacuum workers Balance with autovacuum_max_workers

Replication Slot Spill Files (Logical Replication)

PostgreSQL logical replication uses logical decoding to transform raw WAL records into meaningful, structured row-level changes. Unlike physical replication that copies byte-for-byte disk blocks, logical decoding interprets transactional data and converts it into logical changes (INSERTs, UPDATEs, DELETEs) with actual column values.

Each logical replication slot has an associated WAL sender process responsible for streaming changes to target subscribers. Transactions are sent in commit order, and the subscriber receives, decodes, and applies these changes to the target system.

PostgreSQL supports native logical replication where the subscriber is another PostgreSQL instance. Starting from PostgreSQL 14, subscribers can handle large ongoing transactions efficiently with streaming=on mode enabled:

  • In-progress transactions (not yet committed) are streamed incrementally to the subscriber.
  • The subscriber applies changes to a temporary/pending state immediately.
  • After the transaction commits on the publisher, the subscriber commits those changes.
  • If the transaction rolls back, the subscriber discards the pending changes.

This approach prevents the publisher from accumulating large amounts of decoded data in memory or spill files while waiting for long-running transactions to complete.

Third-party or federated subscribers can also consume from PostgreSQL logical replication slots using the PostgreSQL replication protocol. However, unlike native PostgreSQL subscribers, federated subscribers typically don't support streaming=on mode because they can't control how the underlying target system behaves.

When a subscriber doesn't support streaming=on, the PostgreSQL WAL sender keeps decoded changes for in-progress transactions in spill files at:

pg_replslot/<slot_name>/xid-<xid>-lsn-<lsn>.spill

These spill files can grow because of PostgreSQL's strict commit-order guarantee. For example:

  • Transaction TRX1 remains open for a long time (idle in transaction or performing bulk operations).
  • Other independent transactions (TRX2, TRX3, TRX4, and so forth) continue to commit.
  • The WAL sender accumulates committed transaction data in spill files.
  • These transactions are only sent to the subscriber after TRX1 issues COMMIT or ROLLBACK.

This is expected behavior when working with federated subscribers that don't support streaming=on mode.

The following example shows spill file details (logged at DEBUG2 level) in the logging:

2025-12-03 10:28:14.650 UTC [249610] DEBUG: spill 4096 changes in XID 750 to disk

If the disk gets full, the log entries change to the following:

2025-12-03 10:32:15.550 UTC [249610] ERROR: could not write to data file for XID <xid>: No space left on device
2025-12-03 10:32:15.554 UTC [249610] ERROR: could not read from reorderbuffer spill file: No space left on device

Symptoms of replication slot spill files include:

  • Temporary storage usage increases rapidly.
  • Many .spill files are present in the pg_replslot folder.

Monitoring and Alerts

Use the following query to identify spill file statistics for replication slots:

SELECT 
    slot_name,
    spill_txns,
    spill_count,
    pg_size_pretty(spill_bytes) AS spill_size,
    stream_txns,
    stream_count,
    pg_size_pretty(stream_bytes) AS stream_size
FROM pg_stat_replication_slots;

In the following example of a return:

    slot_name     | spill_txns | spill_count | spill_size | stream_txns | stream_count | stream_size 
------------------+------------+-------------+------------+-------------+--------------+-------------
 federated_sub_1  |        847 |       15234 | 2458 MB    |           0 |            0 | 0 bytes
  • spill_size of 2458 MB indicates significant disk usage from spill files.
  • stream_size of 0 bytes confirms the subscriber is nonnative and doesn't support streaming mode.
  • The WAL sender process is holding or spilling transactions to disk.

Use the following command to check current logical_decoding_work_mem setting:

SHOW logical_decoding_work_mem;
 logical_decoding_work_mem 
---------------------------
 64MB

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

  • spill_bytes per slot exceeds defined thresholds.
  • Spill file usage shows rapid growth.

Mitigation and Prevention

Spilling transactions is unavoidable when nonnative subscribers are involved. However, you can reduce the frequency and size of spill files by following these practices:

  • Handle large transactions carefully: Perform bulk operations in smaller batches.
  • Avoid long-running idle transactions: These block the reorder buffer and cause spill file accumulation.
  • Increase logical_decoding_work_mem: Delays spilling by allowing more decoded data in memory.
  • Set idle_in_transaction_session_timeout: Ends sessions that remain idle in transaction for too long.

The following table lists logical decoding parameters and related information.

Logical Decoding Parameters Reference
Parameter Description Risk If Misconfigured Recommended Value
logical_decoding_work_mem Memory threshold before spilling decoded changes to pg_replslot/<slot>/. This value applies to each logical replication slot. Too low with large/long transactions 256 MB–1 GB
idle_in_transaction_session_timeout Ends idle-in-transaction sessions that block reorder buffer delivery 0 (disabled) 10 min–30 min