OCI Database with PostgreSQL Storage Best Practices

Learn best practices to identify, monitor, and resolve storage-related issues when using OCI Database with PostgreSQL.

OCI Database with PostgreSQL is a fully managed service that handles storage automatically. Unlike traditional PostgreSQL that requires you to manually allocate disk space, OCI PostgreSQL uses database-optimized storage that scales independently from compute resources.

OCI Database with PostgreSQL is built on standard PostgreSQL and follows community guidelines for security and disk layout, ensuring a familiar experience for PostgreSQL users. For more information, see the following blog: Introducing OCI Database with PostgreSQL: Completing Our Cloud Database Suite for Every Need.

Storage Types

OCI Database with PostgreSQL uses the following primary storage categories with different characteristics and management requirements:

Auto-Scale Shared Storage

Auto-scale shared storage handles your database data and write-ahead log (WAL) files. This storage type provides high availability through replication across availability domains.

Here are the key characteristics of auto-scale shared storage:

Dynamic scaling: Storage scales automatically as you create and drop database objects. No pre-provisioning is required and no downtime occurs during scaling.
Decoupled architecture: Compute and storage are separated, allowing each to scale independently based on your needs.
High availability: Data is replicated across availability domains, enabling fast failover with no data loss.
Hard limit: Maximum capacity is 32 TB per database system.

Auto-scale shared storage stores the following types of content:

Database data such as tables, indexes, and other database objects.
WAL files for transaction durability and replication.

Temporary Storage

Temporary storage is allocated per node and handles operational files that don't require long-term persistence. This storage doesn't auto-scale and is sized based on your compute shape. The formula for this storage size is based on Max(50 GB, 2 GB + numOCPU × 32 GB).

Here are the key characteristics of temporary storage:

Doesn't auto-scale: To increase capacity, you must upgrade to a larger compute shape.
Isolated from critical data: If temporary storage fills up, your database data and WAL remain unaffected.

Temporary storage stores the following types of content:

Temporary/spill files from pgsql_tmp/pg_tempdir and pg_replslot, which are created when queries exceed available memory.
PostgreSQL log files for monitoring and debugging.
Replication slot spill files (for logical replication).

The remainder of this topic addresses best practices for handing storage exhaustion issues for these types of storage.

Shared Storage Exhaustion

Replication slots ensure that WAL files are retained until they've been consumed by subscribers. When a replication slot becomes inactive or falls behind, PostgreSQL continues accumulating WAL files, which can eventually exhaust shared storage.

Here are common scenarios where shared storage exhaustion can occur:

Abandoned slots: A subscriber was decommissioned but the replication slot wasn't dropped
Network issues: High latency or connectivity problems between primary and standby cause the standby to fall behind
Slow subscribers: Third-party CDC tools or federated subscribers that can't keep pace with write activity
Long-running transactions: Idle-in-transaction sessions block the reorder buffer, causing WAL to accumulate

The following table shows the key WAL retention parameters. WAL disk usage depends on these parameters combined with active or inactive replication slots.

WAL Retention Parameters
Parameter	Description
`max_slot_wal_keep_size`	WAL retention limit for replication slots (-1 = unlimited, which risks filling disk). For example: `max_slot_wal_keep_size = 20GB` Always keeps 20 GB of WAL from each slot's `replay_lsn`.
`max_wal_size`	Maximum WAL size before forcing a CHECKPOINT. For example: `max_wal_size = 5GB` forces CHECKPOINT when WAL reaches 5 GB.
`wal_keep_size`	Minimum WAL files to retain for standby safety (without replication slots). For example: `wal_keep_size = 10GB` always keeps 10 GB of WAL from the current LSN.

The following table lists the benefits and risks of WAL configurations.

Parameter Configuration Benefits and Risks
Parameter Configuration	Benefit	Risk
`max_slot_wal_keep_size = -1`	Replication doesn't break because of WAL removal.	Primary disk can fill up.
`max_slot_wal_keep_size = 50GB`	Primary is protected from disk exhaustion.	Replication might break if standby lags too much.

Monitoring and Alerts

Use the following query to identify WAL retention by replication slots:

SELECT 
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_retained,
    pg_size_pretty(safe_wal_size) AS safe_wal_remaining
FROM pg_replication_slots;

In this example of a return, 15 GB of WAL is retained and only 5 GB more can be retained before the slot becomes invalidated (if max_slot_wal_keep_size is configured). If max_slot_wal_keep_size = -1, safe_wal_remaining shows NULL.

 slot_name | active | wal_retained | safe_wal_remaining
-----------+--------+--------------+--------------------
 replica_1 | t      | 15 GB        | 5 GB

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

wal_retained exceeds 50% of available capacity.
Any slot has active = false for more than one hour.

Mitigation and Prevention

Use the following mitigation measures as needed:

Drop abandoned slots: If a subscriber is no longer needed or can be recreated, drop the replication slot:
```
SELECT pg_drop_replication_slot('slot_name');
```
Set finite retention limit: Configure max_slot_wal_keep_size to a value that balances replication stability with primary protection.

We recommend you follow these prevention measures:

Select the appropriate max_slot_wal_keep_size based on the following requirements:
- For critical replication: Set a higher value or -1, but monitor disk usage closely.
- For primary protection: Set a finite value and accept that replication might break during extended outages.
- For re-creatable subscribers: Use lower values because the subscriber can be rebuilt if needed.
Implement slot monitoring: Set up automated alerts before slots cause storage issues.
Document slot ownership: Maintain a record of which application or team owns each replication slot.
Plan for network issues: Ensure adequate bandwidth and low latency between primary and replicas.

Temporary Storage Exhaustion

Temporary storage can be exhausted by excessive logging, inadequate memory settings causing spill files, or replication slot spill files. This section covers each scenario with monitoring and resolution guidance.

Excessive PostgreSQL Logging

PostgreSQL provides extensive logging parameters useful for debugging. However, aggressive logging settings can generate excessive disk I/O during peak traffic, rapidly filling the temporary storage volume.

Here are common problematic configurations:

log_statement = 'all': Logs every SQL statement, generating massive log volumes on busy systems.
log_min_duration_statement = 0: Logs every statement with its duration.
log_connections/log_disconnections = on: On high-traffic systems, generates entries for every connection.
Debug-level logging left enabled: Parameters such as log_executor_stats, log_parser_stats, log_planner_stats generate verbose output.

Symptoms of excessive PostgreSQL logging include:

Temporary storage usage increasing rapidly.
High disk I/O on the logging volume.

Mitigation and Prevention

Follow these guidelines to mitigate and prevent issues related to excessive logging:

Review and tune logging parameters based on your application requirements.
Disable verbose logging after debugging is complete.
Avoid leaving debug-level logging enabled in production as this adds unnecessary load to the database server.
If legitimate logging needs require more space, increase the node shape or vertically scale the database system.

The following table lists logging parameters and related information.

Logging Parameters
Parameter	Description	Problematic Value	Recommended Value
`log_statement`	Controls which SQL statements are logged	all	none or ddl or mod
`log_min_duration_statement`	Logs statements exceeding specified milliseconds	0	-1 or ≥1000
`log_connections`	Logs each successful connection	on (high traffic)	off
`log_disconnections`	Logs session terminations including duration	on (high traffic)	off
`log_autovacuum_min_duration`	Logs autovacuum actions exceeding specified milliseconds	0	-1 or ≥60000
`log_error_verbosity`	Controls detail level of error messages	verbose	default
`log_executor_stats`	Logs executor performance statistics	on	off
`log_parser_stats`	Logs parser performance statistics	on	off
`log_planner_stats`	Logs planner performance statistics	on	off
`log_statement_stats`	Logs total statement performance statistics	on	off
`log_duration`	Logs duration of every completed statement	on (high traffic)	off

Inadequate Memory Settings (Temporary/Spill Files)

PostgreSQL uses additional memory beyond shared_buffers for query processing. While shared_buffers handles data caching and in-memory DML operations, work_mem is used for query operations such as sorting, joining, and aggregating.

When a query involves JOIN, AGGREGATE, or SORT operations, PostgreSQL first checks if the relevant columns have presorted indexes. If indexes are unavailable, PostgreSQL tries to perform these operations in memory.

If the work_mem setting is insufficient for the data being processed, PostgreSQL generates temporary files and uses disk instead of memory.

Here are common scenarios where inadequate memory settings can occur:

Complex analytical queries: Queries with several JOINs, large sorts, or aggregations on big datasets.
Default work_mem too low: The default value of 4 MB might be insufficient for complex queries.
High concurrency with memory-intensive queries: Many simultaneous queries each consuming work_mem.
Maintenance operations: Operations such as CREATE INDEX, REINDEX, CLUSTER, or REFRESH MATERIALIZED VIEW use maintenance_work_mem and can generate temporary files if undersized.

The following example shows a sample log entry when the disk gets full:

2025-12-03 10:55:25.650 UTC [249610] ERROR: could not write block 196800198 of temporary file: No space left on device

Symptoms of excessive inadequate memory settings include:

Temporary storage filling up during complex query execution.
Slow query performance on JOIN, SORT, or AGGREGATE operations.
Increased disk I/O during query processing.

Monitoring and Alerts

Use the following query to identify temporary file usage per database:

SELECT 
    datname,
    temp_files,
    pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE temp_bytes > 0
ORDER BY temp_bytes DESC;

Here is an example of what is returned:

 datname  | temp_files | temp_size  
----------+------------+------------
 app      |       1523 | 45 GB      
 postgres |         12 | 120 MB

Note

These values are cumulative. As the database continues generating temporary files, these values increase over time.

Use the following query to check live temporary file usage:

SELECT * FROM pg_ls_tmpdir();

Here is an example of what is returned:

       name        |   size   |      modification      
-------------------+----------+------------------------
 pgsql_tmp249610.3 | 14000000 | 2025-12-03 10:20:16+00
 pgsql_tmp249696.0 | 12394496 | 2025-12-03 10:20:16+00
 pgsql_tmp249610.2 | 13860864 | 2025-12-03 10:20:16+00
 pgsql_tmp249697.0 | 12910592 | 2025-12-03 10:20:16+00

Enter the following commands to check your current memory settings:

SHOW work_mem;
SHOW hash_mem_multiplier;
SHOW maintenance_work_mem;

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

TempStorageUsedPercent exceeds defined thresholds.
Temp file usage (from pg_ls_tmpdir()) shows sustained growth.

Mitigation and Prevention

Follow these practices to mitigate and prevent issues associated with inadequate memory settings:

Identify Queries Generating Temporary Files
Enable logging to capture queries that generate temporary files:
- Set log_temp_files = 0 to log all temporary file creation
- Set log_min_error_statement = log to include the SQL query text
The following is a sample temporary log file entry:
```
2025-12-03 10:45:17.650 UTC [249610] LOG: temporary file: path "pg_tempdir/pgsql_tmp/pgsql_tmp249610.5", size 14000000
```
Optimize Queries
Review the logged queries and optimize where possible. Query tuning should be the first approach before increasing memory settings. Consider adding appropriate indexes to avoid in-memory sorts.
Adjust work_mem (If Query Optimization Isn't Possible)
If queries can't be improved, consider increasing work_mem. Use the following formula as a starting point:
```
work_mem <= (Available Memory) / (max_connections + max_parallel_workers)
```
Where Available Memory is the total system memory minus shared_buffers and oci.pagecache_size.
Note

In worst-case scenarios, each session might consume this amount of memory. If all sessions require work_mem simultaneously, it could exhaust available system memory.
Reduce temporary file generation, but not to eliminate it entirely. Eliminating all temporary files might require a larger compute shape or consuming all available memory.
Adjust hash_mem_multiplier (For Hash Operations)
If queries continue generating temporary files and involve hash operations (not SORT), increase hash_mem_multiplier:
```
SHOW hash_mem_multiplier;
 hash_mem_multiplier 
---------------------
 1
(1 row)
```
Increasing this value allocates more memory for hash operations. The effective hash memory is work_mem × hash_mem_multiplier.
Adjust maintenance_work_mem for Maintenance Operations
Regular application queries aren't the only source of temporary files. Maintenance operations also generate temporary files:
- REFRESH MATERIALIZED VIEW
- CREATE INDEX
- CLUSTER
- REINDEX
These operations use maintenance_work_mem instead of work_mem:

SHOW maintenance_work_mem;

If maintenance operations run on large datasets and maintenance_work_mem is insufficient, temporary files are generated.

The following table lists memory parameters and related information.

Memory Parameters Reference
Parameter	Description	Risk if Misconfigured	Recommended Value
`work_mem`	Base memory per sort/hash operation before spilling to disk	Too high with high concurrency	Lower for OLTP, higher for analytics
`hash_mem_multiplier`	Multiplier for hash operations: `hash_mem` = `work_mem` × this value	Too high (≥8.0) with high concurrency	2.0 (default)
`maintenance_work_mem`	Memory for VACUUM, CREATE INDEX, REFRESH MATERIALIZED VIEW operations	Too high with many autovacuum workers	Balance with `autovacuum_max_workers`

Replication Slot Spill Files (Logical Replication)

PostgreSQL logical replication uses logical decoding to transform raw WAL records into meaningful, structured row-level changes. Unlike physical replication that copies byte-for-byte disk blocks, logical decoding interprets transactional data and converts it into logical changes (INSERTs, UPDATEs, DELETEs) with actual column values.

Each logical replication slot has an associated WAL sender process responsible for streaming changes to target subscribers. Transactions are sent in commit order, and the subscriber receives, decodes, and applies these changes to the target system.

PostgreSQL supports native logical replication where the subscriber is another PostgreSQL instance. Starting from PostgreSQL 14, subscribers can handle large ongoing transactions efficiently with streaming=on mode enabled:

In-progress transactions (not yet committed) are streamed incrementally to the subscriber.
The subscriber applies changes to a temporary/pending state immediately.
After the transaction commits on the publisher, the subscriber commits those changes.
If the transaction rolls back, the subscriber discards the pending changes.

This approach prevents the publisher from accumulating large amounts of decoded data in memory or spill files while waiting for long-running transactions to complete.

Third-party or federated subscribers can also consume from PostgreSQL logical replication slots using the PostgreSQL replication protocol. However, unlike native PostgreSQL subscribers, federated subscribers typically don't support streaming=on mode because they can't control how the underlying target system behaves.

When a subscriber doesn't support streaming=on, the PostgreSQL WAL sender keeps decoded changes for in-progress transactions in spill files at:

pg_replslot/<slot_name>/xid-<xid>-lsn-<lsn>.spill

These spill files can grow because of PostgreSQL's strict commit-order guarantee. For example:

Transaction TRX1 remains open for a long time (idle in transaction or performing bulk operations).
Other independent transactions (TRX2, TRX3, TRX4, and so forth) continue to commit.
The WAL sender accumulates committed transaction data in spill files.
These transactions are only sent to the subscriber after TRX1 issues COMMIT or ROLLBACK.

This is expected behavior when working with federated subscribers that don't support streaming=on mode.

The following example shows spill file details (logged at DEBUG2 level) in the logging:

2025-12-03 10:28:14.650 UTC [249610] DEBUG: spill 4096 changes in XID 750 to disk

If the disk gets full, the log entries change to the following:

2025-12-03 10:32:15.550 UTC [249610] ERROR: could not write to data file for XID <xid>: No space left on device
2025-12-03 10:32:15.554 UTC [249610] ERROR: could not read from reorderbuffer spill file: No space left on device

Symptoms of replication slot spill files include:

Temporary storage usage increases rapidly.
Many .spill files are present in the pg_replslot folder.

Monitoring and Alerts

Use the following query to identify spill file statistics for replication slots:

SELECT 
    slot_name,
    spill_txns,
    spill_count,
    pg_size_pretty(spill_bytes) AS spill_size,
    stream_txns,
    stream_count,
    pg_size_pretty(stream_bytes) AS stream_size
FROM pg_stat_replication_slots;

In the following example of a return:

    slot_name     | spill_txns | spill_count | spill_size | stream_txns | stream_count | stream_size 
------------------+------------+-------------+------------+-------------+--------------+-------------
 federated_sub_1  |        847 |       15234 | 2458 MB    |           0 |            0 | 0 bytes

spill_size of 2458 MB indicates significant disk usage from spill files.
stream_size of 0 bytes confirms the subscriber is nonnative and doesn't support streaming mode.
The WAL sender process is holding or spilling transactions to disk.

Use the following command to check current logical_decoding_work_mem setting:

SHOW logical_decoding_work_mem;
 logical_decoding_work_mem 
---------------------------
 64MB

For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances

We recommend you set up alerts if any of the following conditions occur:

spill_bytes per slot exceeds defined thresholds.
Spill file usage shows rapid growth.

Mitigation and Prevention

Spilling transactions is unavoidable when nonnative subscribers are involved. However, you can reduce the frequency and size of spill files by following these practices:

Handle large transactions carefully: Perform bulk operations in smaller batches.
Avoid long-running idle transactions: These block the reorder buffer and cause spill file accumulation.
Increase logical_decoding_work_mem: Delays spilling by allowing more decoded data in memory.
Set idle_in_transaction_session_timeout: Ends sessions that remain idle in transaction for too long.

The following table lists logical decoding parameters and related information.

Logical Decoding Parameters Reference
Parameter	Description	Risk If Misconfigured	Recommended Value
`logical_decoding_work_mem`	Memory threshold before spilling decoded changes to `pg_replslot/<slot>/`. This value applies to each logical replication slot.	Too low with large/long transactions	256 MB–1 GB
`idle_in_transaction_session_timeout`	Ends idle-in-transaction sessions that block reorder buffer delivery	0 (disabled)	10 min–30 min

Oracle Cloud Infrastructure Documentation

OCI Database with PostgreSQL Storage Best Practices

Storage Types

Auto-Scale Shared Storage

Temporary Storage

Shared Storage Exhaustion

Monitoring and Alerts

Mitigation and Prevention

Temporary Storage Exhaustion

Excessive PostgreSQL Logging

Mitigation and Prevention

Inadequate Memory Settings (Temporary/Spill Files)

Monitoring and Alerts

Mitigation and Prevention

Replication Slot Spill Files (Logical Replication)

Monitoring and Alerts

Mitigation and Prevention