OCI Database with PostgreSQL Storage Best Practices
Learn best practices to identify, monitor, and resolve storage-related issues when using OCI Database with PostgreSQL.
OCI Database with PostgreSQL is a fully managed service that handles storage automatically. Unlike traditional PostgreSQL that requires you to manually allocate disk space, OCI PostgreSQL uses database-optimized storage that scales independently from compute resources.
OCI Database with PostgreSQL is built on standard PostgreSQL and follows community guidelines for security and disk layout, ensuring a familiar experience for PostgreSQL users. For more information, see the following blog: Introducing OCI Database with PostgreSQL: Completing Our Cloud Database Suite for Every Need.
Storage Types
OCI Database with PostgreSQL uses the following primary storage categories with different characteristics and management requirements:
Auto-Scale Shared Storage
Auto-scale shared storage handles your database data and write-ahead log (WAL) files. This storage type provides high availability through replication across availability domains.
Here are the key characteristics of auto-scale shared storage:
- Dynamic scaling: Storage scales automatically as you create and drop database objects. No pre-provisioning is required and no downtime occurs during scaling.
- Decoupled architecture: Compute and storage are separated, allowing each to scale independently based on your needs.
- High availability: Data is replicated across availability domains, enabling fast failover with no data loss.
- Hard limit: Maximum capacity is 32 TB per database system.
Auto-scale shared storage stores the following types of content:
- Database data such as tables, indexes, and other database objects.
- WAL files for transaction durability and replication.
Temporary Storage
Temporary storage is allocated per node and handles operational files that don't require long-term persistence. This storage doesn't auto-scale and is sized based on your compute shape. The formula for this storage size is based on Max(50 GB, 2 GB + numOCPU × 32 GB).
Here are the key characteristics of temporary storage:
- Doesn't auto-scale: To increase capacity, you must upgrade to a larger compute shape.
- Isolated from critical data: If temporary storage fills up, your database data and WAL remain unaffected.
Temporary storage stores the following types of content:
- Temporary/spill files from
pgsql_tmp/pg_tempdirandpg_replslot, which are created when queries exceed available memory. - PostgreSQL log files for monitoring and debugging.
- Replication slot spill files (for logical replication).
The remainder of this topic addresses best practices for handing storage exhaustion issues for these types of storage.
Shared Storage Exhaustion
Replication slots ensure that WAL files are retained until they've been consumed by subscribers. When a replication slot becomes inactive or falls behind, PostgreSQL continues accumulating WAL files, which can eventually exhaust shared storage.
Here are common scenarios where shared storage exhaustion can occur:
- Abandoned slots: A subscriber was decommissioned but the replication slot wasn't dropped
- Network issues: High latency or connectivity problems between primary and standby cause the standby to fall behind
- Slow subscribers: Third-party CDC tools or federated subscribers that can't keep pace with write activity
- Long-running transactions: Idle-in-transaction sessions block the reorder buffer, causing WAL to accumulate
The following table shows the key WAL retention parameters. WAL disk usage depends on these parameters combined with active or inactive replication slots.
| Parameter | Description |
|---|---|
max_slot_wal_keep_size |
WAL retention limit for replication slots (-1 = unlimited, which risks filling disk). For example:
Always keeps 20 GB of WAL from each slot's |
max_wal_size |
Maximum WAL size before forcing a CHECKPOINT. For example:
forces CHECKPOINT when WAL reaches 5 GB. |
wal_keep_size |
Minimum WAL files to retain for standby safety (without replication slots). For example:
always keeps 10 GB of WAL from the current LSN. |
The following table lists the benefits and risks of WAL configurations.
| Parameter Configuration | Benefit | Risk |
|---|---|---|
max_slot_wal_keep_size = -1 |
Replication doesn't break because of WAL removal. | Primary disk can fill up. |
max_slot_wal_keep_size = 50GB |
Primary is protected from disk exhaustion. | Replication might break if standby lags too much. |
Monitoring and Alerts
Use the following query to identify WAL retention by replication slots:
SELECT
slot_name,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_retained,
pg_size_pretty(safe_wal_size) AS safe_wal_remaining
FROM pg_replication_slots;
In this example of a return, 15 GB of WAL is retained and only 5 GB more can be retained before the slot becomes invalidated (if max_slot_wal_keep_size is configured). If max_slot_wal_keep_size = -1, safe_wal_remaining shows NULL.
slot_name | active | wal_retained | safe_wal_remaining
-----------+--------+--------------+--------------------
replica_1 | t | 15 GB | 5 GB
For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances
We recommend you set up alerts if any of the following conditions occur:
wal_retainedexceeds 50% of available capacity.- Any slot has
active = falsefor more than one hour.
Mitigation and Prevention
Use the following mitigation measures as needed:
- Drop abandoned slots: If a subscriber is no longer needed or can be recreated, drop the replication slot:
SELECT pg_drop_replication_slot('slot_name'); - Set finite retention limit: Configure
max_slot_wal_keep_sizeto a value that balances replication stability with primary protection.
- Select the appropriate
max_slot_wal_keep_sizebased on the following requirements:- For critical replication: Set a higher value or -1, but monitor disk usage closely.
- For primary protection: Set a finite value and accept that replication might break during extended outages.
- For re-creatable subscribers: Use lower values because the subscriber can be rebuilt if needed.
- Implement slot monitoring: Set up automated alerts before slots cause storage issues.
- Document slot ownership: Maintain a record of which application or team owns each replication slot.
- Plan for network issues: Ensure adequate bandwidth and low latency between primary and replicas.
Temporary Storage Exhaustion
Temporary storage can be exhausted by excessive logging, inadequate memory settings causing spill files, or replication slot spill files. This section covers each scenario with monitoring and resolution guidance.
Excessive PostgreSQL Logging
PostgreSQL provides extensive logging parameters useful for debugging. However, aggressive logging settings can generate excessive disk I/O during peak traffic, rapidly filling the temporary storage volume.
Here are common problematic configurations:
- log_statement = 'all': Logs every SQL statement, generating massive log volumes on busy systems.
- log_min_duration_statement = 0: Logs every statement with its duration.
- log_connections/log_disconnections = on: On high-traffic systems, generates entries for every connection.
- Debug-level logging left enabled: Parameters such as
log_executor_stats,log_parser_stats,log_planner_statsgenerate verbose output.
Symptoms of excessive PostgreSQL logging include:
- Temporary storage usage increasing rapidly.
- High disk I/O on the logging volume.
Mitigation and Prevention
Follow these guidelines to mitigate and prevent issues related to excessive logging:
- Review and tune logging parameters based on your application requirements.
- Disable verbose logging after debugging is complete.
- Avoid leaving debug-level logging enabled in production as this adds unnecessary load to the database server.
- If legitimate logging needs require more space, increase the node shape or vertically scale the database system.
The following table lists logging parameters and related information.
| Parameter | Description | Problematic Value | Recommended Value |
|---|---|---|---|
log_statement |
Controls which SQL statements are logged | all | none or ddl or mod |
log_min_duration_statement |
Logs statements exceeding specified milliseconds | 0 | -1 or ≥1000 |
log_connections |
Logs each successful connection | on (high traffic) | off |
log_disconnections |
Logs session terminations including duration | on (high traffic) | off |
log_autovacuum_min_duration |
Logs autovacuum actions exceeding specified milliseconds | 0 | -1 or ≥60000 |
log_error_verbosity |
Controls detail level of error messages | verbose | default |
log_executor_stats |
Logs executor performance statistics | on | off |
log_parser_stats |
Logs parser performance statistics | on | off |
log_planner_stats |
Logs planner performance statistics | on | off |
log_statement_stats |
Logs total statement performance statistics | on | off |
log_duration |
Logs duration of every completed statement | on (high traffic) | off |
Inadequate Memory Settings (Temporary/Spill Files)
PostgreSQL uses additional memory beyond shared_buffers for query processing. While shared_buffers handles data caching and in-memory DML operations, work_mem is used for query operations such as sorting, joining, and aggregating.
When a query involves JOIN, AGGREGATE, or SORT operations, PostgreSQL first checks if the relevant columns have presorted indexes. If indexes are unavailable, PostgreSQL tries to perform these operations in memory.
If the work_mem setting is insufficient for the data being processed, PostgreSQL generates temporary files and uses disk instead of memory.
Here are common scenarios where inadequate memory settings can occur:
- Complex analytical queries: Queries with several JOINs, large sorts, or aggregations on big datasets.
- Default work_mem too low: The default value of 4 MB might be insufficient for complex queries.
- High concurrency with memory-intensive queries: Many simultaneous queries each consuming work_mem.
- Maintenance operations: Operations such as
CREATE INDEX,REINDEX,CLUSTER, orREFRESH MATERIALIZED VIEWusemaintenance_work_memand can generate temporary files if undersized.
The following example shows a sample log entry when the disk gets full:
2025-12-03 10:55:25.650 UTC [249610] ERROR: could not write block 196800198 of temporary file: No space left on device
Symptoms of excessive inadequate memory settings include:
- Temporary storage filling up during complex query execution.
- Slow query performance on JOIN, SORT, or AGGREGATE operations.
- Increased disk I/O during query processing.
Monitoring and Alerts
Use the following query to identify temporary file usage per database:
SELECT
datname,
temp_files,
pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE temp_bytes > 0
ORDER BY temp_bytes DESC;
Here is an example of what is returned:
datname | temp_files | temp_size
----------+------------+------------
app | 1523 | 45 GB
postgres | 12 | 120 MB
These values are cumulative. As the database continues generating temporary files, these values increase over time.
Use the following query to check live temporary file usage:
SELECT * FROM pg_ls_tmpdir();
Here is an example of what is returned:
name | size | modification
-------------------+----------+------------------------
pgsql_tmp249610.3 | 14000000 | 2025-12-03 10:20:16+00
pgsql_tmp249696.0 | 12394496 | 2025-12-03 10:20:16+00
pgsql_tmp249610.2 | 13860864 | 2025-12-03 10:20:16+00
pgsql_tmp249697.0 | 12910592 | 2025-12-03 10:20:16+00
Enter the following commands to check your current memory settings:
SHOW work_mem;
SHOW hash_mem_multiplier;
SHOW maintenance_work_mem;
For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances
We recommend you set up alerts if any of the following conditions occur:
TempStorageUsedPercentexceeds defined thresholds.- Temp file usage (from
pg_ls_tmpdir()) shows sustained growth.
Mitigation and Prevention
Follow these practices to mitigate and prevent issues associated with inadequate memory settings:
- Identify Queries Generating Temporary Files
Enable logging to capture queries that generate temporary files:
- Set
log_temp_files = 0to log all temporary file creation - Set
log_min_error_statement = logto include the SQL query text
The following is a sample temporary log file entry:
2025-12-03 10:45:17.650 UTC [249610] LOG: temporary file: path "pg_tempdir/pgsql_tmp/pgsql_tmp249610.5", size 14000000 - Set
- Optimize Queries
Review the logged queries and optimize where possible. Query tuning should be the first approach before increasing memory settings. Consider adding appropriate indexes to avoid in-memory sorts.
- Adjust work_mem (If Query Optimization Isn't Possible)
If queries can't be improved, consider increasing
work_mem. Use the following formula as a starting point:work_mem <= (Available Memory) / (max_connections + max_parallel_workers)Where
Available Memoryis the total system memory minus shared_buffers andoci.pagecache_size.Note
In worst-case scenarios, each session might consume this amount of memory. If all sessions requirework_memsimultaneously, it could exhaust available system memory.Reduce temporary file generation, but not to eliminate it entirely. Eliminating all temporary files might require a larger compute shape or consuming all available memory.
- Adjust hash_mem_multiplier (For Hash Operations)
If queries continue generating temporary files and involve hash operations (not SORT), increase
hash_mem_multiplier:SHOW hash_mem_multiplier; hash_mem_multiplier --------------------- 1 (1 row)Increasing this value allocates more memory for hash operations. The effective hash memory is
work_mem × hash_mem_multiplier. - Adjust maintenance_work_mem for Maintenance Operations
Regular application queries aren't the only source of temporary files. Maintenance operations also generate temporary files:
REFRESH MATERIALIZED VIEWCREATE INDEXCLUSTERREINDEX
These operations use
maintenance_work_meminstead ofwork_mem:
SHOW maintenance_work_mem;
If maintenance operations run on large datasets and maintenance_work_mem is insufficient, temporary files are generated.
The following table lists memory parameters and related information.
| Parameter | Description | Risk if Misconfigured | Recommended Value |
|---|---|---|---|
work_mem |
Base memory per sort/hash operation before spilling to disk | Too high with high concurrency | Lower for OLTP, higher for analytics |
hash_mem_multiplier |
Multiplier for hash operations: hash_mem = work_mem × this value |
Too high (≥8.0) with high concurrency | 2.0 (default) |
maintenance_work_mem |
Memory for VACUUM, CREATE INDEX, REFRESH MATERIALIZED VIEW operations | Too high with many autovacuum workers | Balance with autovacuum_max_workers |
Replication Slot Spill Files (Logical Replication)
PostgreSQL logical replication uses logical decoding to transform raw WAL records into meaningful, structured row-level changes. Unlike physical replication that copies byte-for-byte disk blocks, logical decoding interprets transactional data and converts it into logical changes (INSERTs, UPDATEs, DELETEs) with actual column values.
Each logical replication slot has an associated WAL sender process responsible for streaming changes to target subscribers. Transactions are sent in commit order, and the subscriber receives, decodes, and applies these changes to the target system.
PostgreSQL supports native logical replication where the subscriber is another PostgreSQL instance. Starting from PostgreSQL 14, subscribers can handle large ongoing transactions efficiently with streaming=on mode enabled:
- In-progress transactions (not yet committed) are streamed incrementally to the subscriber.
- The subscriber applies changes to a temporary/pending state immediately.
- After the transaction commits on the publisher, the subscriber commits those changes.
- If the transaction rolls back, the subscriber discards the pending changes.
This approach prevents the publisher from accumulating large amounts of decoded data in memory or spill files while waiting for long-running transactions to complete.
Third-party or federated subscribers can also consume from PostgreSQL logical replication slots using the PostgreSQL replication protocol. However, unlike native PostgreSQL subscribers, federated subscribers typically don't support streaming=on mode because they can't control how the underlying target system behaves.
When a subscriber doesn't support streaming=on, the PostgreSQL WAL sender keeps decoded changes for in-progress transactions in spill files at:
pg_replslot/<slot_name>/xid-<xid>-lsn-<lsn>.spill
These spill files can grow because of PostgreSQL's strict commit-order guarantee. For example:
- Transaction TRX1 remains open for a long time (idle in transaction or performing bulk operations).
- Other independent transactions (TRX2, TRX3, TRX4, and so forth) continue to commit.
- The WAL sender accumulates committed transaction data in spill files.
- These transactions are only sent to the subscriber after TRX1 issues COMMIT or ROLLBACK.
This is expected behavior when working with federated subscribers that don't support streaming=on mode.
The following example shows spill file details (logged at DEBUG2 level) in the logging:
2025-12-03 10:28:14.650 UTC [249610] DEBUG: spill 4096 changes in XID 750 to disk
If the disk gets full, the log entries change to the following:
2025-12-03 10:32:15.550 UTC [249610] ERROR: could not write to data file for XID <xid>: No space left on device
2025-12-03 10:32:15.554 UTC [249610] ERROR: could not read from reorderbuffer spill file: No space left on device
Symptoms of replication slot spill files include:
- Temporary storage usage increases rapidly.
- Many
.spillfiles are present in thepg_replslotfolder.
Monitoring and Alerts
Use the following query to identify spill file statistics for replication slots:
SELECT
slot_name,
spill_txns,
spill_count,
pg_size_pretty(spill_bytes) AS spill_size,
stream_txns,
stream_count,
pg_size_pretty(stream_bytes) AS stream_size
FROM pg_stat_replication_slots;
In the following example of a return:
slot_name | spill_txns | spill_count | spill_size | stream_txns | stream_count | stream_size
------------------+------------+-------------+------------+-------------+--------------+-------------
federated_sub_1 | 847 | 15234 | 2458 MB | 0 | 0 | 0 bytes
spill_sizeof 2458 MB indicates significant disk usage from spill files.stream_sizeof 0 bytes confirms the subscriber is nonnative and doesn't support streaming mode.- The WAL sender process is holding or spilling transactions to disk.
Use the following command to check current logical_decoding_work_mem setting:
SHOW logical_decoding_work_mem;
logical_decoding_work_mem
---------------------------
64MB
For instructions on setting up custom alarms, see Create PostgreSQL Custom OCI Monitoring Alarms and Dashboard using Container Instances
We recommend you set up alerts if any of the following conditions occur:
spill_bytesper slot exceeds defined thresholds.- Spill file usage shows rapid growth.
Mitigation and Prevention
Spilling transactions is unavoidable when nonnative subscribers are involved. However, you can reduce the frequency and size of spill files by following these practices:
- Handle large transactions carefully: Perform bulk operations in smaller batches.
- Avoid long-running idle transactions: These block the reorder buffer and cause spill file accumulation.
- Increase logical_decoding_work_mem: Delays spilling by allowing more decoded data in memory.
- Set idle_in_transaction_session_timeout: Ends sessions that remain idle in transaction for too long.
The following table lists logical decoding parameters and related information.
| Parameter | Description | Risk If Misconfigured | Recommended Value |
|---|---|---|---|
logical_decoding_work_mem |
Memory threshold before spilling decoded changes to pg_replslot/<slot>/. This value applies to each logical replication slot. |
Too low with large/long transactions | 256 MB–1 GB |
idle_in_transaction_session_timeout |
Ends idle-in-transaction sessions that block reorder buffer delivery | 0 (disabled) | 10 min–30 min |