Storage Services Issues

This section describes known issues and workarounds related to the functionality of the internal ZFS storage appliance and the different storage services: block volume storage, object storage and file system storage.

Updating Terraform Changes File Storage Export Path

When you use Terraform to create a file system export, you must specify AUTOSELECT for the value of path in the oci_file_storage_export definition.

You must also include the lifecycle stanza to ignore any updates to the path. If you do not ignore updates to the path, the path is automatically deleted and re-created when you update the Terraform, even if you do not explicitly update this path. Updating this path can interrupt clients that have an active mount via the export.

Workaround: Set the path and include the lifecycle stanza as shown in the following example:

resource "oci_file_storage_export" "pcauserExport" {
  export_set_id  = local.Okit_MT_1702774958525ExportSet_id
  file_system_id = local.Okit_FS_1702774481898_id
  path           = "AUTOSELECT"
  lifecycle {
    ignore_changes = [
      path,
    ]
  }
}

Bug: 36116003

Version: 3.0.2

Creating Image from Instance Takes a Long Time

When you create a new compute image from an instance, its boot volume goes through a series of copy and conversion operations. In addition, the virtual disk copy is non-sparse, which means the full disk size is copied bit-for-bit. As a result, image creation time increases considerably with the size of the base instance's boot volume.

Workaround: Wait for the image creation job to complete. Check the work request status in the Compute Web UI, or use the work request id to check its status in the CLI.

Bug: 33392755

Version: 3.0.1

Large Object Transfers Fail After ZFS Controller Failover

If a ZFS controller failover or failback occurs while a large file is uploaded to or downloaded from an object storage bucket, the connection may be aborted, causing the data transfer to fail. Multipart uploads are affected in the same way. The issue occurs when you use a version of the OCI CLI that does not provide the retry function in case of a brief storage connection timeout. The retry functionality is available as of version 3.0.

Workaround: For a more reliable transfer of large objects and multipart uploads, use OCI CLI version 3.0 or newer.

Bug: 33472317

Version: 3.0.1

Use Multipart Upload for Objects Larger than 100MiB

Uploading very large files to object storage is susceptible to connection and performance issues. For maximum reliability of file transfers to object storage, use multipart uploads.

Workaround: Transfer files larger than 100MiB to object storage using multipart uploads. This behavior is expected; it is not considered a bug.

Bug: 33617535

Version: n/a

File System Export Temporarily Inaccessible After Large Export Options Update

When you update a file system export to add a large number of 'source'-type export options, the command returns a service error that suggests the export no longer exists ("code": "NotFound"). In actual fact, the export becomes inaccessible until the configuration update has completed. If you try to access the export or display its stored information, a similar error is displayed. This behavior is caused by the method used to update file system export options: the existing configuration is deleted and replaced with a new one containing the requested changes. It is only noticeable in the rare use case when dozens of export options are added at the same time.

Workaround: Wait for the update to complete and the file system export to become available again. The CLI command oci fs export get --export-id <fs_export_ocid> should return the information for the export in question.

Bug: 33741386

Version: 3.0.1

Block Volume Stuck in Detaching State

Block volumes can be attached to several different compute instances, and can even have multiple attachments to the same instance. When simultaneous volume detach operations of the same volume occur, as is done with automation tools, the processes may interfere with each other. For example, different work requests may try to update resources on the ZFS storage appliance simultaneously, resulting in stale data in a work request, or in resource update conflicts on the appliance. When block volume detach operations fail in this manner, the block volume attachments in question may become stuck in detaching state, even though the block volumes have been detached from the instances at this stage.

Workaround: If you have instances with block volumes stuck in detaching state, the volumes have been detached, but further manual cleanup is required. The detaching state cannot be cleared, but the affected instances can be stopped and the block volumes can be deleted if that is the end goal.

Bug: 33750513

Version: 3.0.1

Fix available: Please apply the latest patches to your system.

Detaching Volume Using Terraform Fails Due To Timeout

When you use Terraform to detach a volume from an instance, the operation may fail with an error message indicating the volume attachment was not destroyed and the volume remains in attached state. This can occur when the storage service does not send confirmation that the volume was detached, before Terraform stops polling the state of the volume attachment. The volume may be detached successfully after Terraform has reported an error.

Workaround: Re-apply the Terraform configuration. If the errors were the result of a timeout, then the second run will be successful.

Bug: 35256335

Version: 3.0.2

Creating File System Export Fails Due To Timeout

At a time when many file system operations are executed in parallel, timing becomes a critical factor and could lead to an occasional failure. More specifically, the creation of a file system export could time out because the file system is unavailable. The error returned in that case is: "Internal Server Error: No such filesystem to create the export on".

Workaround: Because this error is caused by a resource locking and timeout issue, it is expected that the operation will succeed when you try to execute it again. This error only occurs in rare cases.

Bug: 34778669

Version: 3.0.2

File System Access Lost When Another Export for Subset IP Range Is Deleted

A virtual cloud network (VCN) can contain only one file system mount target. All file systems made available to instances connected to the VCN must have exports defined within its mount target. File system exports can provide access to different file systems from overlapping subnets or IP address ranges. For example: filesys01 can be made available to IP range 10.25.4.0/23 and filesys02 to IP range 10.25.5.0/24. The latter IP range is a subset of the former. Due to the way the mount IP address is assigned, when you delete the export for filesys02, access to filesys01 is removed for the superset IP range as well.

Workaround: If your file system exports have overlapping source IP address ranges, and deleting one export causes access issues with another export similar to the example above, then it is recommended to delete the affected exports and create them again within the VCN mount target.

Bug: 33601987

Version: 3.0.2

File System Export UID/GID Cannot Be Modified

When creating a file system export you can add extra NFS export options, such as access privileges for source IP addresses and identity squashing. Once you have set a user/group identity (UID/GID) squash value in the NFS export options, you can no longer modify that value. When you attempt to set a different ID, an error is returned: "Uid and Gid are not consistent with FS AnonId: <currentUID>"

Workaround: If you need to change the UID/GID mapping, delete the NFS export options and recreate them with the desired values. If you are using the OCI CLI, you must delete the entire file system export (not just the options) and recreate the export, specifying the desired values with the --export-options parameter.

Bug: 34877118

Version: 3.0.2

Block Volume Performance Level Not Preserved During Cloning

The block volumes provisioned on the ZFS Storage Appliance are located in either the standard or high-performance pool. The performance level is reflected in the properties of each block volume as volume performance units (VPU) per GB. However, when cloning a volume group or volume group backup, the performance level of all new block volumes produced by the clone operation is set to 0. CLI output will show the parameter "vpus-per-gb": 0 in the properties of the block volume clone.

Workaround: There is no workaround available. The block volume clones are placed in the correct storage pool, meaning their performance level is as intended.

Bug: 35333587

Version: 3.0.2

Internal Backups for Instance Cloning Not Displayed

When you clone a compute instance, an internal backup of the boot and block volumes is created. In appliance software versions up to 3.0.2-b852928 those internal backups are visible to users. While not recommended, the backups could technically be used to create additional instances. Existing internal backups are not deleted during appliance upgrade or patching. However, in newer software versions the internal backups are no longer exposed.

Workaround: Do not create clones or new compute instances from the existing internal volume (group) backups. To remove old backups of storage volumes, ensure that all other backups and clones of the original source volume are terminated first.

Bug: 35406033

Version: 3.0.2

Limit for Volume Backups Not Enforced

The "Service Limits" chapter in the Oracle Private Cloud Appliance Release Notes specifies a limit of 100 volume backups per tenancy for a system with default storage capacity. This limit is not enforced: you can continue to create volume backups beyond the documented maximum.

Workaround: In theory, the maximum number of volume backups is limited by available storage on the ZFS Storage Appliance. The system is expected to handle thousands of volume backups across all tenancies. However, we recommend that an administrator monitors storage space consumption proactively if users create many volume backups.

Bug: 35509673

Version: 3.0.2

NFS Service Interruption During ZFS Storage Appliance Firmware Upgrade or Patching

When the firmware of the appliance's ZFS Storage Appliance is upgraded or patched, compute instances could encounter an interruption of NFS connectivity. The service outage occurs when failover/failback is performed between the storage appliance controllers, and it could take over 2 minutes to reestablish the NFS service. There could be multiple factors contributing to the delay: the NFS server's 90 second grace period to allow NFSv4 clients to recover locking state after an outage, the NFS protocol attempting to reconnect to the same TCP port, and the NFS client's kernel version.

Workaround: To reduce the outage time of NFS connectivity, it is recommended to use the mount options described in the note with Doc ID 359515.1. While the document describes optimizations for Oracle RAC and Oracle Clusterware, the mount options also improve NFS performance and stability in a Private Cloud Appliance environment.

Bug: 36348165

Version: 3.0.2