OCIFS

Installation

Install using pip:

python3 -m pip install ocifs

Overview

ocifs is a Pythonic filesystem interface to Oracle Cloud Infrastructure (OCI) Object Storage. It builds on top of oci.

The top-level class OCIFileSystem holds connection information and allows typical file-system style operations like cp, mv, ls, du, glob, as well as put/get of local files to/from OCI.

The connection gets validated via a configuration file (usually ~/.oci/config) or Resource Principal.

Calling open() on a OCIFileSystem (typically using a context manager) provides an OCIFile for read or write access to a particular key. The object emulates the standard file object (read, write, tell, seek), such that functions can interact with OCI Object Storage. Only binary read and write modes are implemented, with blocked caching.

ocifs uses and is based upon fsspec.

Examples

Simple locate and read a file:

>>> import ocifs
>>> fs = ocifs.OCIFileSystem()
>>> fs.ls('my-bucket@my-namespace')
['my-bucket@my-namespace/my-file.txt']
>>> with fs.open('my-bucket@my-namespace/my-file.txt', 'rb') as f:
...     print(f.read())
b'Hello, world'

(see also walk and glob in the Unix Operations section)

Reading with delimited blocks:

>>> fs.read_block(path, offset=1000, length=10, delimiter=b'\n')
b'A whole line of text\n'

Learn more about read_block.

Writing with blocked caching:

>>> fs = ocifs.OCIFileSystem(config=".oci/config")  # uses default credentials
>>> with fs.open('mybucket@mynamespace/new-file', 'wb') as f:
...     f.write(2*2**20 * b'a')
...     f.write(2*2**20 * b'a') # data is flushed and file closed
>>> fs.du('mybucket@mynamespace/new-file')
{'mybucket@mynamespace/new-file': 4194304}

Learn more about fsspec’s caching system.

Because ocifs copies the Python file interface it can be used with other projects that consume the file interface like gzip or pandas.

>>> with fs.open('mybucket@mynamespace/my-file.csv.gz', 'rb') as f:
...     g = gzip.GzipFile(fileobj=f)  # Decompress data with gzip
...     df = pd.read_csv(g)           # Read CSV file with Pandas

Adding content type while writing:

>>> fs = ocifs.OCIFileSystem(config=".oci/config")  # uses default credentials
# Example: passing content_type as the parameter value, OCIFS will use this value to set the content_type.
>>> with fs.open('oci://mybucket@mynamespace/path/new-file.txt', 'wb', content_type = 'text/plain' ) as f:
...     f.write(2*2**20 * b'a') # data is flushed and file closed
...     f.flush() # data is flushed and file closed

#Example - ocifs will determine the best-suited content-type for 'new-file.txt'
>>> with fs.open('oci://mybucket@mynamespace/path/new-file.txt', 'wb') as f:
...     f.write(2*2**20 * b'a') # data is flushed and file closed
...     f.flush() # data is flushed and file closed

When uploading a file, you can also specify content_type (commonly referred to as MIME type). ocifs automatically infers the content type from the file extension, but if you specify content_type it will override the auto-detected type. If no content type is specified and the file doesn’t have a file extension, ocifs defaults to the type application/json.

Integration

The libraries intake, pandas and dask accept URLs with the prefix “oci://”, and will use ocifs to complete the IO operation in question. The IO functions take an argument storage_options, which will be passed verbatim to OCIFileSystem, for example:

df = pd.read_excel("oci://bucket@namespace/path/file.xls",
                   storage_options={"config": "~/.oci/config"})

Use the storage_options parameter to pass any additional arguments to ocifs, for example security credentials.

Authentication and Credentials

An OCI config file (API Keys) may be provided as a filepath or a config instance returned from oci.config.from_file when creating an OCIFileSystem using the config argument. For example, two valid inputs to the config argument are: oci.config.from_file("~/.oci/config") and ~/.oci/config. Specify the profile using the profile argument: OCIFileSystem(config="~/.oci/config", profile='PROFILE').

Alternatively a signer may be used to create an OCIFileSystem using the signer argument. A Resource Principal Signer can be created using oci.auth.signers.get_resource_principals_signer(). An Instance Principal Signer can be created using oci.auth.signers.InstancePrincipalsSecurityTokenSigner() If neither config nor signer is provided, OCIFileSystem will attempt to create a Resource Principal, then an Instance Principal. However, passing a signer directly is always preferred.

Learn more about using signers with ocifs in the Getting Connected tab, or learn more about Resource Principal here.

Logging

The logger named ocifs provides information about the operations of the file system. To see all messages, set the logging level to INFO:

import logging
logging.getLogger("ocifs").setLevel(logging.INFO)

Info mode will print messages to stderr. More advanced logging configuration is possible using Python’s standard logging framework.

Limitations

This project is meant for convenience, rather than feature completeness. The following are known current omissions:

  • file access is always binary (although reading files line by line as strings is possible)

  • no permissions/access-control (no chmod/chown methods)

  • ocifs only reads the latest version of a file. Versioned support is on the roadmap for a future release.

A Note on Caching

To save time and bandwidth, ocifs caches the results from listing buckets and objects. To refresh the cache, set the refresh argument to True in the ls() method. However, info, and as a result exists, ignores the cache and makes a call to Object Storage directly. If the underlying bucket is modified after a call to list, the change will be reflected in calls to info, but not list (unless refresh=True).

Contents

Indices and tables