OCIFS
Installation
Install using pip:
python3 -m pip install ocifs
Overview
ocifs is a Pythonic filesystem interface to Oracle Cloud Infrastructure (OCI) Object Storage. It builds on top of oci.
The top-level class OCIFileSystem
holds connection information and allows
typical file-system style operations like cp
, mv
, ls
, du
,
glob
, as well as put/get of local files to/from OCI.
The connection gets validated via a configuration file (usually ~/.oci/config) or Resource Principal.
Calling open()
on a OCIFileSystem
(typically using a context manager)
provides an OCIFile
for read or write access to a particular key. The object
emulates the standard file object (read
, write
, tell
,
seek
), such that functions can interact with OCI Object Storage. Only
binary read and write modes are implemented, with blocked caching.
Examples
Simple locate and read a file:
>>> import ocifs
>>> fs = ocifs.OCIFileSystem()
>>> fs.ls('my-bucket@my-namespace')
['my-bucket@my-namespace/my-file.txt']
>>> with fs.open('my-bucket@my-namespace/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
(see also walk
and glob
in the Unix Operations section)
Reading with delimited blocks:
>>> fs.read_block(path, offset=1000, length=10, delimiter=b'\n')
b'A whole line of text\n'
Learn more about read_block.
Writing with blocked caching:
>>> fs = ocifs.OCIFileSystem(config=".oci/config") # uses default credentials
>>> with fs.open('mybucket@mynamespace/new-file', 'wb') as f:
... f.write(2*2**20 * b'a')
... f.write(2*2**20 * b'a') # data is flushed and file closed
>>> fs.du('mybucket@mynamespace/new-file')
{'mybucket@mynamespace/new-file': 4194304}
Learn more about fsspec
’s caching system.
Because ocifs
copies the Python file interface it can be used with
other projects that consume the file interface like gzip
or pandas
.
>>> with fs.open('mybucket@mynamespace/my-file.csv.gz', 'rb') as f:
... g = gzip.GzipFile(fileobj=f) # Decompress data with gzip
... df = pd.read_csv(g) # Read CSV file with Pandas
Adding content type while writing:
>>> fs = ocifs.OCIFileSystem(config=".oci/config") # uses default credentials
# Example: passing content_type as the parameter value, OCIFS will use this value to set the content_type.
>>> with fs.open('oci://mybucket@mynamespace/path/new-file.txt', 'wb', content_type = 'text/plain' ) as f:
... f.write(2*2**20 * b'a') # data is flushed and file closed
... f.flush() # data is flushed and file closed
#Example - ocifs will determine the best-suited content-type for 'new-file.txt'
>>> with fs.open('oci://mybucket@mynamespace/path/new-file.txt', 'wb') as f:
... f.write(2*2**20 * b'a') # data is flushed and file closed
... f.flush() # data is flushed and file closed
When uploading a file, you can also specify content_type
(commonly referred to as MIME type).
ocifs
automatically infers the content type from the file extension, but if you specify content_type
it will override the auto-detected type. If no content type is specified and the file doesn’t have a file
extension, ocifs
defaults to the type application/json
.
Integration
The libraries intake
, pandas
and dask
accept URLs with the prefix
“oci://”, and will use ocifs
to complete the IO operation in question. The
IO functions take an argument storage_options
, which will be passed verbatim
to OCIFileSystem
, for example:
df = pd.read_excel("oci://bucket@namespace/path/file.xls",
storage_options={"config": "~/.oci/config"})
Use the storage_options
parameter to pass any additional arguments to ocifs
,
for example security credentials.
Authentication and Credentials
An OCI config file (API Keys) may be provided as a filepath or a config instance returned
from oci.config.from_file
when creating an OCIFileSystem
using the config
argument.
For example, two valid inputs to the config
argument are: oci.config.from_file("~/.oci/config")
and ~/.oci/config
. Specify the profile using the profile
argument: OCIFileSystem(config="~/.oci/config", profile='PROFILE')
.
Alternatively a signer may be used to create an OCIFileSystem
using the signer
argument.
A Resource Principal Signer can be created using oci.auth.signers.get_resource_principals_signer()
.
An Instance Principal Signer can be created using oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
If neither config nor signer is provided, OCIFileSystem
will attempt to create a Resource Principal, then an Instance Principal.
However, passing a signer directly is always preferred.
Learn more about using signers with ocifs
in the Getting Connected tab, or learn more about Resource Principal here.
Logging
The logger named ocifs
provides information about the operations of the file
system. To see all messages, set the logging level to INFO
:
import logging
logging.getLogger("ocifs").setLevel(logging.INFO)
Info mode will print messages to stderr. More advanced logging configuration is possible using Python’s standard logging framework.
Limitations
This project is meant for convenience, rather than feature completeness. The following are known current omissions:
file access is always binary (although reading files line by line as strings is possible)
no permissions/access-control (no
chmod
/chown
methods)ocifs
only reads the latest version of a file. Versioned support is on the roadmap for a future release.
A Note on Caching
To save time and bandwidth, ocifs caches the results from listing buckets and objects. To refresh the cache,
set the refresh
argument to True
in the ls()
method. However, info
, and as a result exists
,
ignores the cache and makes a call to Object Storage directly. If the underlying bucket is modified after a call
to list, the change will be reflected in calls to info, but not list (unless refresh=True
).