Initialization 

init ( engine = 'ray' , engine_opts = None , logger = 'auto' , loglevel = 30 , cache_directory = None , dataset_cache_enabled = True ) 

Initialize the AutoMLx framework’s execution engine. AutoMLx can work with a variety of parallelization platforms.

Parameters

engine ( str , default='ray' ) –
Name of the parallelization framework. Can be one of:
- 'ray' : Use ray multiprocessing framework
- 'local' : Use Python’s inbuilt multiprocessing framework.
- 'threading' : Use Python’s inbuilt multithreading framework.
engine_opts ( dict or None , default=None ) –
Options for the parallelization framework. When engine is:
- 'ray' : a dictionary with the following keys * "n_jobs" ( int ), degree of inter-model parallelism * "model_n_jobs" ( int ), the degree of intra-model parallelism * "ray_setup" ( dict ), specifies the arguments to pass to ray.init * "cluster_mode" ( bool ) specifies whether Ray should detect a running cluster on the node and connects to is. Needs to be set both for head and worker nodes. * “enable_object_spilling” ( bool , by default False ), determines if ray object spilling is enabled. If object spilling is enabled and no further object spilling configuration is provided in ray_setup , the object spilling directory is automatically set to the secure AutoMLx caching directory.
- 'local' : engine_opts is of the form {'n_jobs' : val1, 'model_n_jobs' : val2} , where val1 is the degree of inter-model parallelism and val2 is the degree of intra-model parallelism.
- 'threading' : engine_opts is of the form {'n_jobs' : val} , where val is the degree of parallelism.
logger ( logging.Logger , "auto" , str or None , default="auto" ) –
Logging mode. One of
- "auto" : Log to console with specified loglevel (see loglevel ).
- None : No logger is initialized by the AutoMLx package. Relies on the application importing automlx to initialize a logger with the logging.basicConfig() call.
- str : Log to the provided file path and console.
- logging.Logger : Use existing Logger object.
loglevel ( int or None , default=``logging.WARNING`` ) –
Log level is derived from the python logging module, and adjusts the logging verbosity in the following increasing order:
- logging.CRITICAL < logging.WARNING < logging.INFO < logging.DEBUG .
- Set to None to avoid any logging initialization and use the current logging module configuration.
- Setting the loglevel here does nothing if the root logger already has handlers configured. The parameter is also ignored if a logging.Logger object is passed to the logger parameter, or the AutoMLx package has already been configured with a different loglevel.
The loglevel is ignored if the logger parameter is None or a logging.Logger .
cache_directory ( str or None , default=None ) –
Cache directory to be used to store intermediate results of AutoMLx.
- If a path is provided here, the user is responsible for managing the directory.
- If cache_directory is None , the cache is created as a temporary directory and cleaned-up by AutoMLx.
- The caching directory location may also be controlled by setting the TMPDIR environment variable, which will serve as a parent directory to the AutoMLx cache (please ensure the environment variable is set before AutoMLx is imported, for example by running your python script as TMPDIR=/path/to/dir python3 run_automlx.py ).
- The caching directory is cleared at the end of the execution of the python process or when the AutoMLx engine is explicitly shutdown via automlx.shutdown() . The cache may not be cleared if the process is terminated abruptly (for example, by a SIGTERM event).
- If guaranteed cleanup of the temporary files and directories is desired, a cleanup EXIT trap may be utilized. For example, it the AutoMLx cache_directory is set to /tmp/mydir , a cleanup EXIT trap can be defined at the top of a shell script running the AutoMLx python scripts as trap “rm -f /tmp/mydir” EXIT .
dataset_cache_enabled ( bool , default=True ) – If the dataset cache is enabled, transformed versions of the data may be stored to disk (to the AutoMLx cache directory) to speed-up subsequent transformations of the same data.