Initialization
- init ( engine = 'ray' , engine_opts = None , logger = 'auto' , loglevel = 30 , cache_directory = None , dataset_cache_enabled = True )
-
Initialize the AutoMLx framework’s execution engine. AutoMLx can work with a variety of parallelization platforms.
- Parameters
-
-
engine ( str , default='ray' ) –
Name of the parallelization framework. Can be one of:
-
'ray'
: Use ray multiprocessing framework -
'local'
: Use Python’s inbuilt multiprocessing framework. -
'threading'
: Use Python’s inbuilt multithreading framework.
-
-
engine_opts ( dict or None , default=None ) –
Options for the parallelization framework. When engine is:
-
'ray'
: a dictionary with the following keys *"n_jobs"
(int
), degree of inter-model parallelism *"model_n_jobs"
(int
), the degree of intra-model parallelism *"ray_setup"
(dict
), specifies the arguments to pass to ray.init *"cluster_mode"
(bool
) specifies whether Ray should detect a running cluster on the node and connects to is. Needs to be set both for head and worker nodes. * “enable_object_spilling” ( bool , by default False ), determines if ray object spilling is enabled. If object spilling is enabled and no further object spilling configuration is provided in ray_setup , the object spilling directory is automatically set to the secure AutoMLx caching directory. -
'local'
:engine_opts
is of the form{'n_jobs' : val1, 'model_n_jobs' : val2}
, whereval1
is the degree of inter-model parallelism andval2
is the degree of intra-model parallelism. -
'threading'
:engine_opts
is of the form{'n_jobs' : val}
, whereval
is the degree of parallelism.
-
-
logger ( logging.Logger , "auto" , str or None , default="auto" ) –
Logging mode. One of
-
"auto"
: Log to console with specified loglevel (seeloglevel
). -
None
: No logger is initialized by the AutoMLx package. Relies on the application importing automlx to initialize a logger with thelogging.basicConfig()
call. -
str
: Log to the provided file path and console. -
logging.Logger
: Use existingLogger
object.
-
-
loglevel ( int or None , default=``logging.WARNING`` ) –
Log level is derived from the python logging module, and adjusts the logging verbosity in the following increasing order:
-
logging.CRITICAL < logging.WARNING < logging.INFO < logging.DEBUG
. -
Set to
None
to avoid any logging initialization and use the current logging module configuration. -
Setting the loglevel here does nothing if the root logger already has handlers configured. The parameter is also ignored if a
logging.Logger
object is passed to thelogger
parameter, or the AutoMLx package has already been configured with a different loglevel.
The loglevel is ignored if the
logger
parameter isNone
or alogging.Logger
. -
-
cache_directory ( str or None , default=None ) –
Cache directory to be used to store intermediate results of AutoMLx.
-
If a path is provided here, the user is responsible for managing the directory.
-
If cache_directory is None , the cache is created as a temporary directory and cleaned-up by AutoMLx.
-
The caching directory location may also be controlled by setting the TMPDIR environment variable, which will serve as a parent directory to the AutoMLx cache (please ensure the environment variable is set before AutoMLx is imported, for example by running your python script as TMPDIR=/path/to/dir python3 run_automlx.py ).
-
The caching directory is cleared at the end of the execution of the python process or when the AutoMLx engine is explicitly shutdown via automlx.shutdown() . The cache may not be cleared if the process is terminated abruptly (for example, by a SIGTERM event).
-
If guaranteed cleanup of the temporary files and directories is desired, a cleanup EXIT trap may be utilized. For example, it the AutoMLx cache_directory is set to /tmp/mydir , a cleanup EXIT trap can be defined at the top of a shell script running the AutoMLx python scripts as trap “rm -f /tmp/mydir” EXIT .
-
-
dataset_cache_enabled ( bool , default=True ) – If the dataset cache is enabled, transformed versions of the data may be stored to disk (to the AutoMLx cache directory) to speed-up subsequent transformations of the same data.
-