Builder Object

Builder Object provides a core set of APIs, using with which the user can set the behavior of their monitoring. By selecting what components and variants to run, all aspects of the monitoring task can be customised and configured. It is recommended to carefully go over each API the builder object provides to get a good understanding of the framework.

The builder object also provides an easy way to figure out what different components it supports (through its API) or the interface types for each component. We will go over each API the object provides in detail. For details on each of the components please refer to the dedicated sections.

How to use it

Below, we have described the steps the user needs to take to create their component and pass it on to the builder.

  • Import the right class from Ml Insight library to your init code

  • Construct the implementation of the component you want to use with the right parameters.

  • If the builder expects a list for the constructed object, create and insert the object(s) in a list.

  • Pass the constructed object or list to corresponding builder API.

The next sections talk about what builder APIs to pass specific objects.

Hint

  • To know more about how to build specific components please refer the dedicated page for that component.

  • To view a tutorial please check the tutorials page

Providing Schema

The schema is a mandatory input (as of now) to the framework. This provides the necessary metadata to the framework, for it to decide what columns to use for computation, the type of each column (whether input, output, or target), the data type (for example, int, string), and the variable type (continuous or categorical).

The schema also helps the framework to decide what type of metric to inject automatically, if a metric list is not passed by the customer.

with_input_schema(self, input_schema: Dict[str, FeatureType]) -> "InsightsBuilder"

Note

While this is a mandatory field as of now for all input types, in subsequent releases this may become an optional field. The metadata is be decided based on the information available from the input data (for example, for strongly typed formats) or using an Auto Schema Detection strategy.

More details on the feature type can be found on the schema page of the documentation.

Providing Input Data

The input data is a mandatory parameter that must be passed on to the framework as input data is the data to be processed and evaluated. Input data can be provided in multiple ways, supporting multiple formats and storage options.

The first option is to provide a data reader component. A data reader component encapsulates all the logic needed to read a specific format from a specific storage option. The framework uses the DataReader interface to pull the data in a standardised format to be further processed.

Note

When implementing a custom reader is recommended to tackle a specific format by a single reader. e.g. If user has to write a custom csv a custom parquet reader, we recommend writing 2 readers for each variation. This design choice is followed by out of the box readers as well.

Note

Data readers are execution engine aware. So each supported execution engine has its own versions of the readers.

with_reader(self, reader: DataReader) -> "InsightsBuilder":

The second API that is available to pass on the input data is the data frame API. This API takes in the dataframe (or the variants like spark data frame or dask data frame). This is often useful if the library is to be embedded within an existing application where the data frame is already created.

with_data_frame(self, data_frame: Any) -> "InsightsBuilder":

Providing In-memory data transformation

The framework reads input data into a data frame and provides ways to transform the data in-memory. Users can run multiple transformations, for example, to sanitise their data or normalise a specific column. Each of these transformers are run in the order they are provided in the list.

with_transformers(self, transformers: List[Transformer]) -> "InsightsBuilder":

Note

Transformers are meant to be used only for simple transformations in memory. They should not be used to persist data back to any storage. By design, transformers are supposed to work on a small chunk of the overall data, so don’t run a group by on the entire data set.

One important transformer to note here is the conditional feature transformer. We will discuss in details on the specific sub page.

Defining metrics for features

The metrics API provides a way for users to explicitly define which metrics should be calculated for specific feature(s). Metrics come in two types within the framework, univariate metrics and data set metrics. Univariate metrics are defined for a specific single feature, while data set metrics can take multiple features of different column types, variable types, or data types, where certain variations might be mandatory (for example, confusion matrix always expects prediction and target column types).

with_metrics(self, metrics: MetricDetail) -> "InsightsBuilder":

Taking post actions

Post processors are actions that can be run after the entire data has been processed and all the metrics have been calculated. Any type of logic can implemented here, for example, writing the metric result to storage, calling the API of any OCI service, or providing integration with any other tools (like grafana).

Post processors don’t have access to the raw data. They only have access to outputs of the framework like, profile (metric result output) and test results.

Note

For more on profile please refer to the dive deep sections.

with_post_processors(self, post_processors: List[PostProcessor]) -> "InsightsBuilder":

Providing what Execution Engine to Run on

Because of the execution engine’s agnostic behavior with respect to the framework, the code can be written once, but can be run on different execution engines. Examples include, the same code being written by providing a native pandas, running on a Jupyter Notebook, or on ML Jobs using Dask with parallelization options or on ML flow using Spark.

with_engine(self, engine: EngineDetail) -> "InsightsBuilder":

Declaring additional metadata

It is also possible to declare additional metadata and pass it on to the framework. This and the profile are persisted (if the profile is persisted). This can be provided as a free form key value pairs and are called Tags.

with_tags(self, tags: Tags) -> "InsightsBuilder":

Creating the runner object

Finally, when all the mandatory components and all the optional components the user wants to use have been provided, the build API can be called which returns the runner object. More about the runner object can be read in the next page.

Along with building the runner object, the build API also does validation on each of its components and it checks if all the andatory components have been provided.

Warning

Incorrectly constructing a component or providing insufficient components may cause the build API to raise an error.

build(self) -> Runner: