Managing null values

The Manage null values transformation replaces null values for an attribute.

You can fill in the null values using either:
  • The mostly commonly used value
  • For numeric values, the mean of the attribute values
  • A specific value that you specify.

    For example, you could fill in all of the null values with "Not Provided" or "Not Applicable".

To fill in null values:

  1. In the Catalog, select a project.
  2. Select Transform.
  3. Locate an attribute that contains null values you want to modify and select the column.
    Remember the data quality bar shows the percentage of null values in black. For example:
    Shows an attribute with the total number of null values in a mouse over.

  4. From the transform menu, select Advanced > Manage null values.
  5. Specify a replacement for the null values in one of the following ways:
    • Choose Replace with Static Value > Most Frequent Value. This option provides a pre-calculated value in grey.
    • Choose Replace with Static Value > Attribute Mean. This option provides a pre-calculated mean value in grey.
    • Choose Replace with Static Value > Null filling value and specify a value of your choice.
    • Choose Delete Records to delete all records that have a null value for the selected attribute.
  6. Optionally, select Create null indicator attribute to create a new attribute, based on the column name you selected, that gets populated with either 1 or 0 for the value.
    A value of 1 means a null value was replaced. A value of 0 means the original value was not modified because it was not null.
  7. If you selected Replace with Static Value, then indicate whether to apply the change to the current attribute or specify a new attribute name.
  8. Either click Preview to see the previewed results of running the transformation, or click Add to Script to save the transformation step to the script.

If you are done making changes to the project data set, you can commit the changes. See Running the transformation script against a project data set.