Transformations
These are the various transformations which can be done from the UI.
Table 8-5 Transformations
No. | Transformation | Function |
---|---|---|
1. | Add New Feature |
A new feature can be added to the dataset which could be derived from the existing features using Script. Physical Feature Name and Feature Name are the names of the new feature. Script can be used to create a pandas Series for the new feature |
2. | Encode Categorical Features | This function performs One Hot Encoding on a categorical feature and replaces it with multiple numerical features in the dataset. |
3. | Encode Datetime Features | This function encodes a datetime feature and replaces it with multiple numerical features having the following information derived from the datetime feature - year, month, week, day, hour, minute, dayofweek. |
4. | Encode Cyclical Features |
This function encodes a cyclical feature having hour, minute data, and so on and returns two features carrying the sine and cosine transformation of the cyclical data. 'fmax ' denotes the maximum possible value of the cyclical feature data. |
5. | Impute Missing Data |
This function imputes missing data within a feature. For numerical features, there are 4 methods for imputing missing data. simple - imputes with mean, median, most_frequent values based on chosen arg value using the SimpleImputer in sklearn. const - fills the missing values with the value given in the arg knn - imputes using the KNNImputer in sklearn with k value given in arg mice - imputes using the IterativeImputer in sklearn For non-numerical data, missing values can be imputed using the 'const' method by replacing all missing values with the value given in arg |
6. | Feature Scaling | This function is used to scale multiple selected numerical features using the StandardScaler in sklearn |
7. | Dimensionality Reduction |
This function performs PCA on selected numerical features to reduce the dimensionality using sklearn.decomposition.PCA module. The number of output features can be specified using dim field. The names of the output features' names can be specified in the fields 'Physical Feature Name' and 'Feature Name' |
8. | Outlier removal |
This function is used to remove outliers present in a feature based on the specified zscore value. Non-numerical features are label encoded before removing the outliers. |
9. | Duplicates Removal - Data Frame | This function removes all duplicate rows in the dataframe. |
10. | Duplicates Removal - Feature | This function removes all duplicate rows within a specified subset of features and consequently removes those rows from the data frame |
11. | Filter Features |
This function is used to filter the data frame based on conditions specified on features Operations allowed : >,>=,=,!=,<,<=,isin When the chosen operation is 'isin', the input to 'Filter Value' is a list of values that should be present in the output data frame |