Extending the transform function library

You can create an external Groovy script that extends the transform function library, and then call the function in a custom transformation. This provides an extension point to the set of default Groovy-based transformation functions provided with BDD.

The custom Groovy script you create must be packaged in a JAR that you make available as a plug-in to both Studio and the Data Processing component. Next, you can run your custom transformation script in Studio with the runExternalPlugin function.

Note:

Extending the transform function library is not supported by Big Data Discovery Cloud Service. The feature is supported by Big Data Discovery.

This sample Groovy script (named MyPlugin.groovy) is used as an example:

def pluginExec(Object[] args) {
    String input = args[0] //args[0] is the input field from the BDD Transformation Editor 

    //Return a lowercase version of the input.
    input.toLowerCase()
}

The script defines the pluginExec() method, which takes an Object array (named args) as an argument. args[0] corresponds to the string attribute to be transformed and is assigned to the input variable. Next, the Transform function toLowerCase() is called to change the input data to lower case and return it to be stored in a single-assign string attribute.

You can create a more complex transformation script in Groovy, for example, one that imports external Java libraries and uses them in the script. Documentation for the Groovy language is available at: http://www.groovy-lang.org/documentation.html

Note:

For security reasons, you should not use Groovy System.xxx methods, such as the System.getProperty() method.

To extend the transform function library and run the custom transformation script:

Create a directory for your custom transformation plug-ins:
```
mkdir /opt/custom_lib
```
This directory will also store any Java libraries that are imported by the Groovy script.
In the custom plug-ins directory, create a Groovy script, such as the MyPlugin.groovy example shown above.
Package the Groovy script into a JAR with the jar cf command:
```
jar cf MyPlugin.jar MyPlugin.groovy
```
The Groovy script must be located at the root of the JAR. Any additional files used by the script can also be included in the JAR.
Go to the $BDD_HOME/dataprocessing/edp_cli/config directory, and edit the sparkContext.properties file, to add the spark.executor.extraClassPath property pointing to the directory you created in Step 1.
Make sure you add an asterisk character at the end of the path to indicate that all JARs in the directory should be included.
The edited file should look like this example:
```
#########################################################
# Spark additional runtime properties, see
# https://spark.apache.org/docs/1.0.0/configuration.html
# for examples
#########################################################
spark.executor.extraClassPath=/opt/custom_lib/*
```
Note that if there already exists a spark.executor.extraClassPath property in the file, any libraries referenced by the property should be moved to the custom_lib directory so they are still included in the Spark class path.
If you have a clustered (multi-node) Hadoop environment, repeat Steps 1-4 on each node that runs YARN. This ensures that your custom transformation script can be used regardless of which YARN node runs the transformation workflow.
Run the new plug-in in the Studio's Transform page:
1. Make sure the data set is present in the Transform page.
2. Open the Transformation Editor.
3. From the Functions menu, select the runExternalPlugin function.
4. In the function's signature, specify the name of the Groovy script (in quotes) and the attribute to be processed. For example:
```
runExternalPlugin("MyPlugin.groovy", surveys)
```
5. In Configure Output Settings, configure either Apply to existing attribute or Create New Attribute.
6. Click Add to Script. (Do not select Preview as previews do not work with custom transformation scripts.)
7. Click Commit to Project.
After the transformation script is applied to the data set, the new or existing attribute should show in the new data set.