UDFs for OML Embedded Python Execution

Registering UDFs from Python

register_sai_scripts([is_global, overwrite])

Register Spatial AI UDFs so they can be executed through OML Embedded Python Execution for SQL and REST.

UDFs

User Defined Functions to be executed though OML Embedded Python Execution for SQL and REST APIs.

Function compute_spatial_weights

Computes the spatial weights for the given spatial table. Stores the spatial weights object in the data store specified by the save_weights_as parameter.

Parameter

Type

Description

table

String

The name of a database table.

weights_def

Spatial Weights Definition JSON type

Specifies the type of spatial weights to be computed.

save_weights_as

Datastore Save Specification JSON type

Specifies how the computed spatial weights will be stored in datastore.

spatial_col

String

(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.

crs

String or number

(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Function compute_global_spatial_autocorrelation

Computes the Moran I index for the given spatial table and column. Returns the following statistics: I, expected I, p value, z value

Parameter

Type

Description

table

String

The name of a database table.

column

String

The name of the column to calculate the spatial autocorrelation.

weights

Spatial Weights Location JSON Type

(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.

weights_def

Spatial Weights Definition JSON type

(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.

save_weights_as

Datastore Save Specification JSON type

(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.

spatial_col

String

(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.

crs

String or number

(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

FIELD

Type

I

NUMBER

expected_I

NUMBER

p_value

NUMBER

z_value

NUMBER

Function compute_local_spatial_autocorrelation

Computes the statistics for the Local Spatial Autocorrelation of all the rows from the given spatial tables using Local Moran. Returns a tabular result containing the statistics for row from the input table. Returned statistics include: I, p value, z value, quadrant.

Parameter

Type

Description

table

String

The name of a database table.

column

String

The name of the column to calculate the local spatial autocorrelation.

result_table

String

(Optional) - If specified, the result will be stored in this table.

key_column

String

(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.

weights

Spatial Weights Location JSON Type

(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.

weights_def

Spatial Weights Definition JSON type

(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.

save_weights_as

Datastore Save Specification JSON type

(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.

spatial_col

String

(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.

crs

String or number

(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

A resultset containing the same number of rows as the input table.

FIELD

Type

value of key_column parameter or ‘id’ if no key_column param is provided

Depends on the type of the column referenced by key_column

local_moran_I

NUMBER

p_value

NUMBER

z_value

NUMBER

quadrant

NUMBER (HOTSPOT/high-high=1, DOUGHNUT/high-low=2, COLDSPOT/low-low=3, DIAMOND/low-high=4)

Function create_spatial_lag

Creates a spatial lag for the given column of the provided spatial table. Returns a tabular result which includes the calculated spatial lag for each row from the input table.

Parameter

Type

Description

table

String

The name of a database table.

column

String

The name of the column for which the spatial lag will be calculated.

result_table

String

(Optional) - If specified, the result will be stored in this table.

key_column

String

(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.

weights

Spatial Weights Location JSON Type

(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.

weights_def

Spatial Weights Definition JSON type

(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.

save_weights_as

Datastore Save Specification JSON type

(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.

spatial_col

String

(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.

crs

String or number

(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

A resultset containing the spatial lag column, containing a value for each row from the input table.

FIELD

Type

value of key_column parameter or ‘id’ if no key_column param is provided

Depends on the type of the column referenced by key_column

<column>_SLAG (Same name as column input param with the suffix _SLAG)

Depends on the type of the column input param

Function clustering

Peforms clustering on the given spatial table, selecting the given columns or all the columns of the table if no columns parameter was provided. Available clustering methods are: DBSCAN, Agglomerative, KMeans.

Parameter

Type

Description

table

String

The name of a database table.

columns

String

The name of the columns to be considered as features by the clustering algorithm.

method

String

One of the supported clustering algorithms. Possible values: KMEANS, DBSCAN, AGGLOMERATIVE.

scale

Boolean

(Optional - default=true) If true, all the values for the feature columns will be scaled.

result_table

String

(Optional) - If specified, the result will be stored in this table.

key_column

String

(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.

weights

Spatial Weights Location JSON Type

(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.

weights_def

Spatial Weights Definition JSON type

(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.

save_weights_as

Datastore Save Specification JSON type

(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.

geometry_as_feature

Boolean

(Optional, default=false) - If true, and not spatial weights or spatial weights definition is provided, the spatial column will be used as feature for the clustering.

spatial_col

String

(Optional) - The name of the spatial column for which the spatial weights will be computed when performing regionalization, or used as clustering feature if geometry_as_feature is set to true, otherwise, it is ignored. If the table only contains a single spatial column, it is not needed to specify this value.

crs

String or number

(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

plot

Plotting JSON Type

(Optional) - If provided, the clustering results will be plotted and an image will be returned.

The following parameters are specific to clustering algorithms.

KMEANS Parameters

Parameter

Type

Description

n_clusters

Number

(Optional) - The number of clusters to form as well as the number of centroids to generate. Elbow init method is used if not provided.

init

String

(Optional, default=k-means++) - Method for cluster initialization. Posible values: k-means++, random.

n_init

Number

(Optional, default=10) - Number of times k-means will run with different centroid seeds.

max_iter

Number

(Optional, default=300) - Maximum number of iterations of the k-means algorithm for a single run.

tol

Float

(Optional, default=1e-4) - Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence

random_state

Number

(Optional) - Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.

algorithm

String

(Optional, default=auto) - K-means algorithm to use. The classical EM-style is “full”. The “elkan” variation is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters). Possible values: auto, full, elkan.

init_method

String

(Optional, default=elbow) - Possible values elbow, silhouette, gmeans.

DBSCAN Parameters

Parameter

Type

Description

eps

Float

(Optional) - The maximum distance between two samples for one to be considered as in the neighborhood of the other. If eps is None, the K-Distance method is used to estimate the best value for eps.

min_samples

Number

(Optional) - The number of samples in a neighborhood for a point to be considered as a core point. If min_samples is None, it is estimated using the number of features in the data.

metric

String

(Optional, default=euclidean) - The metric used to calculate the distance between instances in a feature array. Possible values cityblock, cosine, euclidean, haversine, manhattan.

algorithm

String

(Optional, default=auto) - The algorithm to be used by the NearestNeighbors module to compute pointwise distances and find nearest neighbors. Possible values: auto, ball_tree, kd_tree, brute.

leaf_size

Number

(Optional, default=30) - Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

p

Float

(Optional) - The power of the Minkowski metric to be used to calculate distance between points. If None, then p=2 (equivalent to the Euclidean distance).

algorithm

String

(Optional, default=auto) - K-means algorithm to use. The classical EM-style is “full”. The “elkan” variation is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters). Possible values: auto, full, elkan.

init_method

Boolean

(Optional, default=true) - If true, it will use the spatial weight matrix as distance. If false, it will set the distance to all neighbors to zero

AGGLOMERATIVE Parameters

Parameter

Type

Description

n_clusters

Number

(Optional, default=2) - The number of clusters to form.

affinity

String

(Optional, default=euclidean) The metric to use when calculating the distance between observations. Possible values cityblock, cosine, euclidean, haversine, manhattan.

linkage

String

(Optional, default=ward) - Determines the distance to use. The algorithm merges pairs of cluster that minimize this criterion. | * ward: Minimizes the variance of the clusters. | * average: Uses the average of the distances of each observation of the two clusters. | * complete: Uses the maximum distances between all observations of the two clusters. | * single: Uses the minimum distances between all observations of the two clusters

distance_threshold

Float

(Optional) The linkage distance threshold. If specified, then n_clusters must not be specified.

Returns

A resultset containing the label assigned to each row from the input table.

Optionally, a plot image can be returned.

FIELD

Type

value of key_column parameter or ‘id’ if no key_column param is provided

Depends on the type of the column referenced by key_column

label

Number

JSON Types

The following are JSON types used accross REST and SQL functions.

Spatial Weights Definition

Describes the spatial weights to be computed.

Fields:

  • type: can contain one of the following values: KNN, DistanceBand, Kernel, Queen, Rook.

  • [swdef_type_fields]: Properties from the equivalent SpatialWeightsDefinition classes. The fields used are the same as the parameters taken by the constructor of the equivalent python classes.

Examples:

{
    "type": "KNN",
    "k": 5
}
{
    "type": "DistanceBand",
    "threshold": 2000.0
}
{
    "type": "Queen"
}

Datastore Save Specification

Specifies how an object can be saved in an OML datastore.

Fields:

  • ds_name: Name of the datastore where the object will be saved

  • obj_name: The name used to save the object

  • append: If true, the object is appended to the datastore

  • overwrite: If an object exists with the same name and it is true, the object will be overwritten. Otherwise the operation will fail.

Example:

{
    "ds_name": "datastore1",
    "obj_name": "my_ob1",
    "append": true,
    "overwrite_obj": false
}

Datastore Object Location

Specified the location of an object in a datastore

Fields:

  • ds_name: Name of an existing datastore

  • obj_name: The name of an object in the datastore

Example:

{
    "ds_name": "datastore1",
    "obj_name": "my_obj1"
}

Cluster Plotting Parameters

Contains parameters used for plotting clustering results. In its simpler form, it can be empty and as long as the OML control parameter oml_graphics_flag is set to true, a plot will be generated.

Fields:

  • width: Width of the image

  • height: Height of the image

  • title: Title of the plot

  • with_noise: (default=false) if true and DBSCAN is used, noise points will be shown.

  • with bounds: (default=false) if true, clusters will be drawn as polygons.

  • with_basemap: (default=false) if true, a basemap will be added to the background.

  • with_legend: (default=true) if true, a legend with the clusters labels is added to the plot.

Example:

{
    "width": 20,
    "height": 15,
    "title": "Clusters",
    "with_noise": true,
    "with_bounds": false,
    "with_basemap": true,
    "with_legend": true
}