UDFs for OML Embedded Python Execution

Registering UDFs from Python

register_sai_scripts([is_global, overwrite])

Register Spatial AI UDFs so they can be executed through OML Embedded Python Execution for SQL and REST.

UDFs

User Defined Functions to be executed though OML Embedded Python Execution for SQL and REST APIs.

Function compute_spatial_weights

Computes the spatial weights for the given spatial table. Stores the spatial weights object in the data store specified by the save_weights_as parameter.

Parameter	Type	Description
table	String	The name of a database table.
weights_def	Spatial Weights Definition JSON type	Specifies the type of spatial weights to be computed.
save_weights_as	Datastore Save Specification JSON type	Specifies how the computed spatial weights will be stored in datastore.
spatial_col	String	(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.
crs	String or number	(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Function compute_global_spatial_autocorrelation

Computes the Moran I index for the given spatial table and column. Returns the following statistics: I, expected I, p value, z value

Parameter	Type	Description
table	String	The name of a database table.
column	String	The name of the column to calculate the spatial autocorrelation.
weights	Spatial Weights Location JSON Type	(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.
weights_def	Spatial Weights Definition JSON type	(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.
save_weights_as	Datastore Save Specification JSON type	(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.
spatial_col	String	(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.
crs	String or number	(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

FIELD	Type
I	NUMBER
expected_I	NUMBER
p_value	NUMBER
z_value	NUMBER

Function compute_local_spatial_autocorrelation

Computes the statistics for the Local Spatial Autocorrelation of all the rows from the given spatial tables using Local Moran. Returns a tabular result containing the statistics for row from the input table. Returned statistics include: I, p value, z value, quadrant.

Parameter	Type	Description
table	String	The name of a database table.
column	String	The name of the column to calculate the local spatial autocorrelation.
result_table	String	(Optional) - If specified, the result will be stored in this table.
key_column	String	(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.
weights	Spatial Weights Location JSON Type	(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.
weights_def	Spatial Weights Definition JSON type	(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.
save_weights_as	Datastore Save Specification JSON type	(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.
spatial_col	String	(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.
crs	String or number	(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

A resultset containing the same number of rows as the input table.

FIELD	Type
value of key_column parameter or ‘id’ if no key_column param is provided	Depends on the type of the column referenced by key_column
local_moran_I	NUMBER
p_value	NUMBER
z_value	NUMBER
quadrant	NUMBER (HOTSPOT/high-high=1, DOUGHNUT/high-low=2, COLDSPOT/low-low=3, DIAMOND/low-high=4)

Function create_spatial_lag

Creates a spatial lag for the given column of the provided spatial table. Returns a tabular result which includes the calculated spatial lag for each row from the input table.

Parameter	Type	Description
table	String	The name of a database table.
column	String	The name of the column for which the spatial lag will be calculated.
result_table	String	(Optional) - If specified, the result will be stored in this table.
key_column	String	(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.
weights	Spatial Weights Location JSON Type	(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.
weights_def	Spatial Weights Definition JSON type	(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.
save_weights_as	Datastore Save Specification JSON type	(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.
spatial_col	String	(Optional) - The name of the spatial column for which the spatial weights will be computed. If the table only contains a single spatial column, it is not needed to specify this value.
crs	String or number	(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.

Returns

A resultset containing the spatial lag column, containing a value for each row from the input table.

FIELD	Type
value of key_column parameter or ‘id’ if no key_column param is provided	Depends on the type of the column referenced by key_column
<column>_SLAG (Same name as column input param with the suffix _SLAG)	Depends on the type of the column input param

Function clustering

Peforms clustering on the given spatial table, selecting the given columns or all the columns of the table if no columns parameter was provided. Available clustering methods are: DBSCAN, Agglomerative, KMeans.

Parameter	Type	Description
table	String	The name of a database table.
columns	String	The name of the columns to be considered as features by the clustering algorithm.
method	String	One of the supported clustering algorithms. Possible values: KMEANS, DBSCAN, AGGLOMERATIVE.
scale	Boolean	(Optional - default=true) If true, all the values for the feature columns will be scaled.
result_table	String	(Optional) - If specified, the result will be stored in this table.
key_column	String	(Optional) - A column from the input table used to associated rows from the input table with the results of this operation. If no specified, ROWNUM will be used from table.
weights	Spatial Weights Location JSON Type	(Optional) - Existing spatial weights object. Previously calculated for the current spatial table’s geometries. If not specified, weights_def must be provided.
weights_def	Spatial Weights Definition JSON type	(Optional) - Specifies the type of spatial weights to be computed. If not specified, weights must be provided.
save_weights_as	Datastore Save Specification JSON type	(Optional) - Specifies how the computed spatial weights will be stored in datastore. It is only used if weights_def is provided.
geometry_as_feature	Boolean	(Optional, default=false) - If true, and not spatial weights or spatial weights definition is provided, the spatial column will be used as feature for the clustering.
spatial_col	String	(Optional) - The name of the spatial column for which the spatial weights will be computed when performing regionalization, or used as clustering feature if geometry_as_feature is set to true, otherwise, it is ignored. If the table only contains a single spatial column, it is not needed to specify this value.
crs	String or number	(Optional) - The spatial cooridate system associated to the goemetries of the spatial column. It can be specified as an SRID number an authority string such as EPSG:4326 or a WKT string.
plot	Plotting JSON Type	(Optional) - If provided, the clustering results will be plotted and an image will be returned.

The following parameters are specific to clustering algorithms.

KMEANS Parameters

Parameter	Type	Description
n_clusters	Number	(Optional) - The number of clusters to form as well as the number of centroids to generate. Elbow init method is used if not provided.
init	String	(Optional, default=k-means++) - Method for cluster initialization. Posible values: k-means++, random.
n_init	Number	(Optional, default=10) - Number of times k-means will run with different centroid seeds.
max_iter	Number	(Optional, default=300) - Maximum number of iterations of the k-means algorithm for a single run.
tol	Float	(Optional, default=1e-4) - Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence
random_state	Number	(Optional) - Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
algorithm	String	(Optional, default=auto) - K-means algorithm to use. The classical EM-style is “full”. The “elkan” variation is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters). Possible values: auto, full, elkan.
init_method	String	(Optional, default=elbow) - Possible values elbow, silhouette, gmeans.

DBSCAN Parameters

Parameter	Type	Description
eps	Float	(Optional) - The maximum distance between two samples for one to be considered as in the neighborhood of the other. If eps is None, the K-Distance method is used to estimate the best value for eps.
min_samples	Number	(Optional) - The number of samples in a neighborhood for a point to be considered as a core point. If min_samples is None, it is estimated using the number of features in the data.
metric	String	(Optional, default=euclidean) - The metric used to calculate the distance between instances in a feature array. Possible values cityblock, cosine, euclidean, haversine, manhattan.
algorithm	String	(Optional, default=auto) - The algorithm to be used by the NearestNeighbors module to compute pointwise distances and find nearest neighbors. Possible values: auto, ball_tree, kd_tree, brute.
leaf_size	Number	(Optional, default=30) - Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
p	Float	(Optional) - The power of the Minkowski metric to be used to calculate distance between points. If None, then `p=2` (equivalent to the Euclidean distance).
algorithm	String	(Optional, default=auto) - K-means algorithm to use. The classical EM-style is “full”. The “elkan” variation is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters). Possible values: auto, full, elkan.
init_method	Boolean	(Optional, default=true) - If true, it will use the spatial weight matrix as distance. If false, it will set the distance to all neighbors to zero

AGGLOMERATIVE Parameters

Parameter	Type	Description
n_clusters	Number	(Optional, default=2) - The number of clusters to form.
affinity	String	(Optional, default=euclidean) The metric to use when calculating the distance between observations. Possible values cityblock, cosine, euclidean, haversine, manhattan.
linkage	String	(Optional, default=ward) - Determines the distance to use. The algorithm merges pairs of cluster that minimize this criterion. \| * ward: Minimizes the variance of the clusters. \| * average: Uses the average of the distances of each observation of the two clusters. \| * complete: Uses the maximum distances between all observations of the two clusters. \| * single: Uses the minimum distances between all observations of the two clusters
distance_threshold	Float	(Optional) The linkage distance threshold. If specified, then n_clusters must not be specified.

Returns

A resultset containing the label assigned to each row from the input table.

Optionally, a plot image can be returned.

FIELD	Type
value of key_column parameter or ‘id’ if no key_column param is provided	Depends on the type of the column referenced by key_column
label	Number

JSON Types

The following are JSON types used accross REST and SQL functions.

Spatial Weights Definition

Describes the spatial weights to be computed.

Fields:

type: can contain one of the following values: KNN, DistanceBand, Kernel, Queen, Rook.
[swdef_type_fields]: Properties from the equivalent SpatialWeightsDefinition classes. The fields used are the same as the parameters taken by the constructor of the equivalent python classes.

KNN fields: See oraclesai.weights.KNNWeightsDefinition

DistanceBand fields: See oraclesai.weights.DistanceBandWeightsDefinition

Kernel fields: See oraclesai.weights.KernelBasedWeightsDefinition

Queen fields: See oraclesai.weights.QueenWeightsDefinition

Rook fields: See oraclesai.weights.RookWeightsDefinition

Examples:

{
    "type": "KNN",
    "k": 5
}

{
    "type": "DistanceBand",
    "threshold": 2000.0
}

{
    "type": "Queen"
}

Datastore Save Specification

Specifies how an object can be saved in an OML datastore.

Fields:

ds_name: Name of the datastore where the object will be saved
obj_name: The name used to save the object
append: If true, the object is appended to the datastore
overwrite: If an object exists with the same name and it is true, the object will be overwritten. Otherwise the operation will fail.

Example:

{
    "ds_name": "datastore1",
    "obj_name": "my_ob1",
    "append": true,
    "overwrite_obj": false
}

Datastore Object Location

Specified the location of an object in a datastore

Fields:

ds_name: Name of an existing datastore
obj_name: The name of an object in the datastore

Example:

{
    "ds_name": "datastore1",
    "obj_name": "my_obj1"
}

Cluster Plotting Parameters

Contains parameters used for plotting clustering results. In its simpler form, it can be empty and as long as the OML control parameter oml_graphics_flag is set to true, a plot will be generated.

Fields:

width: Width of the image
height: Height of the image
title: Title of the plot
with_noise: (default=false) if true and DBSCAN is used, noise points will be shown.
with bounds: (default=false) if true, clusters will be drawn as polygons.
with_basemap: (default=false) if true, a basemap will be added to the background.
with_legend: (default=true) if true, a legend with the clusters labels is added to the plot.

Example:

{
    "width": 20,
    "height": 15,
    "title": "Clusters",
    "with_noise": true,
    "with_bounds": false,
    "with_basemap": true,
    "with_legend": true
}