This tutorial shows how to add user-defined functions (UDF) to PGX and how to use them in PGQL and Green-Marl. The examples will focus on Java and JavaScript UDFs.
First we need to set up the proper UDF configurations to make them available in PGX. For more information about UDF configs, please refer to the UDF config reference.
The following sections show how to specify the location of the UDF configuration files and example configurations for two different UDF use-cases.
We create a directory udf-configs
which will contain all UDF config files and then we set the udf_config_directory
field in the PGX config to point to this directory.
Place the following contents in pgx.conf
to specify that the UDF configuration files can be found in udf-configs
:
{ "udf_config_directory": "udf-configs" }
Hint
PGX will pick up all config files in the directory - including sub-directories. This allows to structure the UDF config files according to your needs.
Note that if using a relative path, the path will be relative to the PGX config used. Refer to the PGX Engine config reference for more information.
A typical config for a Java UDF has the following structure:
The required namespace
field can be set freely and can be used to logically bundle different UDFs.
function_name
is the name of the UDF as it will be used in PGX. UDFs can be looked up using the combination of namespace and function-name.
By default PGX will assume that the same name is used for the UDF as for the name of the method in the UDF source.
In case where they differ, the name in the UDF can be set using the source_function_name
.
The language
field specifies the language the UDF is implemented in and the implementation_reference
fields requires the fully-qualified name of the class that implements the method.
We need to define the type of the UDF in return_type
and the name and type of the arguments it expects in the arguments
list.
Note that the name
field can be chosen freely as it is only used for logging purposes.
In this example we will add the Double.toHexString from the Java standard library as a UDF to PGX.
We use it for this example since it is present on the classpath of every Java application.
This process works for any static Java method as long as the implementation class is present on the PGX classpath.
Place the following contents in udf-configs/hex.json
:
Java UDF Requirements
UDFs written need to be usable and accessible for PGX; i.e. the methods must be both static
and public
.
{ "namespace": "udfTutorial", "function_name": "toHexString", "language": "java", "implementation_reference": "java.lang.Double", "return_type": "string", "arguments": [ { "name": "arg", "type": "double" } ] }
For this second example we will load a UDF from the JavaScript source file format.js
shown below:
const fun = function(name, country) { if (country == null) return name; else return name + " (" + country + ")"; } module.exports = {stringFormat: fun};
JavaScript UDF Requirements
The JavaScript source must contain all dependencies in order to be usable in PGX since PGX cannot resolve any dependencies. Tools like Browserify can be used to bundle all dependencies into a single JavaScript file. The source must also contain at least one valid export.
We then add the following UDF config to the directory mentioned above.
Not that in this case the name of the UDF and the implementing method differ which is why we need to set the source_function_name
field.
{ "namespace": "udfTutorial", "function_name": "format", "language": "javascript", "source_location": "format.js", "source_function_name": "stringFormat", "return_type": "string", "arguments": [ { "name": "name", "type": "string" }, { "name": "country", "type": "string" } ] }
After configuring the UDFs, they can now be used in PGX and the next two sections will show how they can be utilized in PGQL and Green-Marl (not available in all releases).
PGX 24.4.0 limitations
Note that all UDFs have to be present at PGX startup; loading UDFs dynamically is not possible.
Furthermore PGX generally assumes that all UDFs are stateless, side-effect free and can safely be executed in parallel.
First we start PGX in local mode:
cd $PGX_HOME ./bin/pgx-jshell // starting the shell will create an implicit session
import oracle.pgx.api.*; ... PgxSession session = Pgx.createSession("my-session");
session = pypgx.get_session(session_name= "my-session")
Next, let us load a graph into memory that we can use in the subsequent examples.
pgx> var connectionsGraph = session.readGraphWithProperties("examples/graphs/connections.csv.json")
import oracle.pgx.api.*; ... PgxGraph connectionsGraph = session.readGraphWithProperties("examples/graphs/connections.csv.json");
connections_graph = session.read_graph_with_properties("examples/graphs/connections.csv.json")
PGQL can call UDFs directly using the namespace
and function_name
.
Note that the namespace is required, otherwise the function will be interpreted as a built-in function.
UDFs can be used in any part of the PGQL query, e.g. the following query utilizes them in the SELECT
clause.
pgx> var resultSet = connectionsGraph.queryPgql("SELECT DISTINCT udfTutorial.format(v.name, v.country), udfTutorial.toHexString(e.weight) FROM MATCH (v) -[e]-> (u)").print(10) +---------------------------------------------------------------------------+ | udfTutorial.format(v.name, v.country) | udfTutorial.toHexString(e.weight) | +---------------------------------------------------------------------------+ | ABC (United States) | 0x1.0p0 | | ABC (United States) | 0x1.f4p9 | | Abdel Fattah eL-Sisi (Egypt) | 0x1.f4p9 | | Abdullah Gul (Turkey) | 0x1.f4p9 | | Alfonso Cuaron (Mexico) | 0x1.0p0 | | Alibaba (China) | 0x1.f4p9 | | Aliko Dangote () | 0x1.0p0 | | Amazon (United States) | 0x1.f4p9 | | Amy Adams (United States) | 0x1.0p0 | | Angela Merkel (Germany) | 0x1.0p0 | +---------------------------------------------------------------------------+ ==> null
import oracle.pgx.api.*; ... PgqlResultSet resultSet = connectionsGraph.queryPgql("SELECT DISTINCT udfTutorial.format(v.name, v.country), udfTutorial.toHexString(e.weight) FROM MATCH (v) -[e]-> (u)");
resultSet = connections_graph.query_pgql("SELECT DISTINCT udfTutorial.format(v.name, v.country), udfTutorial.toHexString(e.weight) FROM MATCH (v) -[e]-> (u)")
UDFs can be made available in Green-Marl procedures by importing them using the import
statement.
The following Green-Marl file imports the two methods defined above and uses them in the procedure.
import udfTutorial.format import udfTutorial.toHexString procedure udfDemo(graph g, nodeProp<string> name, nodeProp<string> country, edgeProp<float> weight; nodeProp<string> formattedName, edgeProp<string> hexStrings) { foreach(n: g.nodes) { n.formattedName = format(n.name, n.country); foreach(e: n.outEdges) { e.hexStrings = toHexString(e.weight); } } }
With the UDFs configured we can now compile and run this Green-Marl program.