3 Using the In-Memory Graph Server (PGX)
The in-memory Graph server of Oracle Graph supports a set of analytical functions.
This chapter provides examples using the in-memory Graph Server (also referred to as Property Graph In-Memory Analytics, and often abbreviated as PGX in the Javadoc, command line, path descriptions, error messages, and examples). It contains the following major topics.
- PGX User Authentication and Authorization
The Oracle Graph server (PGX) uses an Oracle database as identity manager by default. - Reading Data from Oracle Database into Memory
When data is in PGX (that is, when data has been read into memory), you can run any of the built-in algorithms against your data, even compile and execute your own custom algorithms and use PGQL to query results. - Keeping the Graph in Oracle Database Synchronized with the Graph Server
You can use theFlashbackSynchronizer
API to automatically apply changes made to graph in the database to the correspondingPgxGraph
object in memory, thus keeping both synchronized. - Configuring the In-Memory Analyst
You can configure the in-memory analyst engine and its run-time behavior by assigning a single JSON file to the in-memory analyst at startup. - Storing a Graph Snapshot on Disk
After reading a graph into memory using either Java or the Shell, if you make some changes to the graph such as running the PageRank algorithm and storing the values as vertex properties, you can store this snapshot of the graph on disk. - Executing Built-in Algorithms
The in-memory analyst contains a set of built-in algorithms that are available as Java APIs. - Using Custom PGX Graph Algorithms
A custom PGX graph algorithm allows you to write a graph algorithm in Java and have it automatically compiled to an efficient parallel implementation. - Creating Subgraphs
You can create subgraphs based on a graph that has been loaded into memory. You can use filter expressions or create bipartite subgraphs based on a vertex (node) collection that specifies the left set of the bipartite graph. - Using Automatic Delta Refresh to Handle Database Changes
You can automatically refresh (auto-refresh) graphs periodically to keep the in-memory graph synchronized with changes to the property graph stored in the property graph tables in Oracle Database (VT$ and GE$ tables). - Starting the In-Memory Analyst Server
A preconfigured version of Apache Tomcat is bundled, which allows you to start the in-memory analyst server by running a script. - Deploying to Apache Tomcat
The example in this topic shows how to deploy the graph server as a web application with Apache Tomcat. - Deploying to Oracle WebLogic Server
The example in this topic shows how to deploy the graph server as a web application with Oracle WebLogic Server. - Connecting to the In-Memory Analyst Server
After the property graph in-memory analyst is installed in a computer running Oracle Database -- or on a client system without Oracle Database server software as a web application on Apache Tomcat or Oracle WebLogic Server -- you can connect to the in-memory analyst server. - Managing Property Graph Snapshots
You can manage property graph snapshots. - User-Defined Functions (UDFs) in PGX
User-defined functions (UDFs) allow users of PGX to add custom logic to their PGQL queries or custom graph algorithms, to complement built-in functions with custom requirements.
3.1 PGX User Authentication and Authorization
The Oracle Graph server (PGX) uses an Oracle database as identity manager by default.
This means that you log into the graph server using existing Oracle Database credentials (user name and password), and the actions which you are allowed to do on the graph server are determined by the roles that have been granted to you in the Oracle database.
Basic Steps for Using an Oracle Database for Authentication
- Use an Oracle Database version that is supported by Oracle Graph Server and Client: version 12.2 or later, including Autonomous Database.
- Be sure that you have SYSDBA access (or ADMIN access for Autonomous Database) to grant and revoke users access to the graph server (PGX).
- Be sure that all existing users to which you plan to grant access to the graph server have at least the CREATE SESSION privilege granted.
- Be sure that the database is accessible via JDBC from the host where the Graph Server runs.
- As SYSDBA (or ADMIN on Autonomous Database), create the following roles:
CREATE ROLE graph_developer CREATE ROLE graph_administrator
- Assign roles to all the database developers who should have access the graph server (PGX). For example:
GRANT graph_developer TO <graphuser>
where <graphuser> is a user in the database.
- Assign the administrator role to users who should have administrative access. For example:
GRANT graph_administrator to <administratoruser>
where <administratoruser> is a user in the database.
- Prepare the Graph Server for Database Authentication
Locate thepgx.conf
file of your installation. - Connect to the Server from JShell with Database Authentication
You can use the JShell client to connect to the server in remote mode, using database authentication. - Generate and Use a Token
Generate and use a token for making authenticated remote requests to the graph server. - Read Data from the Database
If you have a valid authentication token, you can now read data from the database into the graph server without specifying any connection information in the graph configuration for as long as the token is valid. - Token Expiration
By default, tokens are valid for 4 hours. If the token expires, requests to the graph server will be rejected. - Advanced Access Configuration
You can customize the following fields inpgx.conf
realm options to customize login behavior. - Examples of Custom Authorization Rules
You can define custom authorization rules for developers. - Revoking Access to the Graph Server
To revoke a user's ability to access the graph server, either drop the user from the database or revoke the corresponding roles from the user, depending on how you defined the access rules in your pgx.conf file.
Parent topic: Using the In-Memory Graph Server (PGX)
3.1.1 Prepare the Graph Server for Database Authentication
Locate the pgx.conf
file of your installation.
If you installed the graph server via RPM, the file is located at: /etc/oracle/graph/pgx.conf
If you use the webapps
package to deploy into Tomcat or WebLogic Server, the pgx.conf
file is located inside the web application archive file (WAR file) at: WEB-INF/classes/pgx.conf
Tip: On Linux, you can use vim to edit the file directly inside the WAR file without unzipping it first. For example: vim pgx-webapp-20.3.0.war
Inside the pgx.conf
file, locate the jdbc_url
line of the realm options:
...
"pgx_realm": {
"implementation": "oracle.pg.identity.DatabaseRealm",
"options": {
"jdbc_url": "<REPLACE-WITH-DATABASE-URL-TO-USE-FOR-AUTHENTICATION>",
"token_expiration_seconds": 3600,
...
Replace the text with the JDBC URL pointing to your database that you configured in the previous step. For example:
...
"pgx_realm": {
"implementation": "oracle.pg.identity.DatabaseRealm",
"options": {
"jdbc_url": "jdbc:oracle:thin:@myhost:1521/myservice",
"token_expiration_seconds": 3600,
...
If you are using an Autonomous Database, specify the JDBC URL like this:
...
"pgx_realm": {
"implementation": "oracle.pg.identity.DatabaseRealm",
"options": {
"jdbc_url": "jdbc:oracle:thin:@my_identifier_low?TNS_ADMIN=/etc/oracle/graph/wallet",
"token_expiration_seconds": 3600,
...
where /etc/oracle/graph/wallet
is an example path to the unzipped wallet file that you downloaded from your Autonomous Database service console, and my_identifier_low
is one of the connect identifiers specified in /etc/oracle/graph/wallet/tnsnames.ora
.
Now, start the graph server. If you installed via RPM, this can be done using:
systemctl start pgx
Parent topic: PGX User Authentication and Authorization
3.1.2 Connect to the Server from JShell with Database Authentication
You can use the JShell client to connect to the server in remote mode, using database authentication.
./bin/opg-jshell --base_url https://localhost:7007 --username <database_user>
You will be prompted for the database password.
import oracle.pg.rdbms.*
import oracle.pgx.api.*
...
ServerInstance instance = GraphServer.getInstance("https://localhost:7007", "<database user>", "<database password>");
PgxSession session = instance.createSession("my-session");
...
Parent topic: PGX User Authentication and Authorization
3.1.3 Generate and Use a Token
Generate and use a token for making authenticated remote requests to the graph server.
If the graph server listens on https://localhost:7007
, you can run the following command to sign in to the graph server as SCOTT:
curl -X POST -H 'Content-Type: application/json' -d '{"username": "scott", "password": "<password>"}' https://localhost:7007/auth/token
The preceding example uses cURL to make an HTTP request to the server, you can use any HTTP client of your choice.
If the user exists and has one of the graph roles assigned, the server will reply with something like the following:
{ "token_type": "Bearer", "expires_in": 3600, "access_token": "eyJraWQiOiJEYXRhYmFzZVJlYWxtIiwiYWxnIjoiUlMyNTYifQ.eyJzdWIiOiJwZ0FkbWluIiwicm9sZXMiOlsicmVzb3VyY2UiLCJjb25uZWN0IiwiZ3JhcGhfYWRtaW5pc3RyYXRvciJdLCJpc3MiOiJvcmFjbGUucGcuaWRlbnRpdHkucmVzdC5BdXRoZW50aWNhdGlvblNlcnZpY2UiLCJleHAiOjE1OTAxMDU1MDV9.D14yGwvzW7zlyjxdiagknjB_wU3VSXnWKHFSYcLDkF2JclyMNE0MmtgJQ958BNFpvB- ha0ODxn_H1mnIlk3Cq7aoLiXN9V2WoxpYPQSdTu1pU2cKo-NfKOJF_MaqnS-USw0XozovqtrEsnaWid8uF8vAS0WHt0Wm8nTtijoe99K__tvDgpYQH3cqicERPlBMRov9oOg-Rfuyg1o6CoGdgrNEMYG44RRJBXOBFCD15yJ2aUfMHU5fukAbh6aWmrkKbwwueUmgTdjhWlxooEwyF-C_LjksVPba2M5zRX-WOC6Zp8Lqxr6uqhxR1W4XpuQLLaD2Vw4OwpP7M5AldsS2MQ" }
You can now use the token to make authenticated remote requests to the graph server.
If you use the JShell client, you will be prompted by the shell to provide the token when you start the shell in remote mode. For example:
./bin/opg-jshell --base_url https://localhost:7007
Enter the authentication token: <token>
If you are using a Java client program, you can connect using the following:
import oracle.pgx.api.*
...
ServerInstance instance = Pgx.getInstance("https://localhost:7007", "<token>");
PgxSession session = instance.createSession("my-session");
...
Parent topic: PGX User Authentication and Authorization
3.1.4 Read Data from the Database
If you have a valid authentication token, you can now read data from the database into the graph server without specifying any connection information in the graph configuration for as long as the token is valid.
Your database user must exist and have read access on the graph data in the database.
For example, the following graph configuration will read a property graph named hr
into memory, authenticating as scott/<password>
with the database:
GraphConfig config = GraphConfigBuilder.forPropertyGraphRdbms()
.setName("hr")
.addVertexProperty("FIRST_NAME", PropertyType.STRING)
.addVertexProperty("LAST_NAME", PropertyType.STRING)
.addVertexProperty("EMAIL", PropertyType.STRING)
.addVertexProperty("CITY", PropertyType.STRING)
.setPartitionWhileLoading(PartitionWhileLoading.BY_LABEL)
.setLoadVertexLabels(true)
.setLoadEdgeLabel(true)
.build();
PgxGraph hr = session.readGraphWithProperties(config);
The following example is a graph configuration in JSON format that reads from relational tables into the graph server, without any connection information being provided in the configuration file itself:
{
"name":"hr",
"vertex_id_strategy":"no_ids",
"vertex_providers":[
{
"name":"Employees",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"key_type":"string",
"props":[
{
"name":"FIRST_NAME",
"type":"string"
},
{
"name":"LAST_NAME",
"type":"string"
}
]
},
{
"name":"Departments",
"format":"rdbms",
"database_table_name":"DEPARTMENTS",
"key_column":"DEPARTMENT_ID",
"key_type":"string",
"props":[
{
"name":"DEPARTMENT_NAME",
"type":"string"
}
]
}
],
"edge_providers":[
{
"name":"WorksFor",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"source_column":"EMPLOYEE_ID",
"destination_column":"EMPLOYEE_ID",
"source_vertex_provider":"Employees",
"destination_vertex_provider":"Employees"
},
{
"name":"WorksAs",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"source_column":"EMPLOYEE_ID",
"destination_column":"JOB_ID",
"source_vertex_provider":"Employees",
"destination_vertex_provider":"Jobs"
}
]
}
For more information about how to read data from the database into the graph server, see Reading Data from Oracle Database into Memory.
Parent topic: PGX User Authentication and Authorization
3.1.5 Token Expiration
By default, tokens are valid for 4 hours. If the token expires, requests to the graph server will be rejected.
If that happens, you can generate a new token by logging in again and asking the server for a handle to your previous session by using the ServerInstance#getSession("<session-id>")
API. For example:
opg> var sessionId = session.getId() // remember session ID in variable
opg> var graph = session.readGraphWithProperties(config) // fails because token expired
// obtain new token (see above for example)
opg> var newToken = ...
// get reference to previous session back
opg> session = Pgx.getInstance(instance.getBaseUrl(), newToken).getSession(sessionId)
opg> var graph = session.readGraphWithProperties(config) // works now
Parent topic: PGX User Authentication and Authorization
3.1.6 Advanced Access Configuration
You can customize the following fields in pgx.conf
realm options to customize login behavior.
Table 3-1 Advanced Access Configuration Options
Field Name | Explanation | Default |
---|---|---|
token_expiration_seconds |
After how many seconds the generated bearer token will expire. | 14400 (4 hours) |
connect_timeout_milliseconds |
After how many milliseconds an connection attempt to the specified JDBC URL will time out, resulting in the login attempt being rejected. | 10000 |
max_pool_size |
Maximum number of JDBC connections allowed per user. If the number is reached, attempts to read from the database will fail for the current user. | 64 |
max_num_user s
|
Maximum number of active, signed in users to allow. If this number is reached, the graph server will reject login attempts. | 512 |
Note:
The preceding options work only if the realm implementation is configured to be oracle.pg.identity.DatabaseRealm
.
- Customizing Roles and Permissions
By default, the graph server maps the following roles to the following permissions inpgx.conf
. - Adding and Removing Roles
You can add new role permission mappings or remove existing mappings by modifying the authorization list. - Defining Permissions for Individual Users
In addition to defining permissions for roles, you can define permissions for individual users.
Parent topic: PGX User Authentication and Authorization
3.1.6.1 Customizing Roles and Permissions
By default, the graph server maps the following roles to the following permissions in pgx.conf
.
"authorization": [{
"pgx_role": "GRAPH_ADMINISTRATOR",
"pgx_permissions": [{
"grant": "PGX_SESSION_CREATE"
}, {
"grant": "PGX_SERVER_GET_INFO"
}, {
"grant": "PGX_SERVER_MANAGE"
}]
}, {
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"grant": "PGX_SESSION_CREATE"
}, {
"grant": "PGX_SESSION_NEW_GRAPH"
}, {
"grant": "PGX_SESSION_GET_PUBLISHED_GRAPH"
}]
}]
You can fully customize this mapping by adding and removing roles and specifying permissions to which a role maps. You can also authorize individual users instead of roles. This topic includes examples of how to customize the permission mapping.
To change the authorization mappings, you can:
- Modify the
pgx.conf
file and then restart the server, or - Do it at run time by using the
ServerInstance#updatePgxConfig()
API. You need the PGX_SERVER_MANAGE permission to do this. Note that using this API will not persist those changes.
Parent topic: Advanced Access Configuration
3.1.6.2 Adding and Removing Roles
You can add new role permission mappings or remove existing mappings by modifying the authorization list.
For example:
"authorization": [{
"pgx_role": "MY_CUSTOM_ROLE_1",
"pgx_permissions": [...]
}, {
"pgx_role": "MY_CUSTOM_ROLE_2",
"pgx_permissions": [...]
}, {
"pgx_role": "MY_CUSTOM_ROLE_3",
"pgx_permissions": [...]
}
Note that role and user names in PGX case case-sensitive, whereas in the database they will be case-insensitive (if not quoted). This means that if you perform CREATE ROLE my_custom_role_1
in the database, you will have to reference it in the pgx.conf
file with "pgx_role": "MY_CUSTOM_ROLE_1"
. On the other hand, if you perform CREATE ROLE "my_custom_role_2"
in the database, you will have to reference it in the pgx.conf
file with "pgx_role": "my_custom_role_2"
.
Parent topic: Advanced Access Configuration
3.1.6.3 Defining Permissions for Individual Users
In addition to defining permissions for roles, you can define permissions for individual users.
For example:
"authorization": [{
"pgx_user": "SCOTT",
"pgx_permissions": [...]
}, {
"pgx_user": "JANE",
"pgx_permissions": [...]
}, {
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [...]
}
Parent topic: Advanced Access Configuration
3.1.7 Examples of Custom Authorization Rules
You can define custom authorization rules for developers.
Example 3-1 Allowing Developers to Use Custom Graph Algorithms
To allow developers to compile custom graph algorithms (see Using Custom PGX Graph Algorithms, add the following static permission to the list of permissions:
...
"authorization": [{
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"grant": "PGX_SESSION_COMPILE_ALGORITHM"
},
...
Example 3-2 Allowing Developers to Publish Graphs
Allowing graph server users to publish graphs or share graphs with other users which originate from the Oracle Database breaks the database authorization model. If you work with graphs in the database, use GRANT statements in the database instead. See the OPG_APIS.GRANT_ACCESS API for examples how to do this for PG graphs. When reading from relational tables, use normal GRANT READ statements on tables.
To allow developers to publish graphs, add the following static permission to the list of permissions:
...
"authorization": [{
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"grant": "PGX_SESSION_ADD_PUBLISHED_GRAPH"
},
...
Publishing graphs alone does not give others access to the graph. You must also specify the type of access. There are three levels of permissions for graphs:
- READ: allows to read the graph data via the PGX API or in PGQL queries, run Analyst or custom algorithms on a graph and create a subgraph or clone the given graph
- EXPORT: export the graph via the PgxGraph#store() APIs. Includes READ permission. Please note that in addition to the EXPORT permission, users also need WRITE permission on a file system in order to export the graph.
- MANAGE: publish the graph or snapshot, grant or revoke permissions on the graph. Includes the EXPORT permission.
The creator of the graph automatically gets the MANAGE permission granted on the graph. If you have the MANAGE permission, you can grant other roles or users READ or EXPORT permission on the graph. You cannot grant MANAGE on a graph. The following example of a user named userA shows how:
import oracle.pgx.api.*
import oracle.pgx.common.auth.*
PgxSession session = Pgx.getInstance("<base-url>", "<auth-token-of-userA>").createSession("userA")
PgxGraph g = session.readGraphWithProperties("examples/sample-graph.json", "sample-graph")
g.grantPermission(new PgxRole("GRAPH_DEVELOPER"), PgxResourcePermission.READ)
g.publish()
PgxSession session = Pgx.getInstance("<base-url>", "<auth-token-of-userB>").createSession("userB")
PgxGraph g = session.getGraph("sample-graph")
g.queryPgql("select count(*) from match (v)").print().close()
Similarly, graphs can be shared with individual users instead of roles, as shown in the following example:
g.grantPermission(new PgxUser("OTHER_USER"), PgxResourcePermission.EXPORT)
where OTHER_USER is the user name of the user that will receive the EXPORT permission on graph g
.
Example 3-3 Allowing Developers to Access Preloaded Graphs
To allow developers to access preloaded graphs (graphs loaded during graph server startup), grant the read permission on the preloaded graph. For example:
"preload_graphs": [{
"path": "/data/my-graph.json",
"name": "global_graph"
}],
"authorization": [{
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"preloaded_graph": "global_graph"
"grant": "read"
},
...
You can grant READ, EXPORT, or MANAGE permission.
Example 3-4 Allowing Developers Access to the Hadoop Distributed Filesystem (HDFS) or the Local File System
To allow developers to read files from HDFS, you must first declare the HDFS directory and then map it to a read or write permission. For example:
...
"file_locations": [{
"name": "my_hdfs_graph_data",
"location": "hdfs:/data/graphs"
}],
"authorization": [{
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"file_location": "my_hdfs_graph_data"
"grant": "read"
},
...
Similarly, you can add another permission with "grant": "write"
to allow write access. Such a write access is required in order to export graphs.
Access to the local file system (where the graph server runs) can be granted the same way. The only difference is that location would be an absolute file path without the hdfs:
prefix. For example:
"location": "/opt/oracle/graph/data"
Note that in addition to the preceding configuration, the operating system user that runs the graph server process must have the corresponding directory privileges to actually read or write into those directories.
Parent topic: PGX User Authentication and Authorization
3.1.8 Revoking Access to the Graph Server
To revoke a user's ability to access the graph server, either drop the user from the database or revoke the corresponding roles from the user, depending on how you defined the access rules in your pgx.conf file.
For example:
REVOKE graph_developer FROM scott
Revoking Graph Permissions
If you have the MANAGE permission on a graph, you can revoke graph access from users or roles using the PgxGraph#revokePermission
API. For example:
PgxGraph g = ...
g.revokePermission(new PgxRole("GRAPH_DEVELOPER")) // revokes previously granted role access
g.revokePermission(new PgxUser("SCOTT")) // revokes previously granted user access
Parent topic: PGX User Authentication and Authorization
3.2 Reading Data from Oracle Database into Memory
When data is in PGX (that is, when data has been read into memory), you can run any of the built-in algorithms against your data, even compile and execute your own custom algorithms and use PGQL to query results.
Depending on your needs, there are two different approaches to how you can read data from the Oracle Database into PGX.
-
Graph database use case
You store your data as a graph in the database and manage that data in the database via graph APIs. You only need PGX as an accelerator for expensive queries or to run graph algorithms on the entire graph.
For this use case, you should store the data in the property graph format in the Oracle Database (VT$ and GE$ tables), use PGQL on RDBMS to manage the data in the database and then read from that property graph format into PGX. You can also configure PGX to periodically fetch updates from the database automatically in the background to keep the data synchronized.
Note that the use of PGX is optional in this use case. For some applications the capabilities available in the database only are sufficient.
-
Analytics-only use case
Your data is stored in relational form and you want to keep managing that data using standard PL/SQL. You are not interested in a "graph database" but still want to benefit from the analytical capabilities of PGX, which exploit the connectivity of your data for specific analytical use cases.
Note:
This topic applies to both user managed and Autonomous Databases. However, the examples shown are for user managed Databases. For extra configuration steps required for Autonomous Databases, see Using Oracle Graph with the Autonomous Database.
Subtopics:
- Store the database password in a keystore
- Either, Write the PGX graph configuration file to load from the property graph schema
- Or, Write the PGX graph configuration file to load a graph directly from relational tables
- Read the data
- Secure coding tips for graph client applications
Store the database password in a keystore
Regardless of your use case, PGX requires a database account to read data from the database into memory. The account should be a low-privilege account (see Security Best Practices with Graph Data).
As described in Read Data from the Database, you can read data from the database into the graph server without specifying additional authentication as long as the token is valid for that database user. But if you want to access a graph from a different user, you can do so, as long as that user's password is stored in a Java Keystore file for protection.
You can use the keytool
command that is bundled together with the JDK to generate such a keystore file on the command line. See the following script as an example:
# Add a password for the 'database1' connection
keytool -importpass -alias database1 -keystore keystore.p12
# 1. Enter the password for the keystore
# 2. Enter the password for the database
# Add another password (for the 'database2' connection)
keytool -importpass -alias database2 -keystore keystore.p12
# List what's in the keystore using the keytool
keytool -list -keystore keystore.p12
If you are using Java version 8 or lower, you should pass the additional parameter -storetype pkcs12
to the keytool commands in the preceding example.
You can store more than one password into a single keystore file. Each password can be referenced using the alias name provided.
Either, Write the PGX graph configuration file to load from the property graph schema
Next write a PGX graph configuration file in JSON format. The file tells PGX where to load the data from, how the data looks like and the keystore alias to use. The following example shows a graph configuration to read data stored in the Oracle property graph format.
{
"format": "pg",
"db_engine": "rdbms",
"name": "hr",
"jdbc_url": "jdbc:oracle:thin:@myhost:1521/orcl",
"username": "hr",
"keystore_alias": "database1",
"vertex_props": [{
"name": "COUNTRY_NAME",
"type": "string"
}, {
"name": "DEPARTMENT_NAME",
"type": "string"
}, {
"name": "SALARY",
"type": "double"
}],
"partition_while_loading": "by_label",
"loading": {
"load_vertex_labels": true,
"load_edge_label": true
}
}
(For the full list of available configuration fields, including their meanings and default values, see https://docs.oracle.com/cd/E56133_01/latest/reference/loader/database/pg-format.html.)
Or, Write the PGX graph configuration file to load a graph directly from relational tables
The following example loads a subset of the HR sample data from relational tables directly into PGX as a graph. The configuration file specifies a mapping from relational to graph format by using the concept of vertex and edge providers.
Note:
Specifying the vertex_providers
and edge_providers
properties loads the data into an optimized representation of the graph.
{
"name":"hr",
"jdbc_url":"jdbc:oracle:thin:@myhost:1521/orcl",
"username":"hr",
"keystore_alias":"database1",
"vertex_id_strategy": "no_ids",
"vertex_providers":[
{
"name":"Employees",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"key_type": "string",
"props":[
{
"name":"FIRST_NAME",
"type":"string"
},
{
"name":"LAST_NAME",
"type":"string"
},
{
"name":"EMAIL",
"type":"string"
},
{
"name":"SALARY",
"type":"long"
}
]
},
{
"name":"Jobs",
"format":"rdbms",
"database_table_name":"JOBS",
"key_column":"JOB_ID",
"key_type": "string",
"props":[
{
"name":"JOB_TITLE",
"type":"string"
}
]
},
{
"name":"Departments",
"format":"rdbms",
"database_table_name":"DEPARTMENTS",
"key_column":"DEPARTMENT_ID",
"key_type": "string",
"props":[
{
"name":"DEPARTMENT_NAME",
"type":"string"
}
]
}
],
"edge_providers":[
{
"name":"WorksFor",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"source_column":"EMPLOYEE_ID",
"destination_column":"EMPLOYEE_ID",
"source_vertex_provider":"Employees",
"destination_vertex_provider":"Employees"
},
{
"name":"WorksAs",
"format":"rdbms",
"database_table_name":"EMPLOYEES",
"key_column":"EMPLOYEE_ID",
"source_column":"EMPLOYEE_ID",
"destination_column":"JOB_ID",
"source_vertex_provider":"Employees",
"destination_vertex_provider":"Jobs"
},
{
"name":"WorkedAt",
"format":"rdbms",
"database_table_name":"JOB_HISTORY",
"key_column":"EMPLOYEE_ID",
"source_column":"EMPLOYEE_ID",
"destination_column":"DEPARTMENT_ID",
"source_vertex_provider":"Employees",
"destination_vertex_provider":"Departments",
"props":[
{
"name":"START_DATE",
"type":"local_date"
},
{
"name":"END_DATE",
"type":"local_date"
}
]
}
]
}
Note about vertex and edge IDs:
PGX enforces by default the existence of a unique identifier for each vertex and edge in a graph, so that they can be retrieved by using PgxGraph.getVertex(ID id)
and PgxGraph.getEdge(ID id)
or by PGQL queries using the built-in id()
method.
The default strategy to generate the vertex IDs is to use the keys provided during loading of the graph. In that case, each vertex should have a vertex key that is unique across all the types of vertices. For edges, by default no keys are required in the edge data, and edge IDs will be automatically generated by PGX. Note that the generation of edge IDs is not guaranteed to be deterministic. If required, it is also possible to load edge keys as IDs.
However, because it may cumbersome for partitioned graphs to define such identifiers, it is possible to disable that requirement for the vertices and/or edges by setting the vertex_id_strategy
and edge_id_strategy
graph configuration fields to the value no_id
s as shown in the preceding example. When disabling vertex (resp. edge) IDs, the implication is that PGX will forbid the call to APIs using vertex (resp. edge) IDs, including the ones indicated previously.
If you want to call those APIs but do not have globally unique IDs in your relational tables, you can specify the unstable_generated_ids
generation strategy, which generates new IDs for you. As the name suggests, there is no correlation to the original IDs in your input tables and there is no guarantee that those IDs are stable. Same as with the edge IDs, it is possible that loading the same input tables twice yields two different generated IDs for the same vertex.
Read the data
Now you can instruct PGX to connect to the database and read the data by passing in both the keystore and the configuration file to PGX, using one of the following approaches:
- Interactively in the graph shell
If you are using the graph shell, start it with the
--secret_store
option. It will prompt you for the keystore password and then attach the keystore to your current session. For example:cd /opt/oracle/graph ./bin/opg-jshell --secret_store /etc/my-secrets/keystore.p12 enter password for keystore /etc/my-secrets/keystore.p12:
Inside the shell, you can then use normal PGX APIs to read the graph into memory by passing the JSON file you just wrote into the
readGraphWithProperties
API:opg-jshell-rdbms> var graph = session.readGraphWithProperties("config.json") graph ==> PgxGraph[name=hr,N=215,E=415,created=1576882388130]
- As a PGX preloaded graph
As a server administrator, you can instruct PGX to load graphs into memory upon server startup. To do so, modify the PGX configuration file at
/etc/oracle/graph/pgx.conf
and add the path the graph configuration file to thepreload_graphs
section. For example:{ ... "preload_graphs": [{ "name": "hr", "path": "/path/to/config.json" }], ... }
Now, when you start up the server using the start-server script, provide the path to the keystore file which will prompt for the keystore password before server startup. For example:
./pgx/bin/start-server --secret-store /etc/my-secrets/keystore.p12 enter password for keystore /etc/my-secrets/keystore.p12:
- In a Java application
To register a keystore in a Java application, use the
registerKeystore()
API on thePgxSession
object. For example:import oracle.pgx.api.*; class Main { public static void main(String[] args) throws Exception { String baseUrl = args[0]; String keystorePath = "/etc/my-secrets/keystore.p12"; char[] keystorePassword = args[1].toCharArray(); String graphConfigPath = args[2]; ServerInstance instance = Pgx.getInstance(baseUrl); try (PgxSession session = instance.createSession("my-session")) { session.registerKeystore(keystorePath, keystorePassword); PgxGraph graph = session.readGraphWithProperties(graphConfigPath); System.out.println("N = " + graph.getNumVertices() + " E = " + graph.getNumEdges()); } } }
You can compile and run the preceding sample program using the Oracle Graph Client package. For example:cd $GRAPH_CLIENT // create Main.java with above contents javac -cp 'lib/*' Main.java java -cp '.:conf:lib/*' Main http://myhost:7007 MyKeystorePassword path/to/config.json
Secure coding tips for graph client applications
When writing graph client applications, make sure to never store any passwords or other secrets in clear text in any files or in any of your code.
Do not accept passwords or other secrets through command line arguments either. Instead, use Console.html#readPassword()
from the JDK.
Parent topic: Using the In-Memory Graph Server (PGX)
3.3 Keeping the Graph in Oracle Database Synchronized with the Graph Server
You can use the FlashbackSynchronizer
API to automatically apply changes made to graph in the database to the corresponding PgxGraph
object in memory, thus keeping both synchronized.
This API uses Oracle's Flashback Technology to fetch the changes in the database since the last fetch and then push those changes into the graph server using the ChangeSet
API. After the changes are applied, the usual snapshot semantics of the graph server apply: each delta fetch application creates a new in-memory snapshot. Any queries or algorithms that are executing concurrently to snapshot creation are unaffected by the changes until the corresponding session refreshes its PgxGraph
object to the latest state by calling the session.setSnapshot(graph, PgxSession.LATEST_SNAPSHOT)
procedure.
For detailed information about Oracle Flashback technology, see the Database Development Guide.
Prerequisites for Synchronizing
The Oracle database must have Flashback enabled and the database user that you use to perform synchronization must have:
- Read access to all tables which need to kept synchornized.
- Permission to use flashback APIs. For example:
grant execute on dbms_flashback to <user>
The database must also be configured to retain changes for the amount of time needed by your use case.
Limitations for Synchronizing
The synchronizer API currently has the following limitations
- Only partitioned graph configurations with all providers being database tables are supported.
- Both the vertex and edge ID strategy must be set as follows:
"vertex_id_strategy": "keys_as_ids", "edge_id_strategy": "keys_as_ids"
- Each edge/vertex provider must be configured to create a key mapping. In each provider block of the graph configuration, add the following loading section:
"loading": { "create_key_mapping": true }
This implies that vertices and edges must have globally unique ID columns.
- Each edge/vertex provider must specify the owner of the table by setting the username field. For example, if user SCOTT owns the table, then set the username accordingly in the provider block of that table:
"username": "scott"
- In the root loading block, the snapshot source must be set to
change_set
:"loading": { "snapshots_source": "change_set" }
For a detailed example, including some options, see the following topic.
- Example of Synchronizing
As an example of performing synchronization, assume you have the following Oracle Database tables, PERSONS and FRIENDSHIPS.
Parent topic: Using the In-Memory Graph Server (PGX)
3.3.1 Example of Synchronizing
As an example of performing synchronization, assume you have the following Oracle Database tables, PERSONS and FRIENDSHIPS.
CREATE TABLE PERSONS (
PERSON_ID NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
NAME VARCHAR2(200),
BIRTHDATE DATE,
HEIGHT FLOAT DEFAULT ON NULL 0,
INT_PROP INT DEFAULT ON NULL 0,
CONSTRAINT person_pk PRIMARY KEY (PERSON_ID)
);
CREATE TABLE FRIENDSHIPS (
FRIENDSHIP_ID NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
PERSON_A NUMBER,
PERSON_B NUMBER,
MEETING_DATE DATE,
TS_PROP TIMESTAMP,
CONSTRAINT fk_PERSON_A_ID FOREIGN KEY (PERSON_A) REFERENCES persons(PERSON_ID),
CONSTRAINT fk_PERSON_B_ID FOREIGN KEY (PERSON_B) REFERENCES persons(PERSON_ID)
);
You add some sample data into these tables:
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('John', 1.80, to_date('13/06/1963', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Mary', 1.65, to_date('25/09/1982', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Bob', 1.75, to_date('11/03/1966', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Alice', 1.70, to_date('01/02/1987', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (1, 3, to_date('01/09/1972', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (2, 4, to_date('19/09/1992', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (4, 2, to_date('19/09/1992', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (3, 2, to_date('10/07/2001', 'DD/MM/YYYY'));
Synchronizing Using Connection Information in the Graph Configuration
You then want to synchronize using connection information in the graph configuration. You have the following sample graph configuration (KeystoreGraphConfigExample
), which reads those tables as a graph:
{
"name": "PeopleFriendships",
"optimized_for": "updates",
"edge_id_strategy": "keys_as_ids",
"edge_id_type": "long",
"vertex_id_type": "long",
"jdbc_url": "<jdbc_url>",
"username": "<username>",
"keystore_alias": "<keystore_alias>",
"vertex_providers": [
{
"format": "rdbms",
"username": "<username>",
"key_type": "long",
"name": "person",
"database_table_name": "PERSONS",
"key_column": "PERSON_ID",
"props": [
...
],
"loading": {
"create_key_mapping": true
}
}
],
"edge_providers": [
{
"format": "rdbms",
"username": "<username>",
"name": "friendOf",
"source_vertex_provider": "person",
"destination_vertex_provider": "person",
"database_table_name": "FRIENDSHIPS",
"source_column": "PERSON_A",
"destination_column": "PERSON_B",
"key_column": "FRIENDSHIP_ID",
"key_type":"long",
"props": [
...
],
"loading": {
"create_key_mapping": true
}
}
],
"loading": {
"snapshots_source": "change_set"
}
}
(In the preceding example, replace the values <jdbc_url>
, <username>
, and <keystore_alias>
with the values for connecting to your database.)
Open the Oracle Property Graph JShell (be sure to register the keystore containing the database password when starting it), and load the graph into memory:
var pgxGraph = session.readGraphWithProperties("persons_graph.json");
The following output line shows that the example graph has four vertices and four edges:
pgxGraph ==> PgxGraph[name=PeopleFriendships,N=4,E=4,created=1594754376861]
Now, back in the database, insert a few new rows:
INSERT INTO PERSONS (NAME, BIRTHDATE, HEIGHT) VALUES ('Mariana',to_date('21/08/1996','DD/MM/YYYY'),1.65);
INSERT INTO PERSONS (NAME, BIRTHDATE, HEIGHT) VALUES ('Francisco',to_date('13/06/1963','DD/MM/YYYY'),1.75);
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (1, 6, to_date('13/06/2013','DD/MM/YYYY'));
COMMIT;
Back in JShell, you can now use the FlashbackSynchronizer
API to automatically fetch and apply those changes:
var synchronizer = new Synchronizer.Builder<FlashbackSynchronizer>().setType(FlashbackSynchronizer.class).setGraph(pgxGraph).build()
pgxGraph = synchronizer.sync()
As you can see from the output, the new vertices and edges have been applied:
pgxGraph ==> PgxGraph[name=PeopleFriendships,N=6,E=5,created=1594754376861]
Note that pgxGraph = synchronizer.sync()
is equivalent to calling the following:
synchronizer.sync()
session.setSnapshot(pgxGraph, PgxSession.LATEST_SNAPSHOT)
Splitting the Fetching and Applying of Changes
The synchronizer.sync()
invocation fetches the changes and applies them in one call. However, you can encode a more complex update logic by splitting this process into separate fetch()
and apply()
invocations. For example:
synchronizer.fetch() // fetches changes from the database
if (synchronizer.getGraphDelta().getTotalNumberOfChanges() > 100) {
// only create snapshot if there have been more than 100 changes
synchronizer.apply()
}
Synchronizing Using an Explicit Oracle Connection
The synchronizer API fetches the changes on the client side. That means the client needs to connect to the database. In the preceding example, it did that by reading the connection information available in the graph configuration of the loaded PgxGraph
object. However, there can be situations in which connection information cannot be obtained from the PgxGraph
object, such as when:
- The associated graph configuration does not contain any database connection information, and the graph was loaded using credentials of a logged in user; or
- The associated graph configuration contains a datasource ID corresponding to a connection stored on the server side.
In these cases, you can pass in an Oracle connection object building the synchronizer object to be used to fetch changes. For example (ExampleGraphConfig.json
):
String jdbcUrl = "<JDBC URL>";
String username = "<USERNAME>";
String password = "<PASSWORD>";
Connection connection = DriverManager.getConnection(jdbcUrl, username, password)
Synchronizer synchronizer = new Synchronizer.Builder<FlashbackSynchronizer>()
.setType(FlashbackSynchronizer.class)
.setGraph(pgxGraph)
.setConnection(connection)
.build();
3.4 Configuring the In-Memory Analyst
You can configure the in-memory analyst engine and its run-time behavior by assigning a single JSON file to the in-memory analyst at startup.
This file can include the parameters shown in the following table. Some examples follow the table.
To specify the configuration file, see Specifying the Configuration File to the In-Memory Analyst.
Note:
-
Relative paths in parameter values are always resolved relative to the parent directory of the configuration file in which they are specified. For example, if the configuration file is
/pgx/conf/pgx.conf
, then the file pathgraph-configs/my-graph.bin.json
inside that file would be resolved to/pgx/conf/graph-configs/my-graph.bin.json
. -
The parameter default values are optimized to deliver the best performance across a wide set of algorithms. Depending on your workload. you may be able to improve performance further by experimenting with different strategies, sizes, and thresholds.
Table 3-2 Configuration Parameters for the In-Memory Analyst
Parameter | Type | Description | Default |
---|---|---|---|
admin_request_cache_timeout |
integer |
After how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: This is only relevant if PGX is deployed as a webapp. |
60 |
allow_idle_timeout_overwrite |
boolean |
If true, sessions can overwrite the default idle timeout. |
true |
allow_local_filesystem |
boolean |
Allow loading from the local file system in client/server mode. Default is false. If set to true, additionally specify the property datasource_dir_whitelist to list the directories. The server can only read from the directories that are listed here.
|
false |
allow_override_scheduling_information |
boolean |
If true, allow all users to override scheduling information like task weight, task priority, and number of threads |
true |
allowed_remote_loading_locations | array of string |
Allow loading of graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3, hdfs). Default is empty. Value supported is “*” (asterisk), meaning that all remote locations will be allowed. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. WARNING: Specifying * (asterisk) should be done only if you want to explicitly allow users of the PGX remote interface to access files on the local file system. |
[] |
allow_task_timeout_overwrite |
boolean |
If true, sessions can overwrite the default task timeout. |
true |
allow_user_auto_refresh |
boolean |
If true, users may enable auto refresh for graphs they load. If false, only graphs mentioned in |
false |
allowed_remote_loading_locations | array of string | (This option may reduce security; use it only if you know what you are doing!) Allow loading graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3, hdfs). If empty, as by default, no remote location is allowed. If "*" is specified in the array, all remote locations are allowed. Only the value "*" is currently supported. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. | [] |
basic_scheduler_config | object | Configuration parameters for the fork join pool backend. | null |
bfs_iterate_que_task_size |
integer |
Task size for BFS iterate QUE phase. |
128 |
bfs_threshold_parent_read_based |
number |
Threshold of BFS traversal level items to switch to parent-read-based visiting strategy. |
0.05 |
bfs_threshold_read_based |
integer |
Threshold of BFS traversal level items to switch to read-based visiting strategy. |
1024 |
bfs_threshold_single_threaded |
integer |
Until what number of BFS traversal level items vertices are visited single-threaded. |
128 |
character_set |
string |
Standard character set to use throughout PGX. UTF-8 is the default. Note: Some formats may not be compatible. |
utf-8 |
cni_diff_factor_default |
integer |
Default diff factor value used in the common neighbor iterator implementations. |
8 |
cni_small_default |
integer |
Default value used in the common neighbor iterator implementations, to indicate below which threshold a subarray is considered small. |
128 |
cni_stop_recursion_default |
integer |
Default value used in the common neighbor iterator implementations, to indicate the minimum size where the binary search approach is applied. |
96 |
datasource_dir_whitelist | array of string | If allow_local_filesystem is set, the list of directories from which it is allowed to read files.
|
[] |
dfs_threshold_large |
integer |
Value that determines at which number of visited vertices the DFS implementation will switch to data structures that are optimized for larger numbers of vertices. |
4096 |
enable_csrf_token_checks |
boolean |
If true, the PGX webapp will verify the Cross-Site Request Forgery (CSRF) token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks. |
true |
enable_gm_compiler |
boolean |
[relevant when profiling with solaris studio] When enabled, label experiments using the 'er_label' command. |
false |
enable_shutdown_cleanup_hook |
boolean |
If true, PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not explicitly shutting down PGX may result in pollution of your temp directory. |
true |
enterprise_scheduler_config |
object |
Configuration parameters for the enterprise scheduler. |
null |
enterprise_scheduler_flags |
object |
[relevant for enterprise_scheduler] Enterprise scheduler-specific settings. |
null |
explicit_spin_locks |
boolean |
true means spin explicitly in a loop until lock becomes available. false means using JDK locks which rely on the JVM to decide whether to context switch or spin. Setting this value to true usually results in better performance. |
true |
graph_algorithm_language | enum[GM_LEGACY, GM, JAVA] | Front-end compiler to use. | gm |
graph_validation_lever | enum[low, high] | Level of validation performed on newly loaded or created graphs. | low |
ignore_incompatible_backend_operations | boolean | If true, only log when encountering incompatible operations and configuration values in RTS or FJ pool. If false, throw exceptions. | false |
in_place_update_consistency | enum[ALLLOW_INCONSISTENCIES, CANCEL_TASKS] | Consistency model used when in-place updates occur. Only relevant if in-place updates are enabled. Currently updates are only applied in place if the updates are not structural (Only modifies properties). Two models are currently implemented: one only delays new tasks when an update occurs, the other also delays running tasks. | allow_inconsistencies |
init_pgql_on_startup | boolean | If true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL. | true |
interval_to_poll_max | integer | Exponential backoff upper bound (in ms) to which -once reached, the job status polling interval is fixed | 1000 |
java_home_dir | string | The path to Java's home directory. If set to <system-java-home-dir> , use the java.home system property.
|
null |
large_array_threshold | integer | Threshold when the size of an array is too big to use a normal Java array. This depends on the used JVM. (Defaults to Integer.MAX_VALUE - 3) | 2147483644 |
max_active_sessions |
integer |
Maximum number of sessions allowed to be active at a time. |
1024 |
max_distinct_strings_per_pool | integer | [only relevant if string_pooling_strategy is indexed] Number of distinct strings per property after which to stop pooling. If the limit is reached, an exception is thrown. | 65536 |
max_off_heap_size |
integer |
Maximum amount of off-heap memory (in megabytes) that PGX is allowed to allocate before an OutOfMemoryError will be thrown. Note: this limit is not guaranteed to never be exceeded, because of rounding and synchronization trade-offs. It only serves as threshold when PGX starts to reject new memory allocation requests. |
<available-physical-memory> |
max_queue_size_per_session |
integer |
The maximum number of pending tasks allowed to be in the queue, per session. If a session reaches the maximum, new incoming requests of that sesssion get rejected. A negative value means no limit. |
-1 |
max_snapshot_count |
integer |
Number of snapshots that may be loaded in the engine at the same time. New snapshots can be created via auto or forced update. If the number of snapshots of a graph reaches this threshold, no more auto-updates will be performed, and a forced update will result in an exception until one or more snapshots are removed from memory. A value of zero indicates to support an unlimited amount of snapshots. |
0 |
memory_allocator | enum[basic_allocator, enterprise_allocator] | The memory allocator to use. | basic_allocator |
memory_cleanup_interval |
integer |
Memory cleanup interval in seconds. |
600 |
ms_bfs_frontier_type_strategy |
enum[auto_grow, short, int] |
The type strategy to use for MS-BFS frontiers. |
auto_grow |
num_spin_locks |
integer |
Number of spin locks each generated app will create at instantiation. Trade-off: a small number implies less memory consumption; a large number implies faster execution (if algorithm uses spin locks). |
1024 |
parallelism | integer | Number of worker threads to be used in thread pool. Note: If the caller thread is part of another thread-pool, this value is ignored and the parallelism of the parent pool is used. | <number-of-cpus> |
pattern_matching_supernode_cache_threshold | integer | Minimum number of a node's neighbor to be a supernode. This is for the pattern matching engine. | 1000 |
pooling_factor | number | [only relevant if string_pooling_strategy is on_heap] This value prevents the string pool to grow as big as the property size, which could render the pooling ineffective. | 0.25 |
preload_graphs |
array of object |
List of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published. |
[] |
random_generator_strategy | enum[non_deterministic, deterministic] | Method of generating random numbers in PGX. | non_deterministic |
random_seed |
long |
[relevant for deterministic random number generator only] Seed for the deterministic random number generator used in the in-memory analyst. The default is -24466691093057031. |
-24466691093057031 |
release_memory_threshold |
double |
Threshold percentage (decimal fraction) of used memory after which the engine starts freeing unused graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs never get freed, and the engine will throw OutOfMemoryErrors as soon as a graph is needed which does not fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. When consumption exceeds 70% and another graph needs to get loaded, unused graphs get freed until memory consumption is below 70% again. |
0.85 |
revisit_threshold | integer | Maximum number of matched results from a node to be reached. | 4096 |
scheduler | enum[basic_scheduler, enterprise_scheduler] | The scheduler to use. basic_scheduler uses a scheduler with basic features. enterprise_scheduler uses a scheduler with advanced enterprise features for running multiple tasks concurrently and providing better performance.
|
advanced_scheduler |
session_idle_timeout_secs |
integer |
Timeout of idling sessions in seconds. Zero (0) means no timeout |
0 |
session_task_timeout_secs |
integer |
Timeout in seconds to interrupt long-running tasks submitted by sessions (algorithms, I/O tasks). Zero (0) means no timeout. |
0 |
small_task_length |
integer |
Task length if the total amount of work is smaller than default task length (only relevant for task-stealing strategies). |
128 |
spark_streams_interface |
string |
The name of an interface will be used for spark data communication. |
null |
strict_mode |
boolean |
If true, exceptions are thrown and logged with ERROR level whenever the engine encounters configuration problems, such as invalid keys, mismatches, and other potential errors. If false, the engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses and uses sensible defaults instead of throwing exceptions. |
true |
string_pooling_strategy | enum[indexed, on_heap, none] | [only relevant if use_string_pool is enabled] The string pooling strategy to use. | on_heap |
task_length |
integer |
Default task length (only relevant for task-stealing strategies). Should be between 100 and 10000. Trade-off: a small number implies more fine-grained tasks are generated, higher stealing throughput; a large number implies less memory consumption and GC activity. |
4096 |
tmp_dir |
string |
Temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, uses the standard tmp directory of the underlying system (/tmp on Linux). |
<system-tmp-dir> |
udf_config_directory |
string |
Directory path containing UDF files. |
null |
use_memory_mapper_for_reading_pgb | boolean | If true, use memory mapped files for reading graphs in PGB format if possible; if false, always use a stream-based implementation. | true |
use_memory_mapper_for_storing_pgb | boolean | If true, use memory mapped files for storing graphs in PGB format if possible; if false, always use a stream-based implementation. | true |
Enterprise Scheduler Parameters
The following parameters are relevant only if the advanced scheduler is used. (They are ignored if the basic scheduler is used.)
-
analysis_task_config
Configuration for analysis tasks. Type: object. Default:
prioritymediummax_threads<no-of-CPUs>weight<no-of-CPUs>
-
fast_analysis_task_config
Configuration for fast analysis tasks. Type: object. Default:
priorityhighmax_threads<no-of-CPUs>weight1
-
maxnum_concurrent_io_tasks
Maximum number of concurrent tasks. Type: integer. Default: 3
-
num_io_threads_per_task
Configuration for fast analysis tasks. Type: object. Default:
<no-of-cpus>
Basic Scheduler Parameters
The following parameters are relevant only if the basic scheduler is used. (They are ignored if the advanced scheduler is used.)
-
num_workers_analysis
Number of worker threads to use for analysis tasks. Type: integer. Default:
<no-of-CPUs>
-
num_workers_fast_track_analysis
Number of worker threads to use for fast-track analysis tasks. Type: integer. Default: 1
-
num_workers_io
Number of worker threads to use for I/O tasks (load/refresh/write from/to disk). This value will not affect file-based loaders, because they are always single-threaded. Database loaders will open a new connection for each I/O worker. Default:
<no-of-CPUs>
Example 3-5 Minimal In-Memory Analyst Configuration
The following example causes the in-memory analyst to initialize its analysis thread pool with 32 workers. (Default values are used for all other parameters.)
{ "enterprise_scheduler_config": { "analysis_task_config": { "max_threads": 32 } } }
Example 3-6 Two Pre-loaded Graphs
sets more fields and specifies two fixed graphs for loading into memory during PGX startup.
{ "enterprise_scheduler_config": { "analysis_task_config": { "max_threads": 32 }, "fast_analysis_task_config": { "max_threads": 32 } }, "memory_cleanup_interval": 600, "max_active_sessions": 1, "release_memory_threshold": 0.2, "preload_graphs": [ { "path": "graph-configs/my-graph.bin.json", "name": "my-graph" }, { "path": "graph-configs/my-other-graph.adj.json", "name": "my-other-graph", "publish": false } ] }
Parent topic: Using the In-Memory Graph Server (PGX)
3.4.1 Specifying the Configuration File to the In-Memory Analyst
The in-memory analyst configuration file is parsed by the in-memory analyst at startup-time whenever ServerInstance#startEngine
(or any of its variants) is called. You can write the path to your configuration file to the in-memory analyst or specify it programmatically. This topic identifies several ways to specify the file
Programmatically
All configuration fields exist as Java enums. Example:
Map<PgxConfig.Field, Object> pgxCfg = new HashMap<>(); pgxCfg.put(PgxConfig.Field.MEMORY_CLEANUP_INTERVAL, 600); ServerInstance instance = ... instance.startEngine(pgxCfg);
All parameters not explicitly set will get default values.
Explicitly Using a File
Instead of a map, you can write the path to an in-memory analyst configuration JSON file. Example:
instance.startEngine("path/to/pgx.conf"); // file on local file system instance.startEngine("classpath:/path/to/pgx.conf"); // file on current classpath
For all other protocols, you can write directly in the input stream to a JSON file. Example:
InputStream is = ... instance.startEngine(is);
Implicitly Using a File
If startEngine()
is called without an argument, the in-memory analyst looks for a configuration file at the following places, stopping when it finds the file:
-
File path found in the Java system property
pgx_conf
. Example:java -Dpgx_conf=conf/my.pgx.config.json ...
-
A file named
pgx.conf
in the root directory of the current classpath -
A file named
pgx.conf
in the root directory relative to the currentSystem.getProperty("user.dir")
directory
Note: Providing a configuration is optional. A default value for each field will be used if the field cannot be found in the given configuration file, or if no configuration file is provided.
Using the Local Shell
To change how the shell configures the local in-memory analyst instance, edit $PGX_HOME/conf/pgx.conf
. Changes will be reflected the next time you invoke $PGX_HOME/bin/pgx
.
You can also change the location of the configuration file as in the following example:
./bin/opg --pgx_conf path/to/my/other/pgx.conf
Setting System Properties
Any parameter can be set using Java system properties by writing -Dpgx.<FIELD>=<VALUE>
arguments to the JVM that the in-memory analyst is running on. Note that setting system properties will overwrite any other configuration. The following example sets the maximum off-heap size to 256 GB, regardless of what any other configuration says:
java -Dpgx.max_off_heap_size=256000 ...
Setting Environment Variables
Any parameter can also be set using environment variables by adding 'PGX_' to the environment variable for the JVM in which the in-memory analyst is executed. Note that setting environment variables will overwrite any other configuration; but if a system property and an environment variable are set for the same parameter, the system property value is used. The following example sets the maximum off-heap size to 256 GB using an environment variable:
PGX_MAX_OFF_HEAP_SIZE=256000 java ...
Parent topic: Configuring the In-Memory Analyst
3.5 Storing a Graph Snapshot on Disk
After reading a graph into memory using either Java or the Shell, if you make some changes to the graph such as running the PageRank algorithm and storing the values as vertex properties, you can store this snapshot of the graph on disk.
This is helpful if you want to save the state of the graph in memory, such as if you must shut down the in-memory analyst server to migrate to a newer version, or if you must shut it down for some other reason.
(Storing graphs over HTTP/REST is currently not supported.)
A snapshot of a graph can be saved as a file in a binary format (called a PGB file) if you want to save the state of the graph in memory, such as if you must shut down the in-memory analyst server to migrate to a newer version, or if you must shut it down for some other reason.
In general, we recommend that you store the graph queries and analytics APIs that had been executed, and that after the in-memory analyst has been restarted, you reload and re-execute the APIs. But if you must save the state of the graph, you can use the logic in the following example to save the graph snapshot from the shell.
In a three-tier deployment, the file is written on the server-side file system. You must also ensure that the file location to write is specified in the in-memory analyst server. (As explained in Three-Tier Deployments of Oracle Graph with Autonomous Database, in a three-tier deployment, access to the PGX server file system requires a list of allowed locations to be specified.)
opg-jshell> var graph = session.createGraphBuilder().addVertex(1).addVertex(2).addVertex(3).addEdge(1,2).addEdge(2,3).addEdge(3, 1).build()
graph ==> PgxGraph[name=anonymous_graph_1,N=3,E=3,created=1581623669674]
opg-jshell> analyst.pagerank(graph)
$3 ==> VertexProperty[name=pagerank,type=double,graph=anonymous_graph_1]
// Now save the state of this graph
opg-jshell> g.store(Format.PGB, "/tmp/snapshot.pgb")
$4 ==> {"edge_props":[],"vertex_uris":["/tmp/snapshot.pgb"],"loading":{},"attributes":{},"edge_uris":[],"vertex_props":[{"name":"pagerank","dimension":0,"type":"double"}],"error_handling":{},"vertex_id_type":"integer","format":"pgb"}
// reload from disk
opg-jshell> var graphFromDisk = session.readGraphFile("/tmp/snapshot.pgb")
graphFromDisk ==> PgxGraph[name=snapshot,N=3,E=3,created=1581623739395]
// previously computed properties are still part of the graph and can be queried
opg-jshell> graphFromDisk.queryPgql("select x.pagerank match (x)").print().close()
The following example is essentially the same as the preceding one, but it uses partitioned graphs. Note that in the case of partitioned graphs, multiple PGB files are being generated, one for each vertex/edge partition in the graph.
-jshell> analyst.pagerank(graph)
$3 ==> VertexProperty[name=pagerank,type=double,graph=anonymous_graph_1]// store graph including all props to disk
// Now save the state of this graph
opg-jshell> var storedPgbConfig = g.store(ProviderFormat.PGB, "/tmp/snapshot")
$4 ==> {"edge_props":[],"vertex_uris":["/tmp/snapshot.pgb"],"loading":{},"attributes":{},"edge_uris":[],"vertex_props":[{"name":"pagerank","dimension":0,"type":"double"}],"error_handling":{},"vertex_id_type":"integer","format":"pgb"}
// Reload from disk
opg-jshell> var graphFromDisk = session.readGraphWithProperties(storedPgbConfig)
graphFromDisk ==> PgxGraph[name=snapshot,N=3,E=3,created=1581623739395]
// Previously computed properties are still part of the graph and can be queried
opg-jshell> graphFromDisk.queryPgql("select x.pagerank match (x)").print().close()
Parent topic: Using the In-Memory Graph Server (PGX)
3.6 Executing Built-in Algorithms
The in-memory analyst contains a set of built-in algorithms that are available as Java APIs.
This topic describes the use of the in-memory analyst using Triangle Counting and PageRank analytics as examples.
Parent topic: Using the In-Memory Graph Server (PGX)
3.6.1 About the In-Memory Analyst
The in-memory analyst contains a set of built-in algorithms that are available as Java APIs. The details of the APIs are documented in the Javadoc that is included in the product documentation library. Specifically, see the BuiltinAlgorithms
interface Method Summary for a list of the supported in-memory analyst methods.
For example, this is the PageRank procedure signature:
/** * Classic pagerank algorithm. Time complexity: O(E * K) with E = number of edges, K is a given constant (max * iterations) * * @param graph * graph * @param e * maximum error for terminating the iteration * @param d * damping factor * @param max * maximum number of iterations * @return Vertex Property holding the result as a double */ public <ID extends Comparable<ID>> VertexProperty<ID, Double> pagerank(PgxGraph graph, double e, double d, int max);
Parent topic: Executing Built-in Algorithms
3.6.2 Running the Triangle Counting Algorithm
For triangle counting, the sortByDegree
boolean parameter of countTriangles()
allows you to control whether the graph should first be sorted by degree (true
) or not (false
). If true
, more memory will be used, but the algorithm will run faster; however, if your graph is very large, you might want to turn this optimization off to avoid running out of memory.
Using the Shell to Run Triangle Counting
opg> analyst.countTriangles(graph, true) ==> 1
Using Java to Run Triangle Counting
import oracle.pgx.api.*; Analyst analyst = session.createAnalyst(); long triangles = analyst.countTriangles(graph, true);
The algorithm finds one triangle in the sample graph.
Tip:
When using the in-memory analyst shell, you can increase the amount of log output during execution by changing the logging level. See information about the :loglevel
command with :h :loglevel
.
Parent topic: Executing Built-in Algorithms
3.6.3 Running the PageRank Algorithm
PageRank computes a rank value between 0
and 1
for each vertex (node) in the graph and stores the values in a double
property. The algorithm therefore creates a vertex property of type double
for the output.
In the in-memory analyst, there are two types of vertex and edge properties:
-
Persistent Properties: Properties that are loaded with the graph from a data source are fixed, in-memory copies of the data on disk, and are therefore persistent. Persistent properties are read-only, immutable and shared between sessions.
-
Transient Properties: Values can only be written to transient properties, which are private to a session. You can create transient properties by calling
createVertexProperty
andcreateEdgeProperty
onPgxGraph
objects, or by copying existing properties usingclone()
on Property objects.Transient properties hold the results of computation by algorithms. For example, the PageRank algorithm computes a rank value between 0 and 1 for each vertex in the graph and stores these values in a transient property named
pg_rank
. Transient properties are destroyed when the Analyst object is destroyed.
This example obtains the top three vertices with the highest PageRank values. It uses a transient vertex property of type double
to hold the computed PageRank values. The PageRank algorithm uses the following default values for the input parameters: error (tolerance = 0.001), damping factor = 0.85, and maximum number of iterations = 100.
Using the Shell to Run PageRank
opg> rank = analyst.pagerank(graph, 0.001, 0.85, 100); ==> ... opg> rank.getTopKValues(3) ==> 128=0.1402019732468347 ==> 333=0.12002296283541904 ==> 99=0.09708583862990475
Using Java to Run PageRank
import java.util.Map.Entry; import oracle.pgx.api.*; Analyst analyst = session.createAnalyst(); VertexProperty<Integer, Double> rank = analyst.pagerank(graph, 0.001, 0.85, 100); for (Entry<Integer, Double> entry : rank.getTopKValues(3)) { System.out.println(entry.getKey() + "=" + entry.getValue()); }
Parent topic: Executing Built-in Algorithms
3.7 Using Custom PGX Graph Algorithms
A custom PGX graph algorithm allows you to write a graph algorithm in Java and have it automatically compiled to an efficient parallel implementation.
For more detailed information than appears in the following subtopics, see the PGX Algorithm specification at https://docs.oracle.com/cd/E56133_01/latest/PGX_Algorithm_Language_Specification.pdf.
- Writing a Custom PGX Algorithm
- Compiling and Running a PGX Algorithm
- Example Custom PGX Algorithm: PageRank
Parent topic: Using the In-Memory Graph Server (PGX)
3.7.1 Writing a Custom PGX Algorithm
A PGX algorithm is a regular .java file with a single class definition that is annotated with @graphAlgorithm
. For example:
import oracle.pgx.algorithm.annotations.GraphAlgorithm;
@GraphAlgorithm
public class MyAlgorithm {
...
}
A PGX algorithm class must contain exactly one public method which will be used as entry point. For example:
import oracle.pgx.algorithm.PgxGraph;
import oracle.pgx.algorithm.VertexProperty;
import oracle.pgx.algorithm.annotations.GraphAlgorithm;
import oracle.pgx.algorithm.annotations.Out;
@GraphAlgorithm
public class MyAlgorithm {
public int myAlgorithm(PgxGraph g, @Out VertexProperty<Integer> distance) {
System.out.println("My first PGX Algorithm program!");
return 42;
}
}
As with normal Java methods, a PGX algorithm method can return a value (an integer in this example). More interesting is the @Out
annotation, which marks the vertex property distance
as output parameter. The caller passes output parameters by reference. This way, the caller has a reference to the modified property after the algorithm terminates.
Parent topic: Using Custom PGX Graph Algorithms
3.7.1.1 Collections
To create a collection you call the .create() function. For example, a VertexProperty<Integer> is created as follows:
VertexProperty<Integer> distance = VertexProperty.create();
To get the value of a property at a certain vertex v
:
distance.get(v);
Similarly, to set the property of a certain vertex v
to a value e
:
distance.set(v, e);
You can even create properties of collections:
VertexProperty<VertexSequence> path = VertexProperty.create();
However, PGX Algorithm assignments are always by value (as opposed to by reference). To make this explicit, you must call .clone()
when assigning a collection:
VertexSequence sequence = path.get(v).clone();
Another consequence of values being passed by value is that you can check for equality using the ==
operator instead of the Java method .equals()
. For example:
PgxVertex v1 = G.getRandomVertex();
PgxVertex v2 = G.getRandomVertex();
System.out.println(v1 == v2);
Parent topic: Writing a Custom PGX Algorithm
3.7.1.2 Iteration
The most common operations in PGX algorithms are iterations (such as looping over all vertices, and looping over a vertex's neighbors) and graph traversal (such as breath-first/depth-first). All collections expose a forEach
and forSequential
method by which you can iterate over the collection in parallel and in sequence, respectively.
For example:
- To iterate over a graph's vertices in parallel:
G.getVertices().forEach(v -> { ... });
- To iterate over a graph's vertices in sequence:
G.getVertices().forSequential(v -> { ... });
- To traverse a graph's vertices from
r
in breadth-first order:import oracle.pgx.algorithm.Traversal; Traversal.inBFS(G, r).forward(n -> { ... });
Inside the
forward
(orbackward
) lambda you can access the current level of the BFS (or DFS) traversal bycalling currentLevel()
.
Parent topic: Writing a Custom PGX Algorithm
3.7.1.3 Reductions
Within these parallel blocks it is common to atomically update, or reduce to, a variable defined outside the lambda. These atomic reductions are available as methods on Scalar<T>: reduceAdd, reduceMul, reduceAnd,
and so on. For example, to count the number of vertices in a graph:
public int countVertices() {
Scalar<Integer> count = Scalar.create(0);
G.getVertices().forEach(n -> {
count.reduceAdd(1);
});
return count.get();
}
Sometimes you want to update multiple values atomically. For example, you might want to find the smallest property value as well as the vertex whose property value attains this smallest value. Due to the parallel execution, two separate reduction statements might get you in an inconsistent state.
To solve this problem the Reductions
class provides argMin
and argMax
functions. The first argument to argMin
is the current value and the second argument is the potential new minimum. Additionally, you can chain andUpdate
calls on the ArgMinMax
object to indicate other variables and the values that they should be updated to (atomically). For example:
VertexProperty<Integer> rank = VertexProperty.create();
int minRank = Integer.MAX_VALUE;
PgxVertex minVertex = PgxVertex.NONE;
G.getVertices().forEach(n ->
argMin(minRank, rank.get(n)).andUpdate(minVertex, n)
);
Parent topic: Writing a Custom PGX Algorithm
3.7.2 Compiling and Running a PGX Algorithm
To be able to compile and run a custom PGX algorithm, you must perform several actions:
- Set two configuration parameters in the
conf/pgx.conf
file:- Set the
graph_algorithm_language
option toJAVA
. - Set the
java_home_dir
option to the path to your Java home (use<system-java-home-dir>
to have PGX infer Java home from the system properties).
- Set the
- Create a session (either implicitly in the shell or explicitly in Java). For example:
cd $PGX_HOME ./bin/opg
- Compile a PGX Algorithm. For example:
myAlgorithm = session.compileProgram("/path/to/MyAlgorithm.java")
- Run the algorithm. For example:
graph = session.readGraphWithProperties("/path/to/config.edge.json") property = graph.createVertexProperty(PropertyType.INTEGER) myAlgorithm.run(graph, property)
Parent topic: Using Custom PGX Graph Algorithms
3.7.3 Example Custom PGX Algorithm: PageRank
The following is an implementation of pagerank
as a PGX algorithm:
import oracle.pgx.algorithm.PgxGraph;
import oracle.pgx.algorithm.Scalar;
import oracle.pgx.algorithm.VertexProperty;
import oracle.pgx.algorithm.annotations.GraphAlgorithm;
import oracle.pgx.algorithm.annotations.Out;
@GraphAlgorithm
public class Pagerank {
public void pagerank(PgxGraph G, double tol, double damp, int max_iter, boolean norm, @Out VertexProperty<Double> rank) {
Scalar<Double> diff = Scalar.create();
int cnt = 0;
double N = G.getNumVertices();
rank.setAll(1 / N);
do {
diff.set(0.0);
Scalar<Double> dangling_factor = Scalar.create(0d);
if (norm) {
dangling_factor.set(damp / N * G.getVertices().filter(v -> v.getOutDegree() == 0).sum(rank::get));
}
G.getVertices().forEach(t -> {
double in_sum = t.getInNeighbors().sum(w -> rank.get(w) / w.getOutDegree());
double val = (1 - damp) / N + damp * in_sum + dangling_factor.get();
diff.reduceAdd(Math.abs(val - rank.get(t)));
rank.setDeferred(t, val);
});
cnt++;
} while (diff.get() > tol && cnt < max_iter);
}
}
Parent topic: Using Custom PGX Graph Algorithms
3.8 Creating Subgraphs
You can create subgraphs based on a graph that has been loaded into memory. You can use filter expressions or create bipartite subgraphs based on a vertex (node) collection that specifies the left set of the bipartite graph.
For information about reading a graph into memory, see Reading Data from Oracle Database into Memory.
- About Filter Expressions
- Using a Simple Filter to Create a Subgraph
- Using a Complex Filter to Create a Subgraph
- Using a Vertex Set to Create a Bipartite Subgraph
Parent topic: Using the In-Memory Graph Server (PGX)
3.8.1 About Filter Expressions
Filter expressions are expressions that are evaluated for each edge. The expression can define predicates that an edge must fulfil to be contained in the result, in this case a subgraph.
Consider an example graph that consists of four vertices (nodes) and four edges. For an edge to match the filter expression src.prop == 10
, the source vertex prop
property must equal 10. Two edges match that filter expression, as shown in the following figure.
The following figure shows the graph that results when the filter is applied. The filter excludes the edges associated with vertex 333, and the vertex itself.
Figure 3-2 Graph Created by the Simple Filter
Description of "Figure 3-2 Graph Created by the Simple Filter"
Using filter expressions to select a single vertex or a set of vertices is difficult. For example, selecting only the vertex with the property value 10
is impossible, because the only way to match the vertex is to match an edge where 10
is either the source or destination property value. However, when you match an edge you automatically include the source vertex, destination vertex, and the edge itself in the result.
Parent topic: Creating Subgraphs
3.8.2 Using a Simple Filter to Create a Subgraph
The following examples create the subgraph described in About Filter Expressions.
Using the Shell to Create a Subgraph
subgraph = graph.filter(new VertexFilter("vertex.prop == 10"))
Using Java to Create a Subgraph
import oracle.pgx.api.*; import oracle.pgx.api.filter.*; PgxGraph graph = session.readGraphWithProperties(...); PgxGraph subgraph = graph.filter(new VertexFilter("vertex.prop == 10"));
Parent topic: Creating Subgraphs
3.8.3 Using a Complex Filter to Create a Subgraph
This example uses a slightly more complex filter. It uses the outDegree
function, which calculates the number of outgoing edges for an identifier (source src
or destination dst
). The following filter expression matches all edges with a cost
property value greater than 50 and a destination vertex (node) with an outDegree
greater than 1.
dst.outDegree() > 1 && edge.cost > 50
One edge in the sample graph matches this filter expression, as shown in the following figure.
Figure 3-3 Edges Matching the outDegree Filter
Description of "Figure 3-3 Edges Matching the outDegree Filter"
The following figure shows the graph that results when the filter is applied. The filter excludes the edges associated with vertixes 99 and 1908, and so excludes those vertices also.
Figure 3-4 Graph Created by the outDegree Filter
Description of "Figure 3-4 Graph Created by the outDegree Filter"
Parent topic: Creating Subgraphs
3.8.4 Using a Vertex Set to Create a Bipartite Subgraph
You can create a bipartite subgraph by specifying a set of vertices (nodes), which are used as the left side. A bipartite subgraph has edges only between the left set of vertices and the right set of vertices. There are no edges within those sets, such as between two nodes on the left side. In the in-memory analyst, vertices that are isolated because all incoming and outgoing edges were deleted are not part of the bipartite subgraph.
The following figure shows a bipartite subgraph. No properties are shown.
The following examples create a bipartite subgraph from the simple graph shown in About Filter Expressions. They create a vertex collection and fill it with the vertices for the left side.
Using the Shell to Create a Bipartite Subgraph
opg> s = graph.createVertexSet() ==> ... opg> s.addAll([graph.getVertex(333), graph.getVertex(99)]) ==> ... opg> s.size() ==> 2 opg> bGraph = graph.bipartiteSubGraphFromLeftSet(s) ==> PGX Bipartite Graph named sample-sub-graph-4
Using Java to Create a Bipartite Subgraph
import oracle.pgx.api.*; VertexSet<Integer> s = graph.createVertexSet(); s.addAll(graph.getVertex(333), graph.getVertex(99)); BipartiteGraph bGraph = graph.bipartiteSubGraphFromLeftSet(s);
When you create a subgraph, the in-memory analyst automatically creates a Boolean vertex (node) property that indicates whether the vertex is on the left side. You can specify a unique name for the property.
The resulting bipartite subgraph looks like this:
Vertex 1908 is excluded from the bipartite subgraph. The only edge that connected that vertex extended from 128 to 1908. The edge was removed, because it violated the bipartite properties of the subgraph. Vertex 1908 had no other edges, and so was removed also.
Parent topic: Creating Subgraphs
3.9 Using Automatic Delta Refresh to Handle Database Changes
You can automatically refresh (auto-refresh) graphs periodically to keep the in-memory graph synchronized with changes to the property graph stored in the property graph tables in Oracle Database (VT$ and GE$ tables).
Note that the auto-refresh feature is not supported when loading a graph into PGX in memory directly from relational tables.
- Configuring the In-Memory Server for Auto-Refresh
- Configuring Basic Auto-Refresh
- Reading the Graph Using the In-Memory Analyst or a Java Application
- Checking Out a Specific Snapshot of the Graph
- Advanced Auto-Refresh Configuration
Parent topic: Using the In-Memory Graph Server (PGX)
3.9.1 Configuring the In-Memory Server for Auto-Refresh
Because auto-refresh can create many snapshots and therefore may lead to a high memory usage, by default the option to enable auto-refresh for graphs is available only to administrators.
To allow all users to auto-refresh graphs, you must include the following line into the in-memory analyst configuration file (located in $ORACLE_HOME/md/property_graph/pgx/conf/pgx.conf
):
{
"allow_user_auto_refresh": true
}
Parent topic: Using Automatic Delta Refresh to Handle Database Changes
3.9.2 Configuring Basic Auto-Refresh
Auto-refresh is configured in the loading section of the graph configuration. The example in this topic sets up auto-refresh to check for updates every minute, and to create a new snapshot when the data source has changed.
The following block (JSON format) enables the auto-refresh feature in the configuration file of the sample graph:
{
"format": "pg",
"jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521/dbName",
"username": "scott",
"password": "<password>",
"name": "my_graph",
"vertex_props": [{
"name": "prop",
"type": "integer"
}],
"edge_props": [{
"name": "cost",
"type": "double"
}],
"separator": " ",
"loading": {
"auto_refresh": true,
"update_interval_sec": 60
},
}
Notice the additional loading
section containing the auto-refresh settings. You can also use the Java APIs to construct the same graph configuration programmatically:
GraphConfig config = GraphConfigBuilder.forPropertyGraphRdbms()
.setJdbcUrl("jdbc:oracle:thin:@mydatabaseserver:1521/dbName")
.setUsername("scott")
.setPassword("<password>")
.setName("my_graph")
.addVertexProperty("prop", PropertyType.INTEGER)
.addEdgeProperty("cost", PropertyType.DOUBLE)
.setAutoRefresh(true)
.setUpdateIntervalSec(60)
.build();
Parent topic: Using Automatic Delta Refresh to Handle Database Changes
3.9.3 Reading the Graph Using the In-Memory Analyst or a Java Application
After creating the graph configuration, you can load the graph into the in-memory analyst using the regular APIs.
opg> G = session.readGraphWithProperties("graphs/my-config.pg.json")
After the graph is loaded, a background task is started automatically, and it periodically checks the data source for updates.
Parent topic: Using Automatic Delta Refresh to Handle Database Changes
3.9.4 Checking Out a Specific Snapshot of the Graph
The database is queried every minute for updates. If the graph has changed in the database after the time interval passed, the graph is reloaded and a new snapshot is created in-memory automatically.
You can "check out" (move a pointer to a different version of) the available in-memory snapshots of the graph using the getAvailableSnapshots()
method of PgxSession
. Example output is as follows:
opg> session.getAvailableSnapshots(G)
==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
==> GraphMetaData [getNumVertices()=5, getNumEdges()=5, memoryMb=3, dataSourceVersion=1452083654000, creationRequestTimestamp=1453314938744 (2016-01-20 10:35:38.744), creationTimestamp=1453314938833 (2016-01-20 10:35:38.833), vertexIdType=integer, edgeIdType=long]
The preceding example output contains two entries, one for the originally loaded graph with 4 vertices and 4 edges, and one for the graph created by auto-refresh with 5 vertices and 5 edges.
To check out out a specific snapshot of the graph, use the setSnapshot() methods of PgxSession and give it the creationTimestamp of the snapshot you want to load.
For example, if G is pointing to the newer graph with 5 vertices and 5 edges, but you want to analyze the older version of the graph, you need to set the snapshot to 1453315122685. In the in-memory analyst shell:
opg> G.getNumVertices()
==> 5
opg> G.getNumEdges()
==> 5
opg> session.setSnapshot( G, 1453315122685 )
==> null
opg> G.getNumVertices()
==> 4
opg> G.getNumEdges()
==> 4
You can also load a specific snapshot of a graph directly using the readGraphAsOf()
method of PgxSession
. This is a shortcut for loading a graph with readGraphWithProperty()
followed by a setSnapshot()
. For example:
opg> G = session.readGraphAsOf( config, 1453315122685 )
If you do not know or care about what snapshots are currently available in-memory, you can also specify a time span of how “old” a snapshot is acceptable by specifying a maximum allowed age. For example, to specify a maximum snapshot age of 60 minutes, you can use the following:
opg> G = session.readGraphWithProperties( config, 60l, TimeUnit.MINUTES )
If there are one or more snapshots in memory younger (newer) than the specified maximum age, the youngest (newest) of those snapshots will be returned. If all the available snapshots are older than the specified maximum age, or if there is no snapshot available at all, then a new snapshot will be created automatically.
Parent topic: Using Automatic Delta Refresh to Handle Database Changes
3.9.5 Advanced Auto-Refresh Configuration
You can specify advanced options for auto-refresh configuration.
Internally, the in-memory analyst fetches the changes since the last check from the database and creates a new snapshot by applying the delta (changes) to the previous snapshot. There are two timers: one for fetching and caching the deltas from the database, the other for actually applying the deltas and creating a new snapshot.
Additionally, you can specify a threshold for the number of cached deltas. If the number of cached changes grows above this threshold, a new snapshot is created automatically. The number of cached changes is a simple sum of the number of vertex changes plus the number of edge changes.
The deltas are fetched periodically and cached on the in-memory analyst server for two reasons:
-
To speed up the actual snapshot creation process
-
To account for the case that the database can "forget" changes after a while
You can specify both a threshold and an update timer, which means that both conditions will be checked before new snapshot is created. At least one of these parameters (threshold or update timer) must be specified to prevent the delta cache from becoming too large. The interval at which the source is queried for changes must not be omitted.
The following parameters show a configuration where the data source is queried for new deltas every 5 minutes. New snapshots are created every 20 minutes or if the cached deltas reach a size of 1000 changes.
{
"format": "pg",
"jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521/dbName",
"username": "scott",
"password": "<your_password>",
"name": "my_graph",
"loading": {
"auto_refresh": true,
"fetch_interval_sec": 300,
"update_interval_sec": 1200,
"update_threshold": 1000,
"create_edge_id_index": true,
"create_edge_id_mapping": true
}
}
Parent topic: Using Automatic Delta Refresh to Handle Database Changes
3.10 Starting the In-Memory Analyst Server
A preconfigured version of Apache Tomcat is bundled, which allows you to start the in-memory analyst server by running a script.
If you need to configure the server before starting it, see Configuring the In-Memory Analyst Server.
You can start the server by running the following script: /opt/oracle/graph/pgx/bin/start-server
Note that running the start-server
script does not start the server as a daemon, and the terminal will not return until you stop the server (for example, by pressing Ctrl+C to interrupt the process). This also means that the server will stop running if you close the terminal in which you started the script.
PGX is integrated with systemd
to run it as a Linux service in the background. To start the PGX server as a daemon process, use the following command (you must have root privileges):
systemctl start pgx
To stop the server, use:
systemctl stop pgx
If the server does not start up, you can see if there are any errors by running:
journalctl -u pgx.service
For more information about how to interact with systemd
on Oracle Linux, see the Oracle Linux administrator's documentation.
3.10.1 Configuring the In-Memory Analyst Server
You can configure the in-memory analyst server by modifying the /etc/oracle/graph/server.conf
file. The following table shows the valid configuration options, which can be specified in JSON format.
Table 3-3 Configuration Options for In-Memory Analyst Server
Option | Type | Description | Default |
---|---|---|---|
authorization |
string |
File that maps clients to roles for authorization. |
server.auth.conf |
ca_certs |
array of string |
List of trusted certificates (PEM format). If 'enable_tls' is set to false, this option has no effect. |
[See information after this table.] |
enable_client_authentication |
boolean |
If true, the client is authenticated during TLS handshake. See the TLS protocol for details. This flag does not have any effect if 'enable_tls' is false. |
true |
enable_tls |
boolean |
If true, the server enables transport layer security (TLS). |
true |
port |
integer |
Port that the PGX server should listen on |
7007 |
server_cert |
string |
The path to the server certificate to be presented to TLS clients (PEM format). If 'enable_tls' is set to false, this option has no effect |
null |
server_private_key |
string |
the private key of the server (PKCS#8, PEM format). If 'enable_tls' is set to false, this option has no effect |
null |
The in-memory analyst web server enables two-way SSL/TLS (Transport Layer Security) by default. The server enforces TLS 1.2 and disables certain cipher suites known to be vulnerable to attacks. Upon a TLS handshake, both the server and the client present certificates to each other, which are used to validate the authenticity of the other party. Client certificates are also used to authorize client applications.
The following is an example server.conf
configuration file:
{ "port": 7007, "server_cert": "certificates/server_certificate.pem", "server_private_key": "certificates/server_key.pem", "ca_certs": [ "certificates/ca_certificate.pem" ], "authorization": "auth/server.auth.conf", "enable_tls": true, "enable_client_authentication": true }
The following is an example server.auth.conf
configuration file: mapping client (applications) identified by their certificate DN string to roles:
{ "authorization": [{ "dn": "CN=Client, OU=Development, O=Oracle, L=Belmont, ST=California, C=US", "admin": false }, { "dn": "CN=Admin, OU=Development, O=Oracle, L=Belmont, ST=California, C=US", "admin": true }] }
You can turn off client-side authentication or SSL/TLS authentication entirely in the server configuration. However, we recommend having two-way SSL/TLS enabled for any production usage.
Parent topic: Starting the In-Memory Analyst Server
3.11 Deploying to Apache Tomcat
The example in this topic shows how to deploy the graph server as a web application with Apache Tomcat.
The graph server will work with Apache Tomcat 9.0.x and higher.
- Download the Oracle Graph Webapps zip file from Oracle Software Delivery Cloud. This file contains ready-to-deploy Java web application archives (.war files). The file name will be similar to this:
oracle-graph-webapps-<version>.zip
- Unzip the file into a directory of your choice.
- Locate the .war file for Tomcat. It follows the naming pattern:
graph-server-<version>-pgx<version>-tomcat.war
- Configure the graph server.
- Modify authentication and other server settings by modifying the
WEB-INF/classes/pgx.conf
file inside the web application archive. - Optionally, change logging settings by modifying the
WEB-INF/classes/log4j2.xml
file inside the web application archive. - Optionally, change other servlet specific deployment descriptors by modifying the
WEB-INF/web.xml
file inside the web application archive.
- Modify authentication and other server settings by modifying the
- Copy the
.war
file into the Tomcatwebapps
directory. For example:cp graph-server-<version>-pgx<version>-tomcat.war $CATALINA_HOME/webapps/pgx.war
- Configure Tomcat specific settings, like the correct use of TLS/encryption
- Ensure that port 8080 is not already in use.
- Start Tomcat:
cd $CATALINA_HOME ./bin/startup.sh
The graph server will now listen on
localhost:8080/pgx
.You can connect to the server from JShell by running the following command:$ <client_install_dir>/bin/opg-jshell --base_url https://localhost:8080/pgx -u <graphuser>
Related Topics
Parent topic: Using the In-Memory Graph Server (PGX)
3.11.1 About the Authentication Mechanism
The in-memory analyst web deployment uses BASIC Auth
by default. You should change to a more secure authentication mechanism for a production deployment.
To change the authentication mechanism, modify the security-constraint
element of the web.xml
deployment descriptor in the web application archive (WAR) file.
Parent topic: Deploying to Apache Tomcat
3.12 Deploying to Oracle WebLogic Server
The example in this topic shows how to deploy the graph server as a web application with Oracle WebLogic Server.
This example shows how to deploy the graph server with Oracle WebLogic Server. Graph server supports WebLogic Server version 12.1.x and 12.2.x.
- Download the Oracle Graph Webapps zip file from Oracle Software Delivery Cloud. This file contains ready-to-deploy Java web application archives (.war files). The file name will be similar to this:
oracle-graph-webapps-<version>.zip
- Unzip the file into a directory of your choice
- Locate the .war file for Weblogic server.
- For Weblogic Server version 12.1.x, use this web application archive:
graph-server-<version>-pgx<version>-wls121x.war
- For Weblogic Server version 12.2.x, use this web application archive:
graph-server-<version>-pgx<version>-wls122x.war
- For Weblogic Server version 12.1.x, use this web application archive:
- Configure the graph server.
- Modify authentication and other server settings by modifying the
WEB-INF/classes/pgx.conf
file inside the web application archive. - Optionally, change logging settings by modifying the
WEB-INF/classes/log4j2.xml
file inside the web application archive. - Optionally, change other servlet specific deployment descriptors by modifying the
WEB-INF/web.xml
file inside the web application archive. - Optionally, change WebLogic Server-specific deployment descriptors by modifying the
WEB-INF/weblogic.xml
file inside the web application archive.
- Modify authentication and other server settings by modifying the
- Configure WebLogic specific settings, like the correct use of TLS/encryption.
- Deploy the
.war
file to WebLogic Server. The following example shows how to do this from the command line:. $MW_HOME/user_projects/domains/mydomain/bin/setDomainEnv.sh . $MW_HOME/wlserver/server/bin/setWLSEnv.sh java weblogic.Deployer -adminurl http://localhost:7001 -username <username> -password <password> -deploy -source <path-to-war-file>
3.12.1 Installing Oracle WebLogic Server
To download and install the latest version of Oracle WebLogic Server, see
http://www.oracle.com/technetwork/middleware/weblogic/documentation/index.html
Parent topic: Deploying to Oracle WebLogic Server
3.13 Connecting to the In-Memory Analyst Server
After the property graph in-memory analyst is installed in a computer running Oracle Database -- or on a client system without Oracle Database server software as a web application on Apache Tomcat or Oracle WebLogic Server -- you can connect to the in-memory analyst server.
Parent topic: Using the In-Memory Graph Server (PGX)
3.13.1 Connecting with the In-Memory Analyst Shell
The simplest way to connect to an in-memory analyst instance is to specify the base URL of the server. The following base URL can connect the SCOTT user to the local instance listening on port 8080:
http://scott:<password>@localhost:8080/pgx
To start the in-memory analyst shell with this base URL, you use the --base_url
command line argument
cd $PGX_HOME ./bin/opg-jshell --base_url http://scott:<password>@localhost:8080/pgx
You can connect to a remote instance the same way. However, the in-memory analyst currently does not provide remote support for the Control API.
3.13.1.1 About Logging HTTP Requests
The in-memory analyst shell suppresses all debugging messages by default. To see which HTTP requests are executed, set the log level for oracle.pgx
to DEBUG
, as shown in this example:
opg> /loglevel oracle.pgx DEBUG ===> log level of oracle.pgx logger set to DEBUG opg> session.readGraphWithProperties("sample_http.adj.json", "sample") 10:24:25,056 [main] DEBUG RemoteUtils - Requesting POST http://scott:<password>@localhost:8080/pgx/core/session/session-shell-6nqg5dd/graph HTTP/1.1 with payload {"graphName":"sample","graphConfig":{"uri":"http://path.to.some.server/pgx/sample.adj","separator":" ","edge_props":[{"type":"double","name":"cost"}],"node_props":[{"type":"integer","name":"prop"}],"format":"adj_list"}} 10:24:25,088 [main] DEBUG RemoteUtils - received HTTP status 201 10:24:25,089 [main] DEBUG RemoteUtils - {"futureId":"87d54bed-bdf9-4601-98b7-ef632ce31463"} 10:24:25,091 [pool-1-thread-3] DEBUG PgxRemoteFuture$1 - Requesting GET http://scott:<password>@localhost:8080/pgx/future/session/session-shell-6nqg5dd/result/87d54bed-bdf9-4601-98b7-ef632ce31463 HTTP/1.1 10:24:25,300 [pool-1-thread-3] DEBUG RemoteUtils - received HTTP status 200 10:24:25,301 [pool-1-thread-3] DEBUG RemoteUtils - {"stats":{"loadingTimeMillis":0,"estimatedMemoryMegabytes":0,"numEdges":4,"numNodes":4},"graphName":"sample","nodeProperties":{"prop":"integer"},"edgeProperties":{"cost":"double"}}
This example requires that the graph URI points to a file that the in-memory analyst server can access using HTTP or HDFS.
Parent topic: Connecting with the In-Memory Analyst Shell
3.13.2 Connecting with Java
You can specify the base URL when you initialize the in-memory analyst using Java. An example is as follows. A URL to an in-memory analyst server is provided to the getInMemAnalyst
API call.
import oracle.pg.rdbms.*; import oracle.pgx.api.*; PgRdbmsGraphConfigcfg = GraphConfigBuilder.forPropertyGraphRdbms().setJdbcUrl("jdbc:oracle:thin:@127.0.0.1:1521:orcl") .setUsername("scott").setPassword("<password>") .setName("mygraph") .setMaxNumConnections(2) .setLoadEdgeLabel(false) .addVertexProperty("name", PropertyType.STRING, "default_name") .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build();OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg); ServerInstance remoteInstance = Pgx.getInstance("http://scott:<password>@hostname:port/pgx"); PgxSession session = remoteInstance.createSession("my-session"); PgxGraph graph = session.readGraphWithProperties(opg.getConfig());
Parent topic: Connecting to the In-Memory Analyst Server
3.13.3 Connecting with the PGX REST API
You can connect to an in-memory analyst instance using the REST API PGX endpoints. This enables you to interact with the in-memory analyst in a language other than Java to implement your own client.
The examples in this topic assume that:
- Linux with curl is installed. curl is a simple command-line utility to interact with REST endpoints.)
- The PGX server is up and running on
http://localhost:7007
. - The PGX server has authentication/authorization disabled; that is,
$ORACLE_HOME/md/property_graph/pgx/conf/server.conf
contains"enable_tls": false
. (This is a non-default setting and not recommended for production). - PGX allows reading graphs from the local file system; that is,
$ORACLE_HOME/md/property_graph/pgx/conf/pgx.conf
contains"allow_local_filesystem": true
. (This is a non-default setting and not recommended for production).
For the Swagger specification, you can see a full list of supported endpoints in JSON by opening http://localhost:7007/swagger.json
in your browser.
- Step 1: Obtain a CSRF token
- Step 2: Create a session
- Step 3: Read a graph
- Step 4: Create a property
- Step 5: Run the PageRank algorithm on the loaded graph
- Step 6: Execute a PGQL query
Step 1: Obtain a CSRF token
Request a CSRF token:
curl -v http://localhost:7007/token
The response will look like this:
* Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 7007 (#0) > GET /token HTTP/1.1 > Host: localhost:7007 > User-Agent: curl/7.47.0 > Accept: */* > < HTTP/1.1 201 < SET-COOKIE: _csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;Version=1; HttpOnly < Content-Length: 0
As you can see in the response, this will set a cookie _csrf_token
to a token value. 9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0
is used as an example token for the following requests. For any write requests, PGX server requires the same token to be present in both cookie and payload.
Step 2: Create a session
To create a new session, send a JSON payload:
curl -v --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0' -H 'content-type: application/json' -X POST http://localhost:7007/core/v1/sessions -d '{"source":"my-application", "idleTimeout":0, "taskTimeout":0, "timeUnitName":"MILLISECONDS", "_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'
Replace my-application
with a value describing the application that you are running. This value can be used by server administrators to map sessions to their applications. Setting idle and task timeouts to 0
means the server will determine when the session and submitted tasks time out. You must provide the same CSRF token in both the cookie header and the JSON payload.
The response will look similar to the following:
* Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 7007 (#0) > POST /core/v1/sessions HTTP/1.1 > Host: localhost:7007 > User-Agent: curl/7.47.0 > Accept: */* > Cookie: _csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0 > content-type: application/json > Content-Length: 159 > * upload completely sent off: 159 out of 159 bytes < HTTP/1.1 201 < SET-COOKIE: SID=abae2811-6dd2-48b0-93a8-8436e078907d;Version=1; HttpOnly < Content-Length: 0
The response sets a cookie to the session ID value that was created for us. Session ID abae2811-6dd2-48b0-93a8-8436e078907d
is used as an example for subsequent requests.
Step 3: Read a graph
Note:
if you want to analyze a pre-loaded graph or a graph that is already published by another session, you can skip this step. All you need to access pre-loaded or published graphs is the name of the graph.
To read a graph, send the graph configuration as JSON to the server as shown in the following example (replace <graph-config>
with the JSON representation of an actual PGX graph config).
curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/loadGraph -H 'content-type: application/json' -d '{"graphConfig":<graph-config>,"graphName":null,"csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'
Here an example of a graph config that reads a property graph from the Oracle database:
{
"format": "pg",
"db_engine": "RDBMS",
"jdbc_url":"jdbc:oracle:thin:@127.0.0.1:1521:orcl122",
"username":"scott",
"password":"tiger",
"max_num_connections": 8,
"name": "connections",
"vertex_props": [
{"name":"name", "type":"string"},
{"name":"role", "type":"string"},
{"name":"occupation", "type":"string"},
{"name":"country", "type":"string"},
{"name":"political party", "type":"string"},
{"name":"religion", "type":"string"}
],
"edge_props": [
{"name":"weight", "type":"double", "default":"1"}
],
"edge_label": true,
"loading": {
"load_edge_label": true
}
}
Passing "graphName": null
tells the server to generate a name.
The server will reply something like the following:
* upload completely sent off: 315 out of 315 bytes < HTTP/1.1 202 < Location: http://localhost:7007/core/v1/futures/8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c/status < Content-Type: application/json;charset=utf-8 < Content-Length: 51 < Date: Mon, 05 Nov 2018 17:22:22 GMT < * Connection #0 to host localhost left intact {"futureId":"8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c"}
About Asynchronous Requests
Most of the PGX REST endpoints are asynchronous. Instead of keeping the connection open until the result is ready, PGX server submits as task and immediately returns a future ID with status code 200, which then can be used by the client to periodically request the status of the task or request the result value once done.
From the preceding response, you can request the future status like this:
curl -v --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/futures/8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c/status
Which will return something like:
< HTTP/1.1 200 < Content-Type: application/json;charset=utf-8 < Content-Length: 730 < Date: Mon, 05 Nov 2018 17:35:19 GMT < * Connection #0 to host localhost left intact {"id":"eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92","links":[{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/status","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92","rel":"abort","method":"DELETE","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/status","rel":"canonical","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/value","rel":"related","method":"GET","interaction":["async-polling"]}],"progress":"succeeded","completed":true,"intervalToPoll":1}
Besides the status (succeeded
in this case), this output also includes links to cancel the task (DELETE
) and to retrieve the result of the task once completed (GET <future-id>/value
):
curl -X GET --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/futures/cdc15a38-3422-42a1-baf4-343c140cf95d/value
Which will return details about the loaded graph, including the name that was generated by the server (sample
):
{"id":"sample","links":[{"href":"http://localhost:7007/core/v1/graphs/sample","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample","rel":"canonical","method":"GET","interaction":["async-polling"]}],"nodeProperties":{"prop1":{"id":"prop1","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/prop1","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/prop1","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"prop1","entityType":"vertex","type":"integer","transient":false}},"vertexLabels":null,"edgeLabel":null,"metaData":{"id":null,"links":null,"numVertices":4,"numEdges":4,"memoryMb":0,"dataSourceVersion":"1536029578000","config":{"format":"adj_list","separator":" ","edge_props":[{"type":"double","name":"cost"}],"error_handling":{},"vertex_props":[{"type":"integer","name":"prop1"}],"vertex_uris":["PATH_TO_FILE"],"vertex_id_type":"integer","loading":{}},"creationRequestTimestamp":1541242100335,"creationTimestamp":1541242100774,"vertexIdType":"integer","edgeIdType":"long","directed":true},"graphName":"sample","edgeProperties":{"cost":{"id":"cost","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/cost","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/cost","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"cost","entityType":"edge","type":"double","transient":false}},"ageMs":0,"transient":false}
For simplicity, the remaining steps omit the additional requests to request the status or value of asynchronous tasks.
Step 4: Create a property
Before you can run the PageRank algorithm on the loaded graph, you must create a vertex property of type DOUBLE on the graph, which can hold the computed ranking values:
curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/graphs/sample/properties -H 'content-type: application/json' -d '{"entityType":"vertex","type":"double","name":"pagerank", "hardName":false,"dimension":0,"_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'
Requesting the result of the returned future will return something like:
{"id":"pagerank","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/pagerank","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/pagerank","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"pagerank","entityType":"vertex","type":"double","transient":true}
Step 5: Run the PageRank algorithm on the loaded graph
The following example shows how to run an algorithm (PageRank in this case). The algorithm ID is part of the URL, and the parameters to be passed into the algorithm are part of the JSON payload:
curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/analyses/pgx_builtin_k1a_pagerank/run -H 'content-type: application/json' -d '{"args":[{"type":"GRAPH","value":"sample"},{"type":"DOUBLE_IN","value":0.001},{"type":"DOUBLE_IN","value":0.85},{"type":"INT_IN","value":100},{"type":"BOOL_IN","value":true},{"type":"NODE_PROPERTY","value":"pagerank"}],"expectedReturnType":"void","workloadCharacteristics":["PARALLELISM.PARALLEL"],"_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'
Once the future is completed, the result will look something like this:
{"success":true,"canceled":false,"exception":null,"returnValue":null,"executionTimeMs":50}
Step 6: Execute a PGQL query
To query the results of the PageRank algorithm, you can run a PGQL query as shown in the following example:
curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/pgql/run -H 'content-type: application/json' -d '{"pgqlQuery":"SELECT x.pagerank MATCH (x) WHERE x.pagerank > 0","semantic":"HOMOMORPHISM", "schemaStrictnessMode":true, "graphName" : "sample", "_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'
The result is a set of links you can use to interact with the result set of the query:
{"id":"pgql_1","links":[{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1","rel":"self","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlResultProxies/pgql_1/elements","rel":"related","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlResultProxies/pgql_1/results","rel":"related","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1","rel":"canonical","method":"GET","interaction":["async-polling"]}],"exists":true,"graphName":"sample","resultSetId":"pgql_1","numResults":4}
To request the first 2048 elements of the result set, send:
curl -X GET --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/pgqlProxies/pgql_1/results?size=2048
The response looks something like this:
{"id":"/pgx/core/v1/pgqlProxies/pgql_1/results","links":[{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1/results","rel":"self","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1/results","rel":"canonical","method":"GET","interaction":["async-polling"]}],"count":4,"totalItems":4,"items":[[0.3081206521195582],[0.21367103988538017],[0.21367103988538017],[0.2645372681096815]],"hasMore":false,"offset":0,"limit":4,"showTotalResults":true}
Parent topic: Connecting to the In-Memory Analyst Server
3.14 Managing Property Graph Snapshots
You can manage property graph snapshots.
Note:
Managing property graph snapshots is intended for advanced users.
You can persist different versions of a property graph as binary snapshots in the database. The binary snapshots represent a subgraph of graph data computed at runtime that may be needed for a future use. The snapshots can be read back later as input for the in-memory analytics, or as an output stream that can be used by the parallel property graph data loader.
You can store binary snapshots in the <graph_name>SS$ table of the property graph using the Java API OraclePropertyGraphUtils.storeBinaryInMemoryGraphSnapshot
. This operation requires a connection to the Oracle database holding the property graph instance, the name of the graph and its owner, the ID of the snapshot, and an input stream from which the binary snapshot can be read. You can also specify the time stamp of the snapshot and the degree of parallelism to be used when storing the snapshot in the table.
You can read a stored binary snapshot using oraclePropertyGraphUtils.readBinaryInMemGraphSnapshot
. This operation requires a connection to the Oracle database holding the property graph instance, the name of the graph and its owner, the ID of the snapshot to read, and an output stream where the binary file snapshot will be written into. You can also specify the degree of parallelism to be used when reading the snapshot binary-file from the table.
The following code snippet creates a property graph from the data file in Oracle Flat-file format, adds a new vertex, and exports the graph into an output stream using GraphML format. This output stream represents a binary file snapshot, and it is stored in the property graph snapshot table. Finally, this example reads back the file from the snapshot table and creates a second graph from its contents.
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName);
opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, szOPVFile, szOPEFile, 2 /* dop */, 1000, true,
"PDML=T,PDDL=T,NO_DUP=T,");
// Add a new vertex
Vertex v = opg.addVertex(Long.valueOf("1000"));
v.setProperty("name", "Alice");
opg.commit();
System.out.pritnln("Graph " + szGraphName + " total vertices: " +
opg.countVertices(dop));
System.out.pritnln("Graph " + szGraphName + " total edges: " +
opg.countEdges(dop));
// Get a snapshot of the current graph as a file in graphML format.
OutputStream os = new ByteArrayOutputStream();
OraclePropertyGraphUtils.exportGraphML(opg,
os /* output stream */,
System.out /* stream to show progress */);
// Save the snapshot into the SS$ table
InputStream is = new ByteArrayInputStream(os.toByteArray());
OraclePropertyGraphUtils.storeBinaryInMemGraphSnapshot(szGraphName,
szGraphOwner /* owner of the
property graph */,
conn /* database connection */,
is,
(long) 1 /* snapshot ID */,
1 /* dop */);
os.close();
is.close();
// Read the snapshot back from the SS$ table
OutputStream snapshotOS = new ByteArrayOutputStream();
OraclePropertyGraphUtils.readBinaryInMemGraphSnapshot(szGraphName,
szGraphOwner /* owner of the
property graph */,
conn /* database connection */,
new OutputStream[] {snapshotOS},
(long) 1 /* snapshot ID */,
1 /* dop */);
InputStream snapshotIS = new ByteArrayInputStream(snapshotOS.toByteArray());
String szGraphNameSnapshot = szGraphName + "_snap";
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args,szGraphNameSnapshot);
OraclePropertyGraphUtils.importGraphML(opg,
snapshotIS /* input stream */,
System.out /* stream to show progress */);
snapshotOS.close();
snapshotIS.close();
System.out.pritnln("Graph " + szGraphNameSnapshot + " total vertices: " +
opg.countVertices(dop));
System.out.pritnln("Graph " + szGraphNameSnapshot + " total edges: " +
opg.countEdges(dop));
The preceding example will produce output similar as the following:
Graph test total vertices: 79 Graph test total edges: 164 Graph test_snap total vertices: 79 Graph test_snap total edges: 164
Parent topic: Using the In-Memory Graph Server (PGX)
3.15 User-Defined Functions (UDFs) in PGX
User-defined functions (UDFs) allow users of PGX to add custom logic to their PGQL queries or custom graph algorithms, to complement built-in functions with custom requirements.
Caution:
UDFs enable the running arbitrary code in the PGX server, possibly accessing sensitive data. Additionally, any PGX session can invoke any of the UDFs that are enabled on the PGX server. The application administrator who enables UDFs is responsible for checking the following:
- All the UDF code can be trusted.
- The UDFs are stored in a secure location that cannot be tampered with.
How to Use UDFs
The following simple example shows how to register a UDF at the PGX server and invoke it.
- Create a class with a public static method. For example:
package my.udfs; public class MyUdfs { public static String concat(String a, String b) { return a + b; } }
- Compile the class and compress into a JAR file. For example:
mkdir ./target javac -d ./target *.java cd target jar cvf MyUdfs.jar *
- Copy the JAR file into
/opt/oracle/graph/pgx/server/lib
. - Create a UDF JSON configuration file. For example, assume that
/path/to/my/udfs/dir/my_udfs.json
contains the following:{ "user_defined_functions": [ { "namespace": "my", "language": "java", "implementation_reference": "my.package.MyUdfs", "function_name": "concat", "return_type": "string", "arguments": [ { "name": "a", "type": "string" }, { "name": "b", "type": "string" } ] } ] }
- Point to the directory containing the UDF configuration file in
/etc/oracle/graph/pgx.conf
. For example:"udf_config_directory": "/path/to/my/udfs/dir/"
- Restart the PGX server. For example:
sudo systemctl restart pgx
- Try to invoke the UDF from within a PGQL query. For example:
graph.queryPgql("SELECT my.concat(my.concat(n.firstName, ' '), n.lastName) FROM MATCH (n:Person)")
- Try to invoke the UDF from within a PGX algorithm. For example:
import oracle.pgx.algorithm.annotations.Udf; .... @GraphAlgorithm public class MyAlogrithm { public void bomAlgorithm(PgxGraph g, VertexProperty<String> firstName, VertexProperty<String> lastName, @Out VertexProperty<String> fullName) { ... fullName.set(v, concat(firstName.get(v), lastName.get(v))); ... } @Udf(namespace = "my") abstract String concat(String a, String b); }
UDF Configuration File Information
A UDF configuration file is a JSON file containing an array of user_defined_functions
. (An example of such a file is in the step to "Create a UDF JSON configuration file" in the preceding "How to Use UDFs" subsection.)
Each user-defined function supports the fields shown in the following table.
Table 3-4 Fields for Each UDF
Field | Data Type | Description | Required? |
---|---|---|---|
function_name | string | Name of the function used as identifier in PGX | Required |
language | enum[java, javascript] | Source language for he function (java or javascript )
|
Required |
return_type | enum[boolean, integer, long, float, double, string] | Return type of the function | Required |
arguments | array of object | Array of arguments. For each argument: type, argument name, required? | [] |
implementation_reference | string | Reference to the function name on the classpath | null |
namespace | string | Namespace of the function in PGX | null |
source_function_name | string | Name of the function in the source language | null |
source_location | string | Local file path to the function's source code | null |
All configured UDFs must be unique with regard to the combination of the following fields:
- namespace
- function_name
- arguments
Parent topic: Using the In-Memory Graph Server (PGX)