3 Using the In-Memory Graph Server (PGX)

The in-memory Graph server of Oracle Graph supports a set of analytical functions.

This chapter provides examples using the in-memory Graph Server (also referred to as Property Graph In-Memory Analytics, and often abbreviated as PGX in the Javadoc, command line, path descriptions, error messages, and examples). It contains the following major topics.

3.1 Overview of the In-Memory Graph Server (PGX)

The In-Memory Graph Server (PGX) is an in-memory graph server for fast, parallel graph query and analytics. The server uses light-weight in-memory data structures to enable fast execution of graph algorithms.

The In-Memory Graph Server (PGX) is an in-memory graph server for fast, parallel graph query and analytics. The server uses light-weight in-memory data structures to enable fast execution of graph algorithms.

There are multiple options to load a graph into the graph server either from Oracle Database or from files.

The graph server can be deployed standalone (it includes an embedded Apache Tomcat instance), or deployed in Oracle WebLogic Server or Apache Tomcat.

3.1.1 Connecting to the In-Memory Graph Server (PGX)

Multiple graph clients can connect to the in-memory graph server at the same time. The client request are processed by the graph server asynchronously. The client requests are queued up first and processed later, when resources are available. The client can poll the server to check if a request has been finished. Internally, the server maintains its own engine (thread pools) for running parallel graph algorithms. The engine tries to process each analytics request concurrently with as many threads as possible.

Each client has its own private workspace, called session. Sessions are isolated from each other. Each client can load a graph instance into its own session, independently from other clients.

Note:

  • If multiple clients load the same graph instance the graph server can share one graph instance between multiple clients under the hood.
  • Each client can add additional vertex or edge properties to a loaded graph in its own session. Such properties are transient properties, and are private to each session and not visible to another session. If a client creates mutated version of a graph, the graph server will create a private graph instance for that client.

3.2 User Authentication and Authorization

The Oracle Graph server (PGX) uses an Oracle Database as identity manager.

This means that you log into the graph server using existing Oracle Database credentials (user name and password), and the actions which you are allowed to do on the graph server are determined by the roles that have been granted to you in the Oracle database.

Basic Steps for Using an Oracle Database for Authentication

  1. Use an Oracle Database version that is supported by Oracle Graph Server and Client: version 12.2 or later, including Autonomous Database.
  2. Be sure that you have ADMIN access (or SYSDBA access for non-autonomous databases) to grant and revoke users access to the graph server (PGX).
  3. Be sure that all existing users to which you plan to grant access to the graph server have at least the CREATE SESSION privilege granted.
  4. Be sure that the database is accessible via JDBC from the host where the Graph Server runs.
  5. As ADMIN (or SYSDBA on non-autonomous databases), run the following procedure to create the roles required by the graph server:

    Note:

    If you install the PL/SQL packages of the Oracle Graph Server and Client distribution on the target Oracle Database, this step is not necessary. All the roles shown in the following code are created as part of the PL/SQL installation automatically. You cannot install the PL/SQL packages on Autonomous Database, so if you use the graph server with Autonomous Database, it is recommended to execute the following statements using SQL Developer Web.
    
    DECLARE
      PRAGMA AUTONOMOUS_TRANSACTION;
      role_exists EXCEPTION;
      PRAGMA EXCEPTION_INIT(role_exists, -01921);
      TYPE graph_roles_table IS TABLE OF VARCHAR2(50);
      graph_roles graph_roles_table;
    BEGIN
      graph_roles := graph_roles_table(
        'GRAPH_DEVELOPER',
        'GRAPH_ADMINISTRATOR',
        'PGX_SESSION_CREATE',
        'PGX_SERVER_GET_INFO',
        'PGX_SERVER_MANAGE',
        'PGX_SESSION_READ_MODEL',
        'PGX_SESSION_MODIFY_MODEL',
        'PGX_SESSION_NEW_GRAPH',
        'PGX_SESSION_GET_PUBLISHED_GRAPH',
        'PGX_SESSION_COMPILE_ALGORITHM',
        'PGX_SESSION_ADD_PUBLISHED_GRAPH');
      FOR elem IN 1 .. graph_roles.count LOOP
      BEGIN
        dbms_output.put_line('create_graph_roles: ' || elem || ': CREATE ROLE ' || graph_roles(elem));
        EXECUTE IMMEDIATE 'CREATE ROLE ' || graph_roles(elem);
      EXCEPTION
        WHEN role_exists THEN
          dbms_output.put_line('create_graph_roles: role already exists. continue');
        WHEN OTHERS THEN
          RAISE;
        END;
      END LOOP;
    EXCEPTION
      when others then
        dbms_output.put_line('create_graph_roles: hit error ');
        raise;
    END;
    /
  6. Assign default permissions to the roles GRAPH_DEVELOPER and GRAPH_ADMINISTRATOR to group multiple permissions together.

    Note:

    If you install the PL/SQL packages of the Oracle Graph Server and Client distribution on the target Oracle Database, this step is not necessary. All the grants shown in the following code are executed as part of the PL/SQL installation automatically. You cannot install the PL/SQL packages on Autonomous Database, so if you use the graph server with Autonomous Database, it is recommended to execute the following statements using SQL Developer Web.
    
    GRANT PGX_SESSION_CREATE TO GRAPH_ADMINISTRATOR;
    GRANT PGX_SERVER_GET_INFO TO GRAPH_ADMINISTRATOR;
    GRANT PGX_SERVER_MANAGE TO GRAPH_ADMINISTRATOR;
    GRANT PGX_SESSION_CREATE TO GRAPH_DEVELOPER;
    GRANT PGX_SESSION_NEW_GRAPH TO GRAPH_DEVELOPER;
    GRANT PGX_SESSION_GET_PUBLISHED_GRAPH TO GRAPH_DEVELOPER;
    GRANT PGX_SESSION_MODIFY_MODEL TO GRAPH_DEVELOPER;
    GRANT PGX_SESSION_READ_MODEL TO GRAPH_DEVELOPER;
  7. Assign roles to all the database developers who should have access the graph server (PGX). For example:
    GRANT graph_developer TO <graphuser>

    where <graphuser> is a user in the database. You can also assign individual permissions (roles prefixed with PGX_) to users directly.

  8. Assign the administrator role to users who should have administrative access. For example:
    GRANT graph_administrator to <administratoruser>

    where <administratoruser> is a user in the database.

3.2.1 Prepare the Graph Server for Database Authentication

Locate the pgx.conf file of your installation.

If you installed the graph server via RPM, the file is located at: /etc/oracle/graph/pgx.conf

If you use the webapps package to deploy into Tomcat or WebLogic Server, the pgx.conf file is located inside the web application archive file (WAR file) at: WEB-INF/classes/pgx.conf

Tip: On Linux, you can use vim to edit the file directly inside the WAR file without unzipping it first. For example: vim pgx-webapp-21.1.0.war

Inside the pgx.conf file, locate the jdbc_url line of the realm options:

...
"pgx_realm": {
  "implementation": "oracle.pg.identity.DatabaseRealm",
  "options": {
    "jdbc_url": "<REPLACE-WITH-DATABASE-URL-TO-USE-FOR-AUTHENTICATION>",
    "token_expiration_seconds": 3600,
...

Replace the text with the JDBC URL pointing to your database that you configured in the previous step. For example:

...
"pgx_realm": {
  "implementation": "oracle.pg.identity.DatabaseRealm",
  "options": {
    "jdbc_url": "jdbc:oracle:thin:@myhost:1521/myservice",
    "token_expiration_seconds": 3600,
...

If you are using an Autonomous Database, specify the JDBC URL like this:

...
"pgx_realm": {
  "implementation": "oracle.pg.identity.DatabaseRealm",
  "options": {
    "jdbc_url": "jdbc:oracle:thin:@my_identifier_low?TNS_ADMIN=/etc/oracle/graph/wallet",
    "token_expiration_seconds": 3600,
...

where /etc/oracle/graph/wallet is an example path to the unzipped wallet file that you downloaded from your Autonomous Database service console, and my_identifier_low is one of the connect identifiers specified in /etc/oracle/graph/wallet/tnsnames.ora.

Now, start the graph server. If you installed via RPM, execute the following command as a root user or with sudo:

sudo systemctl start pgx

3.2.2 Connect to the Server from JShell with Database Authentication

You can use the JShell client to connect to the server in remote mode, using database authentication.

To connect to the server in remote mode:
./bin/opg-jshell --base_url https://localhost:7007 --username <database_user>

You will be prompted for the database password.

If you are using a Java client program, you can connect to the server as shown in the following example:
import oracle.pg.rdbms.*
import oracle.pgx.api.*
 
...
 
ServerInstance instance = GraphServer.getInstance("https://localhost:7007", "<database user>", "<database password>");
PgxSession session = instance.createSession("my-session");

...

Internally, users are authenticated with the graph server using JSON Web Tokens (JWT). See Token Expiration for more details about token expiration.

3.2.3 Read Data from the Database

Once logged in, you can now read data from the database into the graph server without specifying any connection information in the graph configuration.

Your database user must exist and have read access on the graph data in the database.

For example, the following graph configuration will read a property graph named hr into memory, authenticating as <database user>/<database password> with the database:

GraphConfig config = GraphConfigBuilder.forPropertyGraphRdbms()
    .setName("hr")
    .addVertexProperty("FIRST_NAME", PropertyType.STRING)
    .addVertexProperty("LAST_NAME", PropertyType.STRING)
    .addVertexProperty("EMAIL", PropertyType.STRING)
    .addVertexProperty("CITY", PropertyType.STRING)
    .setPartitionWhileLoading(PartitionWhileLoading.BY_LABEL)
    .setLoadVertexLabels(true)
    .setLoadEdgeLabel(true)
    .build();
PgxGraph hr = session.readGraphWithProperties(config);

The following example is a graph configuration in JSON format that reads from relational tables into the graph server, without any connection information being provided in the configuration file itself:

{
    "name":"hr",
    "vertex_id_strategy":"no_ids",
    "vertex_providers":[
        {
            "name":"Employees",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "key_type":"string",
            "props":[
                {
                    "name":"FIRST_NAME",
                    "type":"string"
                },
                {
                    "name":"LAST_NAME",
                    "type":"string"
                }
            ]
        },
        {
            "name":"Departments",
            "format":"rdbms",
            "database_table_name":"DEPARTMENTS",
            "key_column":"DEPARTMENT_ID",
            "key_type":"string",
            "props":[
                {
                    "name":"DEPARTMENT_NAME",
                    "type":"string"
                }
            ]
        }
    ],
    "edge_providers":[
        {
            "name":"WorksFor",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "source_column":"EMPLOYEE_ID",
            "destination_column":"EMPLOYEE_ID",
            "source_vertex_provider":"Employees",
            "destination_vertex_provider":"Employees"
        },
        {
            "name":"WorksAs",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "source_column":"EMPLOYEE_ID",
            "destination_column":"JOB_ID",
            "source_vertex_provider":"Employees",
            "destination_vertex_provider":"Jobs"
        }
    ]
}

For more information about how to read data from the database into the graph server, see Store the Database Password in a Keystore.

3.2.4 Store the Database Password in a Keystore

PGX requires a database account to read data from the database into memory. The account should be a low-privilege account (see Security Best Practices with Graph Data).

As described in Read Data from the Database, you can read data from the database into the graph server without specifying additional authentication as long as the token is valid for that database user. But if you want to access a graph from a different user, you can do so, as long as that user's password is stored in a Java Keystore file for protection.

You can use the keytool command that is bundled together with the JDK to generate such a keystore file on the command line. See the following script as an example:

# Add a password for the 'database1' connection
keytool -importpass -alias database1 -keystore keystore.p12
# 1. Enter the password for the keystore
# 2. Enter the password for the database
 
# Add another password (for the 'database2' connection)
keytool -importpass -alias database2 -keystore keystore.p12
 
# List what's in the keystore using the keytool
keytool -list -keystore keystore.p12

If you are using Java version 8 or lower, you should pass the additional parameter -storetype pkcs12 to the keytool commands in the preceding example.

You can store more than one password into a single keystore file. Each password can be referenced using the alias name provided.

Either, Write the PGX graph configuration file to load from the property graph schema

Next write a PGX graph configuration file in JSON format. The file tells PGX where to load the data from, how the data looks like and the keystore alias to use. The following example shows a graph configuration to read data stored in the Oracle property graph format.

{
  "format": "pg",
  "db_engine": "rdbms",
  "name": "hr",
  "jdbc_url": "jdbc:oracle:thin:@myhost:1521/orcl",
  "username": "hr",
  "keystore_alias": "database1",
  "vertex_props": [{
    "name": "COUNTRY_NAME",
    "type": "string"
  }, {
    "name": "DEPARTMENT_NAME",
    "type": "string"
  }, {
    "name": "SALARY",
    "type": "double"
  }],
  "partition_while_loading": "by_label",
  "loading": {
    "load_vertex_labels": true,
    "load_edge_label": true
  }
}

(For the full list of available configuration fields, including their meanings and default values, see https://docs.oracle.com/cd/E56133_01/latest/reference/loader/database/pg-format.html.)

Or, Write the PGX graph configuration file to load a graph directly from relational tables

The following example loads a subset of the HR sample data from relational tables directly into PGX as a graph. The configuration file specifies a mapping from relational to graph format by using the concept of vertex and edge providers.

Note:

Specifying the vertex_providers and edge_providers properties loads the data into an optimized representation of the graph.

{
    "name":"hr",
    "jdbc_url":"jdbc:oracle:thin:@myhost:1521/orcl",
    "username":"hr",
    "keystore_alias":"database1",
    "vertex_id_strategy": "no_ids",
    "vertex_providers":[
        {
            "name":"Employees",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "key_type": "string",
            "props":[
                {
                    "name":"FIRST_NAME",
                    "type":"string"
                },
                {
                    "name":"LAST_NAME",
                    "type":"string"
                },
                {
                    "name":"EMAIL",
                    "type":"string"
                },
                {
                    "name":"SALARY",
                    "type":"long"
                }
            ]
        },
        {
            "name":"Jobs",
            "format":"rdbms",
            "database_table_name":"JOBS",
            "key_column":"JOB_ID",
            "key_type": "string",
            "props":[
                {
                    "name":"JOB_TITLE",
                    "type":"string"
                }
            ]
        },
        {
            "name":"Departments",
            "format":"rdbms",
            "database_table_name":"DEPARTMENTS",
            "key_column":"DEPARTMENT_ID",
            "key_type": "string",
            "props":[
                {
                    "name":"DEPARTMENT_NAME",
                    "type":"string"
                }
            ]
        }
    ],
    "edge_providers":[
        {
            "name":"WorksFor",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "source_column":"EMPLOYEE_ID",
            "destination_column":"EMPLOYEE_ID",
            "source_vertex_provider":"Employees",
            "destination_vertex_provider":"Employees"
        },
        {
            "name":"WorksAs",
            "format":"rdbms",
            "database_table_name":"EMPLOYEES",
            "key_column":"EMPLOYEE_ID",
            "source_column":"EMPLOYEE_ID",
            "destination_column":"JOB_ID",
            "source_vertex_provider":"Employees",
            "destination_vertex_provider":"Jobs"
        },
        {
            "name":"WorkedAt",
            "format":"rdbms",
            "database_table_name":"JOB_HISTORY",
            "key_column":"EMPLOYEE_ID",
            "source_column":"EMPLOYEE_ID",
            "destination_column":"DEPARTMENT_ID",
            "source_vertex_provider":"Employees",
            "destination_vertex_provider":"Departments",
            "props":[
                {
                    "name":"START_DATE",
                    "type":"local_date"
                },
                {
                    "name":"END_DATE",
                    "type":"local_date"
                }
            ]
        }
    ]
}

Note about vertex and edge IDs:

PGX enforces by default the existence of a unique identifier for each vertex and edge in a graph, so that they can be retrieved by using PgxGraph.getVertex(ID id) and PgxGraph.getEdge(ID id) or by PGQL queries using the built-in id() method.

The default strategy to generate the vertex IDs is to use the keys provided during loading of the graph. In that case, each vertex should have a vertex key that is unique across all the types of vertices. For edges, by default no keys are required in the edge data, and edge IDs will be automatically generated by PGX. Note that the generation of edge IDs is not guaranteed to be deterministic. If required, it is also possible to load edge keys as IDs.

However, because it may cumbersome for partitioned graphs to define such identifiers, it is possible to disable that requirement for the vertices and/or edges by setting the vertex_id_strategy and edge_id_strategy graph configuration fields to the value no_ids as shown in the preceding example. When disabling vertex (resp. edge) IDs, the implication is that PGX will forbid the call to APIs using vertex (resp. edge) IDs, including the ones indicated previously.

If you want to call those APIs but do not have globally unique IDs in your relational tables, you can specify the unstable_generated_ids generation strategy, which generates new IDs for you. As the name suggests, there is no correlation to the original IDs in your input tables and there is no guarantee that those IDs are stable. Same as with the edge IDs, it is possible that loading the same input tables twice yields two different generated IDs for the same vertex.

Read the data

Now you can instruct PGX to connect to the database and read the data by passing in both the keystore and the configuration file to PGX, using one of the following approaches:

  • Interactively in the graph shell

    If you are using the graph shell, start it with the --secret_store option. It will prompt you for the keystore password and then attach the keystore to your current session. For example:

    cd /opt/oracle/graph
    ./bin/opg-jshell --secret_store /etc/my-secrets/keystore.p12
     
     enter password for keystore /etc/my-secrets/keystore.p12:
    

    Inside the shell, you can then use normal PGX APIs to read the graph into memory by passing the JSON file you just wrote into the readGraphWithProperties API:

    opg-jshell-rdbms> var graph = session.readGraphWithProperties("config.json")
    graph ==> PgxGraph[name=hr,N=215,E=415,created=1576882388130]
    
  • As a PGX preloaded graph
    As a server administrator, you can instruct PGX to load graphs into memory upon server startup. To do so, modify the PGX configuration file at /etc/oracle/graph/pgx.conf and add the path the graph configuration file to the preload_graphs section. For example:
    {
      ...
      "preload_graphs": [{
        "name": "hr", 
        "path": "/path/to/config.json"
      }],
      ...
    }
    As root user, edit the service file at /etc/systemd/system/pgx.service and change the ExecStart command to specify the location of the keystore containing the password:
    
    ExecStart=/bin/bash start-server --secret-store /etc/keystore.p12

    Note:

    Please note that /etc/keystore.p12 must not be password protected for this to work. Instead protect the file via file system permission that is only readable by oraclegraph user.
    After the file is edited, reload the changes using:
    sudo systemctl daemon-reload
    Finally start the server:
    sudo systemctl start pgx
  • In a Java application

    To register a keystore in a Java application, use the registerKeystore() API on the PgxSession object. For example:

    import oracle.pgx.api.*;
     
    class Main {
       
      public static void main(String[] args) throws Exception {
        String baseUrl = args[0];
        String keystorePath = "/etc/my-secrets/keystore.p12";
        char[] keystorePassword = args[1].toCharArray();
        String graphConfigPath = args[2];
        ServerInstance instance = Pgx.getInstance(baseUrl);
        try (PgxSession session = instance.createSession("my-session")) {
          session.registerKeystore(keystorePath, keystorePassword);
          PgxGraph graph = session.readGraphWithProperties(graphConfigPath);
          System.out.println("N = " + graph.getNumVertices() + " E = " + graph.getNumEdges());
        }
      }
    }
    
    You can compile and run the preceding sample program using the Oracle Graph Client package. For example:
    cd $GRAPH_CLIENT
    // create Main.java with above contents
    javac -cp 'lib/*' Main.java
    java -cp '.:conf:lib/*' Main http://myhost:7007 MyKeystorePassword path/to/config.json
    

Secure coding tips for graph client applications

When writing graph client applications, make sure to never store any passwords or other secrets in clear text in any files or in any of your code.

Do not accept passwords or other secrets through command line arguments either. Instead, use Console.html#readPassword() from the JDK.

3.2.5 Token Expiration

By default, tokens are valid for 1 hour.

Internally, the graph client automatically renews tokens which are about to expire in less than 30 minutes. This is also configurable by re-authenticating your credentials with the database. By default, tokens can only be automatically renewed for up to 24 times, then you need to login again.

If the maximum amount of auto-renewals is reached, you can log in again without losing any of your session data by using the GraphServer#reauthenticate (instance, "<user>", "<password>") API. For example:


opg> var graph = session.readGraphWithProperties(config) // fails because token cannot be renewed anymore
opg> GraphServer.reauthenticate(instance, "<user>", "<password>") // log in again
opg> var graph = session.readGraphWithProperties(config)          // works now 
 

3.2.6 Advanced Access Configuration

You can customize the following fields in pgx.conf realm options to customize login behavior.

Table 3-1 Advanced Access Configuration Options

Field Name Explanation Default
token_expiration_seconds After how many seconds the generated bearer token will expire. 3600 (1 hour)
connect_timeout_milliseconds After how many milliseconds an connection attempt to the specified JDBC URL will time out, resulting in the login attempt being rejected. 10000
max_pool_size Maximum number of JDBC connections allowed per user. If the number is reached, attempts to read from the database will fail for the current user. 64
max_num_users Maximum number of active, signed in users to allow. If this number is reached, the graph server will reject login attempts. 512
max_num_token_refresh Maximum amount of times a token can be automatically refreshed before requiring a login again. 24

To configure the refresh time on the client side before token expiration, use the following API to login:

int refreshTimeBeforeTokenExpiry = 900; // in seconds, default is 1800 (30 minutes)
ServerInstance instance = GraphServer.getInstance("https://localhost:7007", "<database user>", "<database password>",
    refreshTimeBeforeTokenExpiry);

Note:

The preceding options work only if the realm implementation is configured to be oracle.pg.identity.DatabaseRealm.
3.2.6.1 Customizing Roles and Permissions

You can fully customize the permissions to roles mapping by adding and removing roles and specifying permissions for a role. You can also authorize individual users instead of roles.

This topic includes examples of how to customize the permission mapping.

3.2.6.1.1 Adding and Removing Roles

You can add new role permission mappings or remove existing mappings by modifying the authorization list.

For example:


CREATE ROLE MY_CUSTOM_ROLE_1
GRANT PGX_SESSION_CREATE TO MY_CUSTOM_ROLE1 
GRANT PGX_SERVER_GET_INFO TO MY_CUSTOM_ROLE1 
GRANT MY_CUSTOM_ROLE1 TO SCOTT
3.2.6.1.2 Defining Permissions for Individual Users

In addition to defining permissions for roles, you can define permissions for individual users.

For example:

GRANT PGX_SESSION_CREATE TO SCOTT 
GRANT PGX_SERVER_GET_INFO TO SCOTT 

3.2.7 Revoking Access to the Graph Server

To revoke a user's ability to access the graph server, either drop the user from the database or revoke the corresponding roles from the user, depending on how you defined the access rules in your pgx.conf file.

For example:

REVOKE graph_developer FROM scott

Revoking Graph Permissions

If you have the MANAGE permission on a graph, you can revoke graph access from users or roles using the PgxGraph#revokePermission API. For example:

PgxGraph g = ...
g.revokePermission(new PgxRole("GRAPH_DEVELOPER")) // revokes previously granted role access
g.revokePermission(new PgxUser("SCOTT")) // revokes previously granted user access

3.2.8 Examples of Custom Authorization Rules

You can define custom authorization rules for developers.

Example 3-1 Allowing Developers to Use Custom Graph Algorithms

To allow developers to compile custom graph algorithms (see Using Custom PGX Graph Algorithms), add the following static permission to the list of permissions:

GRANT PGX_SESSION_COMPILE_ALGORITHM TO GRAPH_DEVELOPER

Example 3-2 Allowing Developers to Publish Graphs

Allowing graph server users to publish graphs or share graphs with other users which originate from the Oracle Database breaks the database authorization model. If you work with graphs in the database, use GRANT statements in the database instead. See the OPG_APIS.GRANT_ACCESS API for examples how to do this for PG graphs. When reading from relational tables, use normal GRANT READ statements on tables.

To allow developers to publish graphs, add the following static permission to the list of permissions:

GRANT PGX_SESSION_ADD_PUBLISHED_GRAPH TO GRAPH_DEVELOPER

Publishing graphs alone does not give others access to the graph. You must also specify the type of access. There are three levels of permissions for graphs:

  1. READ: allows to read the graph data via the PGX API or in PGQL queries, run Analyst or custom algorithms on a graph and create a subgraph or clone the given graph
  2. EXPORT: export the graph via the PgxGraph#store() APIs. Includes READ permission. Please note that in addition to the EXPORT permission, users also need WRITE permission on a file system in order to export the graph.
  3. MANAGE: publish the graph or snapshot, grant or revoke permissions on the graph. Includes the EXPORT permission.

The creator of the graph automatically gets the MANAGE permission granted on the graph. If you have the MANAGE permission, you can grant other roles or users READ or EXPORT permission on the graph. You cannot grant MANAGE on a graph. The following example of a user named userA shows how:

import oracle.pgx.api.*
import oracle.pgx.common.auth.*
 
 
PgxSession session = GraphServer.getInstance("<base-url>", "<userA>", "<password-of-userA").createSession("userA")
PgxGraph g = session.readGraphWithProperties("examples/sample-graph.json", "sample-graph")
g.grantPermission(new PgxRole("GRAPH_DEVELOPER"), PgxResourcePermission.READ)
g.publish()
Now other users with the GRAPH_DEVELOPER role can access this graph and have READ access on it, as shown in the following example of userB:
PgxSession session = GraphServer.getInstance("<base-url>", "<userB>", "<password-of-userB").createSession("userB")
PgxGraph g = session.getGraph("sample-graph")
g.queryPgql("select count(*) from match (v)").print().close()

Similarly, graphs can be shared with individual users instead of roles, as shown in the following example:

g.grantPermission(new PgxUser("OTHER_USER"), PgxResourcePermission.EXPORT)

where OTHER_USER is the user name of the user that will receive the EXPORT permission on graph g.

Example 3-3 Allowing Developers to Access Preloaded Graphs

To allow developers to access preloaded graphs (graphs loaded during graph server startup), grant the read permission on the preloaded graph in the pgx.conf file. For example:

"preload_graphs": [{
  "path": "/data/my-graph.json",
  "name": "global_graph"
}],
"authorization": [{
  "pgx_role": "GRAPH_DEVELOPER",
  "pgx_permissions": [{
    "preloaded_graph": "global_graph"
    "grant": "read"
  },
...

You can grant READ, EXPORT, or MANAGE permission.

Example 3-4 Allowing Developers Access to the Hadoop Distributed Filesystem (HDFS) or the Local File System

To allow developers to read files from HDFS, you must first declare the HDFS directory and then map it to a read or write permission. For example:


CREATE OR REPLACE DIRECTORY pgx_file_location AS 'hdfs:/data/graphs'
GRANT READ ON DIRECTORY pgx_file_location TO GRAPH_DEVELOPER

Similarly, you can add another permission with GRANT WRITE to allow write access. Such a write access is required in order to export graphs.

Access to the local file system (where the graph server runs) can be granted the same way. The only difference is that location would be an absolute file path without the hdfs: prefix. For example:

CREATE OR REPLACE DIRECTORY pgx_file_location AS '/opt/oracle/graph/data'

Note that in addition to the preceding configuration, the operating system user that runs the graph server process must have the corresponding directory privileges to actually read or write into those directories.

3.3 Keeping the Graph in Oracle Database Synchronized with the Graph Server

You can use the FlashbackSynchronizer API to automatically apply changes made to graph in the database to the corresponding PgxGraph object in memory, thus keeping both synchronized.

This API uses Oracle's Flashback Technology to fetch the changes in the database since the last fetch and then push those changes into the graph server using the ChangeSet API. After the changes are applied, the usual snapshot semantics of the graph server apply: each delta fetch application creates a new in-memory snapshot. Any queries or algorithms that are executing concurrently to snapshot creation are unaffected by the changes until the corresponding session refreshes its PgxGraph object to the latest state by calling the session.setSnapshot(graph, PgxSession.LATEST_SNAPSHOT) procedure.

For detailed information about Oracle Flashback technology, see the Database Development Guide.

Prerequisites for Synchronizing

The Oracle database must have Flashback enabled and the database user that you use to perform synchronization must have:

  • Read access to all tables which need to kept synchornized.
  • Permission to use flashback APIs. For example:
    grant execute on dbms_flashback to <user>

The database must also be configured to retain changes for the amount of time needed by your use case.

Limitations for Synchronizing

The synchronizer API currently has the following limitations

  • Only partitioned graph configurations with all providers being database tables are supported.
  • Both the vertex and edge ID strategy must be set as follows:
    "vertex_id_strategy": "keys_as_ids",
    "edge_id_strategy": "keys_as_ids"
    
  • Each edge/vertex provider must be configured to create a key mapping. In each provider block of the graph configuration, add the following loading section:
    "loading": {
      "create_key_mapping": true
    }
    

    This implies that vertices and edges must have globally unique ID columns.

  • Each edge/vertex provider must specify the owner of the table by setting the username field. For example, if user SCOTT owns the table, then set the username accordingly in the provider block of that table:
    "username": "scott"
  • In the root loading block, the snapshot source must be set to change_set:
    "loading": {
      "snapshots_source": "change_set"
    }
    

For a detailed example, including some options, see the following topic.

3.3.1 Example of Synchronizing

As an example of performing synchronization, assume you have the following Oracle Database tables, PERSONS and FRIENDSHIPS.

CREATE TABLE PERSONS (
  PERSON_ID NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
  NAME VARCHAR2(200),
  BIRTHDATE DATE,
  HEIGHT FLOAT DEFAULT ON NULL 0,
  INT_PROP INT DEFAULT ON NULL 0,
  CONSTRAINT person_pk PRIMARY KEY (PERSON_ID)
);
 
CREATE TABLE FRIENDSHIPS ( 
  FRIENDSHIP_ID NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
  PERSON_A NUMBER,
  PERSON_B NUMBER,
  MEETING_DATE DATE,
  TS_PROP TIMESTAMP,
  CONSTRAINT fk_PERSON_A_ID FOREIGN KEY (PERSON_A) REFERENCES persons(PERSON_ID),
  CONSTRAINT fk_PERSON_B_ID FOREIGN KEY (PERSON_B) REFERENCES persons(PERSON_ID)
);

You add some sample data into these tables:

INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('John', 1.80, to_date('13/06/1963', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Mary', 1.65, to_date('25/09/1982', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Bob', 1.75, to_date('11/03/1966', 'DD/MM/YYYY'));
INSERT INTO PERSONS (NAME, HEIGHT, BIRTHDATE) VALUES ('Alice', 1.70, to_date('01/02/1987', 'DD/MM/YYYY'));
 
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (1, 3, to_date('01/09/1972', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (2, 4, to_date('19/09/1992', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (4, 2, to_date('19/09/1992', 'DD/MM/YYYY'));
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (3, 2, to_date('10/07/2001', 'DD/MM/YYYY'));

Synchronizing Using Connection Information in the Graph Configuration

You then want to synchronize using connection information in the graph configuration. You have the following sample graph configuration (KeystoreGraphConfigExample), which reads those tables as a graph:

{
  "name": "PeopleFriendships",
  "optimized_for": "updates",
  "edge_id_strategy": "keys_as_ids",
  "edge_id_type": "long",
  "vertex_id_type": "long",
  "jdbc_url": "<jdbc_url>",
  "username": "<username>",
  "keystore_alias": "<keystore_alias>",
  "vertex_providers": [
    {
      "format": "rdbms",
      "username": "<username>",
      "key_type": "long",
      "name": "person",
      "database_table_name": "PERSONS",
      "key_column": "PERSON_ID",
      "props": [
        ...
      ],
      "loading": {
        "create_key_mapping": true
      }
    }
  ],
  "edge_providers": [
    {
      "format": "rdbms",
      "username": "<username>",
      "name": "friendOf",
      "source_vertex_provider": "person",
      "destination_vertex_provider": "person",
      "database_table_name": "FRIENDSHIPS",
      "source_column": "PERSON_A",
      "destination_column": "PERSON_B",
      "key_column": "FRIENDSHIP_ID",
      "key_type":"long",
      "props": [
        ...
      ],
      "loading": {
        "create_key_mapping": true
      }
    }
  ],
  "loading": {
    "snapshots_source": "change_set"
  }
}

(In the preceding example, replace the values <jdbc_url>, <username>, and <keystore_alias> with the values for connecting to your database.)

Open the Oracle Property Graph JShell (be sure to register the keystore containing the database password when starting it), and load the graph into memory:

var pgxGraph = session.readGraphWithProperties("persons_graph.json");

The following output line shows that the example graph has four vertices and four edges:

pgxGraph ==> PgxGraph[name=PeopleFriendships,N=4,E=4,created=1594754376861]

Now, back in the database, insert a few new rows:

INSERT INTO PERSONS (NAME, BIRTHDATE, HEIGHT) VALUES ('Mariana',to_date('21/08/1996','DD/MM/YYYY'),1.65);
INSERT INTO PERSONS (NAME, BIRTHDATE, HEIGHT) VALUES ('Francisco',to_date('13/06/1963','DD/MM/YYYY'),1.75);
INSERT INTO FRIENDSHIPS (PERSON_A, PERSON_B, MEETING_DATE) VALUES (1, 6, to_date('13/06/2013','DD/MM/YYYY'));
COMMIT;

Back in JShell, you can now use the FlashbackSynchronizer API to automatically fetch and apply those changes:

var synchronizer = new Synchronizer.Builder<FlashbackSynchronizer>().setType(FlashbackSynchronizer.class).setGraph(pgxGraph).build()
pgxGraph = synchronizer.sync()

As you can see from the output, the new vertices and edges have been applied:

pgxGraph ==> PgxGraph[name=PeopleFriendships,N=6,E=5,created=1594754376861]

Note that pgxGraph = synchronizer.sync() is equivalent to calling the following:

synchronizer.sync()
session.setSnapshot(pgxGraph, PgxSession.LATEST_SNAPSHOT)

Splitting the Fetching and Applying of Changes

The synchronizer.sync() invocation fetches the changes and applies them in one call. However, you can encode a more complex update logic by splitting this process into separate fetch() and apply() invocations. For example:

synchronizer.fetch() // fetches changes from the database
if (synchronizer.getGraphDelta().getTotalNumberOfChanges() > 100) {
  // only create snapshot if there have been more than 100 changes
  synchronizer.apply() 
}

Synchronizing Using an Explicit Oracle Connection

The synchronizer API fetches the changes on the client side. That means the client needs to connect to the database. In the preceding example, it did that by reading the connection information available in the graph configuration of the loaded PgxGraph object. However, there can be situations in which connection information cannot be obtained from the PgxGraph object, such as when:

  • The associated graph configuration does not contain any database connection information, and the graph was loaded using credentials of a logged in user; or
  • The associated graph configuration contains a datasource ID corresponding to a connection stored on the server side.

In these cases, you can pass in an Oracle connection object building the synchronizer object to be used to fetch changes. For example (ExampleGraphConfig.json):

String jdbcUrl = "<JDBC URL>";
String username = "<USERNAME>";
String password = "<PASSWORD>";
Connection connection = DriverManager.getConnection(jdbcUrl, username, password)
Synchronizer synchronizer = new Synchronizer.Builder<FlashbackSynchronizer>()
    .setType(FlashbackSynchronizer.class)
    .setGraph(pgxGraph)
    .setConnection(connection)
    .build();

3.4 Configuring the In-Memory Analyst

You can configure the in-memory analyst engine and its run-time behavior by assigning a single JSON file to the in-memory analyst at startup.

This file can include the parameters shown in the following table. Some examples follow the table.

To specify the configuration file, see Specifying the Configuration File to the In-Memory Analyst.

Note:

  • Relative paths in parameter values are always resolved relative to the parent directory of the configuration file in which they are specified. For example, if the configuration file is /pgx/conf/pgx.conf, then the file path graph-configs/my-graph.bin.json inside that file would be resolved to /pgx/conf/graph-configs/my-graph.bin.json.

  • The parameter default values are optimized to deliver the best performance across a wide set of algorithms. Depending on your workload. you may be able to improve performance further by experimenting with different strategies, sizes, and thresholds.

Table 3-2 Configuration Parameters for the In-Memory Analyst

Parameter Type Description Default

admin_request_cache_timeout

integer

After how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: This is only relevant if PGX is deployed as a webapp.

60

allow_idle_timeout_overwrite

boolean

If true, sessions can overwrite the default idle timeout.

true

allow_local_filesystem

boolean

Allow loading from the local file system in client/server mode. Default is false. If set to true, additionally specify the property datasource_dir_whitelist to list the directories. The server can only read from the directories that are listed here.

false

allow_override_scheduling_information

boolean

If true, allow all users to override scheduling information like task weight, task priority, and number of threads

true

allowed_remote_loading_locations array of string

Allow loading of graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3, hdfs). Default is empty. Value supported is “*” (asterisk), meaning that all remote locations will be allowed. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. WARNING: Specifying * (asterisk) should be done only if you want to explicitly allow users of the PGX remote interface to access files on the local file system.

[]

allow_task_timeout_overwrite

boolean

If true, sessions can overwrite the default task timeout.

true

allow_user_auto_refresh

boolean

If true, users may enable auto refresh for graphs they load. If false, only graphs mentioned in preload_graphs can have auto refresh enabled.

false

allowed_remote_loading_locations array of string (This option may reduce security; use it only if you know what you are doing!) Allow loading graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3, hdfs). If empty, as by default, no remote location is allowed. If "*" is specified in the array, all remote locations are allowed. Only the value "*" is currently supported. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. []
basic_scheduler_config object Configuration parameters for the fork join pool backend. null

bfs_iterate_que_task_size

integer

Task size for BFS iterate QUE phase.

128

bfs_threshold_parent_read_based

number

Threshold of BFS traversal level items to switch to parent-read-based visiting strategy.

0.05

bfs_threshold_read_based

integer

Threshold of BFS traversal level items to switch to read-based visiting strategy.

1024

bfs_threshold_single_threaded

integer

Until what number of BFS traversal level items vertices are visited single-threaded.

128

character_set

string

Standard character set to use throughout PGX. UTF-8 is the default. Note: Some formats may not be compatible.

utf-8

cni_diff_factor_default

integer

Default diff factor value used in the common neighbor iterator implementations.

8

cni_small_default

integer

Default value used in the common neighbor iterator implementations, to indicate below which threshold a subarray is considered small.

128

cni_stop_recursion_default

integer

Default value used in the common neighbor iterator implementations, to indicate the minimum size where the binary search approach is applied.

96

datasource_dir_whitelist array of string If allow_local_filesystem is set, the list of directories from which it is allowed to read files. []

dfs_threshold_large

integer

Value that determines at which number of visited vertices the DFS implementation will switch to data structures that are optimized for larger numbers of vertices.

4096

enable_csrf_token_checks

boolean

If true, the PGX webapp will verify the Cross-Site Request Forgery (CSRF) token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks.

true

enable_gm_compiler

boolean

[relevant when profiling with solaris studio] When enabled, label experiments using the 'er_label' command.

false

enable_shutdown_cleanup_hook

boolean

If true, PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not explicitly shutting down PGX may result in pollution of your temp directory.

true

enterprise_scheduler_config

object

Configuration parameters for the enterprise scheduler.

null

enterprise_scheduler_flags

object

[relevant for enterprise_scheduler]  Enterprise scheduler-specific settings.

null

explicit_spin_locks

boolean

true means spin explicitly in a loop until lock becomes available. false means using JDK locks which rely on the JVM to decide whether to context switch or spin. Setting this value to true usually results in better performance.

true

graph_algorithm_language enum[GM_LEGACY, GM, JAVA] Front-end compiler to use. gm
graph_validation_lever enum[low, high] Level of validation performed on newly loaded or created graphs. low
ignore_incompatible_backend_operations boolean If true, only log when encountering incompatible operations and configuration values in RTS or FJ pool. If false, throw exceptions. false
in_place_update_consistency enum[ALLLOW_INCONSISTENCIES, CANCEL_TASKS] Consistency model used when in-place updates occur. Only relevant if in-place updates are enabled. Currently updates are only applied in place if the updates are not structural (Only modifies properties). Two models are currently implemented: one only delays new tasks when an update occurs, the other also delays running tasks. allow_inconsistencies
init_pgql_on_startup boolean If true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL. true
interval_to_poll_max integer Exponential backoff upper bound (in ms) to which -once reached, the job status polling interval is fixed 1000
java_home_dir string The path to Java's home directory. If set to <system-java-home-dir>, use the java.home system property. null
large_array_threshold integer Threshold when the size of an array is too big to use a normal Java array. This depends on the used JVM. (Defaults to Integer.MAX_VALUE - 3) 2147483644

max_active_sessions

integer

Maximum number of sessions allowed to be active at a time.

1024

max_distinct_strings_per_pool integer [only relevant if string_pooling_strategy is indexed] Number of distinct strings per property after which to stop pooling. If the limit is reached, an exception is thrown. 65536

max_off_heap_size

integer

Maximum amount of off-heap memory (in megabytes) that PGX is allowed to allocate before an OutOfMemoryError will be thrown. Note: this limit is not guaranteed to never be exceeded, because of rounding and synchronization trade-offs. It only serves as threshold when PGX starts to reject new memory allocation requests.

<available-physical-memory>

max_queue_size_per_session

integer

The maximum number of pending tasks allowed to be in the queue, per session. If a session reaches the maximum, new incoming requests of that sesssion get rejected. A negative value means no limit.

-1

max_snapshot_count

integer

Number of snapshots that may be loaded in the engine at the same time. New snapshots can be created via auto or forced update. If the number of snapshots of a graph reaches this threshold, no more auto-updates will be performed, and a forced update will result in an exception until one or more snapshots are removed from memory. A value of zero indicates to support an unlimited amount of snapshots.

0

memory_allocator enum[basic_allocator, enterprise_allocator] The memory allocator to use. basic_allocator

memory_cleanup_interval

integer

Memory cleanup interval in seconds.

600

ms_bfs_frontier_type_strategy

enum[auto_grow, short, int]

The type strategy to use for MS-BFS frontiers.

auto_grow

num_spin_locks

integer

Number of spin locks each generated app will create at instantiation. Trade-off: a small number implies less memory consumption; a large number implies faster execution (if algorithm uses spin locks).

1024

parallelism integer Number of worker threads to be used in thread pool. Note: If the caller thread is part of another thread-pool, this value is ignored and the parallelism of the parent pool is used. <number-of-cpus>
pattern_matching_supernode_cache_threshold integer Minimum number of a node's neighbor to be a supernode. This is for the pattern matching engine. 1000
pooling_factor number [only relevant if string_pooling_strategy is on_heap] This value prevents the string pool to grow as big as the property size, which could render the pooling ineffective. 0.25

preload_graphs

array of object

List of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published.

[]

random_generator_strategy enum[non_deterministic, deterministic] Method of generating random numbers in PGX. non_deterministic

random_seed

long

[relevant for deterministic random number generator only] Seed for the deterministic random number generator used in the in-memory analyst. The default is -24466691093057031.

-24466691093057031

release_memory_threshold

double

Threshold percentage (decimal fraction) of used memory after which the engine starts freeing unused graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs never get freed, and the engine will throw OutOfMemoryErrors as soon as a graph is needed which does not fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. When consumption exceeds 70% and another graph needs to get loaded, unused graphs get freed until memory consumption is below 70% again.

0.85

revisit_threshold integer Maximum number of matched results from a node to be reached. 4096
scheduler enum[basic_scheduler, enterprise_scheduler] The scheduler to use. basic_scheduler uses a scheduler with basic features. enterprise_scheduler uses a scheduler with advanced enterprise features for running multiple tasks concurrently and providing better performance. advanced_scheduler

session_idle_timeout_secs

integer

Timeout of idling sessions in seconds. Zero (0) means no timeout

0

session_task_timeout_secs

integer

Timeout in seconds to interrupt long-running tasks submitted by sessions (algorithms, I/O tasks). Zero (0) means no timeout.

0

small_task_length

integer

Task length if the total amount of work is smaller than default task length (only relevant for task-stealing strategies).

128

spark_streams_interface

string

The name of an interface will be used for spark data communication.

null

strict_mode

boolean

If true, exceptions are thrown and logged with ERROR level whenever the engine encounters configuration problems, such as invalid keys, mismatches, and other potential errors. If false, the engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses and uses sensible defaults instead of throwing exceptions.

true

string_pooling_strategy enum[indexed, on_heap, none] [only relevant if use_string_pool is enabled] The string pooling strategy to use. on_heap

task_length

integer

Default task length (only relevant for task-stealing strategies). Should be between 100 and 10000. Trade-off: a small number implies more fine-grained tasks are generated, higher stealing throughput; a large number implies less memory consumption and GC activity.

4096

tmp_dir

string

Temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, uses the standard tmp directory of the underlying system (/tmp on Linux).

<system-tmp-dir>

udf_config_directory

string

Directory path containing UDF files.

null

use_memory_mapper_for_reading_pgb boolean If true, use memory mapped files for reading graphs in PGB format if possible; if false, always use a stream-based implementation. true
use_memory_mapper_for_storing_pgb boolean If true, use memory mapped files for storing graphs in PGB format if possible; if false, always use a stream-based implementation. true

Enterprise Scheduler Parameters

The following parameters are relevant only if the advanced scheduler is used. (They are ignored if the basic scheduler is used.)

  • analysis_task_config

    Configuration for analysis tasks. Type: object. Default: prioritymediummax_threads<no-of-CPUs>weight<no-of-CPUs>

  • fast_analysis_task_config

    Configuration for fast analysis tasks. Type: object. Default: priorityhighmax_threads<no-of-CPUs>weight1

  • maxnum_concurrent_io_tasks

    Maximum number of concurrent tasks. Type: integer. Default: 3

  • num_io_threads_per_task

    Configuration for fast analysis tasks. Type: object. Default: <no-of-cpus>

Basic Scheduler Parameters

The following parameters are relevant only if the basic scheduler is used. (They are ignored if the advanced scheduler is used.)

  • num_workers_analysis

    Number of worker threads to use for analysis tasks. Type: integer. Default: <no-of-CPUs>

  • num_workers_fast_track_analysis

    Number of worker threads to use for fast-track analysis tasks. Type: integer. Default: 1

  • num_workers_io

    Number of worker threads to use for I/O tasks (load/refresh/write from/to disk). This value will not affect file-based loaders, because they are always single-threaded. Database loaders will open a new connection for each I/O worker. Default: <no-of-CPUs>

Example 3-5 Minimal In-Memory Analyst Configuration

The following example causes the in-memory analyst to initialize its analysis thread pool with 32 workers. (Default values are used for all other parameters.)

{
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    }
  }
}

Example 3-6 Two Pre-loaded Graphs

sets more fields and specifies two fixed graphs for loading into memory during PGX startup.

{ 
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    },
    "fast_analysis_task_config": {
      "max_threads": 32
    }
  }, 
  "memory_cleanup_interval": 600,
  "max_active_sessions": 1, 
  "release_memory_threshold": 0.2, 
  "preload_graphs": [
    {
      "path": "graph-configs/my-graph.bin.json",
      "name": "my-graph"
    },
    {
      "path": "graph-configs/my-other-graph.adj.json",
      "name": "my-other-graph",
      "publish": false
    }
  ]
}

3.4.1 Specifying the Configuration File to the In-Memory Analyst

The in-memory analyst configuration file is parsed by the in-memory analyst at startup-time whenever ServerInstance#startEngine (or any of its variants) is called. You can write the path to your configuration file to the in-memory analyst or specify it programmatically. This topic identifies several ways to specify the file

Programmatically

All configuration fields exist as Java enums. Example:

Map<PgxConfig.Field, Object> pgxCfg = new HashMap<>();
pgxCfg.put(PgxConfig.Field.MEMORY_CLEANUP_INTERVAL, 600);

ServerInstance instance = ...
instance.startEngine(pgxCfg);

All parameters not explicitly set will get default values.

Explicitly Using a File

Instead of a map, you can write the path to an in-memory analyst configuration JSON file. Example:

instance.startEngine("path/to/pgx.conf"); // file on local file system
instance.startEngine("classpath:/path/to/pgx.conf"); // file on current classpath

For all other protocols, you can write directly in the input stream to a JSON file. Example:

InputStream is = ...
instance.startEngine(is);

Implicitly Using a File

If startEngine() is called without an argument, the in-memory analyst looks for a configuration file at the following places, stopping when it finds the file:

  • File path found in the Java system property pgx_conf. Example: java -Dpgx_conf=conf/my.pgx.config.json ...

  • A file named pgx.conf in the root directory of the current classpath

  • A file named pgx.conf in the root directory relative to the current System.getProperty("user.dir") directory

Note: Providing a configuration is optional. A default value for each field will be used if the field cannot be found in the given configuration file, or if no configuration file is provided.

Using the Local Shell

To change how the shell configures the local in-memory analyst instance, edit $PGX_HOME/conf/pgx.conf. Changes will be reflected the next time you invoke $PGX_HOME/bin/pgx.

You can also change the location of the configuration file as in the following example:

./bin/opg --pgx_conf path/to/my/other/pgx.conf

Setting System Properties

Any parameter can be set using Java system properties by writing -Dpgx.<FIELD>=<VALUE> arguments to the JVM that the in-memory analyst is running on. Note that setting system properties will overwrite any other configuration. The following example sets the maximum off-heap size to 256 GB, regardless of what any other configuration says:

java -Dpgx.max_off_heap_size=256000 ...

Setting Environment Variables

Any parameter can also be set using environment variables by adding 'PGX_' to the environment variable for the JVM in which the in-memory analyst is executed. Note that setting environment variables will overwrite any other configuration; but if a system property and an environment variable are set for the same parameter, the system property value is used. The following example sets the maximum off-heap size to 256 GB using an environment variable:

PGX_MAX_OFF_HEAP_SIZE=256000 java ...

3.5 Storing a Graph Snapshot on Disk

After reading a graph into memory using either Java or the Shell, if you make some changes to the graph such as running the PageRank algorithm and storing the values as vertex properties, you can store this snapshot of the graph on disk.

This is helpful if you want to save the state of the graph in memory, such as if you must shut down the in-memory analyst server to migrate to a newer version, or if you must shut it down for some other reason.

(Storing graphs over HTTP/REST is currently not supported.)

A snapshot of a graph can be saved as a file in a binary format (called a PGB file) if you want to save the state of the graph in memory, such as if you must shut down the in-memory analyst server to migrate to a newer version, or if you must shut it down for some other reason.

In general, we recommend that you store the graph queries and analytics APIs that had been executed, and that after the in-memory analyst has been restarted, you reload and re-execute the APIs. But if you must save the state of the graph, you can use the logic in the following example to save the graph snapshot from the shell.

In a three-tier deployment, the file is written on the server-side file system. You must also ensure that the file location to write is specified in the in-memory analyst server. (As explained in Three-Tier Deployments of Oracle Graph with Autonomous Database, in a three-tier deployment, access to the PGX server file system requires a list of allowed locations to be specified.)

opg-jshell> var graph = session.createGraphBuilder().addVertex(1).addVertex(2).addVertex(3).addEdge(1,2).addEdge(2,3).addEdge(3, 1).build()
graph ==> PgxGraph[name=anonymous_graph_1,N=3,E=3,created=1581623669674]

opg-jshell> analyst.pagerank(graph)
$3 ==> VertexProperty[name=pagerank,type=double,graph=anonymous_graph_1]

// Now save the state of this graph

opg-jshell> g.store(Format.PGB, "/tmp/snapshot.pgb")
$4 ==> {"edge_props":[],"vertex_uris":["/tmp/snapshot.pgb"],"loading":{},"attributes":{},"edge_uris":[],"vertex_props":[{"name":"pagerank","dimension":0,"type":"double"}],"error_handling":{},"vertex_id_type":"integer","format":"pgb"}

// reload from disk 
opg-jshell> var graphFromDisk = session.readGraphFile("/tmp/snapshot.pgb")
graphFromDisk ==> PgxGraph[name=snapshot,N=3,E=3,created=1581623739395]

// previously computed properties are still part of the graph and can be queried
opg-jshell> graphFromDisk.queryPgql("select x.pagerank match (x)").print().close()

The following example is essentially the same as the preceding one, but it uses partitioned graphs. Note that in the case of partitioned graphs, multiple PGB files are being generated, one for each vertex/edge partition in the graph.

-jshell> analyst.pagerank(graph)
$3 ==> VertexProperty[name=pagerank,type=double,graph=anonymous_graph_1]// store graph including all props to disk
// Now save the state of this graph
opg-jshell> var storedPgbConfig = g.store(ProviderFormat.PGB, "/tmp/snapshot")
$4 ==> {"edge_props":[],"vertex_uris":["/tmp/snapshot.pgb"],"loading":{},"attributes":{},"edge_uris":[],"vertex_props":[{"name":"pagerank","dimension":0,"type":"double"}],"error_handling":{},"vertex_id_type":"integer","format":"pgb"}
// Reload from disk 
opg-jshell> var graphFromDisk = session.readGraphWithProperties(storedPgbConfig)
graphFromDisk ==> PgxGraph[name=snapshot,N=3,E=3,created=1581623739395]
// Previously computed properties are still part of the graph and can be queried
opg-jshell> graphFromDisk.queryPgql("select x.pagerank match (x)").print().close()

3.6 Executing Built-in Algorithms

The in-memory graph server (PGX) contains a set of built-in algorithms that are available as Java APIs.

The following table provides an overview of the available algorithms, grouped by category.

Note:

These algorithms can be invoked through the Analyst interface. See the Analyst Class in Javadoc for more details.

Table 3-3 Overview of Built-In Algorithms

Category Algorithms
Classic graph algorithms Prim's Algorithm
Community detection Conductance Minimization (Soman and Narang Algorithm), Infomap, Label Propagation, Louvain
Connected components Strongly Connected Components, Weakly Connected Components (WCC)
Link predition WTF (Whom To Follow) Algorithm
Matrix factorization Matrix Factorization
Other Graph Traversal Algorithms
Path finding All Vertices and Edges on Filtered Path, Bellman-Ford Algorithms, Bidirectional Dijkstra Algorithms, Compute Distance Index, Compute High-Degree Vertices, Dijkstra Algorithms, Enumerate Simple Paths, Fast Path Finding, Fattest Path, Filtered Fast Path Finding, Hop Distance Algorithms
Ranking and walking Closeness Centrality Algorithms, Degree Centrality Algorithms, Eigenvector Centrality, Hyperlink-Induced Topic Search (HITS), PageRank Algorithms, Random Walk with Restart, Stochastic Approach for Link-Structure Analysis (SALSA) Algorithms, Vertex Betweenness Centrality Algorithms
Structure evaluation Adamic-Adar index, Bipartite Check, Conductance, Cycle Detection Algorithms, Degree Distribution Algorithms, Eccentricity Algorithms, K-Core, Local Clustering Coefficient (LCC), Modularity, Partition Conductance, Reachability Algorithms, Topological Ordering Algorithms, Triangle Counting Algorithms

This following topics describe the use of the in-memory graph server (PGX) using Triangle Counting and PageRank analytics as examples.

3.6.1 About the In-Memory Analyst

The in-memory analyst contains a set of built-in algorithms that are available as Java APIs. The details of the APIs are documented in the Javadoc that is included in the product documentation library. Specifically, see the BuiltinAlgorithms interface Method Summary for a list of the supported in-memory analyst methods.

For example, this is the PageRank procedure signature:

/**
   * Classic pagerank algorithm. Time complexity: O(E * K) with E = number of edges, K is a given constant (max
   * iterations)
   *
   * @param graph
   *          graph
   * @param e
   *          maximum error for terminating the iteration
   * @param d
   *          damping factor
   * @param max
   *          maximum number of iterations
   * @return Vertex Property holding the result as a double
   */
  public <ID extends Comparable<ID>> VertexProperty<ID, Double> pagerank(PgxGraph graph, double e, double d, int max);

3.6.2 Running the Triangle Counting Algorithm

For triangle counting, the sortByDegree boolean parameter of countTriangles() allows you to control whether the graph should first be sorted by degree (true) or not (false). If true, more memory will be used, but the algorithm will run faster; however, if your graph is very large, you might want to turn this optimization off to avoid running out of memory.

Using the Shell to Run Triangle Counting

opg> analyst.countTriangles(graph, true)
==> 1

Using Java to Run Triangle Counting

import oracle.pgx.api.*;
 
Analyst analyst = session.createAnalyst();
long triangles = analyst.countTriangles(graph, true);

The algorithm finds one triangle in the sample graph.

Tip:

When using the in-memory analyst shell, you can increase the amount of log output during execution by changing the logging level. See information about the :loglevel command with :h :loglevel.

3.6.3 Running the PageRank Algorithm

PageRank computes a rank value between 0 and 1 for each vertex (node) in the graph and stores the values in a double property. The algorithm therefore creates a vertex property of type double for the output.

In the in-memory analyst, there are two types of vertex and edge properties:

  • Persistent Properties: Properties that are loaded with the graph from a data source are fixed, in-memory copies of the data on disk, and are therefore persistent. Persistent properties are read-only, immutable and shared between sessions.

  • Transient Properties: Values can only be written to transient properties, which are private to a session. You can create transient properties by calling createVertexProperty and createEdgeProperty on PgxGraph objects, or by copying existing properties using clone() on Property objects.

    Transient properties hold the results of computation by algorithms. For example, the PageRank algorithm computes a rank value between 0 and 1 for each vertex in the graph and stores these values in a transient property named pg_rank. Transient properties are destroyed when the Analyst object is destroyed.

This example obtains the top three vertices with the highest PageRank values. It uses a transient vertex property of type double to hold the computed PageRank values. The PageRank algorithm uses the following default values for the input parameters: error (tolerance = 0.001), damping factor = 0.85, and maximum number of iterations = 100.

Using the Shell to Run PageRank

opg> rank = analyst.pagerank(graph, 0.001, 0.85, 100);
==> ...
opg> rank.getTopKValues(3)
==> 128=0.1402019732468347
==> 333=0.12002296283541904
==> 99=0.09708583862990475

Using Java to Run PageRank

import java.util.Map.Entry;
import oracle.pgx.api.*;
 
Analyst analyst = session.createAnalyst();
VertexProperty<Integer, Double> rank = analyst.pagerank(graph, 0.001, 0.85, 100);
for (Entry<Integer, Double> entry : rank.getTopKValues(3)) {
 System.out.println(entry.getKey() + "=" + entry.getValue());
}

3.7 Using Custom PGX Graph Algorithms

A custom PGX graph algorithm allows you to write a graph algorithm in Java and have it automatically compiled to an efficient parallel implementation.

For more detailed information than appears in the following subtopics, see the PGX Algorithm specification at https://docs.oracle.com/cd/E56133_01/latest/PGX_Algorithm_Language_Specification.pdf.

3.7.1 Writing a Custom PGX Algorithm

A PGX algorithm is a regular .java file with a single class definition that is annotated with @graphAlgorithm. For example:

import oracle.pgx.algorithm.annotations.GraphAlgorithm;

@GraphAlgorithm
public class MyAlgorithm {
    ...
}

A PGX algorithm class must contain exactly one public method which will be used as entry point. For example:

import oracle.pgx.algorithm.PgxGraph;
import oracle.pgx.algorithm.VertexProperty;
import oracle.pgx.algorithm.annotations.GraphAlgorithm;
import oracle.pgx.algorithm.annotations.Out;

@GraphAlgorithm
public class MyAlgorithm {
    public int myAlgorithm(PgxGraph g, @Out VertexProperty<Integer> distance) {
        System.out.println("My first PGX Algorithm program!");

        return 42;
    }
}

As with normal Java methods, a PGX algorithm method can return a value (an integer in this example). More interesting is the @Out annotation, which marks the vertex property distance as output parameter. The caller passes output parameters by reference. This way, the caller has a reference to the modified property after the algorithm terminates.

3.7.1.1 Collections

To create a collection you call the .create() function. For example, a VertexProperty<Integer> is created as follows:

VertexProperty<Integer> distance = VertexProperty.create();

To get the value of a property at a certain vertex v:

distance.get(v);

Similarly, to set the property of a certain vertex v to a value e:

distance.set(v, e);

You can even create properties of collections:

VertexProperty<VertexSequence> path = VertexProperty.create();

However, PGX Algorithm assignments are always by value (as opposed to by reference). To make this explicit, you must call .clone() when assigning a collection:

VertexSequence sequence = path.get(v).clone();

Another consequence of values being passed by value is that you can check for equality using the == operator instead of the Java method .equals(). For example:

PgxVertex v1 = G.getRandomVertex();
PgxVertex v2 = G.getRandomVertex();
System.out.println(v1 == v2);
3.7.1.2 Iteration

The most common operations in PGX algorithms are iterations (such as looping over all vertices, and looping over a vertex's neighbors) and graph traversal (such as breath-first/depth-first). All collections expose a forEach and forSequential method by which you can iterate over the collection in parallel and in sequence, respectively.

For example:

  • To iterate over a graph's vertices in parallel:
    G.getVertices().forEach(v -> {
        ...
    });
    
  • To iterate over a graph's vertices in sequence:
    G.getVertices().forSequential(v -> {
        ...
    });
    
  • To traverse a graph's vertices from r in breadth-first order:
    import oracle.pgx.algorithm.Traversal;
    
    Traversal.inBFS(G, r).forward(n -> {
        ...
    });
    

    Inside the forward (or backward) lambda you can access the current level of the BFS (or DFS) traversal by calling currentLevel().

3.7.1.3 Reductions

Within these parallel blocks it is common to atomically update, or reduce to, a variable defined outside the lambda. These atomic reductions are available as methods on Scalar<T>: reduceAdd, reduceMul, reduceAnd, and so on. For example, to count the number of vertices in a graph:

public int countVertices() {
    Scalar<Integer> count = Scalar.create(0);

    G.getVertices().forEach(n -> {
        count.reduceAdd(1);
    });

    return count.get();
}

Sometimes you want to update multiple values atomically. For example, you might want to find the smallest property value as well as the vertex whose property value attains this smallest value. Due to the parallel execution, two separate reduction statements might get you in an inconsistent state.

To solve this problem the Reductions class provides argMin and argMax functions. The first argument to argMin is the current value and the second argument is the potential new minimum. Additionally, you can chain andUpdate calls on the ArgMinMax object to indicate other variables and the values that they should be updated to (atomically). For example:

VertexProperty<Integer> rank = VertexProperty.create();
int minRank = Integer.MAX_VALUE;
PgxVertex minVertex = PgxVertex.NONE;

G.getVertices().forEach(n ->
    argMin(minRank, rank.get(n)).andUpdate(minVertex, n)
);

3.7.2 Compiling and Running a PGX Algorithm

To be able to compile and run a custom PGX algorithm, you must perform several actions:

  1. Set two configuration parameters in the conf/pgx.conf file:
    • Set the graph_algorithm_language option to JAVA.
    • Set the java_home_dir option to the path to your Java home (use <system-java-home-dir> to have PGX infer Java home from the system properties).
  2. Create a session (either implicitly in the shell or explicitly in Java). For example:
    cd $PGX_HOME
    ./bin/opg
    
  3. Compile a PGX Algorithm. For example:
    myAlgorithm = session.compileProgram("/path/to/MyAlgorithm.java")
  4. Run the algorithm. For example:
    graph = session.readGraphWithProperties("/path/to/config.edge.json")
    property = graph.createVertexProperty(PropertyType.INTEGER)
    myAlgorithm.run(graph, property)
    

3.7.3 Example Custom PGX Algorithm: PageRank

The following is an implementation of pagerank as a PGX algorithm:

import oracle.pgx.algorithm.PgxGraph;
import oracle.pgx.algorithm.Scalar;
import oracle.pgx.algorithm.VertexProperty;
import oracle.pgx.algorithm.annotations.GraphAlgorithm;
import oracle.pgx.algorithm.annotations.Out;

@GraphAlgorithm
public class Pagerank {
  public void pagerank(PgxGraph G, double tol, double damp, int max_iter, boolean norm, @Out VertexProperty<Double> rank) {
    Scalar<Double> diff = Scalar.create();
    int cnt = 0;
    double N = G.getNumVertices();

    rank.setAll(1 / N);
    do {
      diff.set(0.0);
      Scalar<Double> dangling_factor = Scalar.create(0d);

      if (norm) {
        dangling_factor.set(damp / N * G.getVertices().filter(v -> v.getOutDegree() == 0).sum(rank::get));
      }

      G.getVertices().forEach(t -> {
        double in_sum = t.getInNeighbors().sum(w -> rank.get(w) / w.getOutDegree());
        double val = (1 - damp) / N + damp * in_sum + dangling_factor.get();
        diff.reduceAdd(Math.abs(val - rank.get(t)));
        rank.setDeferred(t, val);
      });
      cnt++;
    } while (diff.get() > tol && cnt < max_iter);
  }
}

3.8 Creating Subgraphs

You can create subgraphs based on a graph that has been loaded into memory. You can use filter expressions or create bipartite subgraphs based on a vertex (node) collection that specifies the left set of the bipartite graph.

For information about reading a graph into memory, see Store the Database Password in a Keystore.

3.8.1 About Filter Expressions

Filter expressions are expressions that are evaluated for each edge. The expression can define predicates that an edge must fulfil to be contained in the result, in this case a subgraph.

Consider an example graph that consists of four vertices (nodes) and four edges. For an edge to match the filter expression src.prop == 10, the source vertex prop property must equal 10. Two edges match that filter expression, as shown in the following figure.

Figure 3-1 Edges Matching src.prop == 10

Description of Figure 3-1 follows
Description of "Figure 3-1 Edges Matching src.prop == 10"

The following figure shows the graph that results when the filter is applied. The filter excludes the edges associated with vertex 333, and the vertex itself.

Figure 3-2 Graph Created by the Simple Filter

Description of Figure 3-2 follows
Description of "Figure 3-2 Graph Created by the Simple Filter"

Using filter expressions to select a single vertex or a set of vertices is difficult. For example, selecting only the vertex with the property value 10 is impossible, because the only way to match the vertex is to match an edge where 10 is either the source or destination property value. However, when you match an edge you automatically include the source vertex, destination vertex, and the edge itself in the result.

3.8.2 Using a Simple Filter to Create a Subgraph

The following examples create the subgraph described in About Filter Expressions.

Using the Shell to Create a Subgraph

subgraph = graph.filter(new VertexFilter("vertex.prop == 10"))

Using Java to Create a Subgraph

import oracle.pgx.api.*;
import oracle.pgx.api.filter.*;

PgxGraph graph = session.readGraphWithProperties(...);
PgxGraph subgraph = graph.filter(new VertexFilter("vertex.prop == 10"));

3.8.3 Using a Complex Filter to Create a Subgraph

This example uses a slightly more complex filter. It uses the outDegree function, which calculates the number of outgoing edges for an identifier (source src or destination dst). The following filter expression matches all edges with a cost property value greater than 50 and a destination vertex (node) with an outDegree greater than 1.

dst.outDegree() > 1 && edge.cost > 50

One edge in the sample graph matches this filter expression, as shown in the following figure.

Figure 3-3 Edges Matching the outDegree Filter

Description of Figure 3-3 follows
Description of "Figure 3-3 Edges Matching the outDegree Filter"

The following figure shows the graph that results when the filter is applied. The filter excludes the edges associated with vertixes 99 and 1908, and so excludes those vertices also.

Figure 3-4 Graph Created by the outDegree Filter

Description of Figure 3-4 follows
Description of "Figure 3-4 Graph Created by the outDegree Filter"

3.8.4 Using a Vertex Set to Create a Bipartite Subgraph

You can create a bipartite subgraph by specifying a set of vertices (nodes), which are used as the left side. A bipartite subgraph has edges only between the left set of vertices and the right set of vertices. There are no edges within those sets, such as between two nodes on the left side. In the in-memory analyst, vertices that are isolated because all incoming and outgoing edges were deleted are not part of the bipartite subgraph.

The following figure shows a bipartite subgraph. No properties are shown.

The following examples create a bipartite subgraph from the simple graph shown in About Filter Expressions. They create a vertex collection and fill it with the vertices for the left side.

Using the Shell to Create a Bipartite Subgraph

opg> s = graph.createVertexSet()
==> ...
opg> s.addAll([graph.getVertex(333), graph.getVertex(99)])
==> ...
opg> s.size()
==> 2
opg> bGraph = graph.bipartiteSubGraphFromLeftSet(s)
==> PGX Bipartite Graph named sample-sub-graph-4

Using Java to Create a Bipartite Subgraph

import oracle.pgx.api.*;
 
VertexSet<Integer> s = graph.createVertexSet();
s.addAll(graph.getVertex(333), graph.getVertex(99));
BipartiteGraph bGraph = graph.bipartiteSubGraphFromLeftSet(s);

When you create a subgraph, the in-memory analyst automatically creates a Boolean vertex (node) property that indicates whether the vertex is on the left side. You can specify a unique name for the property.

The resulting bipartite subgraph looks like this:

Vertex 1908 is excluded from the bipartite subgraph. The only edge that connected that vertex extended from 128 to 1908. The edge was removed, because it violated the bipartite properties of the subgraph. Vertex 1908 had no other edges, and so was removed also.

3.9 Using Automatic Delta Refresh to Handle Database Changes

You can automatically refresh (auto-refresh) graphs periodically to keep the in-memory graph synchronized with changes to the property graph stored in the property graph tables in Oracle Database (VT$ and GE$ tables).

Note that the auto-refresh feature is not supported when loading a graph into PGX in memory directly from relational tables.

3.9.1 Configuring the In-Memory Server for Auto-Refresh

Because auto-refresh can create many snapshots and therefore may lead to a high memory usage, by default the option to enable auto-refresh for graphs is available only to administrators.

To allow all users to auto-refresh graphs, you must include the following line into the in-memory analyst configuration file (located in $ORACLE_HOME/md/property_graph/pgx/conf/pgx.conf):

{
  "allow_user_auto_refresh": true
}

3.9.2 Configuring Basic Auto-Refresh

Auto-refresh is configured in the loading section of the graph configuration. The example in this topic sets up auto-refresh to check for updates every minute, and to create a new snapshot when the data source has changed.

The following block (JSON format) enables the auto-refresh feature in the configuration file of the sample graph:

{
  "format": "pg",
  "jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521/dbName",
  "username": "scott",
  "password": "<password>",
  "name": "my_graph",
  "vertex_props": [{
    "name": "prop",
    "type": "integer"
  }],
  "edge_props": [{
    "name": "cost",
    "type": "double"
  }],
  "separator": " ",
  "loading": {
    "auto_refresh": true,
    "update_interval_sec": 60
  },
}

Notice the additional loading section containing the auto-refresh settings. You can also use the Java APIs to construct the same graph configuration programmatically:

GraphConfig config = GraphConfigBuilder.forPropertyGraphRdbms()
  .setJdbcUrl("jdbc:oracle:thin:@mydatabaseserver:1521/dbName")
  .setUsername("scott")
  .setPassword("<password>")
  .setName("my_graph")
  .addVertexProperty("prop", PropertyType.INTEGER)
  .addEdgeProperty("cost", PropertyType.DOUBLE)
  .setAutoRefresh(true)
  .setUpdateIntervalSec(60)
  .build();

3.9.3 Reading the Graph Using the In-Memory Analyst or a Java Application

After creating the graph configuration, you can load the graph into the in-memory analyst using the regular APIs.

opg> G = session.readGraphWithProperties("graphs/my-config.pg.json")

After the graph is loaded, a background task is started automatically, and it periodically checks the data source for updates.

3.9.4 Checking Out a Specific Snapshot of the Graph

The database is queried every minute for updates. If the graph has changed in the database after the time interval passed, the graph is reloaded and a new snapshot is created in-memory automatically.

You can "check out" (move a pointer to a different version of) the available in-memory snapshots of the graph using the getAvailableSnapshots() method of PgxSession. Example output is as follows:

opg> session.getAvailableSnapshots(G)
==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
==> GraphMetaData [getNumVertices()=5, getNumEdges()=5, memoryMb=3, dataSourceVersion=1452083654000, creationRequestTimestamp=1453314938744 (2016-01-20 10:35:38.744), creationTimestamp=1453314938833 (2016-01-20 10:35:38.833), vertexIdType=integer, edgeIdType=long]

The preceding example output contains two entries, one for the originally loaded graph with 4 vertices and 4 edges, and one for the graph created by auto-refresh with 5 vertices and 5 edges.

To check out out a specific snapshot of the graph, use the setSnapshot() methods of PgxSession and give it the creationTimestamp of the snapshot you want to load.

For example, if G is pointing to the newer graph with 5 vertices and 5 edges, but you want to analyze the older version of the graph, you need to set the snapshot to 1453315122685. In the in-memory analyst shell:

opg> G.getNumVertices()
==> 5
opg> G.getNumEdges()
==> 5

opg> session.setSnapshot( G, 1453315122685 )
==> null

opg> G.getNumVertices()
==> 4
opg> G.getNumEdges()
==> 4

You can also load a specific snapshot of a graph directly using the readGraphAsOf() method of PgxSession. This is a shortcut for loading a graph with readGraphWithProperty() followed by a setSnapshot(). For example:

opg> G = session.readGraphAsOf( config, 1453315122685 )

If you do not know or care about what snapshots are currently available in-memory, you can also specify a time span of how “old” a snapshot is acceptable by specifying a maximum allowed age. For example, to specify a maximum snapshot age of 60 minutes, you can use the following:

opg> G = session.readGraphWithProperties( config, 60l, TimeUnit.MINUTES )

If there are one or more snapshots in memory younger (newer) than the specified maximum age, the youngest (newest) of those snapshots will be returned. If all the available snapshots are older than the specified maximum age, or if there is no snapshot available at all, then a new snapshot will be created automatically.

3.9.5 Advanced Auto-Refresh Configuration

You can specify advanced options for auto-refresh configuration.

Internally, the in-memory analyst fetches the changes since the last check from the database and creates a new snapshot by applying the delta (changes) to the previous snapshot. There are two timers: one for fetching and caching the deltas from the database, the other for actually applying the deltas and creating a new snapshot.

Additionally, you can specify a threshold for the number of cached deltas. If the number of cached changes grows above this threshold, a new snapshot is created automatically. The number of cached changes is a simple sum of the number of vertex changes plus the number of edge changes.

The deltas are fetched periodically and cached on the in-memory analyst server for two reasons:

  • To speed up the actual snapshot creation process

  • To account for the case that the database can "forget" changes after a while

You can specify both a threshold and an update timer, which means that both conditions will be checked before new snapshot is created. At least one of these parameters (threshold or update timer) must be specified to prevent the delta cache from becoming too large. The interval at which the source is queried for changes must not be omitted.

The following parameters show a configuration where the data source is queried for new deltas every 5 minutes. New snapshots are created every 20 minutes or if the cached deltas reach a size of 1000 changes.

{
  "format": "pg",
  "jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521/dbName",
  "username": "scott",
  "password": "<your_password>",
  "name": "my_graph",

  "loading": {
    "auto_refresh": true,
    "fetch_interval_sec": 300,
    "update_interval_sec": 1200,
    "update_threshold": 1000,
    "create_edge_id_index": true,
    "create_edge_id_mapping": true
  }
}

3.10 Starting the In-Memory Analyst Server

A preconfigured version of Apache Tomcat is bundled, which allows you to start the in-memory analyst server by running a script.

If you need to configure the server before starting it, see Configuring the In-Memory Analyst Server.

PGX is integrated with systemd to run it as a Linux service in the background.

To start the PGX server as a daemon process, run the following command as a root user or with sudo:

sudo systemctl start pgx

To stop the server, run the following command as a root user or with sudo:

sudo systemctl stop pgx

If the server does not start up, you can see if there are any errors by running:

journalctl -u pgx.service

For more information about how to interact with systemd on Oracle Linux, see the Oracle Linux administrator's documentation.

3.10.1 Configuring the In-Memory Analyst Server

You can configure the in-memory analyst server by modifying the /etc/oracle/graph/server.conf file. The following table shows the valid configuration options, which can be specified in JSON format.

Table 3-4 Configuration Options for In-Memory Analyst Server

Option Type Description Default

authorization

string

File that maps clients to roles for authorization.

server.auth.conf

ca_certs

array of string

List of trusted certificates (PEM format). If 'enable_tls' is set to false, this option has no effect.

[See information after this table.]

enable_client_authentication

boolean

If true, the client is authenticated during TLS handshake. See the TLS protocol for details. This flag does not have any effect if 'enable_tls' is false.

true

enable_tls

boolean

If true, the server enables transport layer security (TLS).

true

port

integer

Port that the PGX server should listen on

7007

server_cert

string

The path to the server certificate to be presented to TLS clients (PEM format). If 'enable_tls' is set to false, this option has no effect

null

server_private_key

string

the private key of the server (PKCS#8, PEM format). If 'enable_tls' is set to false, this option has no effect

null

The in-memory analyst web server enables two-way SSL/TLS (Transport Layer Security) by default. The server enforces TLS 1.2 and disables certain cipher suites known to be vulnerable to attacks. Upon a TLS handshake, both the server and the client present certificates to each other, which are used to validate the authenticity of the other party. Client certificates are also used to authorize client applications.

The following is an example server.conf configuration file:

{ 
  "port": 7007, 
  "server_cert": "certificates/server_certificate.pem", 
  "server_private_key": "certificates/server_key.pem", 
  "ca_certs": [ "certificates/ca_certificate.pem" ], 
  "authorization": "auth/server.auth.conf",
  "enable_tls": true,
  "enable_client_authentication": true
}

The following is an example server.auth.conf configuration file: mapping client (applications) identified by their certificate DN string to roles:

{ 
  "authorization": [{
    "dn": "CN=Client, OU=Development, O=Oracle, L=Belmont, ST=California, C=US", 
   "admin": false
  }, {
    "dn": "CN=Admin, OU=Development, O=Oracle, L=Belmont, ST=California, C=US", 
   "admin": true
  }]
}

You can turn off client-side authentication or SSL/TLS authentication entirely in the server configuration. However, we recommend having two-way SSL/TLS enabled for any production usage.

3.11 Deploying to Apache Tomcat

The example in this topic shows how to deploy the graph server as a web application with Apache Tomcat.

The graph server will work with Apache Tomcat 9.0.x and higher.

  1. Download the Oracle Graph Webapps zip file from Oracle Software Delivery Cloud. This file contains ready-to-deploy Java web application archives (.war files). The file name will be similar to this: oracle-graph-webapps-<version>.zip
  2. Unzip the file into a directory of your choice.
  3. Locate the .war file for Tomcat. It follows the naming pattern: graph-server-<version>-pgx<version>-tomcat.war
  4. Configure the graph server.
    1. Modify authentication and other server settings by modifying the WEB-INF/classes/pgx.conf file inside the web application archive.
    2. Optionally, change logging settings by modifying the WEB-INF/classes/log4j2.xml file inside the web application archive.
    3. Optionally, change other servlet specific deployment descriptors by modifying the WEB-INF/web.xml file inside the web application archive.
  5. Copy the .war file into the Tomcat webapps directory. For example:
    cp graph-server-<version>-pgx<version>-tomcat.war $CATALINA_HOME/webapps/pgx.war
  6. Configure Tomcat specific settings, like the correct use of TLS/encryption
  7. Ensure that port 8080 is not already in use.
  8. Start Tomcat:
    cd $CATALINA_HOME 
    ./bin/startup.sh 

    The graph server will now listen on localhost:8080/pgx.

    You can connect to the server from JShell by running the following command:
    $ <client_install_dir>/bin/opg-jshell --base_url https://localhost:8080/pgx -u <graphuser>

3.11.1 About the Authentication Mechanism

The in-memory analyst web deployment uses BASIC Auth by default. You should change to a more secure authentication mechanism for a production deployment.

To change the authentication mechanism, modify the security-constraint element of the web.xml deployment descriptor in the web application archive (WAR) file.

3.12 Deploying to Oracle WebLogic Server

The example in this topic shows how to deploy the graph server as a web application with Oracle WebLogic Server.

This example shows how to deploy the graph server with Oracle WebLogic Server. Graph server supports WebLogic Server version 12.1.x and 12.2.x.

  1. Download the Oracle Graph Webapps zip file from Oracle Software Delivery Cloud. This file contains ready-to-deploy Java web application archives (.war files). The file name will be similar to this: oracle-graph-webapps-<version>.zip
  2. Unzip the file into a directory of your choice
  3. Locate the .war file for Weblogic server.
    1. For Weblogic Server version 12.1.x, use this web application archive: graph-server-<version>-pgx<version>-wls121x.war
    2. For Weblogic Server version 12.2.x, use this web application archive: graph-server-<version>-pgx<version>-wls122x.war
  4. Configure the graph server.
    1. Modify authentication and other server settings by modifying the WEB-INF/classes/pgx.conf file inside the web application archive.
    2. Optionally, change logging settings by modifying the WEB-INF/classes/log4j2.xml file inside the web application archive.
    3. Optionally, change other servlet specific deployment descriptors by modifying the WEB-INF/web.xml file inside the web application archive.
    4. Optionally, change WebLogic Server-specific deployment descriptors by modifying the WEB-INF/weblogic.xml file inside the web application archive.
  5. Configure WebLogic specific settings, like the correct use of TLS/encryption.
  6. Deploy the .war file to WebLogic Server. The following example shows how to do this from the command line:
    . $MW_HOME/user_projects/domains/mydomain/bin/setDomainEnv.sh
    . $MW_HOME/wlserver/server/bin/setWLSEnv.sh
    java weblogic.Deployer -adminurl http://localhost:7001 -username <username> -password <password> -deploy -source <path-to-war-file>
    

3.12.1 Installing Oracle WebLogic Server

To download and install the latest version of Oracle WebLogic Server, see

http://www.oracle.com/technetwork/middleware/weblogic/documentation/index.html

3.13 Connecting to the In-Memory Analyst Server

After the property graph in-memory analyst is installed in a computer running Oracle Database -- or on a client system without Oracle Database server software as a web application on Apache Tomcat or Oracle WebLogic Server -- you can connect to the in-memory analyst server.

3.13.1 Connecting with the In-Memory Analyst Shell

The simplest way to connect to an in-memory analyst instance is to specify the base URL of the server. The following base URL can connect the SCOTT user to the local instance listening on port 8080:

http://scott:<password>@localhost:8080/pgx

To start the in-memory analyst shell with this base URL, you use the --base_url command line argument

cd $PGX_HOME
./bin/opg-jshell --base_url http://scott:<password>@localhost:8080/pgx

You can connect to a remote instance the same way. However, the in-memory analyst currently does not provide remote support for the Control API.

3.13.1.1 About Logging HTTP Requests

The in-memory analyst shell suppresses all debugging messages by default. To see which HTTP requests are executed, set the log level for oracle.pgx to DEBUG, as shown in this example:

opg> /loglevel oracle.pgx DEBUG
===> log level of oracle.pgx logger set to DEBUG
opg> session.readGraphWithProperties("sample_http.adj.json", "sample")
10:24:25,056 [main] DEBUG RemoteUtils - Requesting POST http://scott:<password>@localhost:8080/pgx/core/session/session-shell-6nqg5dd/graph HTTP/1.1 with payload {"graphName":"sample","graphConfig":{"uri":"http://path.to.some.server/pgx/sample.adj","separator":" ","edge_props":[{"type":"double","name":"cost"}],"node_props":[{"type":"integer","name":"prop"}],"format":"adj_list"}}
10:24:25,088 [main] DEBUG RemoteUtils - received HTTP status 201
10:24:25,089 [main] DEBUG RemoteUtils - {"futureId":"87d54bed-bdf9-4601-98b7-ef632ce31463"}
10:24:25,091 [pool-1-thread-3] DEBUG PgxRemoteFuture$1 - Requesting GET http://scott:<password>@localhost:8080/pgx/future/session/session-shell-6nqg5dd/result/87d54bed-bdf9-4601-98b7-ef632ce31463 HTTP/1.1
10:24:25,300 [pool-1-thread-3] DEBUG RemoteUtils - received HTTP status 200
10:24:25,301 [pool-1-thread-3] DEBUG RemoteUtils - {"stats":{"loadingTimeMillis":0,"estimatedMemoryMegabytes":0,"numEdges":4,"numNodes":4},"graphName":"sample","nodeProperties":{"prop":"integer"},"edgeProperties":{"cost":"double"}}

This example requires that the graph URI points to a file that the in-memory analyst server can access using HTTP or HDFS.

3.13.2 Connecting with Java

You can specify the base URL when you initialize the in-memory analyst using Java. An example is as follows. A URL to an in-memory analyst server is provided to the getInMemAnalyst API call.

import oracle.pg.rdbms.*;
import oracle.pgx.api.*;
 
PgRdbmsGraphConfigcfg = GraphConfigBuilder.forPropertyGraphRdbms().setJdbcUrl("jdbc:oracle:thin:@127.0.0.1:1521:orcl") 
   .setUsername("scott").setPassword("<password>")  .setName("mygraph") 
   .setMaxNumConnections(2) .setLoadEdgeLabel(false) 
   .addVertexProperty("name", PropertyType.STRING, "default_name")  
   .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")  
   .build();OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg);
ServerInstance remoteInstance = Pgx.getInstance("http://scott:<password>@hostname:port/pgx");
PgxSession session = remoteInstance.createSession("my-session");
 
PgxGraph graph = session.readGraphWithProperties(opg.getConfig());

3.13.3 Connecting with the PGX REST API

You can connect to an in-memory analyst instance using the REST API PGX endpoints. This enables you to interact with the in-memory analyst in a language other than Java to implement your own client.

The examples in this topic assume that:

  • Linux with curl is installed. curl is a simple command-line utility to interact with REST endpoints.)
  • The PGX server is up and running on http://localhost:7007.
  • The PGX server has authentication/authorization disabled; that is, $ORACLE_HOME/md/property_graph/pgx/conf/server.conf contains "enable_tls": false. (This is a non-default setting and not recommended for production).
  • PGX allows reading graphs from the local file system; that is, $ORACLE_HOME/md/property_graph/pgx/conf/pgx.conf contains "allow_local_filesystem": true. (This is a non-default setting and not recommended for production).

For the Swagger specification, you can see a full list of supported endpoints in JSON by opening http://localhost:7007/swagger.json in your browser.

Step 1: Obtain a CSRF token

Request a CSRF token:

curl -v http://localhost:7007/token

The response will look like this:

*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 7007 (#0)
> GET /token HTTP/1.1
> Host: localhost:7007
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 201
< SET-COOKIE: _csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;Version=1; HttpOnly
< Content-Length: 0

As you can see in the response, this will set a cookie _csrf_token to a token value. 9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0 is used as an example token for the following requests. For any write requests, PGX server requires the same token to be present in both cookie and payload.

Step 2: Create a session

To create a new session, send a JSON payload:

curl -v --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0' -H 'content-type: application/json' -X POST http://localhost:7007/core/v1/sessions -d '{"source":"my-application", "idleTimeout":0, "taskTimeout":0, "timeUnitName":"MILLISECONDS", "_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'

Replace my-application with a value describing the application that you are running. This value can be used by server administrators to map sessions to their applications. Setting idle and task timeouts to 0 means the server will determine when the session and submitted tasks time out. You must provide the same CSRF token in both the cookie header and the JSON payload.

The response will look similar to the following:

*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 7007 (#0)
> POST /core/v1/sessions HTTP/1.1
> Host: localhost:7007
> User-Agent: curl/7.47.0
> Accept: */*
> Cookie: _csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0
> content-type: application/json
> Content-Length: 159
> 
* upload completely sent off: 159 out of 159 bytes
< HTTP/1.1 201
< SET-COOKIE: SID=abae2811-6dd2-48b0-93a8-8436e078907d;Version=1; HttpOnly
< Content-Length: 0

The response sets a cookie to the session ID value that was created for us. Session ID abae2811-6dd2-48b0-93a8-8436e078907d is used as an example for subsequent requests.

Step 3: Read a graph

Note:

if you want to analyze a pre-loaded graph or a graph that is already published by another session, you can skip this step. All you need to access pre-loaded or published graphs is the name of the graph.

To read a graph, send the graph configuration as JSON to the server as shown in the following example (replace <graph-config> with the JSON representation of an actual PGX graph config).

curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d'  http://localhost:7007/core/v1/loadGraph -H 'content-type: application/json'  -d  '{"graphConfig":<graph-config>,"graphName":null,"csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'

Here an example of a graph config that reads a property graph from the Oracle database:

{
  "format": "pg",
  "db_engine": "RDBMS",
  "jdbc_url":"jdbc:oracle:thin:@127.0.0.1:1521:orcl122",
  "username":"scott",
  "password":"tiger",
  "max_num_connections": 8,
  "name": "connections",
  "vertex_props": [
    {"name":"name", "type":"string"},
    {"name":"role", "type":"string"},
    {"name":"occupation", "type":"string"},
    {"name":"country", "type":"string"},
    {"name":"political party", "type":"string"},
    {"name":"religion", "type":"string"}
  ],
  "edge_props": [
    {"name":"weight", "type":"double", "default":"1"}
  ],
  "edge_label": true,
  "loading": {
    "load_edge_label": true
  }
}

Passing "graphName": null tells the server to generate a name.

The server will reply something like the following:

* upload completely sent off: 315 out of 315 bytes
< HTTP/1.1 202
< Location: http://localhost:7007/core/v1/futures/8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c/status
< Content-Type: application/json;charset=utf-8
< Content-Length: 51
< Date: Mon, 05 Nov 2018 17:22:22 GMT
<
* Connection #0 to host localhost left intact
{"futureId":"8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c"}

About Asynchronous Requests

Most of the PGX REST endpoints are asynchronous. Instead of keeping the connection open until the result is ready, PGX server submits as task and immediately returns a future ID with status code 200, which then can be used by the client to periodically request the status of the task or request the result value once done.

From the preceding response, you can request the future status like this:

curl -v --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d'  http://localhost:7007/core/v1/futures/8a46ef65-01a9-4bd0-87d3-ffe9dfd2ce3c/status

Which will return something like:

< HTTP/1.1 200
< Content-Type: application/json;charset=utf-8
< Content-Length: 730
< Date: Mon, 05 Nov 2018 17:35:19 GMT
< 
* Connection #0 to host localhost left intact
{"id":"eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92","links":[{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/status","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92","rel":"abort","method":"DELETE","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/status","rel":"canonical","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/futures/eb17f75b-e4c1-4a66-81a0-4ff0f8b4cb92/value","rel":"related","method":"GET","interaction":["async-polling"]}],"progress":"succeeded","completed":true,"intervalToPoll":1}

Besides the status (succeeded in this case), this output also includes links to cancel the task (DELETE) and to retrieve the result of the task once completed (GET <future-id>/value):

curl -X GET --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/futures/cdc15a38-3422-42a1-baf4-343c140cf95d/value

Which will return details about the loaded graph, including the name that was generated by the server (sample):

{"id":"sample","links":[{"href":"http://localhost:7007/core/v1/graphs/sample","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample","rel":"canonical","method":"GET","interaction":["async-polling"]}],"nodeProperties":{"prop1":{"id":"prop1","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/prop1","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/prop1","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"prop1","entityType":"vertex","type":"integer","transient":false}},"vertexLabels":null,"edgeLabel":null,"metaData":{"id":null,"links":null,"numVertices":4,"numEdges":4,"memoryMb":0,"dataSourceVersion":"1536029578000","config":{"format":"adj_list","separator":" ","edge_props":[{"type":"double","name":"cost"}],"error_handling":{},"vertex_props":[{"type":"integer","name":"prop1"}],"vertex_uris":["PATH_TO_FILE"],"vertex_id_type":"integer","loading":{}},"creationRequestTimestamp":1541242100335,"creationTimestamp":1541242100774,"vertexIdType":"integer","edgeIdType":"long","directed":true},"graphName":"sample","edgeProperties":{"cost":{"id":"cost","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/cost","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/cost","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"cost","entityType":"edge","type":"double","transient":false}},"ageMs":0,"transient":false}

For simplicity, the remaining steps omit the additional requests to request the status or value of asynchronous tasks.

Step 4: Create a property

Before you can run the PageRank algorithm on the loaded graph, you must create a vertex property of type DOUBLE on the graph, which can hold the computed ranking values:

curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d'  http://localhost:7007/core/v1/graphs/sample/properties -H 'content-type: application/json'  -d '{"entityType":"vertex","type":"double","name":"pagerank", "hardName":false,"dimension":0,"_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'

Requesting the result of the returned future will return something like:

{"id":"pagerank","links":[{"href":"http://localhost:7007/core/v1/graphs/sample/properties/pagerank","rel":"self","method":"GET","interaction":["async-polling"]},{"href":"http://localhost:7007/core/v1/graphs/sample/properties/pagerank","rel":"canonical","method":"GET","interaction":["async-polling"]}],"dimension":0,"name":"pagerank","entityType":"vertex","type":"double","transient":true}

Step 5: Run the PageRank algorithm on the loaded graph

The following example shows how to run an algorithm (PageRank in this case). The algorithm ID is part of the URL, and the parameters to be passed into the algorithm are part of the JSON payload:

curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/analyses/pgx_builtin_k1a_pagerank/run -H  'content-type: application/json' -d  '{"args":[{"type":"GRAPH","value":"sample"},{"type":"DOUBLE_IN","value":0.001},{"type":"DOUBLE_IN","value":0.85},{"type":"INT_IN","value":100},{"type":"BOOL_IN","value":true},{"type":"NODE_PROPERTY","value":"pagerank"}],"expectedReturnType":"void","workloadCharacteristics":["PARALLELISM.PARALLEL"],"_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'

Once the future is completed, the result will look something like this:

{"success":true,"canceled":false,"exception":null,"returnValue":null,"executionTimeMs":50}

Step 6: Execute a PGQL query

To query the results of the PageRank algorithm, you can run a PGQL query as shown in the following example:

curl -v -X POST --cookie '_csrf_token=9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0;SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/pgql/run -H 'content-type: application/json'  -d '{"pgqlQuery":"SELECT x.pagerank MATCH (x) WHERE x.pagerank > 0","semantic":"HOMOMORPHISM", "schemaStrictnessMode":true, "graphName" : "sample", "_csrf_token":"9bf51c8f-1c75-455e-9b57-ec3ca1c63cc0"}'

The result is a set of links you can use to interact with the result set of the query:

{"id":"pgql_1","links":[{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1","rel":"self","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlResultProxies/pgql_1/elements","rel":"related","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlResultProxies/pgql_1/results","rel":"related","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1","rel":"canonical","method":"GET","interaction":["async-polling"]}],"exists":true,"graphName":"sample","resultSetId":"pgql_1","numResults":4}

To request the first 2048 elements of the result set, send:

curl -X GET  --cookie 'SID=abae2811-6dd2-48b0-93a8-8436e078907d' http://localhost:7007/core/v1/pgqlProxies/pgql_1/results?size=2048

The response looks something like this:

{"id":"/pgx/core/v1/pgqlProxies/pgql_1/results","links":[{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1/results","rel":"self","method":"GET","interaction":["sync"]},{"href":"http://localhost:7007/core/v1/pgqlProxies/pgql_1/results","rel":"canonical","method":"GET","interaction":["async-polling"]}],"count":4,"totalItems":4,"items":[[0.3081206521195582],[0.21367103988538017],[0.21367103988538017],[0.2645372681096815]],"hasMore":false,"offset":0,"limit":4,"showTotalResults":true}

3.14 Managing Property Graph Snapshots

You can manage property graph snapshots.

Note:

Managing property graph snapshots is intended for advanced users.

You can persist different versions of a property graph as binary snapshots in the database. The binary snapshots represent a subgraph of graph data computed at runtime that may be needed for a future use. The snapshots can be read back later as input for the in-memory analytics, or as an output stream that can be used by the parallel property graph data loader.

You can store binary snapshots in the <graph_name>SS$ table of the property graph using the Java API OraclePropertyGraphUtils.storeBinaryInMemoryGraphSnapshot. This operation requires a connection to the Oracle database holding the property graph instance, the name of the graph and its owner, the ID of the snapshot, and an input stream from which the binary snapshot can be read. You can also specify the time stamp of the snapshot and the degree of parallelism to be used when storing the snapshot in the table.

You can read a stored binary snapshot using oraclePropertyGraphUtils.readBinaryInMemGraphSnapshot. This operation requires a connection to the Oracle database holding the property graph instance, the name of the graph and its owner, the ID of the snapshot to read, and an output stream where the binary file snapshot will be written into. You can also specify the degree of parallelism to be used when reading the snapshot binary-file from the table.

The following code snippet creates a property graph from the data file in Oracle Flat-file format, adds a new vertex, and exports the graph into an output stream using GraphML format. This output stream represents a binary file snapshot, and it is stored in the property graph snapshot table. Finally, this example reads back the file from the snapshot table and creates a second graph from its contents.

String szOPVFile = "../../data/connections.opv"; 
String szOPEFile = "../../data/connections.ope"; 
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName); 
opgdl = OraclePropertyGraphDataLoader.getInstance(); 
opgdl.loadData(opg, szOPVFile, szOPEFile, 2 /* dop */, 1000, true, 
               "PDML=T,PDDL=T,NO_DUP=T,"); 

// Add a new vertex
Vertex v = opg.addVertex(Long.valueOf("1000"));
v.setProperty("name", "Alice");
opg.commit();

System.out.pritnln("Graph " + szGraphName + " total vertices: " + 
                   opg.countVertices(dop));
System.out.pritnln("Graph " + szGraphName + " total edges: " + 
                   opg.countEdges(dop));


// Get a snapshot of the current graph as a file in graphML format. 
OutputStream os = new ByteArrayOutputStream();
OraclePropertyGraphUtils.exportGraphML(opg, 
                                       os /* output stream */, 
                                       System.out /* stream to show progress */);

// Save the snapshot into the SS$ table
InputStream is = new ByteArrayInputStream(os.toByteArray());
OraclePropertyGraphUtils.storeBinaryInMemGraphSnapshot(szGraphName, 
                                                     szGraphOwner /* owner of the 
                                                                   property graph */,
                                                     conn /* database connection */, 
                                                     is,
                                                     (long) 1 /* snapshot ID */,
                                                     1 /* dop */);
os.close();
is.close();

// Read the snapshot back from the SS$ table
OutputStream snapshotOS = new ByteArrayOutputStream();
OraclePropertyGraphUtils.readBinaryInMemGraphSnapshot(szGraphName, 
                                                    szGraphOwner /* owner of the 
                                                                   property graph */,
                                                    conn /* database connection */, 
                                                    new OutputStream[] {snapshotOS},
                                                    (long) 1 /* snapshot ID */,
                                                    1 /* dop */);

InputStream snapshotIS = new ByteArrayInputStream(snapshotOS.toByteArray());
String szGraphNameSnapshot = szGraphName + "_snap";
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args,szGraphNameSnapshot); 

OraclePropertyGraphUtils.importGraphML(opg, 
                                       snapshotIS /* input stream */, 
                                       System.out /* stream to show progress */);

snapshotOS.close();
snapshotIS.close();


System.out.pritnln("Graph " + szGraphNameSnapshot + " total vertices: " + 
                   opg.countVertices(dop));
System.out.pritnln("Graph " + szGraphNameSnapshot + " total edges: " + 
                   opg.countEdges(dop));

The preceding example will produce output similar as the following:

Graph test total vertices: 79
Graph test total edges: 164
Graph test_snap total vertices: 79
Graph test_snap total edges: 164

3.15 User-Defined Functions (UDFs) in PGX

User-defined functions (UDFs) allow users of PGX to add custom logic to their PGQL queries or custom graph algorithms, to complement built-in functions with custom requirements.

Caution:

UDFs enable the running arbitrary code in the PGX server, possibly accessing sensitive data. Additionally, any PGX session can invoke any of the UDFs that are enabled on the PGX server. The application administrator who enables UDFs is responsible for checking the following:

  • All the UDF code can be trusted.
  • The UDFs are stored in a secure location that cannot be tampered with.

How to Use UDFs

The following simple example shows how to register a UDF at the PGX server and invoke it.

  1. Create a class with a public static method. For example:
    package my.udfs;
     
    public class MyUdfs {
      public static String concat(String a, String b) {
        return a + b;
      }
    }
    
  2. Compile the class and compress into a JAR file. For example:
    mkdir ./target
    javac -d ./target *.java
    cd target
    jar cvf MyUdfs.jar *
    
  3. Copy the JAR file into /opt/oracle/graph/pgx/server/lib.
  4. Create a UDF JSON configuration file. For example, assume that /path/to/my/udfs/dir/my_udfs.json contains the following:
    {
      "user_defined_functions": [
        {
          "namespace": "my",
          "language": "java",
          "implementation_reference": "my.package.MyUdfs",
          "function_name": "concat",
          "return_type": "string",
          "arguments": [
             {
               "name": "a",
               "type": "string"
             },
             {
               "name": "b",
               "type": "string"
             }
           ]
        }
      ]
    }
  5. Point to the directory containing the UDF configuration file in /etc/oracle/graph/pgx.conf. For example:
    "udf_config_directory": "/path/to/my/udfs/dir/"
  6. Restart the PGX server. For example:
    sudo systemctl restart pgx
  7. Try to invoke the UDF from within a PGQL query. For example:
    graph.queryPgql("SELECT my.concat(my.concat(n.firstName, ' '), n.lastName) FROM MATCH (n:Person)")
  8. Try to invoke the UDF from within a PGX algorithm. For example:
    import oracle.pgx.algorithm.annotations.Udf;
    ....
     
    @GraphAlgorithm
    public class MyAlogrithm {
      public void bomAlgorithm(PgxGraph g, VertexProperty<String> firstName, VertexProperty<String> lastName, @Out VertexProperty<String> fullName) {
     
     
      ... fullName.set(v, concat(firstName.get(v), lastName.get(v))); ...
     
      }
     
      @Udf(namespace = "my")
      abstract String concat(String a, String b);
    }

UDF Configuration File Information

A UDF configuration file is a JSON file containing an array of user_defined_functions. (An example of such a file is in the step to "Create a UDF JSON configuration file" in the preceding "How to Use UDFs" subsection.)

Each user-defined function supports the fields shown in the following table.

Table 3-5 Fields for Each UDF

Field Data Type Description Required?
function_name string Name of the function used as identifier in PGX Required
language enum[java, javascript] Source language for he function (java or javascript) Required
return_type enum[boolean, integer, long, float, double, string] Return type of the function Required
arguments array of object Array of arguments. For each argument: type, argument name, required? []
implementation_reference string Reference to the function name on the classpath null
namespace string Namespace of the function in PGX null
source_function_name string Name of the function in the source language null
source_location string Local file path to the function's source code null

All configured UDFs must be unique with regard to the combination of the following fields:

  • namespace
  • function_name
  • arguments