Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1 Reading Graphs from Oracle Database into the Graph Server (PGX)

Once logged into the graph server (PGX), you can read graphs from the database into the graph server.

Your database user must exist and have read access on the graph data in the database.

The following options are supported for loading a graph:

SQL Property Graphs

Using the readGraphByName API - see Loading a SQL Property Graph Using the readGraphByName API for more details.
Using the PgSqlSubgraphReader API to create and load a subgraph - see Loading a Subgraph Using PGQL Queries for more details.
Using the PGQL CREATE PROPERTY GRAPH statement - see Creating a SQL Property Graph Using PGQL for more details.

PGQL Property Graphs

Using the readGraphByName API - see Loading a PGQL Property Graph Using the readGraphByName API for more details.
Using the PGQL CREATE PROPERTY GRAPH statement - see Creating a Property Graph Using PGQL for more details.
Using the PgViewSubgraphReader#fromPgView API to create and load a subgraph - see Loading a Subgraph from a PGQL Property Graph for more details.
Using a PGX graph configuration file in JSON format - see Loading a Graph Using a JSON Configuration File for more details.
Using the GraphConfigBuilder class to create Oracle RDBMS graph configurations programmatically through Java methods - see Loading a Graph by Defining a Graph Configuration Object for more details.

Also, refer to the following sections for additional information:

Reading Entity Providers at the Same SCN
If you have a graph which consists of multiple vertex or edge tables or both, then you can read all the vertices and edges at the same System Change Number (SCN).
Progress Reporting and Estimation for Graph Loading
Loading a large graph into the graph server(PGX) can be a long running operation. However, if you load the graph using an asynchronous action, then you can monitor the progress of the graph loading operation.
Graph Configuration Options
Learn about the graph configuration options.
Data Loading Security Best Practices
Loading a graph from the database requires authentication and it is therefore important to adhere to certain security guidelines when configuring access to this kind of data source.
Data Format Support Matrix
Learn about the different data formats supported in the graph server (PGX).
Immutability of Loaded Graphs
Once the graph is loaded into the graph server (PGX), the graph and its properties are automatically marked as immutable.

Parent topic: Graph Management in the Graph Server (PGX)

16.2.1.1 Reading Entity Providers at the Same SCN

If you have a graph which consists of multiple vertex or edge tables or both, then you can read all the vertices and edges at the same System Change Number (SCN).

This helps to overcome issues such as reading edge providers at a later SCN than the SCN at which the vertices were read, as some edges may reference missing vertices.

Note that reading a graph from the database is still possible even if Flashback is not enabled on Oracle Database. In case of multiple databases, SCN can be used to maintain consistency for entity providers belonging to the same database only.

You can use the as_of flag in the graph configuration to specify at what SCN an entity provider must be read. The valid values for the as_of flag are as follows:

Table 16-1 Valid values for "as_of" Key in Graph Configuration

Value	Description
A positive `long` value	This is a parseable SCN value.
`"<current-scn>"`	The current SCN is determined at the beginning of the graph loading.
`"<no-scn>"`	This is to disable SCN at the time of graph loading.
`null`	This defaults to `"<current-scn>"` behavior.

If "as_of" is omitted for a vertex or an edge provider in the graph configuration file, then this follows the same behavior as "as_of": null.

Example 16-1 Graph Configuration Using "as_of" for Vertex and Edge Providers in the Same Database

The following example configuration has three vertex providers and one edge provider pointing to the same database.

{
  "name": "employee_graph",
  "vertex_providers": [
    {
      "name": "Department",
      "as_of": "<current-scn>",
      "format": "rdbms",
      "database_table_name": "DEPARTMENTS",
      "key_column": "DEPARTMENT_ID",
      "props": [
        {
          "name": "DEPARTMENT_NAME",
          "type": "string"
        }
      ]
    },
    {
      "name": "Location",
      "as_of": "28924323",
      "format": "rdbms",
      "database_table_name": "LOCATIONS",
      "key_column": "LOCATION_ID",
      "props": [
        {
          "name": "CITY",
          "type": "string"
        }
      ]
    },
    {
      "name": "Region",
      "as_of": "<no-scn>",
      "format": "rdbms",
      "database_table_name": "REGIONS",
      "key_column": "REGION_ID",
      "props": [
        {
          "name": "REGION_NAME",
          "type": "string"
        }
      ]
    }
  ],
  "edge_providers": [
    {
      "name": "LocatedAt",
      "format": "rdbms",
      "database_table_name": "DEPARTMENTS",
      "key_column": "DEPARTMENT_ID",
      "source_column": "DEPARTMENT_ID",
      "destination_column": "LOCATION_ID",
      "source_vertex_provider": "Department",
      "destination_vertex_provider": "Location"
    }
  ]
}

When reading the employee_graph using the preceding configuration file, the graph is read at the same SCN for the Department and LocatedAt entity providers. This is explained in the following table:

Table 16-2 Example Scenario Using "as_of"

Entity Provider	`"as_of"`	SCN Value
`Department`	`"<current-scn>"`	SCN determined automatically
`Location`	`"28924323"`	`"28924323"` used as SCN
`Region`	`"<no-scn>"`	No SCN used
`LocatedAt`	`"as_of"` flag is omitted	SCN determined automatically

The current SCN value of the database can be determined using one of the following options:

Querying V$DATABASE view:
```
SELECT CURRENT_SCN FROM V$DATABASE;
```

Using DBMS_FLASHBACK package:

SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM DUAL;

If you do not have the required privileges to perform either of the preceding operations, then you can use:

SELECT TIMESTAMP_TO_SCN(SYSDATE) FROM DUAL;

However, note that this option is less precise than the earlier two options.

You can then read the graph into the graph server using the JSON configuration file as shown:

opg4j> var g = session.readGraphWithProperties("employee_graph.json")

PgxGraph g = session.readGraphWithProperties("employee_graph.json");

g = session.read_graph_with_properties("employee_graph.json")

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1.2 Progress Reporting and Estimation for Graph Loading

Loading a large graph into the graph server(PGX) can be a long running operation. However, if you load the graph using an asynchronous action, then you can monitor the progress of the graph loading operation.

The following table shows the asynchronous graph loading APIs supported for the following formats:

Table 16-3 Asynchronous Graph Loading APIs

Data Format	API
PGQL Property Graph	session.readGraphByNameAsync()
PG SCHEMA	session.readGraphWithPropertiesAsync()
CSV	session.readGraphFileAsync()

These supported APIs return a PgxFuture object.

You can then use the PgxFuture.getProgress() method to collect the following statistics:

Report on the progress of the graph loading operation
Estimate of the remaining vertices and edges that need to be loaded into memory

For example, the following code shows the steps to load a PGQL property graph graph asynchronously and subsequently obtain the FutureProgress object to report and estimate the loading progress. However, note that the graph loading estimate (for example, the number of loaded entities and providers or the number of total entities and providers) can be obtained only until the graph loading operation is in progress. Also, the system internally computes the graph loading progress for every 10000 entries of entities that are loaded into the graph server (PGX).

JShell
Java

opg4j>  var graphLoadingFuture = session.readGraphByNameAsync("BANK_GRAPH_VIEW", GraphSource.PG_VIEW)
readGraphFuture ==> oracle.pgx.api.PgxFuture@6106dfb6[Not completed]

opg4j> while (!graphLoadingFuture.isDone()) {
...>   var progress = graphLoadingFuture.getProgress();
...>   var graphLoadingProgress = progress.asGraphLoadingProgress();
...>   if (graphLoadingProgress.isPresent()) {
...>     var numLoadedVertices = graphLoadingProgress.get().getNumLoadedVertices();
...>   }
...>   Thread.sleep(1000);
...> }

opg4j> var graph = graphLoadingFuture.get();
graph ==> PgxGraph[name=BANK_GRAPH_VIEW_3,N=999,E=4993,created=1664289985985]

PgxFuture<PgxGraph> graphLoadingFuture = session.readGraphByNameAsync("BANK_GRAPH_VIEW", GraphSource.PG_VIEW);
while (!graphLoadingFuture.isDone()) {
  FutureProgress progress = graphLoadingFuture.getProgress();
  Optional < GraphLoadingProgress > graphLoadingProgress = progress.asGraphLoadingProgress();
  if (graphLoadingProgress.isPresent()) {
    long numLoadedVertices = graphLoadingProgress.get().getNumLoadedVertices();
  }
  Thread.sleep(1000);
}
PgxGraph graph = graphLoadingFuture.get();

It is recommended that you do not use the FutureProgress object in a chain of asynchronous operations.

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1.3 Graph Configuration Options

Learn about the graph configuration options.

The following table lists the JSON fields that are common to all graph configurations:

Table 16-4 Graph Config JSON Fields

Field	Type	Description	Default
`name`	`string`	Name of the graph.	Required
`array_compaction_threshold`	`number`	[only relevant if the graph is optimized for updates] Threshold used to determined when to compact the delta-logs into a new array. If lower than the engine `min_array_compaction_threshold value`, `min_array_compaction_threshold` will be used instead	`0.2`
`attributes`	`object`	Additional attributes needed to read and write the graph data.	`null`
`data_source_id`	`string`	Data source id to use to connect to an RDBMS instance.	`null`
`edge_id_strategy`	`enum[no_ids, keys_as_ids, unstable_generated_ids]`	Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to `null`), the strategy will be determined during loading or using a default value.	`null`
`edge_id_type`	`enum[long]`	Type of the edge ID. Setting it to `long` requires the IDs in the edge providers to be unique across the graphs; those IDs will be used as global IDs. Setting it to `null` (or omitting it) will allow repeated IDs across different edge providers and PGX will automatically generate globally-unique IDs for the edges.	`null`
`edge_providers`	`array of object`	List of edge providers in this graph.	`[]`
`error_handling`	`object`	Error handling configuration.	`null`
`external_stores`	`array of object`	Specification of the external stores where external string properties reside.	`[]`
`jdbc_url`	`string`	JDBC URL pointing to an RDBMS instance	`null`
`keystore_alias`	`string`	Alias to the keystore to use when connecting to database.	`null`
`loading`	`object`	Loading-specific configuration to use.	`null`
`local_date_format`	`array of string`	`array of local_date` formats to use when loading and storing `local_date` properties. See DateTimeFormatter for more details of the format string	`[]`
`max_prefetched_rows`	`integer`	Maximum number of rows prefetched during each round trip resultset-database.	`10000`
`num_connections`	`integer`	Number of connections to read and write data from or to the RDBMS table.	`<no-of-cpus>`
`optimized_for`	`enum[read, updates]`	Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates.	`read`
`password`	`string`	Password to use when connecting to database.	`null`
`point2d`	`string`	Longitude and latitude as floating point values separated by a space.	`0.0 0.0`
`prepared_queries`	`array of object`	An additional list of prepared queries with arguments, working in the same way as 'queries'. Data matching at least one those queries will also be loaded.	`[]`
`queries`	`array of string`	A list of queries used to determine which data to load from the database. Data matching at least one of the queries will be loaded. Not setting any query will load the entire graph.	[]
`redaction_rules`	`array of object`	Array of redaction rules.	`[]`
`rules_mapping`	`array of object`	Mapping for redaction rules to users and roles.	`[]`
`schema`	`string`	Schema to use when reading or writing RDBMS objects	`null`
`source_name`	`string`	Name of the database graph, if the graph is loaded from a database.	`null`
`source_type`	`enum[pg_view]`	Source type for database graphs.	`null`
`time_format`	`array of string`	The `time` format to use when loading and storing time properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`time_with_timezone_format`	`array of string`	The `time with timezone` format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for more information of the format string.	`[]`
`timestamp_format`	`array of string`	The `timestamp` format to use when loading and storing timestamp properties. See DateTimeFormatter for more information of the format string.	`[]`
`timestamp_with_timezone_format`	`array of string`	The `timestamp with timezone` format to use when loading and storing `timestamp with timezone` properties. See DateTimeFormatter for more information of the format string.	`[]`
`username`	`string`	Username to use when connecting to an RDBMS instance.	`null`
`vector_component_delimiter`	`character`	Delimiter for the different components of vector properties.	`;`
`vertex_id_strategy`	`enum[no_ids, keys_as_ids, unstable_generated_ids]`	Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to `null`), the strategy will be automatically detected.	`null`
`vertex_id_type`	`enum[int, integer, long, string]`	Type of the vertex ID. For homogeneous graphs, if not specified (or set to `null`), it will default to a specific value (depending on the origin of the data).	`null`
`vertex_providers`	`array of object`	List of vertex providers in this graph.	`[]`

Note:

Database connection fields specified in the graph configuration will be used as default in case underlying data provider configuration does not specify them.

Provider Configuration JSON file Options

You can specify the meta-information about each provider's data using provider configurations. Provider configurations include the following information about the provider data:

Location of the data: a file, multiple files or database providers
Information about the properties: name and type of the property

Table 16-5 Provider Configuration JSON file Options

Field	Type	Description	Default
`format`	`enum[pgb, csv, rdbms]`	Provider format.	Required
`name`	`string`	Entity provider name.	Required
`attributes`	`object`	Additional attributes needed to read and write the graph data.	`null`
`destination_vertex_provider`	`string`	Name of the destination vertex provider to be used for this edge provider.	`null`
`error_handling`	`object`	Error handling configuration.	`null`
`has_keys`	`boolean`	Indicates if the provided entities data have keys.	`true`
`key_type`	`enum[int, integer, long, string]`	Type of the keys.	`long`
`keystore_alias`	`string`	Alias to the keystore to use when connecting to database.	`null`
`label`	`string`	label for the entities loaded from this provider.	`null`
`loading`	`object`	Loading-specific configuration.	`null`
`local_date_format`	`array of string`	Array of local_date formats to use when loading and storing local_date properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`password`	`string`	Password to use when connecting to database.	`null`
`point2d`	`string`	Longitude and latitude as floating point values separated by a space.	`0.0 0.0`
`props`	`array of object`	Specification of the properties associated with this entity provider.	`[]`
`source_vertex_provider`	`string`	Name of the source vertex provider to be used for this edge provider.	`null`
`time_format`	`array of string`	The time format to use when loading and storing time properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`time_with_timezone_format`	`array of string`	The time with timezone format to use when loading and storing time with timezone properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`timestamp_format`	`array of string`	The timestamp format to use when loading and storing timestamp properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`timestamp_with_timezone_format`	`array of string`	The timestamp with timezone format to use when loading and storing timestamp with timezone properties. See DateTimeFormatter for a documentation of the format string.	`[]`
`vector_component_delimiter`	`character`	Delimiter for the different components of vector properties.	`;`

Provider Labels

The label field in the provider configuration can be used to set a label for the entities loaded from the provider. If no label is specified, all entities from the provider are labeled with the name of the provider. It is only possible to set the same label for two different providers if they have exactly the same properties (same names and same types).

Property Configuration

The props entry in the Provider configuration is an object with the following JSON fields:

Table 16-6 Property Configuration

Field	Type	Description	Default
`name`	`string`	Name of the property.	Required
`type`	`enum[boolean, integer, vertex, edge, float, long, double, string, date, local_date, time, timestamp, time_with_timezone, timestamp_with_timezone, point2d]`	Type of the property . Note: `date` is deprecated, use one of `local_date` / `time` / `timestamp` / `time_with_timezone` / `timestamp_with_timezone` instead). `vertex`/`edge` are place-holders for the type specified in `vertex_id_type`/`edge_id_type` fields.	Required
`aggregate`	`enum[identity, group_key, min, max, avg, sum, concat, count]`	[currently unsupported] which aggregation function to use, aggregation always happens by vertex key.	`null`
`column`	`value`	Name or index (starting from 0) of the column holding the property data. If it is not specified, the loader will try to use the property name as column name (for CSV format only).	`null`
`default`	`value`	Default value to be assigned to this property if datasource does not provide it. In case of `date` type: string is expected to be formatted with `yyyy-MM-dd HH:mm:ss`. If no default is present (`null`), non-existent properties will contain default Java types (primitives) or empty string (`string`) or `01.01.1970 00:00` (`date`).	`null`
`dimension`	`integer`	Dimension of property.	`0`
`drop_after_loading`	`boolean`	[currently unsupported] indicating helper properties only used for aggregation, which are dropped after loading	`false`
`field`	`value`	Name of the JSON field holding the property data. Nesting is denoted by dot - separation. Field names containing dots are possible, in this case the dots need to be escaped using backslashes to resolve ambiguities. Only the exactly specified object are loaded, if they are non existent, the default value is used.	`null`
`format`	`array of string`	Array of formats of property.	`[]`
`group_key`	`string`	[currently unsupported] can only be used if the property / key is part of the grouping expression.	`null`
`max_distinct_strings_per_pool`	`integer`	[only relevant if `string_pooling_strategy` is `indexed`] Amount of distinct strings per property after which to stop pooling. If the limit is reached an exception is thrown. If set to null, the default value from the global PGX configuration will be used.	`null`
`stores`	`array of object`	A list of storage identifiers that indicate where this property resides.	`[]`
`string_pooling_strategy`	`enum[indexed, on_heap, none]`	Indicates which string pooling strategy to use. If set to `null`, the default value from the global PGX configuration will be used.	`null`

Loading Configuration

The loading entry is a JSON object with the following fields:

Table 16-7 Loading Configuration

Field	Type	Description	Default
`create_key_mapping`	`boolean`	If `true`, a mapping between entity keys and internal IDs is prepared during loading.	`true`
`filter`	`string`	[currently unsupported] the filter expression	`null`
`grouping_by`	`array of string`	[currently unsupported] array of edge properties used for aggregator. For Vertices, only the ID can be used (default)	`[]`
`load_labels`	`boolean`	Whether or not to load the entity label if it is available.	`false`
`strict_mode`	`boolean`	If `true`, exceptions are thrown and logged with `ERROR` level whenever loader encounters problems with input file, such as invalid format, repeated keys, missing fields, mismatches and other potential errors. If `false`, loader may use less memory during loading phase, but behave unexpectedly with erratic input files.	`true`

Error Handling Configuration

The error_handling entry is a JSON object with the following fields:

Table 16-8 Error Handling Configuration

Field	Type	Description	Default
`on_missed_prop_key`	`enum[silent, log_warn, log_warn_once, error]`	Error handling for a missing property key.	`log_warn_once`
`on_missing_vertex`	`enum[ignore_edge, ignore_edge_log, ignore_edge_log_once, create_vertex, create_vertex_log, create_vertex_log_once, error]`	Error handling for a missing source or destination vertex of an edge in a vertex data source.	`error`
`on_parsing_issue`	`enum[silent, log_warn, log_warn_once, error]`	Error handling for incorrect data parsing. If set to silent, `log_warn` or `log_warn_once`, will attempt to continue loading. Some parsing issues may not be recoverable and provoke the end of loading.	`error`
`on_prop_conversion`	`enum[silent, log_warn, log_warn_once, error]`	Error handling when encountering a different property type other than the one specified, but coercion is possible.	`log_warn_once`
`on_type_mismatch`	`enum[silent, log_warn, log_warn_once, error]`	Error handling when encountering a different property type other than the one specified, but coercion is not possible.	`error`
`on_vector_length_mismatch`	`enum[silent, log_warn, log_warn_once, error]`	Error handling for a vector property that does not have the correct dimension.	`error`

Note:

The only supported setting for the on_missing_vertex error handling configuration is ignore_edge.

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1.4 Data Loading Security Best Practices

Loading a graph from the database requires authentication and it is therefore important to adhere to certain security guidelines when configuring access to this kind of data source.

The following guidelines are recommended:

The user or role used to access the data should be a read-only account that only has access to the required graph data.
The graph data should be marked as read-only, for example, with non-updateable views in the case of the database.

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1.5 Data Format Support Matrix

Learn about the different data formats supported in the graph server (PGX).

The following table illustrates how the different data formats differ in the way IDs, labels and vector properties are handled.

Note:

The table refers to limitations of the PGX implementation of the format and not necessarily to limitations of the format itself.

Table 16-9 Data Format Support Matrix

Format	Vertex IDs	Edge IDs	Vertex Labels	Edge Labels	Vector properties
`PGB`	`int, long, string`	`long`	multiple	single	supported (vectors can be of type `integer`, `long`, `float` or `double`)
`CSV`	`int, long, string`	`long`	multiple	single	supported (vectors can be of type `integer`, `long`, `float` or `double`)
`ADJ_LIST`	`int, long, string`	Not supported	Not supported	Not supported	supported (vectors can be of type `integer`, `long`, `float` or `double`)
`EDGE_LIST`	`int, long, string`	Not supported	multiple	single	supported (vectors can be of type `integer`, `long`, `float` or `double`)
`GRAPHML`	`int, long, string`	Not supported	Not supported	Not supported	Not supported

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)

16.2.1.6 Immutability of Loaded Graphs

Once the graph is loaded into the graph server (PGX), the graph and its properties are automatically marked as immutable.

The immutability of loaded graphs is due to the following design choices:

Typical graph analyses happen on a snapshot of a graph instance, and therefore they do not require mutations of the graph instance.
Immutability allows PGX to use an internal graph representation optimized for fast analysis.
In remote mode, the graph instance might be shared among multiple clients.

However, the graph server (PGX) also provides methods to customize and mutate graph instances for the purpose of analysis. See Graph Mutation and Subgraphs for more information.

Parent topic: Reading Graphs from Oracle Database into the Graph Server (PGX)