About Vertex and Edge IDs

Generating vertex and edge IDs when loading from database tables into PGX

PGX enforces by default the existence of a unique identifier for each vertex and edge in a graph, so that they can be retrieved by using PgxGraph.getVertex(ID id) and PgxGraph.getEdge(ID id) or by PGQL queries using the built-in id() method.

The ID generation strategies can be selected through the configuration parameters vertex_id_strategy and edge_id_strategy.

Using keys to generate IDs

The default strategy to generate the vertex IDs is to use the keys provided during loading of the graph (keys_as_ids). In that case, each vertex should have a vertex key that is unique across all providers.

For edges, by default no keys are required in the edge data, and edge IDs will be automatically generated by PGX (unstable_generated_ids). Note that the generation of edge IDs is not guaranteed to be deterministic. If required, it is also possible to load edge keys as IDs.

The partitioned_ids strategy requires keys to be unique only within a vertex or edge provider (data source). The keys do not have to be globally unique. Globally unique IDs are derived from a combination of the provider name and the key inside the provider, as <provider_name>(<unique_key_within_provider>). For example, Account(1).

The partititioned_ids strategy can be set through the configuration fields vertex_id_strategy and edge_id_strategy. For example,

{
  "name": "bank_graph_analytics",
  "optimized_for": "updates",
  "vertex_id_strategy" : "partitioned_ids",
  "edge_id_strategy" : "partitioned_ids",
  "vertex_providers": [
    {
      "name": "Accounts",
      "format": "rdbms",
      "database_table_name": "BANK_NODES",
      "key_column": "ID",
      "key_type": "integer",
      "props": [
        {
          "name": "keyProp",
          "type": "long",
          "column": 1
        },
        {
          "name": "number",
          "type": "long",
          "column": 2
        }
      ],
      "loading": {
        "create_key_mapping" : true
      }
    }
  ],
  "edge_providers": [
    {
      "name": "Transfers",
      "format": "rdbms",
      "database_table_name": "BANK_EDGES_AMT",
      "key_column": "ID",
      "source_column": "SRC_ID",
      "destination_column": "DEST_ID",
      "source_vertex_provider": "Accounts",
      "destination_vertex_provider": "Accounts",
      "props": [
        {
          "name": "keyProp",
          "type": "long",
          "column": 1
        },
        {
          "name": "amount",
          "type": "double",
          "column": 4
        }
      ],
      "loading": {
        "create_key_mapping" : true
      }
    }
  ]
}

Note:

All available key types are supported in combination with partitioned IDs.

After the graph is loaded, PGX maintains information about which property of a provider corresponds to the key of the provider. In the preceding example, the vertex property keyProp happens to correspond to the vertex key ("column": 1) and also the edge property keyProp happens to correspond to the edge key (again, "column": 1). Each provider can have at most one such "key property" and the property can have any name.

Key properties are used for internal optimizations as well as for providing keys for the vertex or edge or both when inserting new entities. Key properties are currently non-updatable. Trying to update a key property will result in an error. For example,

vertex key property ID cannot be updated

Using an auto-incrementer to generate IDs

It is recommended to always set create_key_mapping to true to benefit from performance optimizations. But if there are no single-column keys for edges, create_key_mapping can be set to false. Similarly, create_key_mapping can be set to false for vertex providers also. IDs will be generated via an auto-incrementer, for example Accounts(1), Accounts(2), Accounts(3).