4.3 About Vertex and Edge IDs

Generating vertex and edge IDs when loading from database tables into PGX

PGX enforces by default the existence of a unique identifier for each vertex and edge in a graph, so that they can be retrieved by using PgxGraph.getVertex(ID id) and PgxGraph.getEdge(ID id) or by PGQL queries using the built-in id() method.

The following supported ID generation strategies can be selected through the configuration parameters vertex_id_strategy and edge_id_strategy:
  • keys_as_ids: This is the default strategy to generate vertex IDs.
  • partitioned_ids: This is the recommended strategy for partitioned graphs.
  • unstable_generated_ids: This results in system generated vertex or edge IDs.
  • no_ids: This strategy disables vertex or edge IDs and therefore prevents you from calling APIs using vertex or edge IDs.

Using keys to generate IDs

The default strategy to generate the vertex IDs is to use the keys provided during loading of the graph (keys_as_ids). In that case, each vertex should have a vertex key that is unique across all providers.

For edges, by default no keys are required in the edge data, and edge IDs will be automatically generated by PGX (unstable_generated_ids). This automatic ID generation can be applied for vertex IDs also. Note that the generation of vertex or edge IDs is not guaranteed to be deterministic. If required, it is also possible to load edge keys as IDs.

The partitioned_ids strategy requires keys to be unique only within a vertex or edge provider (data source). The keys do not have to be globally unique. Globally unique IDs are derived from a combination of the provider name and the key inside the provider, as <provider_name>(<unique_key_within_provider>). For example, Account(1).

The partititioned_ids strategy can be set through the configuration fields vertex_id_strategy and edge_id_strategy. For example,

{
  "name": "bank_graph_analytics",
  "optimized_for": "updates",
  "vertex_id_strategy" : "partitioned_ids",
  "edge_id_strategy" : "partitioned_ids",
  "vertex_providers": [
    {
      "name": "Accounts",
      "format": "rdbms",
      "database_table_name": "BANK_ACCOUNTS",
      "key_column": "ID",
      "key_type": "integer",
      "props": [
        {
          "name": "ID",
          "type": "integer"
        },
        {
          "name": "NAME",
          "type": "string"
        }
      ],
      "loading": {
        "create_key_mapping" : true
      }
    }
  ],
  "edge_providers": [
    {
      "name": "Transfers",
      "format": "rdbms",
      "database_table_name": "BANK_TXNS",
      "key_column": "ID",
      "source_column": "FROM_ACCT_ID",
      "destination_column": "TO_ACCT_ID",
      "source_vertex_provider": "Accounts",
      "destination_vertex_provider": "Accounts",
      "props": [
         {
          "name": "ID",
          "type": "integer"
        },
        {
          "name": "AMOUNT",
          "type": "double"
        }
      ],
      "loading": {
        "create_key_mapping" : true
      }
    }
  ]
}

Note:

All available key types are supported in combination with partitioned IDs.

After the graph is loaded, PGX maintains information about which property of a provider corresponds to the key of the provider. In the preceding example, the vertex property ID happens to correspond to the vertex key and also the edge property ID happens to correspond to the edge key. Each provider can have at most one such "key property" and the property can have any name.

Key properties are used for internal optimizations as well as for providing keys for the vertex or edge or both when inserting new entities. Key properties are currently non-updatable. Trying to update a key property will result in an error. For example,
vertex key property ID cannot be updated

Using an auto-incrementer to generate partitioned IDs

It is recommended to always set create_key_mapping to true to benefit from performance optimizations. But if there are no single-column keys for edges, create_key_mapping can be set to false. Similarly, create_key_mapping can be set to false for vertex providers also. IDs will be generated via an auto-incrementer, for example Accounts(1), Accounts(2), Accounts(3).

See PGQL Queries with Partitioned IDs for more information on executing PGQL queries with partitioned IDs.