PGX 21.1.1
Documentation

Elasticsearch Format

PGX allows the loading of graph data from Elasticsearch indices. To read from Elasticsearch, you need to specify the following additional fields in the graph config:

Field Type Description Default
formatenum[pgb, csv, rdbms, es]provider formatrequired
namestringentity provider namerequired
attributesobjectadditional attributes needed to read/write the graph datanull
destination_fieldstringname of the field in the Elasticsearch index containing the keys of the destination verticesdvkey
destination_vertex_providerstringname of the destination vertex provider to be used for this edge providernull
error_handlingobjecterror handling configurationnull
es_index_namestringIndex name on the Elasticsearch server form which graph data is loadednull
es_querystringElasticsearch query expressed as escaped JSON stringnull
es_urlstringElasticsearch URL pointing to an Elasticsearch instancenull
has_keysbooleanindicates if the provided entities data have keystrue
key_fieldstringname of the field in the Elasticsearch index containing the keys of the entitykey
key_typeenum[int, integer, long, string]type of the keyslong
keystore_aliasstringalias to the keystore to use when connecting to databasenull
labelstringlabel for the entities loaded from this providernull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_batch_sizeintegermaximal batch size of Elasticsearch response objects10000
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
propsarray of objectspecification of the properties associated with this entity provider[]
proxy_urlstringproxy server URL to be used for connection to es_urlnull
scroll_timestringtime to keep Elasticsearch-scroll alive, batch data needs to be received and processed in that time window. Follows time unit format: [number][time unit] where time unit is d for day, h for hour m for minute etc.1m
source_fieldstringname of the field in the Elasticsearch index containing the keys of the source verticessvkey
source_vertex_providerstringname of the source vertex provider to be used for this edge providernull
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
usernamestringusername to use when connecting to an Elasticsearch instancenull
vector_component_delimitercharacterdelimiter for the different components of vector properties;

Example

The following examples illustrate how to configure PGX to load a graph from an Elasticsearch server.

Example Graph Configuration

{
  "name": "example_graph_from_es",
  "es_url": "http://elastic_domain:9200",
  "es_index_name": "nested",
  "username": "john",
  "keystore_alias": "elastic_domain_user",
  "vertex_providers": [
    {
      "name": "vProvider_1",
      "max_batch_size": 5000,
      "format": "es",
      "es_query": "{ \"range\": { \"vid\": {\"gte\": 5,\"lte\": 9} } }", 
      "key_field": "name",
      "key_type": "string",
      "props": [
        {
          "name": "name_pgx",
          "type": "string",
          "field": "name"
        },
        {
          "name": "vid_pgx",
          "type": "integer",
          "field": "vid",
          "default": "0"
        },
        {
          "name": "nested_escaping",
          "type": "integer",
          "field": "a.b\\.c.d\\.e\\.f.g",
          "default": 10
        },
        {
          "name": "ambiguous_escaping",
          "type": "integer",
          "field": "a\\.b.c.d\\.e\\.f.g", 
          "default": 12
        }
      ]
    }
  ],
  "edge_providers": [
    {
      "name": "eProvider_1",
      "format": "es",
      "es_url": "http://localhost:9200",
      "es_index_name": "nested",
      "es_query": "",
      "source_vertex_provider": "vProvider_1",
      "destination_vertex_provider": "vProvider_1",
      "source_field":"name",
      "destination_field": "name",
      "props": [
        {
          "name": "name",
          "field": "name",
          "type": "string"
        }
      ]
    }
  ]
}

Handling Sparse Values and Nesting

When a specified object instance is empty, the default value is used. Nesting is denoted by dot separation. Nesting is escaped with backslashes used when names contain dots. Only the strictly specified fields are loaded, if they are empty, the default values are used. The objects in vProvider_1 in the example above are and would be expected to be constituted as follows:

nested_escaping: field: "a.b\.c.d\.e\.f.g"

{"a": {"b.c": {"d.e.f": {"g": 7}}}}

ambiguous_escaping: field: "a\.b.c.d\.e\.f.g"

{"a.b": {"c": {"d.e.f": {"g": 10}}}}

Elasticsearch Queries

Elasticsearch queries are specified as es_query fields, they are escaped JSON strings containing explicit Elasticsearch queries which are directly sent to the Elasticsearch server with the first request.