PGX 20.1.1
Documentation

Detecting Graph Configurations

PGX can automatically detect the graph configuration for GraphML (non-partitioned graphs only), PGB and CSV files. Both loading the graph immediately and creating a graph config object for a graph file is supported. If the expected format is not explicitly specified, PGX will try to detect the format via the file extension and by looking for known magic words inside the file.

Property names may be assigned automatically

Note that PGB files do not contain any names for the vertex and edge properties. Therefore, PGX will generate names for all properties found in the file.

CSV format detection

Format detection from CSV files require the files to contain a header and this header to follow the syntax described here

Loading Graphs Directly

The following PgxSession methods can be used to load a graph without an explicit graph configuration:

PgxFuture<PgxGraph> readGraphFileAsync(String path)
PgxFuture<PgxGraph> readGraphFilesAsync(List<String> paths)

If the vertices and edges are located in segregated files, the following methods should be used:

PgxFuture<PgxGraph> readGraphFilesAsync(String vertexPath, String edgePath)
PgxFuture<PgxGraph> readGraphFilesAsync(List<String> vertexPaths. List<String> edgePaths)

Generating Graph Configurations

The following PgxSession methods can be used to generate a graph config for a specific graph:

PgxFuture<GraphConfig> describeGraphFileAsync(String path)
PgxFuture<GraphConfig> describeGraphFilesAsync(List<String> paths)

As above, the following methods should be used when vertices and edges do not share the same file:

PgxFuture<GraphConfig> describeGraphFilesAsync(String vertexPath, String edgePath)
PgxFuture<GraphConfig> describeGraphFilesAsync(List<String> vertexPaths, List<String> edgePaths)

Variants of these methods are available. See the Javadocs for a complete overview.

CSV Header Format

When loading a graph from CSV files with automatic configuration detection, the files have to conform to the following syntax.

Special Columns

Headers for special column (ID, labels, edge source and destination) are of the form name:KEYWORD(arg1;arg2;...). The name can be omitted, but if it is present the column will be loaded as a property with this name in addition to its special purpose.

Property Columns

The columns holding property data have a header of the form property_name:property_type. Columns where no property type is specified default to string. A column can be skipped by the loader by using the :IGNORE keyword.

Vertex Tables

Here is an example of a vertex table header:

:VID(integer),prop1:string,prop2:integer,:LABELS

This header defines four columns: 1. the vertex ID column, of type integer 2. a property named prop1 of type string 3. a property named prop2 of type integer 4. the vertex labels column

The following keywords are recognized for vertex tables: :VID(type;table_name): vertex ID * type: type of the vertex ID (integer, long or string) * table_name (only for partitioned graphs, optional): if it is omitted, the table name will be generated from the file name by removing the extension and an optional _n suffix, where n is an integer partition number. (e.g. a table in people_2.csv will be called people). Files with the same table name are loaded into the same vertex table. In that case, their structure has to be identical. :LABELS(separator) (only in non-partitioned graphs, optional): vertex labels * separator (optional). Specifies a different character to separate labels in the column than the default ;.

Edge Tables

Here is an example of an edge table header:

:IGNORE,id:EID(relationships),:SRC(people),:DST(people)

This header defines four columns: 1. a column whose contents will be ignored when loading the table 2. the edge ID column, which defines the edge table name to be relationships and will also be loaded as a property named id (of type long) 3. the source vertex ID column, which points to the vertex table people 4. the destination vertex ID column, which also points to the vertex table people

The following keywords are recognized for edge tables: :SRC(source_table): source vertex table * source_table (only for partitioned graphs): specifies the name of the table containing the source vertices of the edges in this table. :DST(destination_table): destination vertex table * destination_table: (only for partitioned graphs). Specifies the name of the table containing the destination vertices of the edges in this table. :EID(table_name): edge ID * table_name (only for partitioned graphs, optional): if it is omitted or the :EID field is not present in the header, the table name is generated the same way as for vertex tables. :LABEL (only for non-partitioned graphs, optional): edge label

Additional Support

It is possible to load graphs from CSV files exported from or created for (i.e., with header formats supported by) Neo4j and Amazon Neptune. Currently, properties of type point and duration (for Neo4j) as well as array properties (for both formats) are not supported.

See Also

More examples are provided in this tutorial