Plain Text Formats

15.1.4 Plain Text Formats

The graph server (PGX) supports the following plain-text formats:

Comma-Separated Values (CSV)
Adjacency List (ADJ_LIST)
Edge List (EDGE_LIST)
Two Tables (TWO_TABLES)
Flat File (FLAT_FILE)

Parsing of Vertices

PGX supports three types of vertex identifies (id): integer, long and string. The type defaults to integer, but can be configured through the vertex_id_type option in the graph configuration.

Parsing of Edges

Of the various formats and protocols supported by graph server (PGX), only CSV and flat file parsing support edge identifiers. For all other data sources, the id of an edge is PGX's internal id, which is an integer from zero to num_edges - 1.

Parsing of Properties

string properties, spatial properties (currently only point2d) and temporal properties (date, local_date, time, timestamp, time_with_timezone and timestamp_with_timezone) must be quoted ("<string>") only if they contain a separator character (usually , for CSV and ' ' for Edge List and Adjacency List) or if they contain " or \n.

date properties are parsed using Java's SimpleDateFormat utility, instantiated with the format string yyyy-MM-dd HH:mm:ss unless specified otherwise in the graph configuration. All other types of temporal properties are parsed using Java's DateTimeFormatter utility.

point2d can be specified by its longitude followed by its latitude, separated by a space. Both longitude and latitude are doubles. For example, "-74.0445 40.6892" is the representation of a point2d instance representing the location of the Statue of Liberty.

Boolean values are interpreted as true if the value is true (ignoring case), Y (ignoring case) or 1, false otherwise. The suggested notation for false is false (ignoring case), N (ignoring case) or 0. All other types are parsed using the parseXXX() functions of its corresponding Java type, for example, Integer.parseInt(...) for integer types.

Vector properties are supported in the Adjacency List (ADJ_LIST), Comma-Separated Values (CSV), Edge List (EDGE_LIST), and Two Tables text (TWO_TABLES) formats. Vector properties with vector components of type integer, long, float and double can be loaded from these formats. In order to specify that a vertex or edge property is a vector property, the dimension field of the graph property configuration must be set to the dimension of the vector and be a strictly positive integer value. A vector value is represented in the supported text formats by the list of the vector components values separated by the vector component delimiter. By default the vector component delimiter is ;, but this delimiter can be changed by changing the vector_component_delimiter graph configuration entry. Therefore a 3-dimensional vector of doubles could for example look like 0.1;0.0004;3.14 in the text file if the vector component delimiter is ;.

Separators

When using single file formats, IDs and properties are separated with tab or one single space ("\t ") by default, for multiple file formats comma (",") is used instead. However, PGX allows to configure the separator string.

Parallel Loading

The following formats support parallel loading from multiple files:

CSV (specify multiple files in vertex_uris and/or edge_uris)
Adjacency List (specify multiple files in uris)
Edge List (specify multiple files in uris)
Two Tables (specify multiple files in vertex_uris and/or edge_uris)
Flat File (specify multiple files in vertex_uris and/or edge_uris)

Legend

The following abbreviations are used to specify text formats:

V = Vertex Key
VG = Neighbor Vertex
VL = Vertex Labels
VP = Vertex Property
VPK = Vertex Property Key
VPT = Vertex Property Type
EL = Edge Label
EP = Edge Property
EPK = Edge Property Key
EPT = Edge Property Type

For example <V-2, VG-4> or <V-2, VG-4> denotes the 4th neighbor of the 2nd vertex.

Parent topic: Loading Graph Data from Files