16.1.4.5 Flat File (FLAT_FILE)

The Flat File format is a text file format containing two description files, one for vertices and one for edges. Each file consists of a list of properties with the following format:

vertices.opv

vertex_ID, key_name, value_type, value, value, value

<V-1> <V-1, VPK-1> <V-1, VPT-1> [<V-1, VP-1> <V-1, VP-1> <V-1, VP-1>]
...
<V-1> <V-1, VPK-N> <V-1, VPT-1> [<V-1, VP-N> <V-1, VP-N> <V-1, VP-N>]
<V-2> <V-2, VPK-1> <V-2, VPT-1> [<V-2, VP-1> <V-2, VP-1> <V-2, VP-1>]
...
<V-2> <V-2, VPK-N> <V-2, VPT-N> [<V-2, VP-N> <V-2, VP-N> <V-2, VP-N>]
...
<V-V> <V-V, VPK-N> <V-V, VPT-N> [<V-V, VP-N> <V-V, VP-N> <V-V, VP-N>]
edges.ope

edge_ID, source_vertex_ID, destination_vertex_ID, edge_label, key_name, value_type, value, value, value


<E-1> <V-1, VG-1> <E-1, EL-1> <E-1, EPK-1> <E-1, EPT-1> [<E-1, EP-1> <E-1, EP-1> <E-1, EP-1>]
...
<E-1> <V-N, VG-N> <E-1, EL-N> <E-1, EPK-N> <E-1, EPT-N> [<E-1, EP-N> <E-1, EP-N> <E-1, EP-N>]
<E-2> <V-1, VG-1> <E-2, EL-1> <E-2, EPK-1> <E-2, EPT-1> [<E-2, EP-1> <E-2, EP-1> <E-2, EP-1>]
...
<E-2> <V-N, VG-N> <E-2, EL-N> <E-2, EPK-N> <E-2, EPT-N> [<E-2, EP-N> <E-2, EP-N> <E-2, EP-N>]
...
<E-E> <V-N, VG-N> <E-E, EL-N> <E-E, EPK-N> <E-E, EPT-N> [<E-E, EP-N> <E-E, EP-N> <E-E, EP-N>]

Special Considerations when Using Flat File Format

  • When no properties are defined for a certain vertex or edge, %20 is used instead of the key name:
    Vertices: 1,%20,,,, 
    Edges: 1,2,1,"label",%20,,,,
  • Values that are not numeric nor date go in the first field; numeric values go in the second, and dates in the third.
  • The following shows the mapping between PGX property type and flat file value_type:

    Table 16-3 Mapping between PGX Property Type and Flat File value_type

    PGX property type Flat file value_type
    STRING 1
    INTEGER 2
    FLOAT 3
    DOUBLE 4
    DATE 5
    LOCAL_DATE 5
    TIME 5
    TIMESTAMP 5
    TIME_WITH_TIMEZONE 5
    TIMESTAMP_WITH_TIMEZONE 5
    BOOLEAN 6
    LONG 7
    POINT2D 200

    Note:

    When loading a graph in flat file format into PGX, the graph configuration is used to find the right temporal or spatial type.
  • The standard for the flat file format defines commma as the only valid delimiter, therefore any delimiter set in the graph configuration is ignored and comma is used instead.
  • Strings must not be quoted, however the following encoding is needed for some characters:
    • '%' -> '%25'
    • '\t' -> '%09'
    • ' ' -> '%20'
    • '\n' -> '%0A'
    • ',' -> '%2C'
  • When storing a graph into flat file format, vertex labels will be ignored. Also, when a graph has no edge label, an empty string ("") will be stored instead.
  • When loading a graph in parallel using flat file format, all information regarding a specific vertex or edge must be contained in the same partition otherwise unexpected behavior might occur.

Example 16-6 Graph in Flat File Text format

The following example shows a graph of 4 vertices (1, 2, 3 and 4), each having a double and a string property, and 3 edges, each having a boolean and a date property, encoded in Flat File Text format:

vertices.opv:

1,doubleProp,4,,8.0,
1,stringProp,1,foo,,
2,doubleProp,4,,4.3,
2,stringProp,1,bar,,
3,doubleProp,4,,6.1,
3,stringProp,1,bax,,
4,doubleProp,4,,17.78,
4,stringProp,1,f00,,
edges.ope:

1,2,1,label,boolProp,6,false,,
1,2,1,label,dateProp,5,,,1985-10-18%2010:00:00
2,3,2,label,boolProp,6,true,,
2,3,2,label,dateProp,5,,,1961-12-30%2014:45:14
3,3,4,label,boolProp,6,false,,
3,3,4,label,dateProp,5,,,2001-01-15%2007:00:43