15.1.4.1 Comma-Separated Values (CSV)

The CSV format is a text file format with vertices and edges stored in different files. Each line of the files represents a vertex or an edge. The vertex key and labels, the edge key, source, destination and label, and the attached properties are stored in the order specified by the file header (first line) and the configuration.

A graph with V vertices, having N vertex properties and K neighbors each, and E edges, having M edge properties, would be represented in CSV as shown:

vertices.csv

<V-1>,<VL-1>,<V-1, NP-1>,...,<V-1, NP-N>
<V-2>,<VL-2>,<V-2, NP-1>,...,<V-2, NP-N>
...
<V-V>,<VL-N>,<V-V, NP-1>,...,<V-V, NP-N>
edges.csv

<E-1>,<V-1>,<V-1, VG-1>,<EL-1>,<E-1, EP-1>,...,<E-1, EP-M>
...
<E-K>,<V-1>,<V-1, VG-K>,<EL-N>,<E-K, EP-1>,...,<E-K, EP-M>
<E-K+1>,<V-2>,<V-2, VG-1>,<EL-N+1>,<E-K+1, EP-1>,...,<E-K+1, EP-M>
...
<E-V*K>,<V-V>,<V-V, VG-K>,<EL-V*K>,<E-V*K, EP-1>,...,<E-V*K, EP-M>

Example 15-1 Loading graph from a CSV file with header details

The following examples shows a graph configuration file for loading a graph with two vertices and two edges:

vertices.csv

key,integer_prop,string_prop
1,33,"Alice"
2,42,"Bob"
edges.csv

source,dest,integer_prop,string_prop
1,2,0,"baz"
2,2,-12,"bat"

The corresponding graph configuration file is as shown:

{
    "format": "csv",
    "header": true,
    "vertex_id_column": "key",
    "edge_source_column": "source",
    "edge_destination_column": "dest",
    "vertex_uris": ["vertices.csv"],
    "edge_uris": ["edges.csv"],
    "vertex_props": [
        {
            "name": "integer_prop",
            "type": "integer"
        },
        {
            "name": "string_prop",
            "type": "string"
        }
    ],
    "edge_props": [
        {
            "name": "integer_prop",
            "type": "integer"
        },
        {
            "name": "string_prop",
            "type": "string"
        }
    ]
}

Example 15-2 Loading graph from a CSV file without header details

The following examples shows a graph configuration file for loading a graph with two vertices and two edges:

vertices.csv

1,33,"Alice"
2,42,"Bob"
edges.csv

1,2,0,"baz"
2,2,-12,"bat"
The corresponding graph configuration file is as shown:

Note:

The column indices are given in place of the column names.
{
    "format": "csv",
    "header": false,
    "vertex_id_column": 1,
    "edge_source_column": 1,
    "edge_destination_column": 2,
    "vertex_uris": ["vertices.csv"],
    "edge_uris": ["edges.csv"],
    "vertex_props": [
        {
            "name": "integer_prop",
            "type": "integer",
            "column": 2
        },
        {
            "name": "string_prop",
            "type": "string",
            "column": 3
        }
    ],
    "edge_props": [
        {
            "name": "integer_prop",
            "type": "integer",
            "column": 3
        },
        {
            "name": "string_prop",
            "type": "string",
            "column": 4
        }
    ]
}
If no column indices are set in the configuration file, the columns are assumed to be in the following order:
  • For vertex files: - Vertex ID - Vertex labels (if present) - Vertex properties in the order they are declared in the configuration
  • For edge files: - Edge ID (if present) - Edge source - Edge destination - Edge label (if present) - Edge properties in the order they are declared in the configuration

Therefore the earlier configuration is equivalent to:

{
    "format": "csv",
    "header": false,
    "vertex_uris": ["vertices.csv"],
    "edge_uris": ["edges.csv"],
    "vertex_props": [
        {
            "name": "integer_prop",
            "type": "integer"
        },
        {
            "name": "string_prop",
            "type": "string"
        }
    ],
    "edge_props": [
        {
            "name": "integer_prop",
            "type": "integer"
        },
        {
            "name": "string_prop",
            "type": "string"
        }
    ]
}