PGX 20.1.1
Documentation

Load Your Own Custom Graph Data

In this guide, you will learn how to load your own custom graph data. We will create a graph, alter it in several steps, and then explain how to ensure that it is loaded properly. Although PGX supports a handful of different graph formats, this guide only uses the two_tables format, with csv format for vertices and edges data. For more information about graph formats, please refer to the file format reference and the database format reference page.

Preloading graphs

It is possible to instruct the PGX server to pre-load a graph in memory with the graphs configuration entry for pgx.conf (refer to this page for all the available configuration options). For example, if you have something like: { "graphs": [ "${PATH_TO_GRAPH0_JSON}", "${PATH_TO_GRAPH1_JSON}", ... ] } in your pgx.conf, those graphs will be pre-loaded when the PGX server starts and will be readily available for analysis, without any loading time. To get a handle to preloaded graphs in a PGX sessions, you can call the usual readGraph() APIs with the exact same graph configuration that is set in pgx.conf.

Create a Simple Graph File

First we will create a small, simple graph in csv format, with no vertex or edge properties. We will create two csv files: one for vertices and one for edges.

Each line in the vertices file has one number which corresponds to the vertex ID.

1
2
3
4

Each line in the edges file corresponds to an edge. The first number in each line is the vertex ID, followed by one of the vertex IDs which its outgoing edges are pointing to. The individual tokens (e.g. the vertices) are separated by a comma.

1,2
2,3
2,4
3,4
4,2

We have 4 vertices and 5 edges. Note: The edge from 2 to 4 goes in both directions.

graph_1

To load this graph into PGX, we have to write a JSON graph config file that specifies the data file to load, the format used and the names and types of properties. We can write the config in JSON format into a file or, when using the Java API, by using a GraphConfigBuilder object. Below are examples of both methods:

{
    "format": "csv",
    "vertex_uris": [
        "sample.vertices.csv"
    ],
    "edge_uris": [
        "sample.edges.csv"
    ]
}
FileGraphConfig config = GraphConfigBuilder.forFileFormat(Format.CSV)
   .addVertexUri("sample.vertices.csv")
   .addEdgeUri("sample.edges.csv")
   .build();

With the graph file and the graph config, we can load the graph into PGX.

Add a Vertex Property

The previous graph consists only of vertices and edges, without any vertex or edge properties. When we add a double vertex property to our graph, the vertex data looks like this:

1,0.1
2,2.0
3,0.3
4,4.56789

The vertex properties are positioned directly after the vertex ID in each line. Note: PGX supports non-partitioned graphs; i.e. graphs where all vertices must have the same number and types of properties; as well as partitioned graphs in which vertices can have different properties.

For PGX to read the modified file, we have to declare this vertex property in our config file or the builder code. We can choose a descriptive name for the property and set the type to double:

{
    "format": "csv",
    "vertex_uris": [
        "sample.vertices.csv"
    ],
    "edge_uris": [
        "sample.edges.csv"
    ],
    "vertex_props":[{
        "name":"double-prop",
        "type":"double"
    }]
}
FileGraphConfig config = GraphConfigBuilder.forFileFormat(Format.CSV)
    .addVertexUri("sample.vertices.csv")
    .addEdgeUri("sample.edges.csv")
    .addVertexProperty("double-prop", PropertyType.DOUBLE)
    .setSeparator(",")
    .build();

Use Strings as Vertex Identifiers

The previous examples all had integer vertex IDs to identify a vertex. integer vertex IDs are the default in PGX, however, you can change the graph so that it uses string vertex IDs instead of integer numbers.

The vertex data will look like this:

"vertex 1",0.1
"vertex 2",2.0
"vertex 3",0.3
"vertex 4",4.56789

While the edges data will look like this:

"vertex 1","vertex 2"
"vertex 2","vertex 3"
"vertex 2","vertex 4"
"vertex 3","vertex 4"
"vertex 4","vertex 2"

Again, we need to modify the graph config file or the corresponding Java code to match the graph file:

{
    "format": "csv",
    "vertex_uris": [
        "sample.vertices.csv"
    ],
    "edge_uris": [
        "sample.edges.csv"
    ],
    "vertex_props":[{
        "name":"double-prop",
        "type":"double"
    }],
    "vertex_id_type":"string"
}
FileGraphConfig config = GraphConfigBuilder.forFileFormat(Format.CSV)
    .addVertexUri("sample.vertices.csv")
    .addEdgeUri("sample.edges.csv")
    .addVertexProperty("double-prop", PropertyType.DOUBLE)
    .setVertexIdType(IdType.STRING)
    .build();

String Memory Consumption

Note: string vertex IDs consume much more memory than integer vertex IDs. Refer to the memory requirements documentation for more details.

Add an Edge Property

Finally, we will add an edge property to our small graph. We want it to be of type string. Note: The edge properties are positioned after the destination vertex ID:

"vertex 1","vertex 2","edge_prop_1_2"
"vertex 2","vertex 3","edge_prop_2_3"
"vertex 2","vertex 4","edge_prop_2_4"
"vertex 3","vertex 4","edge_prop_3_4"
"vertex 4","vertex 2","edge_prop_4_2"

Declaring the edge property in both the config and the builder code is just as easy as for a vertex property:

{
    "format": "csv",
    "vertex_uris": [
        "sample.vertices.csv"
    ],
    "edge_uris": [
        "sample.edges.csv"
    ],
    "vertex_props":[{
        "name":"double-prop",
        "type":"double"
    }],
    "vertex_id_type":"string",
     "edge_props":[{
        "name":"edge-prop",
        "type":"string"
    }]
}
FileGraphConfig config = GraphConfigBuilder.forFileFormat(Format.CSV)
    .addVertexUri("sample.vertices.csv")
    .addEdgeUri("sample.edges.csv")
    .addVertexProperty("double-prop", PropertyType.DOUBLE)
    .setVertexIdType(IdType.STRING)
    .addEdgeProperty("edge-prop", PropertyType.STRING)
    .setSeparator(",")
    .build();

Feel free to try out other options with the sample graph, for example, by adding additional properties.