Loading Graph Data in Parallel from Multiple Files

30.2 Loading Graph Data in Parallel from Multiple Files

You can load a graph in parallel using multiple files.

The following graph configuration can be used to load the graph data from four vertex files and two edge files into the same graph. Note that all the uris are specified inside the JSON graph configuration.

{
  "name": "parallelLoadingExampleGraph",
  "vertex_providers": [
    {
      "name": "Person",
      "format": "csv",
      "uris": [
        "../person1.csv",
        "../person2.csv"
      ],
      "props": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "age",
          "type": "integer"
        }
      ]
    },
    {
      "name": "House",
      "format": "csv",
      "uris": [
        "../house1.csv",
        "../house2.csv"
      ],
      "props": [
        {
          "name": "sqm",
          "type": "float"
        }
      ]
    }
  ],
  "edge_providers": [
    {
      "name": "PersonLivesInHouse",
      "format": "csv",
      "uris": [
        "../personLivesInHouse1.csv",
        "../personLivesInHouse2.csv"
      ],
      "source_vertex_provider": "Person",
      "destination_vertex_provider": "House",
      "props": [
        {
          "name": "movedSince",
          "type": "timestamp"
        }
      ]
    }
  ]
}

The graph server (PGX) will automatically load the graph in parallel, using one thread for each file. This means that a graph can be loaded in parallel with as many threads as files are given depending on the configured parallelism for the graph server (PGX) instance.

Since the graph configuration will be used for all of the specified files, it is crucial to use the same format for all these files, that is, using the same separator, having the same defined properties, complying with the same format specification.

Parent topic: Working with Files Using the Graph Server (PGX)