Configless Loading
PGX allows to load graphs from certain file formats without having to write a configuration file. This guide illustrates how to load a graph from CSV files in such a way.
Vertex table
Consider a CSV file containing personal data. Each row contains a record. The first column contains the people’s social security number, the second their date of birth, and the third their name. This file will be the vertices file.
To be able to load the vertices into PGX, this file needs a header. This header needs to specify a column holding the vertex IDs, with the :VID
keyword.
The other columns will be loaded as properties, and as such their names need to be suffixed with the type of the data they contain. The annotated file looks as follows:
ssn:string,age:integer,name:VID(string),:LABEL
555-55-5555,45,"John Doe",Person
666-66-6666,29,"Jane Smith",Person
...
The :VID
keyword needs to be parameterized with the type of the data contained in the column. Also, as name
is specified before the colon, the column will also be loaded as a property with this name. :LABEL
marks column containing the vertex label.
Edge table
Consider also relationship data between people, in another CSV file. The first and third columns contain the people involved in the relationship, and the second column holds the relationship type. This file will be the edge table.
Both name columns will be the source, respectively destination vertex columns, and the type column will be the edge label. The source and destination columns are specified with the :SRC
and :DST
keywords, and the label with the :LABEL
keyword. :EID
keyword marks the column with edge IDs. The resulting file is:
:SRC,:LABEL,:DST,:EID
"John Doe",friendsWith,"Jane Smith",1
"Jane Smith",friendsWith,"John Doe",2
"John Doe",employs,"Jack Brown",3
...
Loading the graph
Assuming the vertex data file is named people.csv
, the edge data file is named relationships.csv
and both files are in the current directory, loading the graph from the PGX shell is done by the following API call:
1people_csv = self.pgx_test_resources + "/documentation-graphs/people.csv"
2relationship_csv = self.pgx_test_resources + \
3 "/documentation-graphs/relationships.csv"
4session.read_graph_files(
5 people_csv,
6 edge_file_paths=relationship_csv,
7 graph_name="tutorial"
8)
The third argument allows to specify the name of the loaded graph.
Partitioned graph example
It is also possible to load graphs with multiple vertex tables and edge tables.
Consider another vertex file universities.csv
. The file contains the name, location and foundation year of several universities.
The header for this file is very similar to the one for non-partitioned graphs.
The only difference is that :VID
takes a second argument to specify the table name.
name:VID(string;universities),location,founding_year:integer
"MIT","Boston, MA",1861
"Carnegie Mellon","Pittsburgh, PA",1900
"Stanford","Stanford, CA",1891
"UC Berkeley","Berkeley, CA",1868
...
The header doesn’t specify a property type for location
, so it will default to str
at loading. The people.csv
file we used above can be used as is in a partitioned graph. The table name will be inferred from the file name, and the data will be loaded into a table named people
.
The edges of the partitioned graph will be in the studiesAt.csv
file. The file contains information about who goes to which university, as well as the respective student ID numbers. These numbers will not be loaded into the graph, as they will be skipped with the :IGNORE
keyword.
Contrarily to what is the case for non-partitioned graphs, the edge table header needs to specify to which table the two ends of the edges belong by giving it as argument to the :SRC
and :DST
keywords.
studentId:IGNORE,:SRC(people),:DST(universities)
792,"John Doe","MIT"
4289,"Jane Smith","Stanford"
...
Loading the graph is then done as follows, assuming like above that the files are in the current directory:
1universities_csv = self.pgx_test_resources + \
2 "/documentation-graphs/universities.csv"
3studiesat_csv = self.pgx_test_resources + "/documentation-graphs/studiesAt.csv"
4session.read_graph_files(
5 [people_csv, universities_csv],
6 edge_file_paths=[studiesat_csv]
7)