****************** Configless Loading ****************** PGX allows to load graphs from certain file formats without having to write a configuration file. This guide illustrates how to load a graph from CSV files in such a way. Vertex table ------------ Consider a CSV file containing personal data. Each row contains a record. The first column contains the people's social security number, the second their date of birth, and the third their name. This file will be the vertices file. To be able to load the vertices into PGX, this file needs a header. This header needs to specify a column holding the vertex IDs, with the ``:VID`` keyword. The other columns will be loaded as properties, and as such their names need to be suffixed with the type of the data they contain. The annotated file looks as follows: .. code-block:: none ssn:string,age:integer,name:VID(string),:LABEL 555-55-5555,45,"John Doe",Person 666-66-6666,29,"Jane Smith",Person ... The ``:VID`` keyword needs to be parameterized with the type of the data contained in the column. Also, as ``name`` is specified before the colon, the column will also be loaded as a property with this name. ``:LABEL`` marks column containing the vertex label. Edge table ---------- Consider also relationship data between people, in another CSV file. The first and third columns contain the people involved in the relationship, and the second column holds the relationship type. This file will be the edge table. Both name columns will be the source, respectively destination vertex columns, and the type column will be the edge label. The source and destination columns are specified with the ``:SRC`` and ``:DST`` keywords, and the label with the ``:LABEL`` keyword. ``:EID`` keyword marks the column with edge IDs. The resulting file is: .. code-block:: none :SRC,:LABEL,:DST,:EID "John Doe",friendsWith,"Jane Smith",1 "Jane Smith",friendsWith,"John Doe",2 "John Doe",employs,"Jack Brown",3 ... Loading the graph ----------------- Assuming the vertex data file is named ``people.csv``, the edge data file is named ``relationships.csv`` and both files are in the current directory, loading the graph from the PGX shell is done by the following API call: .. code-block:: python :linenos: people_csv = self.pgx_test_resources + "/documentation-graphs/people.csv" relationship_csv = self.pgx_test_resources + \ "/documentation-graphs/relationships.csv" session.read_graph_files( people_csv, edge_file_paths=relationship_csv, graph_name="tutorial" ) The third argument allows to specify the name of the loaded graph. Partitioned graph example ------------------------- It is also possible to load graphs with multiple vertex tables and edge tables. Consider another vertex file ``universities.csv``. The file contains the name, location and foundation year of several universities. The header for this file is very similar to the one for non-partitioned graphs. The only difference is that ``:VID`` takes a second argument to specify the table name. .. code-block:: none name:VID(string;universities),location,founding_year:integer "MIT","Boston, MA",1861 "Carnegie Mellon","Pittsburgh, PA",1900 "Stanford","Stanford, CA",1891 "UC Berkeley","Berkeley, CA",1868 ... The header doesn't specify a property type for ``location``, so it will default to ``str`` at loading. The ``people.csv`` file we used above can be used as is in a partitioned graph. The table name will be inferred from the file name, and the data will be loaded into a table named ``people``. The edges of the partitioned graph will be in the ``studiesAt.csv`` file. The file contains information about who goes to which university, as well as the respective student ID numbers. These numbers will not be loaded into the graph, as they will be skipped with the ``:IGNORE`` keyword. Contrarily to what is the case for non-partitioned graphs, the edge table header needs to specify to which table the two ends of the edges belong by giving it as argument to the ``:SRC`` and ``:DST`` keywords. .. code-block:: none studentId:IGNORE,:SRC(people),:DST(universities) 792,"John Doe","MIT" 4289,"Jane Smith","Stanford" ... Loading the graph is then done as follows, assuming like above that the files are in the current directory: .. code-block:: python :linenos: universities_csv = self.pgx_test_resources + \ "/documentation-graphs/universities.csv" studiesat_csv = self.pgx_test_resources + "/documentation-graphs/studiesAt.csv" session.read_graph_files( [people_csv, universities_csv], edge_file_paths=[studiesat_csv] )