********************************* PgxFrame (Tabular Data-Structure) ********************************* Overview -------- :class:`PgxFrame` is a data-structure to load/store and manipulate tabular data. It contains rows and columns. A :class:`PgxFrame` can contain multiple columns where each column consist of elements of the same data type, and has a name. The list of the columns with their names and data types defines the schema of the frame. (The number of rows in the :class:`PgxFrame` is not part of the schema of the frame.) :class:`PgxFrame` provides some operations that also output :class:`PgxFrames` (described later in the tutorial). Those operations can be performed in-place (meaning that the frame is mutated during the operation) in order to save memory. In place operations should be used whenever possible. However, we provide out-place variants, i.e., a new frame is created during the operation. For all the following operations, we mention the respective out-place operations: +---------------------+---------------------+ |In-place operations | Out-place operations| +=====================+=====================+ |headInPlace | head | +---------------------+---------------------+ |tailInPlace | tail | +---------------------+---------------------+ |flattenAllInPlace | flattenAll | +---------------------+---------------------+ |renameColumnInPlace | renameColumn | +---------------------+---------------------+ |renameColumnsInPlace | renameColumns | +---------------------+---------------------+ |selectInPlace | select | +---------------------+---------------------+ Functionalities --------------- We show here the current functionalities of `PgxFrames` using some toy examples. Loading a PgxFrame (with multiple data types) from some specified path ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, create a session: .. code-block:: python :linenos: session = pypgx.get_session(session_name="my-session") We use the following sample data (in `CSV` format, with a space separator instead of comma) in the next examples of our tutorial: .. code-block:: none :linenos: "John" 27 4133300.0 true 11.0 123456782 "1985-10-18" "Albert" 23 5813000.5 false 12.0 124343142 "2000-01-14" "Heather" 28 1.0130302E7 true 10.5 827520917 "1985-10-18" "Emily" 24 9380080.5 false 13.0 128973221 "1910-07-30" """D'Juan""" 27 1582093.0 true 11.0 92384 "1955-12-01" A frame schema is necessary to load a :class:`PgxFrame`. An example frame schema with various data types can be defined as follows: .. code-block:: python :linenos: example_frame_schema = [ ("name", "STRING_TYPE"), # columnDescriptor ("age", "INTEGER_TYPE"), ("salary", "DOUBLE_TYPE"), ("married", "BOOLEAN_TYPE"), ("tax_rate", "FLOAT_TYPE"), ("random", "LONG_TYPE"), ("date_of_birth", "LOCAL_DATE_TYPE") ] Loading the ``CSV`` file with the above-mentioned schema can be performed as follows: .. code-block:: python :linenos: example_frame = session.read_frame() example_frame = example_frame.name("simple frame") example_frame = example_frame.columns(example_frame_schema) example_frame = example_frame.csv() example_frame = example_frame.separator(' ') example_frame = example_frame.load(simple_frame_csv) Loading a PgxFrame from client-side data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PgxFrames can also be loaded directly from client-side data, a frame schema is necessary to load a :class:`PgxFrame` from client-side data. An example frame schema with various data types can be defined as follows: .. code-block:: python :linenos: example_frame_schema = [ ("name", "STRING_TYPE"), ("age", "INTEGER_TYPE"), ("salary", "DOUBLE_TYPE"), ("married", "BOOLEAN_TYPE"), ("tax_rate", "FLOAT_TYPE"), ("random", "LONG_TYPE"), ("date_of_birth", "LOCAL_DATE_TYPE") ] Once we have the schema defined we need to define our data .. code-block:: python :linenos: from datetime import date example_frame_data = { "name": ["Alice", "Bob", "Charlie"], "age": [25, 27, 29], "salary": [10000.0, 15000.0, 20000.0], "married": [False, False, True], "tax_rate": [0.21, 0.26, 0.32], "random": [2394293898324, 45640604960495, 12312323409087654], "date_of_birth": [ date(1990, 9, 15), date(1991, 11, 4), date(1993, 10, 4) ] } We can now load the frame as follows: .. code-block:: python :linenos: example_frame = session.create_frame( example_frame_schema, example_frame_data, 'example frame' ) We can also load the frame incrementally as we receive more data: .. code-block:: python :linenos: example_frame_builder = session.create_frame_builder( example_frame_schema) example_frame_builder.add_rows(example_frame_data) example_frame_data_part_2 = { "name": ["Dave"], "age": [26], "salary": [18000.0], "married": [True], "tax_rate": [0.30], "random": [456783423423], "date_of_birth": [date(1989, 9, 15)] } example_frame_builder.add_rows(example_frame_data_part_2) example_frame2 = example_frame_builder.build("example_frame") Finally, we can also load a frame from a pandas dataframe in python: .. code-block:: python :linenos: import pandas as pd example_pandas_dataframe = pd.DataFrame(data=example_frame_data) example_frame = session.pandas_to_pgx_frame( example_pandas_dataframe, "pandas frame" ) Printing the content of a PgxFrame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now, we can also observe the frame contents using :meth:`print` functionality as follows: .. code-block:: python :linenos: example_frame.print() The output looks like: +----------+-----+-------------+---------+----------+-----------+---------------+ | name | age | salary | married | tax_rate | random | date_of_birth | +==========+=====+=============+=========+==========+===========+===============+ | John | 27 | 4133300.0 | true | 11.0 | 123456782 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Albert | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Heather | 28 | 1.0130302E7 | true | 10.5 | 827520917 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Emily | 24 | 9380080.5 | false | 13.0 | 128973221 | 1910-07-30 | +----------+-----+-------------+---------+----------+-----------+---------------+ | "D'Juan" | 27 | 1582093.0 | true | 11.0 | 92384 | 1955-12-01 | +----------+-----+-------------+---------+----------+-----------+---------------+ Destroying a PgxFrame ~~~~~~~~~~~~~~~~~~~~~ As :class:`PgxFrames` can take a lot of memory on the PGX server if they have a lot of rows or columns, it may be necessary to close them with the :meth:`close()` operation. After this operation, the content of the :class:`PgxFrame` is not available anymore. .. code-block:: python :linenos: example_frame.close() For the rest of this tutorial, we reload the :class:`PgxFrame`, as specified in the previous sub-section. Storing a PgxFrame to some specified path ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can store the :class:`PgxFrame` in ``CSV`` format as follows: .. code-block:: python :linenos: path = "/tmp/stored_simple_frame.csv" example_frame2.store(path, file_format="csv", overwrite=True) We can also store :class:`PgxFrames` in ``PGB`` binary format using a ``pgb`` storer instead of the ``csv`` storer in the above-mentioned example. .. code-block:: python :linenos: pgb_path = "/tmp/stored_simple_frame.pgb" example_frame2.store(pgb_path, file_format="pgb", overwrite=True) Flattening vector properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It might be useful in some use-cases to split the vector properties into multiple columns. We support this functionality using our :meth:`flatten_all()` operation. If we flatten the above :class:`PgxFrame`, we get the following flattened :class:`PgxFrame`: +---------+----------+------------+------------+------------+------------+-------------+-------------+ | intProp | intProp2 | vectProp_0 | vectProp_1 | vectProp_2 | stringProp | vectProp2_0 | vectProp2_1 | +=========+==========+============+============+============+============+=============+=============+ | 0 | 2 | 0.1 | 0.2 | 0.3 | testProp0 | 0.1 | 0.2 | +---------+----------+------------+------------+------------+------------+-------------+-------------+ | 1 | 1 | 0.1 | 0.2 | 0.3 | testProp10 | 0.1 | 0.2 | +---------+----------+------------+------------+------------+------------+-------------+-------------+ | 1 | 2 | 0.1 | 0.2 | 0.3 | testProp20 | 0.1 | 0.2 | +---------+----------+------------+------------+------------+------------+-------------+-------------+ | 2 | 3 | 0.1 | 0.2 | 0.3 | testProp30 | 0.1 | 0.2 | +---------+----------+------------+------------+------------+------------+-------------+-------------+ | 3 | 1 | 0.1 | 0.2 | 0.3 | testProp40 | 0.1 | 0.2 | +---------+----------+------------+------------+------------+------------+-------------+-------------+ One use-case of this flattening is in our MLlib where we export the embeddings using this flattening operation as classical features in a ``CSV`` file that can be easily used for post-processing in PGX or other frameworks. Union of PGX Frames ~~~~~~~~~~~~~~~~~~~ If we have two :class:`PgxFrames` that have compatible columns (i.e. same type and order) we are able to union them. Let's say we have another frame ``second_example_frame``, besides the ``example_frame`` described above, with the following content. .. code-block:: python :linenos: second_example_frame = session.read_frame() second_example_frame = second_example_frame.name("another simple frame") second_example_frame = second_example_frame.columns( example_frame_schema) second_example_frame = second_example_frame.csv() second_example_frame = second_example_frame.separator(' ') second_example_frame = second_example_frame.load( second_example_frame_path) +------+-----+-----------+---------+----------+-----------+---------------+ | name | age | salary | married | tax_rate | random | date_of_birth | +======+=====+===========+=========+==========+===========+===============+ | Mary | 25 | 6821092.0 | false | 11.0 | 88231223 | 1995-12-23 | +------+-----+-----------+---------+----------+-----------+---------------+ | Anca | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | +------+-----+-----------+---------+----------+-----------+---------------+ Now, if we want to create the union of ``example_frame`` with the ``second_example_frame``, we only need to execute the following: .. code-block:: python :linenos: example_frame.union(second_example_frame).print() +----------+-----+-------------+---------+----------+-----------+---------------+ | name | age | salary | married | tax_rate | random | date_of_birth | +==========+=====+=============+=========+==========+===========+===============+ | John | 27 | 4133300.0 | true | 11.0 | 123456782 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Albert | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Heather | 28 | 1.0130302E7 | true | 10.5 | 827520917 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Emily | 24 | 9380080.5 | false | 13.0 | 128973221 | 1910-07-30 | +----------+-----+-------------+---------+----------+-----------+---------------+ | "D'Juan" | 27 | 1582093.0 | true | 11.0 | 92384 | 1955-12-01 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Mary | 25 | 6821092.0 | false | 11.0 | 88231223 | 1995-12-23 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Anca | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | +----------+-----+-------------+---------+----------+-----------+---------------+ We can observe that the rows of the resulting :class:`PgxFrame` are the union of the rows from the two original frames. One thing to note here is that the union operation will not remove duplicate rows resulted from the :meth:`union` operation. Joining PGX Frames ~~~~~~~~~~~~~~~~~~~ It might happen that we have two frames whose rows are correlated through one of the columns. This is the case of many machine learning problems where we have to join embeddings coming from different sources. For this, we have the :meth:`join` functionality that allows us to combine frames by checking for equality between rows for a specific column. Let's say we have another frame `more_info_frame` that contains additional information about the people in the ``example_frame``. .. code-block:: python :linenos: more_info_frame.print() +--------+------------------------------+---------+ | name | title | reports | +========+==============================+=========+ | John | Software Engineering Manager | 5 | +--------+------------------------------+---------+ | Albert | Sales Manager | 10 | +--------+------------------------------+---------+ | Emily | Operations Manager | 20 | +--------+------------------------------+---------+ Now, if we want to combine this frame with the ``example_frame`` on the ``name`` column, we only need to call the :meth:`join` method. .. code-block:: python :linenos: example_frame\ .join(more_info_frame, "name", left_prefix="leftFrame", right_prefix="rightFrame")\ .print() +----------------+---------------+------------------+-------------------+--------------------+------------------+-------------------------+-----------------+------------------------------+--------------------+ | leftFrame_name | leftFrame_age | leftFrame_salary | leftFrame_married | leftFrame_tax_rate | leftFrame_random | leftFrame_date_of_birth | rightFrame_name | rightFrame_title | rightFrame_reports | +================+===============+==================+===================+====================+==================+=========================+=================+==============================+====================+ | John | 27 | 4133300.0 | true | 11.0 | 123456782 | 1985-10-18 | John | Software Engineering Manager | 5 | +----------------+---------------+------------------+-------------------+--------------------+------------------+-------------------------+-----------------+------------------------------+--------------------+ | Albert | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | Albert | Sales Manager | 10 | +----------------+---------------+------------------+-------------------+--------------------+------------------+-------------------------+-----------------+------------------------------+--------------------+ | Emily | 24 | 9380080.5 | false | 13.0 | 128973221 | 1910-07-30 | Emily | Operations Manager | 20 | +----------------+---------------+------------------+-------------------+--------------------+------------------+-------------------------+-----------------+------------------------------+--------------------+ We can see that the joined frame contains the columns of the two frames involved in the operation for the rows with the same ``name``. Also note the column prefixes specified in the call, ``leftFrame`` and ``rightFrame``. PgxFrame helpers ~~~~~~~~~~~~~~~~ We also support operations on :class:`PgxFrame` such as :meth:`head`, :meth:`tail`, :meth:`select` as follows. **Head operation** The :meth:`head` operation can be used to only keep the first rows of a :class:`PgxFrame`. (The result is deterministic only for ordered :class:`PgxFrame`) Here, we apply the :meth:`head` operation on the :class:`PgxFrame` used above and print it: .. code-block:: python :linenos: example_frame.head(2).print() The output looks as follows: +---------+----------+-------------+------------+-----------+ | intProp | intProp2 | vectProp | stringProp | vectProp2 | +=========+==========+=============+============+===========+ | 0 | 2 | 0.1;0.2;0.3 | testProp0 | 0.1;0.2 | +---------+----------+-------------+------------+-----------+ | 1 | 1 | 0.1;0.2;0.3 | testProp10 | 0.1;0.2 | +---------+----------+-------------+------------+-----------+ **Tail operation** The :meth:`tail` operation can be used to only keep the last rows of a :class:`PgxFrame`. (The result is deterministic only for ordered :class:`PgxFrame`) Next, we apply the :meth:`tail` operation on the :class:`PgxFrame` used above and print it: .. code-block:: python :linenos: example_frame.tail(2).print() The output looks as follows: +---------+----------+-------------+------------+-----------+ | intProp | intProp2 | vectProp | stringProp | vectProp2 | +=========+==========+=============+============+===========+ | 2 | 3 | 0.1;0.2;0.3 | testProp30 | 0.1;0.2 | +---------+----------+-------------+------------+-----------+ | 3 | 1 | 0.1;0.2;0.3 | testProp40 | 0.1;0.2 | +---------+----------+-------------+------------+-----------+ **Select operation** The :meth:`select` operation can be used to keep only a specified list of columns of an input :class:`PgxFrame`. We now apply the :meth:`select` operation on the :class:`PgxFrame` used above and print it: .. code-block:: python :linenos: vec_frame_selected = example_frame.select( "name", "age", "date_of_birth") We take a look at how the selected :class:`PgxFrame` looks like (using ``vec_frame_selected.print()``): +-----------+-------------+------------+ | vectProp2 | vectProp | stringProp | +===========+=============+============+ | 0.1;0.2 | 0.1;0.2;0.3 | testProp0 | +-----------+-------------+------------+ | 0.1;0.2 | 0.1;0.2;0.3 | testProp10 | +-----------+-------------+------------+ | 0.1;0.2 | 0.1;0.2;0.3 | testProp20 | +-----------+-------------+------------+ | 0.1;0.2 | 0.1;0.2;0.3 | testProp30 | +-----------+-------------+------------+ | 0.1;0.2 | 0.1;0.2;0.3 | testProp40 | +-----------+-------------+------------+ PgxFrame-PgqlResultSet conversions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We now explain the conversions between :class:`PgxFrames` and :class:`PgqlResultSets`. **PgxFrame to PgqlResultSet** We convert a :class:`PgxFrame` to :class:`PgqlResultSet` as follows: .. code-block:: python :linenos: result_set = example_frame.to_pgql_result_set() We now have a look at the content of the ``result_set`` using ``result_set.print()`` as follows: +----------+-----+-------------+---------+----------+-----------+---------------+ | name | age | salary | married | tax_rate | random | date_of_birth | +==========+=====+=============+=========+==========+===========+===============+ | John | 27 | 4133300.0 | true | 11.0 | 123456782 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Albert | 23 | 5813000.5 | false | 12.0 | 124343142 | 2000-01-14 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Heather | 28 | 1.0130302E7 | true | 10.5 | 827520917 | 1985-10-18 | +----------+-----+-------------+---------+----------+-----------+---------------+ | Emily | 24 | 9380080.5 | false | 13.0 | 128973221 | 1910-07-30 | +----------+-----+-------------+---------+----------+-----------+---------------+ | "D'Juan" | 27 | 1582093.0 | true | 11.0 | 92384 | 1955-12-01 | +----------+-----+-------------+---------+----------+-----------+---------------+ The content of the result set can be accessed through the usual :class:`PgqlResultSet` APIs. **PgqlResultSet to PgxFrame** We convert a :class:`PgqlResultSet` to :class:`PgxFrame` as follows: .. code-block:: python :linenos: query = "SELECT v.age FROM MATCH (v)" graph = session.read_graph_with_properties(self.pgql_graph) result_set = graph.query_pgql(query) result_set.to_frame() Creating a graph from multiple PgxFrame instances ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can create a :class:`PgxGraph` with vertex :class:`PgxFrame` (s) and edge :class:`PgxFrame` (s). Given the following PgxFrame instances: people: +----+---------+ | id | name | +====+=========+ | 1 | Alice | +----+---------+ | 2 | Bob | +----+---------+ | 3 | Charlie | +----+---------+ houses: +----------------+----------+ | identification | location | +================+==========+ | 1 | Road 1 | +----------------+----------+ | 2 | Street 5 | +----------------+----------+ | 3 | Avenue 4 | +----------------+----------+ knows: +-----+-----+ | src | dst | +=====+=====+ | 1 | 1 | +-----+-----+ | 2 | 3 | +-----+-----+ | 3 | 2 | +-----+-----+ lives: +--------+-------------+ | source | destination | +========+=============+ | 1 | 2 | +--------+-------------+ | 2 | 1 | +--------+-------------+ | 3 | 3 | +--------+-------------+ We can now create a :class:`PgxGraph` as follows: .. code-block:: python :linenos: vertex_providers_from_frames = [ session.vertex_provider_from_frame( "person", people_frame ), session.vertex_provider_from_frame( "house", frame=houses_frame, vertex_key_column="identification" ) ] edge_providers_from_frames = [ session.edge_provider_from_frame( "person_knows_person", source_provider="person", destination_provider="person", frame=knows_frame), session.edge_provider_from_frame( "person_lives_at_house", source_provider="person", destination_provider="house", frame=lives_frame, source_vertex_column="source", destination_vertex_column="destination" ) ] graph = session.graph_from_frames( "example graph", vertex_providers_from_frames, edge_providers_from_frames, partitioned=True )