1 Spatial and Graph Property Graph Support Overview

This chapter provides an overview of Oracle Spatial and Graph support for property graph features.

1.1 About Property Graph Support

Oracle Spatial and Graph delivers advanced spatial and graph analytic capabilities in Oracle Database.

The property graph features support graph operations, indexing, queries, search, and in-memory analytics.

1.2 Property Graph Prerequisites

Although no special configuration operations are required, the requirements for using the property graph features are the following.

  • Oracle Spatial and Graph must be installed.

  • max_string_size must be enabled,

  • AL16UTF16 (instead of UTF8) must be specified as the NLS_NCHAR_CHARACTERSET.

    AL32UTF8 (UTF8) should be the default character set,  but  AL16UTF16 must be the NLS_NCHAR_CHARACTERSET.

  • Java 8 or higher must be installed.

    However, there is no need for a separate installation because JDK 8 is installed under $ORACLE_HOME. Just make sure that the correct version of Java is used to run property graph-based applications.

  • An Oracle Partitioning license is required if the property graph schema is used as described in Property Graph Schema Objects for Oracle Database. Note that Oracle Partitioning is optional if the two-tables schema is used as described in Handling Property Graphs Using a Two-Tables Schema.

  • If text indexing with SolrCloud is used, SolrCloud must be downloaded separately and configured manually.

    For information about text indexes managed using SolrCloud and other search engines, see Managing Text Indexing for Property Graph Data.

    For information about downloading and using Apache Solr, https://lucene.apache.org/solr/guide/.

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges.

Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.

The property graph features of Oracle Special and Graph support those use cases with the following capabilities:

  • A scalable graph database

  • Developer-based APIs based upon Tinkerpop Blueprints, and Java graph APIs

  • Text search and query through integration with Apache Lucene, SolrCloud, and Oracle Text

  • Scripting languages support for Groovy and Python

  • A parallel, in-memory graph analytics engine

  • A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding

  • Parallel bulk load and export of property graph data in Oracle-defined flat files format

  • Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs

1.3.1 Property Graph Sizing Recommendations

The following are recommendations for property graph installation.

Table 1-1 Property Graph Sizing Recommendations

Graph Size Recommended Physical Memory to be Dedicated Recommended Number of CPU Processors

10 to 100M edges

Up to 14 GB RAM

2 to 4 processors, and up to 16 processors for more compute-intensive workloads

100M to 1B edges

14 GB to 100 GB RAM

4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads

Over 1B edges

Over 100 GB RAM

12 to 32 processors, or more for especially compute-intensive workloads

1.4 Spatial Support in Property Graphs

The property graph support in the Oracle Spatial and Graph option is integrated with the spatial support.

The integration has the following aspects: representing spatial data in a property Graph, creating a spatial index on that spatial data, and querying that spatial data.

1.4.1 Representing Spatial Data in a Property Graph

Spatial data can be used as values of vertex properties and edge properties.

For example, an entity can have a point (longitude/latitude) as the value of a property named location. As another example, an edge may have a polygon as the value of a property, and this property can represent the location at which this link (relationship) was established.

The following shows some example syntax for encoding spatial data in a property graph.

  • Point: '-122.230 37.560'

  • Point: 'POINT(-122.241 37.567)'

  • Point with SRID specified: 'srid/8307 POINT(-122.246 37.572)'

  • Polygon: 'POLYGON((-83.6  34.1, -83.6 34.3, -83.4 34.3, -83.4 34.1, -83.6 34.1))'

  • Polygon with SRID specified: 'srid/8307 POLYGON((-83.6 34.1, -83.6 34.3, -83.4 34.3, -83.4 34.1, -83.6 34.1))'

  • Line string: 'LINESTRING (30 10, 10 30, 40 40)'

  • Multiline string: 'MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))'

Assume a test property graph named test. The following statements add a set of vertices with coordinates (longitude and latitude) spacified for each.

insert into testVT$(vid, k, t, v) values(100, 'geoloc', 20, '-122.230 37.560'); 
insert into testVT$(vid, k, t, v) values(101, 'geoloc', 20, '-122.231 37.561');
insert into testVT$(vid, k, t, v) values(102, 'geoloc', 20, '-122.236 37.562914');
insert into testVT$(vid, k, t, v) values(103, 'geoloc', 20, '-122.241 37.567');
insert into testVT$(vid, k, t, v) values(104, 'geoloc', 20, '-122.246 37.572');
insert into testVT$(vid, k, t, v) values(105, 'geoloc', 20, '-122.251 37.577');
insert into testVT$(vid, k, t, v) values(200, 'geoloc', 20, '-122.256 37.582');
insert into testVT$(vid, k, t, v) values(201, 'geoloc', 20, '-122.261 37.587');

The Spatial data in the property graph can be used to construct SDO_GEOMETRY objects. For example, the OPG_APIS.GET_GEOMETRY_FROM_V_T_COLS function can be used to read spatial data from the V column for all T of a specified value (such as 20), and return SDO_GEOMETRY objects. This function attempts to parse the value as coordinates if the value appears to be two numbers, and it uses the SDO_GEOMETRY constructor if the value is not a simple point. Finally, if a SRID is provided, it uses the SDO_CS_TRANSFORM procedure to transform using the given coordinate system.

The following example uses the OPG_APIS.GET_GEOMETRY_FROM_V_T_COLS function to get geometries from the test property graph. It includes some of the output.

SQL> select vid, k, opg_apis.get_geometry_from_v_t_cols 
      from testVT$     
        order by vid, k;
   . . .
       100 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.23, 37.56, NULL), NULL, NULL)
       101 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.231, 37.561, NULL), NULL, NULL)
       102 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.236, 37.562914, NULL), NULL, NULL)
       103 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.241, 37.567, NULL), NULL, NULL)
   . . .

You can generate SDO_GEOMETRY objects from WKT literals.  The following example inserts WKT literals, and then uses the OPG_APIS.GET_WKTGEOMETRY_FROM_V_T_COLS function to construct SDO_GEOMETRY objects from the V, T columns.

truncate table testGE$;
truncate table testVT$;
insert into testVT$(vid, k, t, v) values(101, 'geoloc', 20, 'POLYGON((-83.6  34.1, -83.6 34.3, -83.4 34.3, -83.4 34.1, -83.6 34.1))');
insert into testVT$(vid, k, t, v) values(103, 'geoloc', 20, 'POINT(-122.241 37.567)');
insert into testVT$(vid, k, t, v) values(105, 'geoloc', 20, 'POINT(-122.251 37.577)');
insert into testVT$(vid, k, t, v) values(200, 'geoloc', 20, 'MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))');
insert into testVT$(vid, k, t, v) values(201, 'geoloc', 20, 'LINESTRING (30 10, 10 30, 40 40)');

prompt show the geometry info
SQL> select vid, k, opg_apis.get_wktgeometry_from_v_t_cols(v,t) 
       from testVT$
      order by vid, k;
   . . .
       101 geoloc SDO_GEOMETRY(2003, 8307, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 1), SDO_ORDINATE_ARRAY(-83.6, 34.1, -83.6, 34.3, -83.4, 34.3, -83.4, 34.1, -83.6, 34.1))
       103 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.241, 37.567, NULL), NULL, NULL)
       105 geoloc SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-122.251, 37.577, NULL), NULL, NULL)
       200 geoloc SDO_GEOMETRY(2006, 8307, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1, 7, 2, 1), SDO_ORDINATE_ARRAY(10, 10, 20, 20, 10, 40, 40, 40, 30, 30, 40, 20, 30, 10))
       201 geoloc SDO_GEOMETRY(2002, 8307, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(30, 10, 10, 30, 40, 40))

1.4.2 Creating a Spatial Index on Property Graph Data

After adding spatial data to a property graph, you can use OPG_APIS package subprograms to construct SDO_GEOMETRY objects, and then you can create a function-based spatial index on the vertices (VT$) or the edges (VT$) table.

Using the example property graph named test, the following statements add the necessary metadata and create a function-based spatial index.

SQL> -- In the schema that owns the property graph TEST:
SQL> --
SQL> insert into user_sdo_geom_metadata values('TESTVT$',
 'mdsys.opg_apis.get_geometry_from_v_t_cols(v,t)',
 sdo_dim_array(
   sdo_dim_element('Longitude', -180, 180, 0.005),
   sdo_dim_element('Latitude', -90, 90, 0.005)), 8307);

commit;

SQL> -- Create a function-based spatial index
SQL> create index testVTXGEO$
       on testVT$(mdsys.opg_apis.get_geometry_from_v_t_cols(v, t))
       indextype is mdsys.spatial_index_v2
       parameters ('tablespace=USERS')
       parallel 4
       local;

(To create a spatial index on your own property graph, replace the graph name test with the name of your graph.)

If the WKT literals are used in the V column, then replace mdsys.opg_apis.get_geometry_from_v_t_cols with mdsys.opg_apis.get_wktgeometry_from_v_t_cols in the preceding two SQL statements.

Note that the preceding SQL spatial index creation steps are wrapped in convenient Java methods in the OraclePropertyGraph class defined in the oracle.pg.rdbms package:

  /**
   * This API creates a default Spatial index on edges. It assumes that
   * the mdsys.opg_apis.get_geometry_from_v_t_cols(v,t) PL/SQL is going to be used
   * to create a function-based Spatial index. In addition, it adds a predefined
   * value into user_sdo_geom_metadata. To customize, please refer to the dev
   * guide for adding a row to user_sdo_geom_metadata and then creating a 
   * Spatial index manually.
   * Note that, a DDL will be executed so expect an implict commit. If you 
   * have changes that do not want to be persisted, run a rollback before calling
   * this method.
   * @param dop degree of parallelism used to create the Spatial index
   */
  public void createDefaultSpatialIndexOnEdges(int dop);

  /**
   * This API creates a default Spatial index on vertices. It assumes that
   * the mdsys.opg_apis.get_geometry_from_v_t_cols(v,t) PL/SQL is going to be used
   * to create a function-based Spatial index. In addition, it adds a predefined
   * value into user_sdo_geom_metadata. To customize, please refer to the dev
   * guide for adding a row to user_sdo_geom_metadata and then creating a 
   * Spatial index manually.
   * Note that a DDL will be executed so expect an implict commit. If you 
   * have changes that do not want to be persisted, run a rollback before calling
   * this method.
   * @param dop degree of parallelism used to create the Spatial index
   */
  public void createDefaultSpatialIndexOnVertices(int dop);

1.4.3 Querying Spatial Data in a Property Graph

Oracle Spatial and Graph geospatial query functions can be applied to spatial data in a property graph. This topic provides some examples.

Note that a query based on spatial information can be combined with navigation and pattern matching.

The following example finds entities (vertices) that are within a specified distance (here, 1 mile) of a location (point geometry).

SQL> -- use SDO_WITHIN_DISTANCE to filter vertices
SQL>  select vid, k, t, v 
        from testvt$
       where sdo_within_distance(mdsys.opg_apis.get_geometry_from_v_t_cols(v, t),
            mdsys.sdo_geometry(2001, 8307, mdsys.sdo_point_type(-122.23, 37.56, null), null, null), 
            'distance=1 unit=mile') = 'TRUE'
       order by vid, k;

The output and execution plan may include the following. Notice that a newly created domain indexTESTVTXGEO$ is used in the execution.

       100 geoloc       20  -122.230 37.560
       101 geoloc       20  -122.231 37.561
       ..  ...              ...

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |             |     1 | 18176 |     2  (50)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR                         |             |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)                    | :TQ10001    |     1 | 18176 |     2  (50)| 00:00:01 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                        |             |     1 | 18176 |     2  (50)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                          |             |     1 | 18176 |     1   (0)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE                      | :TQ10000    |     1 | 18176 |     1   (0)| 00:00:01 |       |       |  Q1,00 | P->P | RANGE      |
|   6 |       PX PARTITION HASH ALL             |             |     1 | 18176 |     1   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS BY LOCAL INDEX ROWID| TESTVT$     |     1 | 18176 |     1   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
|*  8 |         DOMAIN INDEX (SEL: 0.000000 %)  | TESTVTXGEO$ |       |       |     1   (0)| 00:00:01 |       |       |  Q1,00 |      |            |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   8 - access("MDSYS"."SDO_WITHIN_DISTANCE"("OPG_APIS"."GET_GEOMETRY_FROM_V_T_COLS"("V","T"),"MDSYS"."SDO_GEOMETRY"(2001,8307,"MDSYS"."SDO_P
              OINT_TYPE"((-122.23),37.56,NULL),NULL,NULL),'distance=1 unit=mile')='TRUE')

The following example sorts entities (vertices) based on their distance from a location.

-- Sort based on distance in miles
SQL> select vid, dist from (
       select vid, k, t, v, 
          sdo_geom.sdo_distance(mdsys.opg_apis.get_geometry_from_v_t_cols(v, t),
              mdsys.sdo_geometry(2001, 8307, mdsys.sdo_point_type(-122.23, 37.56, null), null, null), 1.0, 'unit=mile') dist
         from testvt$
        where t = 20
      ) order by dist asc
;

The output and execution plan may include the following.

        ...
       101  .088148935
       102  .385863422
       103  .773127682
       104 1.2068052
       105 1.64421947
       200 2.08301065
       ...

-----------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     1 | 15062 |  1366   (1)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY      |         |     1 | 15062 |  1366   (1)| 00:00:01 |       |       |
|   2 |   PARTITION HASH ALL|         |     1 | 15062 |  1365   (1)| 00:00:01 |     1 |     8 |
|*  3 |    TABLE ACCESS FULL| TESTVT$ |     1 | 15062 |  1365   (1)| 00:00:01 |     1 |     8 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T"=20 AND INTERNAL_FUNCTION("V"))