Changes in This Release for Oracle Big Data Spatial and Graph User's Guide and Reference

This preface describes significant new features and changes in Oracle Big Data Spatial and Graph User's Guide and Reference for Oracle Big Data Spatial and Graph Release 2.1.

Property Graph Query Language (PGQL) Enhancements

Information about the Property Graph Query Language (PGQL) has been significantly expanded, and new PGQL capabilities have been added, including:

  • Arithmetic expressions in SELECT, ORDER BY, GROUP BY: Previously, arithmetic expressions were only allowed in the WHERE clause, while the SELECT, GROUP BY, and ORDER BY clauses only allowed basic expression. Now, all the clauses have full support for arithmetic expressions.

  • MIN, MAX, and COUNT aggregates in PGQL now work with String values.

For information about PGQL and examples of how to use it, see the following:

Zeppelin Notebook Interpreter

The in-memory analyst now provides a Zeppelin Notebook (https://zeppelin.apache.org/) interpreter. You can add graph analysis features to a Zeppelin notebook server, which enables collaborative editing graph analysis, and producing reports that can be live-edited and updated.

For information, see Using the In-Memory Analyst Zeppelin Interpreter.

Apache Spark Integration Enhancements

Apache Spark integration with property graph support has been enhanced. For more information, see Using Apache Spark with Property Graph Data andUsing the In-Memory Analyst to Analyze Graph Data in Apache Spark.

Vector Analysis API for Apache Spark

A new Vector Analysis API is available for Apache Spark and Apache Spark SQL that allows you to create spatial RDDs (Resilient Distributed Datasets) for performing spatial transformations and actions, as explained in Oracle Big Data Spatial Vector Analysis for Spark.

In-Memory Analyst Shell Syntax Enhancements

Arguments to Java methods (such as on the analyst built-in object) can now be done in abbreviated parameterized form (someMethod(g : var1, v : var2)), in addition to the previously supported traditional way (someMethod(param1, param2)) and standard parameterized form (someMethod(graph : var1, vertex : var2)). Example of abbreviated parameterized form:

```
analyst.whomToFollow(g: g, v: g.pickRandomVertex(), maxIt: 100, sDF: 0.85, sMD: 0.01)
```

Enterprise Scheduler Added

An enterprise scheduler has been added, which by default manages the following:

  • Concurrent execution of tasks from multiple sessions

  • Detailed configuration for thread pools with pool weight, priority, and maximum number of threads

  • Dynamically sizing the I/O thread pool

  • Detailed per-task settings for each thread pool

For information, see Using the In-Memory Analyst Enterprise Scheduler.

Manipulation API Enhancements

Reading (loading), undirecting, and simplifying graphs now allow filtering. The ability to filter a graph while reading it can significantly reduce memory requirements. For example, if every vertex has 10 properties but you know you only need one property for the analysis that you want to do, you do not need to load all 10. With a very large graph, this could make the difference between being able to read or not read all the requested data.

Simplifying and undirecting graphs now can use a MutationStrategy to choose or merge properties, or use edge reduction strategies.

For details, see the PgxGraph information in Oracle Big Data Spatial and Graph In-Memory Analyst Java API Reference (Javadoc).

Reading from Files in Parallel

The in-memory analyst can now read from multiple files in adjacency list or edge list format in parallel.

Reading Custom Graph Data provides an extended example where the graph uses the adjacency list format.

Arithmetic Expression Support in Filter Expressions

You can now use arithmetic expressions in filter expressions when reading a subgraph from NoSQL or Apache HBase, running filtered algorithms, or creating a subgraph in memory.

Creating Subgraphs explains filter expressions and provides examples of using filter expressions to create subgraphs.

Deprecated Features for Release 2.1

Several in-memory analyst APIs and features have been deprecated for this release. They will xcontinue to work in this release, but may not in future releases. You are encouraged to avoid them for future development and to change existing code to use alternatives.

In the GraphConfigBuilder class, the methods forSingleFileFormats() and forMultipleFileFormats() are now deprecated and have been unified into a new forFileFormats() method, which accepts all of the following:

  • A single file using the setUri() method

  • Multiple using the addUri() method

  • Multiple vertex and edge files using the addVertexUri() and addEdgeUri() methods

In the GraphConfigBuilder class, the methods forSingleFileFormat(Format) and forMultipleFileFormat(Format) are now deprecated and have been unified into the newforFileFormat(Format) method

With the introduction of the enterprise scheduler, the following in-memory analyst configuration fields are deprecated:

  • num_workers_analysisnum_workers_fast_track_analysis, and num_workers_io on the top level are now deprecated. Instead, they must be placed into the new, nested fj_pool_config field. For example, the following configuration file segment:

    {
      ...
      "num_workers_analysis": 64,
      "num_workers_fast_track_analysis": 1,
      "num_workers_io": 72,
      ...
    }
    

    must now be expressed as:

    {
       ...
       "basic_scheduler_config": {
         "num_workers_analysis": 64,
         "num_workers_fast_track_analysis": 1,
         "num_workers_io": 72
       },
       ...
     }
    
  • parallelization_strategy is now deprecated and has been replaced by a new field, scheduler. For example, the following configuration file segment:

    {
      ...
      "parallelization_strategy": "task_stealing_counted",
      ...
    }
    

    must now be expressed as:

    {
      ...
      "scheduler": "basic_scheduler",
      ...
     }
    
  • The previous value rts for parallelization_strategy is mapped to the scheduler value enterprise_scheduler.

  • The strategies task_stealing and segmented are deprecated and no longer have any effect, and will be treated like setting the scheduler value to basic_scheduler.