This preface describes significant new features and changes in Oracle Big Data Spatial and Graph User's Guide and Reference for Oracle Big Data Spatial and Graph Release 2.1.
Information about the Property Graph Query Language (PGQL) has been significantly expanded, and new PGQL capabilities have been added, including:
Arithmetic expressions in SELECT, ORDER BY, GROUP BY: Previously, arithmetic expressions were only allowed in the WHERE clause, while the SELECT, GROUP BY, and ORDER BY clauses only allowed basic expression. Now, all the clauses have full support for arithmetic expressions.
MIN, MAX, and COUNT aggregates in PGQL now work with String values.
For information about PGQL and examples of how to use it, see the following:
The in-memory analyst now provides a Zeppelin Notebook (https://zeppelin.apache.org/) interpreter. You can add graph analysis features to a Zeppelin notebook server, which enables collaborative editing graph analysis, and producing reports that can be live-edited and updated.
For information, see Using the In-Memory Analyst Zeppelin Interpreter.
Apache Spark integration with property graph support has been enhanced. For more information, see Using Apache Spark with Property Graph Data andUsing the In-Memory Analyst to Analyze Graph Data in Apache Spark.
A new Vector Analysis API is available for Apache Spark and Apache Spark SQL that allows you to create spatial RDDs (Resilient Distributed Datasets) for performing spatial transformations and actions, as explained in Oracle Big Data Spatial Vector Analysis for Spark.
Arguments to Java methods (such as on the analyst
built-in object) can now be done in abbreviated parameterized form (someMethod(g : var1, v : var2)
), in addition to the previously supported traditional way (someMethod(param1, param2)
) and standard parameterized form (someMethod(graph : var1, vertex : var2)
). Example of abbreviated parameterized form:
``` analyst.whomToFollow(g: g, v: g.pickRandomVertex(), maxIt: 100, sDF: 0.85, sMD: 0.01) ```
An enterprise scheduler has been added, which by default manages the following:
Concurrent execution of tasks from multiple sessions
Detailed configuration for thread pools with pool weight, priority, and maximum number of threads
Dynamically sizing the I/O thread pool
Detailed per-task settings for each thread pool
For information, see Using the In-Memory Analyst Enterprise Scheduler.
Reading (loading), undirecting, and simplifying graphs now allow filtering. The ability to filter a graph while reading it can significantly reduce memory requirements. For example, if every vertex has 10 properties but you know you only need one property for the analysis that you want to do, you do not need to load all 10. With a very large graph, this could make the difference between being able to read or not read all the requested data.
Simplifying and undirecting graphs now can use a MutationStrategy
to choose or merge properties, or use edge reduction strategies.
For details, see the PgxGraph
information in Oracle Big Data Spatial and Graph In-Memory Analyst Java API Reference (Javadoc).
The in-memory analyst can now read from multiple files in adjacency list or edge list format in parallel.
Reading Custom Graph Data provides an extended example where the graph uses the adjacency list format.
You can now use arithmetic expressions in filter expressions when reading a subgraph from NoSQL or Apache HBase, running filtered algorithms, or creating a subgraph in memory.
Creating Subgraphs explains filter expressions and provides examples of using filter expressions to create subgraphs.
Several in-memory analyst APIs and features have been deprecated for this release. They will xcontinue to work in this release, but may not in future releases. You are encouraged to avoid them for future development and to change existing code to use alternatives.
In the GraphConfigBuilder
class, the methods forSingleFileFormats()
and forMultipleFileFormats()
are now deprecated and have been unified into a new forFileFormats()
method, which accepts all of the following:
A single file using the setUri()
method
Multiple using the addUri()
method
Multiple vertex and edge files using the addVertexUri()
and addEdgeUri()
methods
In the GraphConfigBuilder
class, the methods forSingleFileFormat(Format)
and forMultipleFileFormat(Format)
are now deprecated and have been unified into the newforFileFormat(Format)
method
With the introduction of the enterprise scheduler, the following in-memory analyst configuration fields are deprecated:
num_workers_analysis
, num_workers_fast_track_analysis
, and num_workers_io
on the top level are now deprecated. Instead, they must be placed into the new, nested fj_pool_config
field. For example, the following configuration file segment:
{ ... "num_workers_analysis": 64, "num_workers_fast_track_analysis": 1, "num_workers_io": 72, ... }
must now be expressed as:
{ ... "basic_scheduler_config": { "num_workers_analysis": 64, "num_workers_fast_track_analysis": 1, "num_workers_io": 72 }, ... }
parallelization_strategy
is now deprecated and has been replaced by a new field, scheduler
. For example, the following configuration file segment:
{ ... "parallelization_strategy": "task_stealing_counted", ... }
must now be expressed as:
{ ... "scheduler": "basic_scheduler", ... }
The previous value rts
for parallelization_strategy
is mapped to the scheduler
value enterprise_scheduler
.
The strategies task_stealing
and segmented
are deprecated and no longer have any effect, and will be treated like setting the scheduler
value to basic_scheduler
.