Sharding JSON Document Collections

Overview of Sharding JSON Documents

Oracle Sharding allows JSON documents to scale to massive data and transactions volume, provide fault isolation, and support data sovereignty. Oracle Database has support for native JSON objects. Applications can interact with the sharded database using the SODA (Simple Oracle Document Access) API, which allows you to access data using JSON document attributes.

In Oracle Database, JSON documents can be stored in a database table. The database tables act as JSON collections, and each row is a JSON document. JSON documents are stored in the database as type JSON, which is backed by a highly optimized binary JSON format called OSON.

Although Oracle provides support for JSON operators to create, work with, and retrieve JSON documents, the SODA interface is also supported. SODA provides a more intuitive interface for working with JSON documents.

SODA is an API for NoSQL-style JSON (and not only JSON) document collections in Oracle Database. Using SODA APIs, application can perform CRUD operations on documents in collections. Collections are backed by regular Oracle tables (or views).

Typically, to create a collection, one would use SODA API. That creates the underlying table backing the collection. In order to create a sharded collection, however, a shared table has to be created first. Then, a collection can be created on top of a sharded table, by using a mapped collection feature of SODA.

Working with JSON documents in an Oracle Sharding environment introduces the notion of a sharding key. JSON documents are distributed to the individual database table shards according to the sharding key. The sharding key can either be a field from within the JSON document or an external column such as the ID assigned by the SODA API.

For further reading about JSON and SODA, see JSON in Oracle Database and Overview of SODA.

The topics that follow provide details about how to shard JSON objects in Oracle Database. The high level steps are:

Deploy a sharded database
Identify a sharding key that the application can use to fetch data
Define a data store for JSON in Oracle Database by creating sharded tables
Map the sharded table with SODA

Then life cycle management tasks detailed are:

Add documents to the sharded JSON collection in the application
Fetch document data from the sharded JSON collection in the application

Preparing the Oracle Sharding Environment

Before you begin configuring Oracle Sharding with SODA, deploy a sharding configuration and start the global services.

A Oracle Sharding sharded database configuration, including shard directors, shard catalog, and shard databases, and any replicas must be deployed. After deploying the sharded database, you must create and start global database services on the shards to service incoming connection requests from your application.

See Sharded Database Deployment for information about creating and deploying a sharded database configuration.

Creating an All-Shards User with SODA Privileges

Create a user on the shard catalog that has the privileges to create schema objects in the sharded database, and also has the necessary execute privileges on the DBMS_SODA PL/SQL package.

For the purposes of this document, the user is referred to as the Sharding/SODA user, and the user name is app_schema in the examples.

To create the Sharding/SODA user:

Connect to the shard catalog database (for example, as SYSDBA).
Enable SHARD DDL.
Run CREATE USER command, granting the permissions shown in the example below.
Note that the Sharding/SODA user is created on the PDB, not the CDB.

The following is an example Sharding/SODA user creation script.

-- Set the container and create the sharded user
alter session set container=SDBPDB;
alter session enable shard ddl;
create user app_schema identified by password;
 
-- Grant basic privileges
grant connect, resource, alter session to app_schema;
grant execute on dbms_crypto to app_schema;

-- All privileges below are required. User can also be granted all privileges
grant create table, create procedure, create tablespace, create
materialized view to app_schema;
grant unlimited tablespace to app_schema;
grant select_catalog_role to app_schema;

-- Grant soda_app for this user
grant soda_app to app_schema;

-- Specific grants on shard plsql
grant execute on exec_shard_plsql to app_schema;
grant gsmadmin_role to app_schema;
grant gsm_pooladmin_role to app_schema;

Note the standard database schema privileges and the standard SODA privileges granted to the user. The exec_shard_plsql grant, which gives the user the ability to run PL/SQL procedures on a sharded database, is a sharding-specific privilege required for the Sharding/SODA user.

For more information about Oracle Sharding schema design, including sharding user creation and running PL/SQL, see Sharded Database Schema Design.

Choosing a Sharding Key

SODA collections are backed by regular Oracle tables. One of the columns in these tables is the ID column, which contains unique keys for the documents in the collection. This column can be used as the sharding key. Alternatively, you can choose a JSON field in the document content to be the sharding key.

The choice of sharding key is application dependent.

The advantages and disadvantages of each sharding key choice are listed in the sections below.

Using the SODA ID as the Sharding Key

The SODA API automatically manages a unique ID for each SODA document. This ID is used by the SODA API to create and retrieve documents within a collection.

The SODA ID must be provided manually by the application when it is used as a sharding key. This is because when creating a new document on a specific shard, the sharding key is required beforehand in order to connect to the appropriate shard. The SODA API allows for this manual (also known as CLIENT key) assignment of a SODA ID on document creation. Examples are provided in the code samples in Using SODA ID as the Sharding Key.

It is up to the application to decide if this SODA ID represents something meaningful (for example, a Customer ID) or is merely a unique Document ID. In any case, the ID must be unique. This is not a requirement imposed by Oracle Sharding but by the SODA API.

A summary of using the SODA ID as the sharding key:

The sharding key must be unique.
The sharding key is a document ID, which can be independent of the contents of the JSON fields.
Whenever a new document is inserted, this ID must be provided by the application.

Using a JSON field as the Sharding Key

A JSON field can be used as the sharding key. This key does not need to be unique.

In this case, each document in a collection has a separate SODA ID (as required by SODA), but it is managed automatically by the SODA API as a separate document ID.

A summary of using a JSON field as the sharding key:

The sharding key does not need to be unique.
The sharding key is a field within the JSON of each document.
The SODA ID does not need to be specified when inserting a new document.

Considerations in choosing a Sharding Key method

Note that in both cases, a sharding key is a field which rarely or never changes. This might be a uniquely assigned Customer or Document ID. It can also be a non-unique ID such as a customer birth date, with day, month and year, or a postal code.

For system-managed sharding, either sharding key method is appropriate for distributing documents across shards.

For user-defined sharding, SODA ID as shard key only makes sense if the ID has a meaningful value and it makes sense to partition this by range, for example.

Given no other constraints, using a JSON field as the sharding key offers greater flexibility and allows the sharding key to be stored naturally as part of the JSON.

System-managed vs. User-defined Sharding

Although similar in many ways, user-defined sharding gives you greater control over where data resides. This can be useful when data needs to be separated geographically, or other reasons arise so that data also requires a physical mapping.

Much of the procedures and examples in later topics apply to both sharding methods. There are two exceptions:

On creation of the sharded table which underlies the SODA collection, the physical mapping for user-defined sharding must be specified. You can find an example in which a range of ZIP codes must reside on specific shards in Using a JSON Field as a Sharding Key.
SODA queries (QBEs) can rely on this data grouping to be able to perform queries on one shard which includes a range of sharding keys.

How to Implement a Solution

After choosing which type of sharding key to use, refer to the following use cases to see examples of how to create a sharded table for the JSON collection, and how to interact with the sharded table from an application.

Using SODA ID as the Sharding Key

You can designate the SODA ID as the sharding key when creating the sharded database schema.

The following examples show you how to create a sharded table for the JSON collection, create the SODA mapping, and access the sharded table from an application with Java and Python code samples.

Creating a Sharded Table for the JSON Collection

To create a sharded table that uses the SODA ID as the sharding key:

Connect to the shard catalog as the Sharding/SODA user.
Enable SHARD DDL.
Create a tablespace set.
Run CREATE SHARDED TABLE, as shown in the example below.

The following example creates a sharded table (Customers) for a JSON collection of customer profile documents (CUSTPROFILE).

A column for the SODA ID (ID) identifies the JSON entries, and is also used as the primary key and sharding key. When creating a JSON entry in the table with SODA, the application populates the ID column with a unique value.

The other columns are the default column names given when SODA creates a table to hold an underlying collection. You can see this for yourself when creating a SODA collection and then examining the created table.

Creating a Sharded Table: System-Managed

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create a tablespace set */
CREATE TABLESPACE SET TSP_SET_1 USING TEMPLATE
 (datafile size 100m autoextend on next 10M maxsize unlimited
 extent management local segment space management auto); 

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID),
)
TABLESPACE SET TSP_SET_1
PARTITION BY CONSISTENT HASH (ID) PARTITIONS AUTO;

Creating a Sharded Table: User-Defined

If the SODA ID has a meaningful value, then the database can be sharded with the user-defined method, and you can create a sharded table using the example below.

Before creating the sharded table in a user-defined sharded database, ensure that the necessary tablespaces and shardspaces have been created. See User-Defined Sharding and Configure the Sharded Database Topology for details about creating sharded database objects.

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID),
)
PARTITION BY RANGE (ID)
(PARTITION p1 VALUES LESS THAN ('5000') TABLESPACE ts1,
PARTITION p2 VALUES LESS THAN ('10000') TABLESPACE ts2)

Creating a Mapped SODA Collection on the Sharded Table

Create a mapped SODA collection to let SODA know which columns to use when working with the sharded table.

In this task, you first run a procedure to create the mapped collection, which creates the metadata necessary for SODA to recognize the previously created table as a SODA collection.

Afterwards you run an additional procedure, sys.exec_shard_plsql(), which ensures that the map collection is created on all shards and all future shards.

Creating a SODA Mapped Collection Across All Shards

As the Sharding/SODA user and with SHARD DDL enabled, run the following commands on the shard catalog. The shard catalog propagates the procedure to all of the shards to be processed automatically.

GRANT SODA_APP TO PROCEDURE APP_SCHEMA.COLLECTION_PROC_CUSTOMERS;

create or replace procedure COLLECTION_PROC_CUSTOMERS AS 
METADATA varchar2(8000);
 COL SODA_COLLECTION_T;
 begin METADATA := '{"tableName":"CUSTOMERS",
 "keyColumn":{"name":"ID","assignmentMethod" : "CLIENT"},
 "contentColumn":{"name":"CUSTPROFILE","sqlType":"JSON"},
 "versionColumn":{"name":"VERSION","method":"UUID"},
 "lastModifiedColumn":{"name":"LAST_MODIFIED"},
 "creationTimeColumn":{"name":"CREATED_ON"},
 "readOnly":false}'; 
-- Create a collection using "map" mode, based on 
-- the table you've created above and specified in 
-- the custom metadata under "tableName" field. 
COL := dbms_soda.create_collection('CUSTOMERS',METADATA,DBMS_SODA.CREATE_MODE_MAP); 
end ; 
/ 

exec sys.exec_shard_plsql('app_schema.collection_proc_customers()',4+1);

Note that the keyColumn is mapped as ID, which holds the unique ID of each document. It is designated as CLIENT here because the application will supply a unique key for each document on insert.

At this point, a new collection has been created.

You can run PL/SQL to list the collections. On the shard catalog, run the following commands, and verify that the output lists the CUSTOMERS collection as shown here.

SET SERVEROUTPUT ON
DECLARE
l_coll_list SODA_COLLNAME_LIST_T;
BEGIN
l_coll_list := DBMS_SODA.list_collection_names;
 
IF l_coll_list.COUNT > 0 THEN
FOR i IN 1 .. l_coll_list.COUNT LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_coll_list(i));
END LOOP;
END IF;
END;
/
1 : CUSTOMERS
 
PL/SQL procedure successfully completed.

Code Samples

The following code samples in Java and Python show you how to connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards, and not using the shard catalog.

Java Code Sample

These Java code samples are created for the "Using SODA ID as the Sharding Key" use case.

The Java sample below shows you how to connect to a shard and insert a JSON document into the collection.

import java.sql.Connection;
import java.util.Properties;
import java.util.List;
 
// SODA specific imports
import oracle.soda.rdbms.OracleRDBMSClient;
import oracle.soda.OracleDatabase;
import oracle.soda.OracleCursor;
import oracle.soda.OracleCollection;
import oracle.soda.OracleDocument;
import oracle.soda.OracleException;
 
// Sharding and UCP imports
import oracle.jdbc.OracleShardingKey;
import oracle.jdbc.OracleType;
import oracle.jdbc.pool.OracleDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
 
 
/*
* The sample demonstrates connecting to a Sharded Database using
* Oracle JDBC driver and UCP as a client side connection pool.
*/
public class QuickInsertShard {
 
 public static void main(String args[]) throws Exception {
 
// TNS_ADMIN - Should be the path where the tnsnames.ora file resides
// dbshard_rw - It is the TNS alias present in tnsnames.ora.
// Note that the connection is to the Shard Director (GSM) and the service name is the shard RW service
final String DB_URL="jdbc:oracle:thin:@dbshard_rw?TNS_ADMIN=/home/opc/dbhome/";
 
// Update the Database Username and Password to the Shard User
final String DB_USER = "app_schema";
String DB_PASSWORD = "<user_password>" ;
 
 
// Get the PoolDataSource for UCP
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
 
// Set the connection factory first before all other properties
pds.setConnectionFactoryClassName(OracleDataSource.class.getName());
pds.setURL(DB_URL);
pds.setUser(DB_USER);
pds.setPassword(DB_PASSWORD);
pds.setConnectionPoolName("JDBC_UCP_POOL");
 
// Default is 0. Set the initial number of connections to be created
// when UCP is started.
pds.setInitialPoolSize(10);
// Default is 0. Set the minimum number of connections
// that is maintained by UCP at runtime.
pds.setMinPoolSize(10);
// Instead of Max Pool Size, we can set the number of max connections per shard
pds.setMaxConnectionsPerShard(20);
 
// We cannot get the connection until we have the Shard key which is part of the SQL
//We first set the sharding key or document id explicitly
String shardingKeyVal="10";
 
// Now we build the connection using this shard key
OracleShardingKey sdkey = pds.createShardingKeyBuilder().subkey(shardingKeyVal, OracleType.VARCHAR2).build();
System.out.println("Initiating UCP and Creating Connection...");
Connection conn = pds.createConnectionBuilder().shardingKey(sdkey).build();
 
// Enable the SODA Shared Metadata cache
Properties props = new Properties();
props.put("oracle.soda.sharedMetadataCache", "true");
OracleRDBMSClient cl = new OracleRDBMSClient(props);
 
// Get a DB Connection for use in SODA
OracleDatabase db = cl.getDatabase(conn);
 
// Print all the Collections in this DB
List<String> names =  db.admin().getCollectionNames();
for (String name : names)
 System.out.println ("Collection name: " + name);
 
// Open up the CUSTOMERS Collection
OracleCollection col = db.openCollection("CUSTOMERS");
 
//For a collection configured with client-assigned document keys,
//you must provide the key for the input document. Build a document with JSON.
OracleDocument cKeyDoc = db.createDocumentFromString(shardingKeyVal, "{\"name\": \"Matilda\", \"State\": \"CA\", \"ZIP\":\"94065\"}");
 
// Insert the document above
//If the key  already identifies a document in the collection
//then this will replace the existing doc.
OracleDocument savedDoc = col.saveAndGet(cKeyDoc);
 
// Get the document back assuming we only know the key
// We are still connected to the same shard
OracleDocument doc = col.find().key(shardingKeyVal).getOne();
String content = doc.getContentAsString();
System.out.println("Retrieved content is: " + content);
 
// We are done, so close the connection to the shard
conn.close();
 
// At this point we could open up a new shard connection using a different sharding key
 
 
 }} // End of QuickInsertShard

This Java sample shows how you would perform a multi-shard query.

import java.sql.Connection;
import java.util.Properties;
import java.util.List;
 
// SODA specific imports
import oracle.soda.rdbms.OracleRDBMSClient;
import oracle.soda.OracleDatabase;
import oracle.soda.OracleCursor;
import oracle.soda.OracleCollection;
import oracle.soda.OracleDocument;
import oracle.soda.OracleException;
 
// Sharding and UCP imports
import oracle.jdbc.OracleShardingKey;
import oracle.jdbc.OracleType;
import oracle.jdbc.pool.OracleDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
 
 
/*
* The sample demonstrates connecting to a Sharded Database using
* Oracle JDBC driver and UCP as a client side connection pool.
*/
public class QuickQueryCat {
 
 public static void main(String args[]) throws Exception {
 
// TNS_ADMIN - Should be the path where the tnsnames.ora file resides
// dbshard_rw - It is the TNS alias present in tnsnames.ora.
// This connection is to the shard director using the catalog service name.
final String DB_URL="jdbc:oracle:thin:@dbcat?TNS_ADMIN=/home/opc/dbhome/";
 
// Update the Database Username and Password to the Shard User
final String DB_USER = "app_schema";
String DB_PASSWORD = "<user_password>" ;
 
 
// Get the PoolDataSource for UCP
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
 
// Set the connection factory first before all other properties
pds.setConnectionFactoryClassName(OracleDataSource.class.getName());
pds.setURL(DB_URL);
pds.setUser(DB_USER);
pds.setPassword(DB_PASSWORD);
pds.setConnectionPoolName("JDBC_UCP_POOL");
 
// Now we get a direct connection to the shard catalog
System.out.println("Initiating UCP and Creating Connection...");
Connection conn = pds.getConnection();
 
// Enable the SODA Shared Metadata cache
Properties props = new Properties();
props.put("oracle.soda.sharedMetadataCache", "true");
OracleRDBMSClient cl = new OracleRDBMSClient(props);
 
// Get a DB Connection
OracleDatabase db = cl.getDatabase(conn);
 
// Print all the Collections in this DB
List<String> names =  db.admin().getCollectionNames();
for (String name : names)
 System.out.println ("Collection name: " + name);
 
// Open up the CUSTOMERS Collection
OracleCollection col = db.openCollection("CUSTOMERS");
 
// Do a search across ALL Shards. In this case all users named Matilda
// Setup the specification and open a cursor
OracleDocument filterSpec = db.createDocumentFromString("{ \"name\" : \"Matilda\"}");
 
OracleCursor c = col.find().filter(filterSpec).getCursor();
 
// Print the results of the query
while (c.hasNext()) {
  OracleDocument resultDoc = c.next();
 
  // Print the document key and document content
  System.out.println ("Document key: " + resultDoc.getKey() + "\n" +
                        " document content: " + resultDoc.getContentAsString());
}
 
// Close the cursor
c.close();
 
// Here, we could initiate another multi-shard query if desired
 
// We are done, so close the connection
conn.close();
 
 
 }} // End of QuickQueryCat

Python Code Sample

This Python sample shows how you can actually work with JSON objects using SODA in a sharded database environment.

To use this sample code in your environment, follow the instructions to install the cx_Oracle module for Python: https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html

This example shows how to connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

# import the cx_Oracle module for Python
import cx_Oracle

# Create a connection pool that will be used for connecting to all shards
# The components of the dsn are hostname (shard director),
# port (usually 1522), global service (created with GDSCTL)
# The pool is then created and SODA metadata caching is enabled.
dsn=cx_Oracle.makedsn("shard_director_host",1522,service_name="service_name")
pool=cx_Oracle.SessionPool("app_schema","password",dsn, soda_metadata_cache=True)

# Connect to a specific shard by using the sharding key, which in this example is
# set explicitly with "sodaid", but this might be passed in or part of a loop 
# You must know beforehand if you are creating or working with a document for a specific Customer
# 
sodaid="2468"
connection=pool.acquire(shardingkey=[sodaid])

# Set autocommit and open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")

# Insert a document
# Because you are specifying the shard key, you must pass that in with the document (key=custid)
# The value can be a UUID for example but it need not have any relation to the JSON Content.

content = {'name': 'Matilda', 'State': 'CA', 'ZIP':'94065'}
idcontent=soda.createDocument(content, key=sodaid)
doc = collection.insertOneAndGet(idcontent)

# Fetch the document back by key
doc = collection.find().key(sodaid).getOne()
content = doc.getContent()
print('Retrieved SODA document dictionary is:')
print(content)

# After you have finished, release this connection back into the pool
pool.release(connection)
  
# If you want to add or work with more customers, start with another connection
# For example: connection=pool.acquire(shardingkey=["123"]) and so on.
  
#When you are completely finished working with customers you can shut down the pool
pool.close()

This code sample shows you how to run a multi-shard query to return all customer names whose names begin with an "M".

import cx_Oracle
  
# Create an unpooled connection to the shard catalog
# In general, pooled connections should be used for all connections. This is shown here only as an example.
# The connect string connects to the shard director, but uses the catalog service, e.g. GD$catalog.oradbcloud
connection = cx_Oracle.connect("app_schema","password","db_connect_string")
  
# Open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
  
# Now query the collection
# It is important to note that this is a query across ALL shards
# In other words, you will get ALL users whose names start with M
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
  
# Close the connection
connection.close()

Using a JSON Field as a Sharding Key

You can designate a JSON field to be the sharding key when creating your sharded database schema.

The examples in the topics that follow show you how to create a sharded table for the JSON collection, create the SODA mapping, trigger the sharding key column population, and access the sharded table from an application with Java and Python code samples.

Creating a Sharded Table for the JSON Collection

To create a sharded table that uses a sharding key other than the SODA ID:

Connect to the shard catalog as the Sharding/SODA user.
Enable SHARD DDL.
Create a tablespace set.
Run CREATE SHARDED TABLE, as shown in the example below.

The following examples create a sharded table (Customers) for a JSON collection of customer profile documents (CUSTPROFILE).

A column for the SODA ID (ID) identifies the JSON entries. When creating a JSON entry in the table with SODA, the application populates the ID column with a unique value.

A sharding key column (ZIP) is the ZIP code value extracted from the JSON document.

The other columns are the default column names given when SODA creates a table to hold an underlying collection. You can see this for yourself when creating a SODA collection and then examining the created table.

Note that the ID column by itself cannot be the primary key. The PK must be or must include the sharding key, in this case ZIP. In the application examples, both ID and ZIP are used to work with the sharded data. In the example above the PK consists of the sharding key and the SODA ID (ZIP, ID), because ZIP will not be a unique value by itself.

Note that in Oracle 21c, you can use either (ZIP, ID) or (ID, ZIP) as the combined Primary Key. In general, you should expect access to this table to be for these values individually, not as a combination. SODA access for these examples looking for ID and customer queries might be using the JSON field (ZIP in this case), so you will create individual indexes in any case. .

Choosing a good sharding key depends on the usage and application requirements. You can use a unique sharding key, for example a Customer ID, but in that case you could also use the SODA ID to store the sharding key.

Creating a Sharded Table: System-Managed

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create a tablespace set */
CREATE TABLESPACE SET TSP_SET_1 USING TEMPLATE
 (datafile size 100m autoextend on next 10M maxsize unlimited
 extent management local segment space management auto); 

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"ZIP" VARCHAR2(60) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID,ZIP))
TABLESPACE SET TSP_SET_1
PARTITION BY CONSISTENT HASH (ZIP) PARTITIONS AUTO;

Creating a Sharded Table: User-Defined

Ensure that all of the necessary tablespaces and shardspaces have been created.

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"ZIP" VARCHAR2(60) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID,ZIP))
PARTITION BY RANGE (ZIP)
(PARTITION p1 VALUES LESS THAN ('50000') TABLESPACE ts1,
PARTITION p2 VALUES LESS THAN ('99999') TABLESPACE ts2)

Creating a Mapped SODA Collection on the Sharded Table

Create a map to let SODA know which columns to use when working with the sharded table, and add the sharded table to the list of collections.

You can run a procedure to create the map, but this procedure also must be run on ALL of the shards in the sharded database. The procedure also needs to be run on any shards added in the future. You can accomplish both of these requirements using a sharding-specific PL/SQL procedure, sys.exec_shard_plsql().

To create a SODA map across all shards:

As the Sharding/SODA user and with SHARD DDL enabled, run the following commands on the shard catalog. The shard catalog propagates the procedure to all of the shards to be processed automatically.

create or replace procedure COLLECTION_PROC_CUSTOMERS AS
METADATA varchar2(8000);
COL SODA_COLLECTION_T;
begin
METADATA := '{"tableName":"CUSTOMERS",
"keyColumn":{"name":"ID"},
"contentColumn":{"name":"CUSTPROFILE","sqlType":"JSON"},
"versionColumn":{"name":"VERSION","method":"UUID"},
"lastModifiedColumn":{"name":"LAST_MODIFIED"},
"creationTimeColumn":{"name":"CREATED_ON"},
"readOnly":false}';
 -- Create a collection using "map" mode, based on
 -- the table you've created above and specified in 
 -- the custom metadata under "tableName" field.
COL := dbms_soda.create_collection('CUSTOMERS',METADATA,DBMS_SODA.CREATE_MODE_MAP);
end ;
/
 
exec sys.exec_shard_plsql('app_schema.collection_proc_customers()',4+1);

Note that the keyColumn is ID, the key used by SODA to insert and retrieve collections. There is no reference to the ZIP column because it is not used by SODA in the mapping.

At this point, a new collection has been created just as if you had run a CREATE COLLECTION command.

You can run some PL/SQL to list out the collections. On the shard catalog, run the following command, and verify that the output lists the Customers table.

SET SERVEROUTPUT ON
DECLARE
l_coll_list SODA_COLLNAME_LIST_T;
BEGIN
l_coll_list := DBMS_SODA.list_collection_names;
 
IF l_coll_list.COUNT > 0 THEN
FOR i IN 1 .. l_coll_list.COUNT LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_coll_list(i));
END LOOP;
END IF;
END;
/
1 : CUSTOMERS
 
PL/SQL procedure successfully completed.
 
SQL>

Creating a Trigger to Populate the Sharding Key

When SODA inserts or updates the document, it automatically populates the underlying table columns described in the collection metadata (that is ID, CUSTPROFILE, LAST_MODIFIED, CREATED_ON, and VERSION). However, you also need to populate the ZIP column, and the value must come from within the JSON document. This is accomplished using a trigger.

Note that this is a BEFORE trigger, which allows you to populate a column even when that column is the primary key.

Run the following statements on the shard catalog as the application schema user. The procedure sys.exec_shard_plsql ensures that it is also run on all shards and all future shards.

alter session enable shard ddl
 
create or replace procedure COLLECTION_BF_ZIP_CUSTOMERS AS
begin
EXECUTE IMMEDIATE 'alter session enable shard operations';
EXECUTE IMMEDIATE q'%
Create or Replace TRIGGER CUST_BF_TRIG
BEFORE INSERT or UPDATE on CUSTOMERS
FOR EACH ROW
begin
:new.ZIP := JSON_VALUE(:NEW.CUSTPROFILE, '$.ZIP' error on error error on empty);
end;
%';
end;
/

exec sys.exec_shard_plsql('app_schema.collection_bf_zip_customers()',4+1+2);

In the example above, ZIP is assumed to be a top-level field in the JSON document. If the value is in a nested field, for example under an ADDRESS field, you must include the field hierarchy, for example '$.ADDRESS.ZIP'.

Code Samples

The Java and Python code samples for "Using a JSON Field as the Sharding Key" demonstrate how you can actually work with JSON objects using SODA in a sharded database environment.

In these examples, you connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

Java Code Sample

The Java code sample below shows you how to insert JSON documents in a collection where the data is sharded by a JSON field, ZIP code in this example.

import java.sql.Connection;
import java.util.Properties;
import java.util.List;
 
// SODA specific imports
import oracle.soda.rdbms.OracleRDBMSClient;
import oracle.soda.OracleDatabase;
import oracle.soda.OracleCursor;
import oracle.soda.OracleCollection;
import oracle.soda.OracleDocument;
import oracle.soda.OracleException;
 
// Sharding and UCP imports
import oracle.jdbc.OracleShardingKey;
import oracle.jdbc.OracleType;
import oracle.jdbc.pool.OracleDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
 
 
/*
* The sample demonstrates connecting to a Sharded Database using
* Oracle JDBC driver and UCP as a client side connection pool.
*/
public class QuickInsertShardJSONField {
 
 public static void main(String args[]) throws Exception {
 
// TNS_ADMIN - Should be the path where the tnsnames.ora file resides
// dbshard_rw - It is the TNS alias present in tnsnames.ora.
// Note that the connection is to the Shard Director (GSM) and the service name is the shard RW service
final String DB_URL="jdbc:oracle:thin:@dbshard_rw?TNS_ADMIN=/home/opc/dbhome/";
 
// Update the Database Username and Password to the Shard User
final String DB_USER = "app_schema";
String DB_PASSWORD = "<user_password>" ;
 
 
// Get the PoolDataSource for UCP
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
 
// Set the connection factory first before all other properties
pds.setConnectionFactoryClassName(OracleDataSource.class.getName());
pds.setURL(DB_URL);
pds.setUser(DB_USER);
pds.setPassword(DB_PASSWORD);
pds.setConnectionPoolName("JDBC_UCP_POOL");
 
// Default is 0. Set the initial number of connections to be created
// when UCP is started.
pds.setInitialPoolSize(10);
// Default is 0. Set the minimum number of connections
// that is maintained by UCP at runtime.
pds.setMinPoolSize(10);
// Instead of Max Pool Size, we can set the number of max connections per shard
pds.setMaxConnectionsPerShard(20);
 
// We cannot get the connection until we have the Shard key which is part of the SQL
//We first set the sharding key which in our case is the value of the ZIP code field
String shardingKeyVal="94065";
 
// Now we build the connection using this shard key
OracleShardingKey sdkey = pds.createShardingKeyBuilder().subkey(shardingKeyVal, OracleType.VARCHAR2).build();
System.out.println("Initiating UCP and Creating Connection...");
Connection conn = pds.createConnectionBuilder().shardingKey(sdkey).build();
 
// Enable the SODA Shared Metadata cache
Properties props = new Properties();
props.put("oracle.soda.sharedMetadataCache", "true");
OracleRDBMSClient cl = new OracleRDBMSClient(props);
 
// Get a DB Connection
OracleDatabase db = cl.getDatabase(conn);
 
// Print all the Collections in this DB
List<String> names =  db.admin().getCollectionNames();
for (String name : names)
 System.out.println ("Collection name: " + name);
 
// Open up the CUSTOMERS Collection
OracleCollection col = db.openCollection("CUSTOMERS");
 
//We do not provide an SODA ID column.
//This is provided by SODA when the document is created
// Note that the ZIP field MUST match what we have specified as the key
OracleDocument cDoc = db.createDocumentFromString("{\"name\": \"Matilda\", \"State\": \"CA\", \"ZIP\":\"94065\"}");
 
// Insert the document above
OracleDocument insertedDoc = col.insertAndGet(cDoc);
 
// Get the document key
String dockey = insertedDoc.getKey();
 
// Get the document back by key
// We are still connected to the same shard
OracleDocument doc = col.find().key(dockey).getOne();
String content = doc.getContentAsString();
System.out.println("Retrieved content is: " + content);
 
// We are done, so close the connection to the shard
conn.close();
 
// At this point we could open up a new shard connection using a different sharding key
 
 
 }} // End of QuickInsertShardJSONField

Python Code Sample

This code sample in Python shows how you can actually work with JSON objects using SODA in a sharded database environment.

To use this sample code in your environment, follow the instructions to install the cx_Oracle module for Python: https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html

In this example, you connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

# import the cx_Oracle module for Python
import cx_Oracle
 
# Create a connection pool that will be used for connecting to all shards
# The components of the dsn are hostname (shard director),
# port (usually 1522), global service (created using GDSCTL)
# We also enable SODA metadata caching
dsn=cx_Oracle.makedsn("shard_director_host",1522,service_name="service_name")
pool=cx_Oracle.SessionPool("app_schema","password",dsn,soda_metadata_cache=True)

# Connect to a specific shard by using the shard key, a ZIP code. which in this
# example is set explicitly as '94065', but this might be passed in or part of a loop 
# You must know beforehand whether you are creating or working with a document
# with a specific ZIP code value.
connection=pool.acquire(shardingkey=["94065"])
 
# set autocommit and open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
 
# Insert a document
# A system generated SODA key is created by default.
content = {'name': 'Matilda', 'STATE': 'CA', 'ZIP': '94065'}
doc = collection.insertOneAndGet(content)
 
# The SODA key can now be used to work with this document directly
# We can retrieve it immediately
key = doc.key
print('The key of the new SODA document is: ', key)
 
# Fetch the document back by this same SODA key. 
# This only works because we are still connected to the same shard
doc = collection.find().key(key).getOne()
content = doc.getContent()
print('Retrieved SODA document dictionary is:')
print(content)
 

# Next, add another customer. We are in the shard containing 94065,
# so we can add a customer with the same ZIP code '94065'
content = {'name': 'Mildred', 'STATE': 'CA', 'ZIP: '94065'}
doc = collection.insertOneAndGet(content)
 
# Now do a query.
# It is important to note that this query is ONLY executed within this one shard,
# the shard which contains the part of the sharded table with 94065 ZIP codes.
# In other words, the actual query has the additional bound of customers whose
# names start with 'M' in 94065
# and any other ZIPs stored on this shard. This is unlikely to be a useful query
# for system-managed sharding.
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
 
# After you have finished, release this connection back into the pool
pool.release(connection)
 
# If you want to add or work with more customers with a different
# shard key start with another connection
# For example: connection=pool.acquire(shardingkey=["10012"]) and so on.
 
# When you are completely finished working with customers, shut down the pool.
pool.close()

This code sample shows you how to run a multi-shard query to return all customer names in all shards whose names begin with an "M".

import cx_Oracle
 
# Create an unpooled connection to the shard catalog
# The connect string connects to the shard director, but uses the catalog service,
# e.g. GD$catalog.oradbcloud
connection = cx_Oracle.connect("app_schema","password","db_connect_string")

# Open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
  
# Now query the collection
# It is important to note that this is a query across ALL shards
# In other words, you will get ALL users whose name starts with M across ALL Zip codes
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
  
#Close the connection
connection.close()

Additional Information About Sharding with SODA

Performance Tuning

Metadata and Statement Caching

For all implementations, statement caching should be turned on the connection pool. This avoids unnecessary round trips to the database.

To turn on SODA metadata caching:

In Java:

Properties props = new Properties();
props.put("oracle.soda.sharedMetadataCache", "true");
OracleRDBMSClient cl = new OracleRDBMSClient(props);

More information is available at SODA Collection Metadata Caching.

In Python:

# Create the session pool
pool = cx_Oracle.SessionPool(user="hr", password=userpwd,
               dsn="dbhost.example.com/orclpdb1",soda_metadata_cache=True)

More information is available at Using the SODA Metadata Cache

Threading

For optimal use of resources, an instantiation of OracleClient is only required once as it is shared among threads.

The objects obtained from it, such as OracleDatabase and consequently OracleCollection are not thread-safe and do need to be instantiated when creating new requests.

Index Creation and Management

Oracle Sharding requires that the shard key be part of the Primary Key. There are no restrictions on creating additional indexes.

All of the guidelines provided by the SODA documentation on creating and managing indexes continue to apply.

Scaling Out Shards

When adding a new shard to the database configuration, all of the DDL, including the SODA metadata and triggers, are automatically available on the new shard.

No extra configuration is required for SODA/JSON Sharding.