Schema Flexibility in Oracle NoSQL Database

Unlike the relational database world with purely fixed schemas, NoSQL Database is largely about schema flexibility – that is the ability to easily change how data is organized and stored.

Schema flexibility in Oracle NoSQL Database mostly takes the form of non-scalar data types. These non-scalar data types can be used to embed flexible structures inside your tables.

Non-scalar data types:

Oracle NoSQL database supports the following non-scalar data types:
  • JSON – JSON is a map of key/value pairs that can be used as a datatype of a column in the Oracle NoSQL Database. The JSON datatype gives you the ability to dynamically read and write attributes having no prior knowledge of what is and what is not stored in the JSON document. You can introspect into the document by reading from Oracle NoSQL Database as a JSON string, or you can specify path expressions as deep as you like into a hierarchy of JSON. As an example, you can create a JSON document that represents the variable terms and conditions of a contract. The document attribute names (or keys) can represent the tag or name of the contractual term or condition and the value of the attribute can represent the text of that term. Using a JSON brings you ultimate flexibility in your data model.
  • Records – Records containing scalar or non-scalar values can be used as a datatype for a column in an Oracle NoSQL Database table. You can think of a record as a document with a fixed set of attributes, however noting that one or more attributes of the record can be a non-fixed array or JSON document, giving you the flexibility to extend a fixed document without modifying the schema. Records present an interesting intermediate step between the benefits of the fixed schema world (single copy of a schema) and the ultimate flexibility of the JSON world.
  • Arrays – Arrays of scalar or non-scalar values can be used as a datatype for a column in an Oracle NoSQL Database table. Arrays can be convenient for storing a collection of event values. For example, you may wish to collect a list of behavioral segments for users as they browse web pages.

Trade-offs while using Flexible Schema:

Some guidelines that you can follow while considering flexible schema are listed below.

The Flexibility/Cost of scale Tradeoff :

When thinking about how flexible you want your schema to be, it’s important to understand that the more flexible you make your schema, the bigger the challenge is for scaling your solution. For example, let’s say that you are storing information on user behavior. And you want to store this information as the users access your website. You can implement one of the two options here. You can choose to model the solution using fixed columns for the required user attributes that you will need to track. Alternatively, you can choose to model this using a JSON document, giving you the flexibility to add and remove attributes for users without having to evolve your schema.

The second option may work quite well for small numbers of users; however, if you will need to scale this solution to large numbers of users, then you need extra storage. You also need additional compute overhead for processing the key/value pairs (attribute names and their values) in the JSON document. This could make the cost of scaling your solution prohibitive. Extra storage is needed to store the metadata along with the data (e.g. the attribute names) and extra compute is needed to serialize and de-serialize these documents. If you are using a replication factor of more than one, then that adds additional overhead for each tracked user. If a large scale is a major requirement for you then you’ll most likely want to trade off flexibility for storage efficiency and consider using a more fixed schema.

The Flexibility/Latency Tradeoff :

In many NoSQL applications, low latency data access is a key requirement. In these situations, it’s important to understand the potential tradeoff with respect to the I/O latency of using one data modeling method over another. In this respect, using a non-scalar data type such as a record, array, or JSON document will entail a read followed by an update.

For example, when you add a new value to an array your application must read the record from NoSQL Database first, add the value to the array, and then write it back to NoSQL Database. Even if performing this operation using the SQL UPDATE operator of Oracle NoSQL Database (which executes in the replication node), the record must still be read from persistent storage, de-serialized, modified, serialized, and written back. On a system with a spinning disk, this could cost anywhere from fifteen to thirty milliseconds (or more). For certain applications like online advertising, this may be beyond the latency SLA that can be tolerated. If you are faced with similar stringent latency SLAs then you should consider favoring a child table approach which eliminates the read and will allow you to simply perform a write of the new value. Of course, the tradeoff here is one of flexibility for low latency.

For more information on when to use parent-child tables, see Using Parent-Child tables in Oracle NoSQL Database.

Updates to Non-Scalars versus Inserts :

The Oracle NoSQL Database storage engine is based on an append-only architecture, also known as log-structured storage. Log structured storage systems perform extremely well for insert operations where the new records to be inserted are simply appended to the end of the log. Update operations involve appending the updated record to the log and then marking the old record for deletion. Records marked for deletion are regularly cleaned from NoSQL Database’s logs to free up disk space by a background process called the cleaner. Although the cleaner is highly optimized, it will add some CPU and I/O overhead to the replication node. The more updates performed by your application, the more log cleaning activity there will be.

As a guideline, if you have extreme performance goals for your application (or for a specific table), you should strongly consider trying to craft your data model by using parent/child tables versus non-scalar columns, giving you the potential for replacing updates with inserts. For more information on when to use parent-child tables, see Using Parent-Child tables in Oracle NoSQL Database.

Static Vs Dynamic Data :

In many applications, it’s possible to identify portions of the data that are somewhat static and change relatively slowly, and other portions of the data which are highly dynamic and change frequently, even at millisecond granularity. For example, in online advertising, campaigns are a relatively slow-moving piece of data while the budget spent (impressions or clicks delivered) can change every few milliseconds as millions of users load web pages that have ads associated with the campaign. The data pertaining to budgets is a case of highly dynamic data. This is an example of a scenario that has a high velocity of write operations. Oracle NoSQL database is a log-structured, append-only storage architecture, where inserts are more optimal than an update operation.

For the more static portions of your data, the flexibility of the non-scalar datatypes may be an attractive option for your application. Using a JSON document could provide an extensible way for your application to interact with this data without undue sacrifice to performance. On the flip side, for data that is changing rapidly or being inserted rapidly, you’ll want to consider trading off flexibility for this data and use a parent table with a fixed schema and a child table with a fixed schema. Whether or not you choose to model the rapidly changing data as a parent or child table will depend largely on how you wish to access it. For more information on when to use parent-child tables, see Using Parent-Child tables in Oracle NoSQL Database.

JSON Collection Tables:

Oracle NoSQL Database supports JSON Collection tables. This is particularly useful for applications that store and retrieve their data purely as documents. Such tables contain primary key fields and a document. The schema of the JSON Collection table can't be altered to add typed fields. JSON Collection tables are created to simplify the management and manipulation of documents. The JSON collection table eliminates the need to declare fields as type JSON during table creation. When you insert data into the table, each row is inserted as a single document containing any number of JSON fields. You can add JSON fields through INSERT or UPSERT operations.