Table Design

Learn how to design and configure tables in Oracle NoSQL Database Cloud Service.

Table

A table is a collection of rows, where each row holds a data record. Each table row consists of key and data fields, which are defined when a table is created. In addition, a table has a specified storage, can support a defined maximum read and write throughput, and has a maximum size.

Learn about the following concepts before designing a table in Oracle NoSQL Database Cloud Service:

  • Table Fields: Tables are created using DDL (Data Definition Language) that defines the data types and primary keys used for the table. Oracle NoSQL Database Cloud Service supports multiple data types, including several numeric types, string, binary, timestamp, maps, arrays, records, and a special JSON type which can hold valid JSON data. An application can choose data types for the table fields based on the data model. See Supported Data Types for a complete list of data types supported in Oracle NoSQL Database Cloud Service.

    See Table Fields to learn more about table fields.

  • Primary and Shard Keys: Every table must have one or more fields designated as the primary key. This designation occurs at the time of table creation, and cannot be changed after the table is created. A primary key uniquely identifies every row in the table. In the simplest case, a primary key is used to retrieve a specific row to examine or modify.

    Shard keys identify which primary key fields are meaningful in terms of shard storage. That is, rows which contain the same values for all the shard keys are guaranteed to be stored on the same shard. This shard storage matters for some operations that promise atomicity of the results.

    See Primary Keys and Shard Keys to learn more.

  • Indexes: Indexes represent an alternative way of retrieving table rows. Normally you retrieve table rows using the primary key. By creating an index, you can more efficiently retrieve rows based on fields that are not part of the primary key. Indexes provide more querying capability, but consume both storage and throughput resources.

    See Creating Tables and Indexes to learn how to create an index.

  • Capacity: When you create a table, you also specify throughput and storage resources available for the table. Read and write operations to the table are limited by the read and write throughput capacity that you define. The amount of space that the table can use is limited by the storage capacity.

    See Estimating Capacity for information on how to estimate the capacity that you should specify for your table.

    See Handling Capacity to learn how to handle the capacity of your table.

  • Time to Live (TTL): Time to Live allows you to automatically expire table rows. TTL is expressed as the time data is allowed to live in the table. Data which has reached the expiration timeout value can no longer be retrieved, and does not appear in any read operations.

    See Time to Live to learn more.

  • Identity Columns: Identity columns are a special type of columns that get their values automatically assigned by Oracle NoSQL Database Cloud Service. These values are generated from a sequence generator. See Identity Column from SQL Reference for Oracle NoSQL Database to learn more.
  • Table Life cycles: When tables are created, modified, or deleted, they transition through different table life cycle states.

    See Table States and Life Cycles to learn about the life cycle of a table.

Supported Data Types

Oracle NoSQL Database Cloud Service supports many common data types.

Data Type Description

BINARY

A sequence of zero or more bytes.

FIXED_BINARY A fixed-size byte array.

BOOLEAN

A data type with one of two possible values: TRUE or FALSE.

DOUBLE

A 64 bit (8 bytes) long floating-point number.

LONG

A 64-bit long integer number.

INTEGER

A 32 bit (4 bytes) long integer number.

STRING

A sequence of Unicode characters.

NUMBER

An arbitrary-precision signed decimal number.

TIMESTAMP

A point in time with a precision. The precision affects the storage size and usage. Timestamp is stored and managed in UTC (Coordinated Universal Time).

ENUM

An enumeration, represented as an array of strings. ENUM values are symbolic identifiers (tokens) and are stored as a small integer value representing an ordered position in the enumeration.

ARRAY

An ordered collection of zero of more typed items. Arrays that are not defined as JSON cannot contain NULL values.

Arrays declared as JSON can contain any valid JSON, including the special value, null, which is relevant to JSON.

MAP

An unordered collection of zero or more key-item pairs, where all keys are strings and all items are the same type. All keys must be unique. The key-item pairs are called fields, the keys are field names, and the associated items are field values. Field values can have different types, but maps cannot contain NULL field values.

RECORD

A fixed collection of one or more key-item pairs, where all keys are strings. All keys in a record must be unique.

JSON

Any valid JSON data.

Table Fields

Learn how to design and configure data using table fields.

An application may choose to use schemaless tables, where a row consists of key fields and a single JSON data field. A schemaless table offers flexibility in what can be stored in a row.

Alternatively, the application can choose to use fixed schema tables, where all of the table fields are defined as specific types.

Fixed schema tables with typed data are safer to use from an enforcement and storage efficiency standpoint. Even though the schema of fixed schema tables can be modified, their table structure cannot easily be changed. A schemaless table is flexible and the table structure can be easily modified.

Finally, an application can also use a hybrid data model approach where a table can have typed data and JSON data fields.

The following examples demonstrate how to design and configure data for all three approaches.

Example 1: Designing a Schemaless Table

You have multiple options to store information about browsing patterns in your table. One option is to define a table that uses a cookie ID as a key and keeps audience segmentation data as a single JSON field.

// schema less, data is stored in a JSON field
CREATE TABLE audience_info (
       cookie_id LONG,
       audience_data JSON,
       PRIMARY KEY(cookie_id))

In this case, the audience_info table can hold a JSON object such as:

{
  "cookie_id": "",
  "audience_data": {
    "ipaddr" : "10.0.00.xxx",
    "audience_segment: {
       "sports_lover" : "2018-11-30",
       "book_reader" :  "2018-12-01"
    }
  }
}

Your application will have a key field and a data field for this table. You have flexibility in what you chose to store as information in your audience_data field. Therefore, you can easily change the types of information available.

Example 2: Designing a Fixed Schema Table

You can store information about browsing patterns by creating your table with more explicitly declared fields:

// fixed schema, data is stored in typed fields.
CREATE TABLE audience_info(
       cookie_id LONG,
       ipaddr STRING,
       audience_segment RECORD(sports_lover TIMESTAMP(9),
                               book_reader TIMESTAMP(9)),
       PRIMARY KEY(cookie_id));

In this example, your table has a key field and two data fields. Your data is more compact, and you are able to ensure that all data fields are accurate.

Example 3: Designing a Hybrid Table

You can store information about browsing patterns by using both typed and JSON data fields in your table.

// mixed, data is stored in both typed and JSON fields.
CREATE TABLE audience_info (
       cookie_id LONG,
       ipaddr STRING,
       audience_segment JSON,
       PRIMARY KEY(cookie_id));

Primary Keys and Shard Keys

Learn the purpose of primary keys and shard keys while designing your application.

Primary keys and shard keys are important elements in your schema and help you access and distribute data efficiently. You create primary keys and shard keys only when you create a table. They remain in place for the life of the table, and cannot be changed or dropped.

Primary Keys

You must designate one or more primary key columns when you create your table. A primary key uniquely identifies every row in the table. For simple CRUD operations, Oracle NoSQL Database Cloud Service uses the primary key to retrieve a specific row to read or modify. For example, consider a table has the following fields:

  • productName

  • productType

  • productLine

From experience, you know that the product name is important as well as unique to each row, so you set the productName as the primary key. Then, you retrieve rows of interest based on the productName. In such a case, use a statement like this, to define the table.

/* Create a new table called users. */
CREATE TABLE if not exists myProducts 
(
  productName STRING,
  productType STRING,
  productLine INTEGER,
  PRIMARY KEY (productName)
)";

Shard Keys

The main purpose of shard keys is to distribute data across the Oracle NoSQL Database Cloud Service cluster for increased efficiency, and to position records that share the shard key locally for easy reference and access. Records that share the shard key are stored in the same physical location and can be accessed atomically and efficiently.

Your Primary and shard key design has implications on scaling and achieving provisioned throughput. For example, when records share shard keys, you can delete multiple table rows in an atomic operation, or retrieve a subset of rows in your table in a single atomic operation. In addition to enabling scalability, well-designed shard keys can improve performance by requiring fewer cycles to put data to, or get data from, a single shard.

For example, suppose that you designate three primary key fields:

PRIMARY KEY (productName, productType, productLine)

Because you know that your application frequently makes queries using the productName and productType columns, specifying those fields as shard keys is appropriate. The shard key designation guarantees that all rows for these two columns are stored on the same shard. If these two fields are not shard keys, the most frequently queried columns could be stored on any shard. Then, locating all rows for both fields requires scanning all data storage, rather than one shard.

Shard keys designate storage on the same shard to facilitate efficient queries for key values. However, because you want your data be distributed across the shards for best performance, you must avoid shard keys that have few unique values.

Note:

If you do not designate shard keys when creating a table, Oracle NoSQL Database Cloud Service uses the primary keys for shard organization.

Important factors to consider when choosing a shard key

  • Cardinality: Low cardinality fields, such as a user home country, group records together on a few shards. In turn, those shards require frequent data rebalancing, increasing the likelihood of hot shard issues. Instead, each shard key should have high cardinality, where the shard key can express an even slice of records in the data set. For example, identity numbers such as customerID, userID, or productID are good candidates for a shard key.

  • Atomicity: Only objects that share the shard key can participate in a transaction. If you require ACID transactions that span multiple records, choose a shard key that lets you meet that requirement.

What are the best practices to follow?

  • Uniform distribution of shard keys: When shard keys are uniformly distributed, no single shard limits the capacity of the system.

  • Query Isolation: Queries should be targeted to a specific shard to maximize efficiency and performance. If queries are not isolated to a single shard, the query is applied to all shards which is less efficient and increases query latency.

See Creating Tables and Indexes to learn how to assign primary and shard keys using the TableRequest object.

Time to Live

Learn how to specify expiration times for tables and rows using the Time-to-Live (TTL) feature.

Many applications handle data that has a limited useful lifetime. Time-to-Live (TTL) is a mechanism that allows you to set a time frame on table rows, after which the rows expire automatically, and are no longer available. It is the amount of time data is allowed to remain in the Oracle NoSQL Database Cloud Service. Data that reaches expiration time can no longer be retrieved, and does not appear in any storage statistics.

By default, every table that you create has a TTL value of zero, indicating no expiration time. You can declare a TTL value when you create a table, specifying the TTL with a number, followed by either HOURS or DAYS. Table rows inherit the TTL value of the table in which they reside, unless you explicitly set a TTL value for table rows. Setting a row's TTL value overrides the table's TTL value. If you change the table's TTL value after the row has a TTL value, the row's TTL value persists.

You can update the TTL value for a table row at any time before the row reaches the expiration time. Expired data can no longer be accessed. Therefore, using TTL values is more efficient than manually deleting rows, because the overhead of writing a database log entry for the data deletion is avoided. Expired data is purged from the disk after expiration date.

Table States and Life Cycles

Learn about the different table states and their significance (table life cycle process).

Each table passes through a series of different states from table creation to deletion (drop). For example, a table in the DROPPING state cannot proceed to the ACTIVE state, while a table in the ACTIVE state can change to the UPDATING state. You can track the different table states by monitoring the table life cycle. This section describes the various table states.

Description of table-state.png follows
Description of the illustration table-state.png

Table State Description

CREATING

The table is in the process of being created. It is not ready to use.

UPDATING

Updating the table is in process. Further table modifications are not possible while the table is in this state.

A table is in the UPDATING state when:

  • The table limits are being changed
  • The table schema is evolving
  • Adding or dropping a table index

ACTIVE

The table can be used in the current state. The table may have been recently created, or modified, but the table state is now stable.

DROPPING

The table is being dropped and cannot be accessed for any purpose.

DROPPED

The table has been dropped and no longer exists for read, write, or query activities.

Note:

Once dropped, a table with the same name can be created again.