Chapter 3. Record Design Considerations

Table of Contents

Keys
What is a Key Component?
Values

Oracle NoSQL Database KVStores offer storage of key-value pairs. Each such pair can be thought of as a single record in a database, where the key is used to locate the value. Both the key and the value are application-defined, given some loose restrictions imposed by Oracle NoSQL Database.

Every key in the KVStore is a list of strings. All keys must have one or more major components. Keys can also optionally have one or more minor components.

The value portion of the record can be simply a byte array, or it can use Avro to identify its schema. (See Avro Schemas for more information.) The value portion can be as simple or complex as you want it to be.

Note

Avro is deprecated. If you want a fixed schema to define the value portion of a record, it is better to use the Table API. That API offers advantages that the Key/Value API with Avro does not — such as secondary indexes.

As a very simple example, suppose you wanted your store to contain information about people. You might then decide to do this:

This is a very simple example of what you might choose to store in Oracle NoSQL Database. However, from a performance point of view, this example might not be the best way for you to organize your data. How you design both your keys and your values can have important performance implications.

The remainder of this chapter describes the performance issues surrounding Oracle NoSQL Database schema design.

Keys

Oracle NoSQL Database organizes records using keys. All records have one or more major key components and, optionally, one or more minor key components. If minor key components are in use, the combination of the major and minor components uniquely identifies a single record in the store.

Keys are spread evenly using a hash across partitions based on the key's major component(s). Every key must have at least one major component, but you can optionally use a list of major components. This means that records that share the same combination of major key components are guaranteed to be in the same partition, which means they can be efficiently queried. In addition, records with identical major key components can be operated upon using multiple operations but under a single atomic operation.

Remember that major key components are used to identify which partition contains a record, and that every partition is stored in a single shard. This means that major key components are used to identify which shard stores a given record. The combination of the major key components, plus the data access operation that you want performed is used to identify which node within the shard will service the request. Be aware that you cannot control which physical machine, or even which shard, will be used to store any given piece of data. That is all decided for you by the KV driver.

However, the fact that records are placed on the same physical node based on their major key components means that keys which share major key components can be queried efficiently in a single operation. This is because, conceptually, you are operating on a single physical database when you operate on keys stored together in a single partition. (In reality, a single shard uses multiple physical databases, but that level of complexity is hidden from you when interacting with the store.)

Remember that every partition is placed in a single shard, and that your store will have multiple shards. This is good, because it improves both read and write throughput performance. But in order to take full advantage of that performance enhancement, you need at least as many different major key components as you have partitions. In other words, do not create all your records under a single major key component, or even under a small number of major key components, because doing so will create performance bottle necks as the number of records in your store grow large.

Minor key components also offer performance improvements if used correctly, but in order to understand how you need to understand performance issues surrounding the value portion of your records. We will discuss those issues a little later in this chapter.

What is a Key Component?

A key component is a Java String. Issues of comparison can be answered by examining how Java Strings are compared using your preferred encoding.

Because it is a String, a key component can be anything you want it to be. Typically, some naming scheme is adopted for the application so as to logically organize records.

It helps to think of key components as being locations in a file system path. You can write out a record's components as if they were a file system path delimited by a forward slash ("/"). For example, suppose you used multiple major components to identify a record, and one such record using the following major components: "Smith", and "Bob." Another record might use "Smith" and "Patricia". And a third might use "Wong", and "Bill". Then the major components for those records could be written as:

/Smith/Bob
/Smith/Patricia
/Wong/Bill 

Further, suppose you had different kinds of information about each user that you want to store. Then the different types of information could be further identified using minor components such as "birthdate", "image", "phonenumber", "userID", and so forth. The minor portion of a key component is separated by the major components by a special slash-hyphen-slash delimiter (/-/).

By separating keys into major and minor key components, we could potentially store and operate upon the following records. Those that share a common major component can be operated upon in a single atomic operation:

/Smith/Bob/-/birthdate
/Smith/Bob/-/phonenumber
/Smith/Bob/-/image
/Smith/Bob/-/userID 
/Smith/Patricia/-/birthdate
/Smith/Patricia/-/phonenumber
/Smith/Patricia/-/image
/Smith/Patricia/-/userID 
/Wong/Bill/-/birthdate
/Wong/Bill/-/phonenumber
/Wong/Bill/-/image
/Wong/Bill/-/userID

Note that the above keys might not represent the most efficient way to organize your data. We discuss this issue in the next section.