Table of Contents
Primary keys and shard keys are important concepts for your table design. What you use for primary and shard keys has implications in terms of your ability to read multiple rows at a time. But beyond that, your key design has important performance implications.
Every table must have one or more fields designated as the primary key. This designation occurs at the time that the table is created, and cannot be changed after the fact. A table's primary key uniquely identifies every row in the table. In the simplest case, it is used to retrieve a specific row so that it can be examined and/or modified.
For example, a table might have five fields:
productName
, productType
,
color
, size
, and
inventoryCount
. To retrieve individual rows
from the table, it might be enough to just know the product's
name. In this case, you would set the primary key field as
productName
and then retrieve rows based on
the product name that you want to examine/manipulate.
In this case, the CLI script that you would use to create this table might be:
## Enter into table creation mode table create -name myProducts ## Now add the fields add-field -type STRING -name productName add-field -type STRING -name productType add-field -type ENUM -name color -enum-values blue,green,red add-field -type ENUM -name size -enum-values small,medium,blue add-field -type INTEGER -name inventoryCount ## A primary key must be defined for every table ## Here, we will define field 'productName' as the primary key. primary-key -field productName ## Exit table creation mode exit ## Add the table to the store. Use the -wait flag to ## force the script to wait for the plan to complete ## before doing anything else. plan add-table -name myProducts -wait
However, you can use multiple fields for your primary keys. On a functional level, doing this allows you to delete multiple rows in your table in a single atomic operation. In addition, multiple primary keys allows you to retrieve a subset of the rows in your table in a single atomic operation.
We describe how to retrieve multiple rows from your table in Reading Table Rows. We show how to delete multiple rows at a time in Using multiDelete().
Fields can be designated as primary keys only if they are declared to be one of the following types:
Integer
Long
Float
Double
String
Enum
Some of the methods you use to perform multi-row operations allow, or even require, a partial primary key. A partial primary key is, simply, a key where only some of the fields comprising the row's primary key are specified.
For example, the following example specifies three fields for the table's primary key:
## Enter into table creation mode table create -name myProducts ## Now add the fields add-field -type STRING -name productName add-field -type STRING -name productType add-field -type STRING -name productClass add-field -type ENUM -name color -enum-values blue,green,red add-field -type ENUM -name size -enum-values small,medium,large add-field -type INTEGER -name inventoryCount ## A primary key must be defined for every table primary-key -field productName -field productType -field productClass ## Exit table creation mode exit ## Add the table to the store. Use the -wait flag to ## force the script to wait for the plan to complete ## before doing anything else. plan add-table -name myProducts -wait
In this case, a full primary key would be one where you
provide value for all three primary key fields:
productName
, productType
,
and productClass
. A partial primary key
would be one where you provide values for only one or two
of those fields.
Note that order matters when specifying a partial key. The partial key must be a subset of the full key, starting with the first field specified and then adding fields in order. So the following partial keys are valid:
productName
|
productName , productType |
But a partial key comprised of productType
and productClass
is not.
Shard keys identify which primary key fields are meaningful in terms of shard storage. That is, rows which contain the same values for all the shard key fields are guaranteed to be stored on the same shard. This matters for some operations that promise atomicity of the results. (See Executing a Sequence of Operations for more information.)
For example, suppose you set the following primary keys:
primary-key -field productType -field productName -field productClass
You can guarantee that rows are placed on the same shard
using the values set for the productType
and productName
fields like this:
shard-key -field productType -field productName
Shard key fields must be a first-to-last subset of the primary key fields, and they must be specified in the same order as were the primary key fields. In the previous example, the following would result in an error:
shard-key -field productClass