Iterating over Table Rows

Iterating over Table Rows
Prev	Chapter 6. Reading Table Rows	Next

Store.table_iterator() provides non-atomic table iteration.

Store.table_iterator() does not return the entire set of rows all at once. Instead, it batches the fetching of rows in the iterator, to minimize the number of network round trips, while not monopolizing the available bandwidth. Also, the rows returned by this method are in unsorted order.

Note that this method does not result in a single atomic operation. Because the retrieval is batched, the return set can change over the course of the entire retrieval operation. As a result, you lose the atomicity of the operation when you use this method.

This method provides for an unsorted traversal of rows in your table. If you do not provide a key, then this method will iterate over all of the table's rows.

When using this method, you can optionally specify:

A MultiRowOptions object that lets you specify:
- A FieldRange object, which defines a range of values to be retrieved for the specified key.
- A list of parent and ancestor tables to include in the iteration.
A TableIteratorOptions object, which allows you to specify an iteration direction, the maximum number of results to return for each retrieval batch, and a ReadOptions class. This class allows you specify a consistency policy for the operation, as well as an upper bound on the amount of time that the operation is allowed to take. Consistency policies are described in Consistency Guarantees.

For example, suppose you have a table that stores information about products, which is designed like this:

CREATE TABLE myTable (
    itemType STRING,
    itemCategory STRING,
    itemClass STRING,
    itemColor STRING,
    itemSize STRING,
    price FLOAT,
    inventoryCount INTEGER,
    PRIMARY KEY (SHARD(itemType, itemCategory, itemClass), itemColor,
    itemSize)
)

With tables containing data like this:

Row 1:

itemType: Hats

itemCategory: baseball

itemClass: longbill

itemColor: red

itemSize: small

price: 12.07

inventoryCount: 127

Row 2:

itemType: Hats

itemCategory: baseball

itemClass: longbill

itemColor: red

itemSize: medium

price: 13.07

inventoryCount: 201

Row 3:

itemType: Hats

itemCategory: baseball

itemClass: longbill

itemColor: red

itemSize: large

price: 14.07

inventoryCount: 39

Row n:

itemType: Coats

itemCategory: Casual

itemClass: Winter

itemColor: red

itemSize: large

price: 247.99

inventoryCount: 9

Then in the simplest case, you can retrieve all of the rows related to 'Hats' using Store.table_iterator() as follows. Note that this simple example can also be accomplished using the Store.multi_get() method. If you have a complete shard key, and if the entire results set will fit in memory, then multi_get() will perform much better than table_iterator(). However, if the results set cannot fit entirely in memory, or if you do not have a complete shard key, then table_iterator() is the better choice. Note that reads performed using table_iterator() are non-atomic, which may have ramifications if you are performing a long-running iteration over records that are being updated.

def display_row(row):
    try:
            print "Retrieved row:"
            print "\tType: %s" % row['itemType']
            print "\tCategory: %s" % row['itemCategory']
            print "\tClass: %s" % row['itemClass']
            print "\tColor: %s" % row['itemColor']
            print "\tSize: %s" % row['itemSize']
            print "\tPrice: %s" % row['price']
            print "\tInventory Count: %s" % row['inventoryCount']
            print "\n"
    except KeyError, ke:
        logging.error("Row display failed. Bad key: %s" % ke.message)


def do_store_ops(store):

    key_d = {'itemType' : 'Hats'}

    try:
        row_list = store.table_iterator("myTable", key_d, False)
        if not row_list:
            logging.debug("Table retrieval failed")
        else:
            logging.debug("Table retrieval succeeded.")
            for r in row_list:
                display_row(r)
    except IllegalArgumentException, iae:
        logging.error("Table retrieval failed.")
        logging.error(iae.message)

Prev	Up	Next
Using multi_get()	Home	Specifying Field Ranges