Store.table_iterator()
provides non-atomic table iteration.
Store.table_iterator()
does not return the entire set of rows all at once. Instead, it
batches the fetching of rows in the iterator, to minimize the
number of network round trips, while not monopolizing the
available bandwidth. Also, the rows returned by this method are
in unsorted order.
Note that this method does not result in a single atomic operation. Because the retrieval is batched, the return set can change over the course of the entire retrieval operation. As a result, you lose the atomicity of the operation when you use this method.
This method provides for an unsorted traversal of rows in your table. If you do not provide a key, then this method will iterate over all of the table's rows.
When using this method, you can optionally specify:
A MultiRowOptions
object that
lets you specify:
A FieldRange
object, which
defines a range of values to be retrieved for the
specified key.
A list of parent and ancestor tables to include in the iteration.
A TableIteratorOptions
object,
which allows you to specify an iteration direction, the
maximum number of results to return for each retrieval
batch, and a ReadOptions
class.
This class allows you specify a consistency policy for
the operation, as well as an upper bound on the amount
of time that the operation is allowed to take.
Consistency policies are described in
Consistency Guarantees.
For example, suppose you have a table that stores information about products, which is designed like this:
CREATE TABLE myTable ( itemType STRING, itemCategory STRING, itemClass STRING, itemColor STRING, itemSize STRING, price FLOAT, inventoryCount INTEGER, PRIMARY KEY (SHARD(itemType, itemCategory, itemClass), itemColor, itemSize) )
With tables containing data like this:
Row 1:
itemType: Hats |
itemCategory: baseball |
itemClass: longbill |
itemColor: red |
itemSize: small |
price: 12.07 |
inventoryCount: 127 |
Row 2:
itemType: Hats |
itemCategory: baseball |
itemClass: longbill |
itemColor: red |
itemSize: medium |
price: 13.07 |
inventoryCount: 201 |
Row 3:
itemType: Hats |
itemCategory: baseball |
itemClass: longbill |
itemColor: red |
itemSize: large |
price: 14.07 |
inventoryCount: 39 |
Row n:
itemType: Coats |
itemCategory: Casual |
itemClass: Winter |
itemColor: red |
itemSize: large |
price: 247.99 |
inventoryCount: 9 |
Then in the simplest case, you can retrieve all of the rows
related to 'Hats' using
Store.table_iterator()
as follows. Note that this simple example can also be accomplished using
the
Store.multi_get()
method. If you have a complete shard key, and if the entire
results set will fit in memory, then
multi_get()
will perform much better than
table_iterator()
.
However, if the results set cannot fit entirely in memory, or
if you do not have a complete shard key, then
table_iterator()
is the better choice. Note that reads performed using
table_iterator()
are non-atomic, which may have ramifications if you are
performing a long-running iteration over records that are being
updated.
def display_row(row): try: print "Retrieved row:" print "\tType: %s" % row['itemType'] print "\tCategory: %s" % row['itemCategory'] print "\tClass: %s" % row['itemClass'] print "\tColor: %s" % row['itemColor'] print "\tSize: %s" % row['itemSize'] print "\tPrice: %s" % row['price'] print "\tInventory Count: %s" % row['inventoryCount'] print "\n" except KeyError, ke: logging.error("Row display failed. Bad key: %s" % ke.message) def do_store_ops(store): key_d = {'itemType' : 'Hats'} try: row_list = store.table_iterator("myTable", key_d, False) if not row_list: logging.debug("Table retrieval failed") else: logging.debug("Table retrieval succeeded.") for r in row_list: display_row(r) except IllegalArgumentException, iae: logging.error("Table retrieval failed.") logging.error(iae.message)