Runtime Analysis

Default Program Run
Varying the Node Size
Using Wholedoc Storage
Using Read Committed Isolation
Read Committed with Wholedoc Storage

The examples presented in this chapter allow you to manipulate certain runtime characteristics that will affect the number of deadlocks the program will encounter. You can modify:

The point of the application is to measure the number of deadlocks encountered for a given program run. By counting the number of deadlocks, we can get a sense of the overall amount of lock contention occurring in our application. Remember that deadlocks represent a race condition that your application lost. In order to occur, two more more threads had to have attempted to lock database pages in such a way that the threads blocked waiting for locks that will never be released (see Locks, Blocks, and Deadlocks for a more complete description). So by examining the number of deadlocks that we see, we can indirectly get a sense for the amount of lock contention that the application encountered. Roughly speaking, the more deadlocks seen, the more lock contention that was going on during the application run.

Note that as you modify these constraints, you will see that the program will encounter differing numbers of deadlocks per program run. No two program runs will indicate the same number of deadlocks, but changing these constraint can on average increase or decrease the number of deadlocks reported by the application.

The reason why this application sees deadlocks is because of what BDB XML does under the hood. Recall that BDB XML writes XML documents to underlying Berkeley DB databases. Also, recall the Berkeley DB databases are usually organized in pages; multiple database entries will exist on any given page. Also, Berkeley DB uses page-level locking. The result is that multiple XML documents (or portions of XML documents) can and will be stored on the same database page. When multiple threads attempt to lock a database page, you get lock contention. When multiple database pages are in use and they are locked out of order by the threads of control, you can see deadlocks.

Therefore, the things that will immediately affect the amount of lock contention our application will encounter are:

Default Program Run

By default, the program makes the following choices:

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory
Number of threads:              5
Number of doc nodes:            1
Using node storage:             true
Using read committed:           false 

This represents a worse-case situation for the application in all ways but one; it uses small documents that are just one node in size. Running the example three times in a row results in 370, 317, and 382 reported deadlocks for an an average of 356.333 deadlocks. Note that your own test results will likely differ depending on the number and speed of your CPUs and the speed of your hard drive. For the record, these test results were taken using a single CPU PowerPC G3 system with a slow (4200 RPM) laptop hard drive.

Varying the Node Size

With a default node size of 1, we saw an average of 356.333 reported deadlocks over three runs of the program. Now lets try increasing the size of the document to see what it does to the number of reported deadlocks:

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory -n 10
Number of threads:              5
Number of doc nodes:            10
Using node storage:             true
Using read committed:           false 

This results in 894, 854, and 861 deadlocks for an average of 869.667 reported deadlocks. Clearly the amount of lock contention that we are seeing has increased, but why?

Remember that larger documents should fill up database pages, which should result in less lock contention because there are fewer lockers per database page. However, we are using node storage which means that the additional nodes results in additional small database entries. Given the way our application is writing documents, adding 9 additional nodes per document simply increases the chance of even more documents placing nodes on any given page.

Notice that there is a limit to the amount of lock contention that this application will see by simply adding nodes to the documents it creates. For example, suppose we created documents with 100 nodes:

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory -n 100
Number of threads:              5
Number of doc nodes:            100
Using node storage:             true
Using read committed:           false 

In this case, we see an average of 316 deadlocks — less, even, than the single node case. Why? First, the documents are now very large so there is a good chance that each document is filling up entire pages, even though we are still using node-level storage. In addition, each thread is now busy creating documents and then writing them to the containers, where they are being deconstructed into individual nodes. All of this is CPU-intensive activity that is likely helping to prevent lock contention because each thread is spending more time on document handling than it does with the smaller document sizes.

Using Wholedoc Storage

In the previous section we saw that specifying a document node size of 10 resulted in an average of 869.667 deadlocks across three program runs. This indicates a fairly high level of lock contention. It also indicates that the program is not operating particularly efficiently.

One way we could improve the write throughput for our application is to use whole document storage instead of node-level storage. This will result in fewer, but larger, database entries. The result should be fewer threads of control fighting for locks on a given page because fewer individual documents will be held on any given page.

Specifying a node size of 10 with whole document storage:

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory -n 10 -w
Number of threads:              5
Number of doc nodes:            10
Using node storage:             false
Using read committed:           false 

gives us an average deadlock count of 556 across three program runs. That's certainly a significant improvement over node-level storage, although for many workloads you will pay for it in terms of the higher query times that wholedoc storage will cost you.

Using Read Committed Isolation

Another way we can modestly improve our write performance is by using read committed isolation. This causes our transactions to release write locks immediately, instead of waiting until the transaction is resolved. Using read committed isolation does not gives us the dramatic write performance that does using wholedoc storage (see the previous section) but it is still an improvement.

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory -n 10 -2
Number of threads:              5
Number of doc nodes:            10
Using node storage:             true
Using read committed:           true 

The average number of deadlocks seen across three runs with these settings is 724, down from 869.667. This is a modest improvement to be sure, but then you do not have to pay the query penalty that wholedoc containers might cost you.

Read Committed with Wholedoc Storage

Finally, the best improvement we can hope to see for this application, using 10 node documents and 5 writer threads, is to use read committed isolation to write to whole document containers.

> java dbxml.txn.TxnGuide -h myEnvironmentDirectory -n 10 -w -2
Number of threads:              5
Number of doc nodes:            10
Using node storage:             true
Using read committed:           true 

For three runs of the program with these settings, we observe 228.333 deadlocks — a remarkable improvement over the worst-case 869.667 that we saw for 10 nodes, 5 writer threads!