Tuesday, November 15, 2005

Design choices - flexibility versus performance part 2

When designing the interface for the BTree module, there are some conflicting priorities.

An important goal is to ensure that the interface is reusable and makes as little assumptions about the content of "keys" and "values" as possible. Some of the points to consider are:

a) Should the BTree module be aware of the type system available in the system?
b) Should it be aware of how multiple attribute keys are handled?

I chose to keep the level of abstraction higher. Therefore in my implementation of the BTree module, the module has no knowledge of what is contained in the key, whether it is made up of single or multiple attributes, etc. The advantage is that a key can be anything that satisfies a relatively simple interface defined by the system. The disadvantages are:

1) The system cannot perform specific performance optimisations that would be dependent upon the knowledge of the attribute structure and type system. For example, the system cannot perform compression on the key structure.
2) The system cannot support search operators at an attribute level. When searching and fetching keys and values, only one composite operator is supported - >=. That is, the fetch algorithm knows how to obtain the next key that is equal or greater than the specified key.

On the whole, I am happy with the interface as it provides greater reusability. It allows the BTree module to be independently useful.

A problem I have been grappling with recently is how to support data driven construction of keys, without breaking the high level interface defined so far. In a DBMS system, the definition of the keys is stored in System Catalogs. There has to be a way by which the key definition can be used to generate keys. In the current design, the BTree module expects to be given a factory class for instantiating keys. I thought of creating a Key factory that would read a key definition and then generate keys based upon the definition. However, the problem is that such a key factory would be capable of generating many different types of keys, whereas the current BTree interface expects a one to one relationship between the key factory and the type of key. I have come to the conclusion that I need to enhance the system in two ways:

Firstly, I need to support one to many relationship between the key factory and the key. Since the BTree instance is tied to a specific type of key, it therefore needs to be enhanced to supply an "id" that enables the key factory to generate the specific type of key required by the BTree instance. This means that instead of invoking:

key = keyFactory.getNewInstance()

the BTree instance would invoke:

key = keyFactory.getNewInstance(myid)

The key factory would be able to use the "id" to determine the type of key required.

The second enhancement I need is to do with performance. At present, the Object Registry always generates new instances of objects, which would be inefficient for a key factory that needs to maintain its own internal registry. I therefore need to enhance the registry to support Singletons - key factories need to be cached in the registry so that the same factory can be used by multiple BTree instances.

My final point about the tradeoffs between flexibility and performance is to do with storage structure of keys and log records. I have tried to make the storage structures self describing. This means that the keys and values, as well as the log records, must persist sufficient information to be able to reconstruct their state when required. In a multi-attribute key, for example, this means that each attribute must store its type information, data length, etc. along with the data. This naturally increases the storage space requirement of entities. The benefit is that the system does not require external support to determine how to read data back from persistent storage. For example, the BTree module does not require the System Catalalog to be available, it has sufficient information to be able to read and write keys to persistent storage.

Thursday, November 10, 2005


In a major refactoring exercise I have split the SimpleDBM modules into two higher level packages. The Latch Manager, the Object Registry and the Util packages all go under the package org.simpledbm.common. The rest of the packages go under org.simpledbm.rss. I decided to use the acronym RSS for the low level data management API in honour of System R. System R called its low level API Research Storage System or RSS in short. See the paper Morton M. Astrahan, Mike W. Blasgen, Donald D. Chamberlin, Kapali P. Eswaran, Jim Gray, Patricia P. Griffiths, W. Frank King III, Raymond A. Lorie, Paul R. McJones, James W. Mehl, Gianfranco R. Putzolu, Irving L. Traiger, Bradford W. Wade, Vera Watson: System R: Relational Approach to Database Management. ACM Trans. Database Syst. 1(2): 97-137(1976).

Of course, all this refactoring has left CVS in a mess, as now there are many empty subdirectories under src/org/simpledbm.

I have also been working on a Developer's Reference Manual. This will hopefully contain sufficient information for interested hackers and database technology enthusiasts to play around with various modules. You can access the document here. I would welcome any feedback.

Finally, the SimpleDBM project has now graduated out of the incubator at www.java.net. This should lead to greater exposure and interest in the project.

Monday, November 07, 2005

BTree Implementation Progress Report Part 4

BTree scans are now available as well. This means that the BTree implementation is feature complete, although, there are still some areas that need more work.

For the rest of November, I am going to concentrate on improving the code, refactoring bits that I don't like, updating documentation, and generally cleaning up the code. All this in preparation for a release of the code towards the end of the month.

From December onwards, work can start on the remaining bits in the Data Manager layer of SimpleDBM, i.e, tables. I have not yet decided whether knowledge about data types should be part of this layer or whether it is best left to a higher-level layer. I am tempted to keep the data layer as low level as possible; this will make it more reusable.

Wednesday, November 02, 2005

BTree Implementation Progress Report Part 3

Both insert key and delete key operations are now available. More test cases are being written; in order to test lock conflicts when the same key is concurrently inserted/deleted by different transactions, the new test cases have to use multiple threads. It is harder to debug such test cases, but Eclipse makes it easy. I simply run the JUnit tests in debug mode and set break points at appropriate places.

By end of this week I shall be working on the BTree scan operations. There aren't many research publications that deal with the inner workings of an index scan; the only paper that seems to discuss a real implementation is the paper by Mohan, C. An Efficient Method for Performing Record Deletions and Updates Using Index Scans, Proc. 28th International Conference on Very Large Databases, Hong Kong, August 2002. The paper discusses many of the performance issues with implementing scans. My implementation will suffer from the problems Mohan describes, as I want to do a basic implementation first before looking at optimisations.

To ensure that the BTree implementation is tested thoroughly, I have started to use the code coverage tool Clover. The vendor has graciously provided me a free license to use this tool. Using this tool I am able to determine which code paths have not been tested, and then write test cases to exercise those.