Friday, October 28, 2005

BTree Implementation Progress Report Part 2

Currently I am busy testing the code for inserting keys into the BTree. My testing methodology is:

  1. Write a JUnit test case.
  2. Debug the code, stepping through the entire test.
  3. Fix any bugs and retest until test runs okay.

Out of the 16 test cases written so far, only 4 are on insert - with many more to come.

To facilitate testing I need to construct trees with specific properties. The way I have chosen to do this is to specify the data in an XML file and then load it to create the desired tree. Just goes to show the amount of effort required to construct proper test cases. My current estimate is that I won't have a fully tested BTree implementation until end of November.

Thursday, October 13, 2005

BTree Implementation Progress Report

I have been working hard on getting the BTree implementation completed. It is about 75% complete, however, delivery will take longer as I need to ensure that adequate test cases are available to verify the correctness of the algorithms. Already, this is probably the largest module. It will also be one of the most complex.

As mentioned before, I am implementing the B-link tree algorithm as published in the paper Ibrahim Jaluta, Seppo Sippu and Eljas Soisalon-Soininen. Concurrency control and recovery for balanced B-link trees. The VLDB Journal, Volume 14, Issue 2 (April 2005), Pages: 257 - 277, ISSN:1066-8888. I think this will probably be one of the few BTree implementations that fully supports merging of pages when keys are deleted.

I have written a short paper discussing some space management issues in the implementation. The paper can be found here.

Sunday, October 02, 2005

Database Technology camps

Mainstream Relational Database technology is split into two camps.

On the one hand are the systems that are directly or indirectly based upon techniques and algorithms derived from IBM's System R project. Most published research also falls into this group.

In the second camp are systems that have been inspired by Oracle. Oracle has a very interesting history. Although Oracle started with the specifications of System R, it actually did not use any of the System R ideas. A good summary of how Oracle evolved can be found here.

SimpleDBM is based upon the System R heritage. The main reason for this is that I do not work in a company that makes DBMSes. Hence I do not have access to information other than published research. Since Oracle does not publish details of its algorithms, it would be much harder for me to implement a system based upon Oracle. I have, however, given much thought to some of the techniques used in Oracle, and hope to discuss some of these in future blogs and papers.