Saturday, August 23, 2008

A History of SimpleDBM

The SimpleDBM project started as the DM1 project in August 2001. Initially, I started writing the database code in C. A year later, I ported the code to C++. Here is what I wrote at the time about my switch to C++:
I decided to port the product to C++ primarily for following reasons:
  • I wanted to use the automatic construction and destruction facility in C++. I think this is one of the coolest features of C++. As an aside, I like Java's finally solution better than C++ destructors because with finally, I am able to control the destruction of objects better.

  • I was apprehensive about using C++, because I find that C++ is a large and complicated language. In order to keep the DM1 Threads code simple, I made some decisions that are sure to be controversial:

  • I do not use the C++ standard library, including the STL, at all. This is because I do not like the code bloat that results from its use. I also do not like the idea of using the heap for creating simple objects like strings. I think that C programs are generally more efficient than C++ programs, because they are much more circumspect about using the heap. One advantage of not using the C++ standard library is that it makes DM1 Threads more portable.

  • I do not use templates other than in a very small way.

  • I use sparingly or not at all, certain features of C++ language, such as multiple inheritance, operator overloading, new style typecast operators, etc. This is because I feel that none of these features are essential (Java does without them nicely).

  • I think that Java is a better language than C++, but the lack of certain features (efficient array handling, pointers, etc.) makes it unsuitable for a project like DM1. In many ways, I use C++ as a better Java. C# would have been a good choice, but due to its lack of availability on UNIX platforms, I was unable to use it.
Progress over the next few years was slow. This was partly because there were long periods when I wrote nothing, and partly because I had to do research for the project. I am self taught programmer with no formal degree in Computer Science. I have also never worked for a database vendor. Implementing a database has been an ambition for a long time. I had already implemented single user persistence storage system in C++, but this was nowhere near a real database system with transactions.

I was helped tremendously by reading code of other Open Source systems such as Shore and PostgreSQL. Another source of information was the research published by ACM and also by IBM researcher C. Mohan. I became a member of ACM in 2001 just so that I could access all the research.

There weren't and still aren't any books that go into the nitty-gritty of implementing relational databases. The only book that came remotely close to discussing some of the details of a database engine was the book Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter. This was a life saver.

By 2005 I had become very frustrated with C++. I found that the IDEs available for C++ were no match for the Java IDEs. Even simple refactoring of code was pretty onerous.

Then IBM announced in 2004 that they were open sourcing Cloudscape. I was very excited because here was a database that was written in Java, and that performed quite well. Another key event was the release of Java 1.5 (Java 5.0). Finally there was a version of Java that had the locking primitives that I needed for my database project. I procrastinated for a while because I did not fancy porting all the code I had written in C++ to Java.

Fortunately (for the project), the company I was working for at the time, got taken over. There was a period of inactivity as we awaited the fate of our IT department. Having not much to do at work meant that I had spare time and was able to use this time to port the code to Java. This was in late 2005, which was the most fruitful six months I spent on the project.

I completed most of the core modules of SimpleDBM in the six months between July and Dec 2005. Since then I have spent more time testing and refactoring, and added the TypeSystem and Database API modules. After I changed my job in 2006, progress has been sporadic. I had a period of year and half when I was so busy at work that I was taking work home every day. I was unable to spend time on SimpleDBM at all.

Early in 2008, I was approached by an ex-database enterpreneur who suggested that I should do some work on Apache Derby. I thought of ditching SimpleDBM to work on Derby. Unfortunately, this relationship did not work out. Now I think that if I am working on my own free time, then it is more fulfilling to work on SimpleDBM, as everything I create is my own. I have considerable freedom, and the ability to work on areas that interest me. I would very much like to work on Apache Derby but cannot afford to do so unless it is a paid job.