Wednesday, November 29, 2006

Why another database manager?

A friend asked me recently why I spent my time implementing a DBMS, when there are already a number of open source databases. It is a good question because clearly I am not breaking any new ground here. In fact, I find myself incapable of inventing great new technology. Most of what I am implementing is well known stuff. I am a software engineer, rather than a scientist, going by the definitions of engineers and scientists by C.A.R.Hoare. It seems that this project is pure self indulgence, when I could be spending my time more fruitfully, either contributing to projects like Apache Derby, or working on something more relevant.

I guess that if I am honest with myself, I have to admit that there is an element of self indulgence here. But there is also some utility. The knowledge I gain in the process is a benefit to me in my work life. But apart from that, I think that my DBMS implementation is better documented and easier to understand than other opensource implementations. This is for a couple of reasons:
  1. I use well established algorithms, which are well documented in computer science literature. I am also putting more effort into documentation than is typical of many opensource projects.
  2. The system is decomposed into well defined modules which are loosely coupled. I find most other implementations are far more integrated, and therefore difficult to understand. I have traded off performance and efficiency in favour of ease of understanding.
There are still no books that describe how to build a real DBMS, and also show you with real code how DBMS features such as transactions, locking, recovery, btrees, etc. work. The only book that comes close is Transaction Processing: Concepts and Techniques, but the sample code contained in this book is not complete. In some ways, my project provides a sample implementation of many of the techniques described in this book.

Finally, there is the question of pride. When I started this project many years ago in C++, I never thought I could do it. I had no training in this field, and no access to people who do this type of stuff at work. Having come this far, it seems a shame to give up.

3 comments:

Anonymous said...

Hi Dibyendu

Just to tell you, I have done the same in C++ many years ago (between 92 & 96) for building a rehosting product on Unix of the Bull mainframe TP monitor. (I could send you the slides if you want for an historical perspective)...

So I am very pleased to see you have redo your own C++ work in Java because I was doing the same also... and you totally rigth, it is a shame to give up having coming so far... may be we can share our forces some day to complete the big picture...

Regards

my sourceforge id: francisandre

Anonymous said...

How is this different from Oracle Berkeley DB Java Edition? Were you unaware of its existence? Was the license too restrictive for some reason (if so, how/why)? Was there some technical reason not to choose/use it? Was this purely a learning exercise? Did you study Berkeley DB or Berkeley DB Java Edition when you considered your implementation strategy/techniques?

I'm just curious to know how we could have found you and given you a "wheel" before you re-invented it. Unless it was purely for the joy of making your own wheel from scratch, which I can understand.

:-)

-greg

Greg Burd | Senior Product Manager | Oracle Berkeley DB

Dibyendu Majumdar said...

Hi Greg,

SimpleDBM and Oracle Berkeley DB Java Edition have a lot in common. I think that when I started porting SimpleDBM to Java in 2005, I wasn't aware of the Berkeley Java Edition.

I studied many open source databases, including Berkeley DB, PostgreSQL, MySQL, Apache Derby, Shore ... and for a time even considered abandoning SimpleDBM in favor of Apache Derby ... but having started it and invested time into it, it seemed a waste to let it go. Enjoyed the challenge as well, to be honest.

Regards