Sunday, April 25, 2010

Proposed license boilerplate

Given below is the boilerplate license notice that will be add to SimpleDBM source files from version 2 onwards. This is based upon the boilerplate used by Mozilla.org. Note that I decided to add LGPL to the mix as well, so that SimpleDBM V2 will be triple licensed. Hopefully that will ensure compatibility with the vast majority of Open Source licenses.

/**
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER.
 *
 * Contributor(s):
 *
 * The Original Software is SimpleDBM (www.simpledbm.org).
 * The Initial Developer of the Original Software is Dibyendu Majumdar.
 *
 * Portions Copyright 2005-2010 Dibyendu Majumdar. All Rights Reserved.
 *
 * The contents of this file are subject to the terms of the
 * Apache License Version 2 (the "APL"). You may not use this
 * file except in compliance with the License. A copy of the
 * APL may be obtained from:
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the APL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the APL, the GPL or the LGPL.
 *
 * Copies of GPL and LGPL may be obtained from:
 * http://www.gnu.org/licenses/license-list.html
 */

Monday, April 19, 2010

Multi-licensing

The next version of SimpleDBM will be available under the GPLv2 as now, as well as the Apache License. Dual licensing will allow people to use SimpleDBM in more flexible ways.

Sunday, April 18, 2010

Network client server API released

I am pleased to finally publish 1.0.18-ALPHA release of SimpleDBM. This release has following changes:
  • A network client server implementation that allows SimpleDBM to run as a standalone database server to which clients can connect remotely.
  • A sample application that demonstrates the use of the network API. The sample implements a simple discussion forum; front end has been created using Google Web Toolkit.
The version 1.x codebase is now going into maintenance phase, as I am not going to add any new features to this version. I will start work on version 2.x which will allow me to refactor some of the modules as  previously blogged.

Licensing revisited

In a previous post I wrote about why I preferred GPL license for SimpleDBM. But I am no longer sure; my intention was always to ensure that SimpleDBM can be used by anyone without worrying about licensing issues, and I have no desire to put restrictions on other people's work. So if someone enhanced SimpleDBM, they should be free to do whatever they like with their enhancement, and although it would be nice if they contributed back, I don't insist on it. But this philosophy is very different to the GPL, which asserts that any enhancements should also be GPL.

I also did not fully understand the restrictions that GPL poses on linking with another library. Users of SimpleDBM should definitely not have to change their license or adopt GPL just to be able to use SimpleDBM in their applications.

I am seriously considering changing the SimpleDBM license to some other; probably Apache Version 2.

Saturday, April 10, 2010

Roadmap

I have been thinking about how SimpleDBM should evolve. When I started the project my intention was to eventually add support for SQL, but now I can't see this happening in the near term. SimpleDBM is not aimed at competing with other SQL databases; SQL is nice because of the ease with which tables can be queries, joined etc., but implementing an SQL layer is quite a lot of work, which I am not able to put in right now.

Another subject that has interested me right from the beginning is multi-version concurrency. Unfortunately, I have not really found a way of implementing this which is satisfactory. The two main approaches are those taken by Oracle and PostgreSQL - which I have previously compared in a short paper. The Oracle approach is problematic because it requires page level redo/undo in the transaction system; SimpleDBM's BTree implementation uses logical undo, allowing for undo to be applied to a different page from the original. I do not like the PostgreSQL approach to MVCC either, as it does not support versioning in indexes.

Instead of adding large features such as above, I shall perhaps focus on the many smaller changes that I have been mulling over for some time now:
  • Refactor the code so that the modularity of SimpleDBM can be exploited better. This involves separately packaging the API from the implementation, and allowing implementations of individual modules to be easily swapped in.
  • Add statistics gathering so that useful metrics can be captured. Some work has been done in this area.
  • Improve performance and scalability of the lock manager.
  • Add support for sequences.
  • Add support for reverse indexes.
  • Add JMX monitoring capability.
  • Refactor the type system - and make the type system a first class component of the core engine. This needs a bit of explanation. When I first started SimpleDBM, my strategy was that the core database engine should be typeless, and should allow a type system to be plugged in. The core engine should treat records and keys as blobs of data, and not worry about their internal structure. This strategy allowed me to develop the core engine without first having to define a type system. However, it has meant that some things are less efficient - for instance, row updates cause the entire before and after images of the row to be logged. Another area of concern is the ability to compress data within pages, which is hard to do without some knowledge of the structure of the data inside the records.
  • Carry forward the work of making most types immutable, include row types. A row builder class can be provided to create rows, but once constructed the row should be immutable. 
  • Carry on improving the documentation.
  • Improve the test cases, and the test coverage.
  • Create a single threaded version which can run on small devices.
  • Add support for nested readonly transactions; these are useful for carrying out foreign key checks, should they be added in future.
  • Ensure the embedded and network API are interchangeable, and that clients can swap between the two without having to change any code. At present, the network API is completely separate from the embedded API.
  • Create a full blown sample application - work is ongoing to create this.
  • Try to raise awareness about SimpleDBM and build a community of users and developers.

    Tuesday, April 06, 2010

    Sample Network Application

    It is taking longer than I anticipated to create a sample application. The main hurdle has been mastering Google Web Toolkit enough to create the user interface. I am hacking from the sample mail application available in the GWT distribution; but the code is increasingly becoming very different.

    First, here is a screen shot from the web UI. Apologies for the rough edges; I am not a UI developer, building user interfaces is a chore to me.


    The basic UI is working - I have a stub server application waiting to be hooked up with the backend.

    The UI is built using the MVP paradigm, except that I don't use an EventBus, as I am the sole developer, and the added complexity of a bus, and associated event mechanisms is not warranted. I have a RequestProcessor class that handles the presentation logic.

    I have been thinking about how to create the primary key of some of the tables. I have settled for a special table that will hold sequences; each sequence has a name and a long value. As reverse indexes are not yet supported in SimpleDBM, I came up with the idea of a decreasing sequence so that as time goes by, by accessing the data in increasing sequence, I can ensure that newer data appears before older data. This goes to show that we can live with almost any limitation; a bit of thinking gives a solution to the problem!

    As sequences do not need to be rolled back ever, the sequence generator can execute its own small transaction whenever the sequence needs decrementing. To make things efficient, we can allocate chunks of sequences at a time, but for now, I will simply decrement one at a time.

    There is also nothing like really using a system to discover bugs. I found that the Long column type was missing functionality to set a Long value!