Wednesday, May 22, 2013

SimpleDBM now available in Maven Central

I am pleased to announce that I have finally managed to get SimpleDBM (v 1.0.22) uploaded to Maven Central.  This should make it much easier for people to use SimpleDBM in their projects.

I am working on fixing the build systems of the sample projects - the source repository has fixes for btreedemo and tupledemo samples.

Monday, July 09, 2012

Performance optimisations in typesystem

The new typesystem in v2 will only support following types:
  • bool 
  • int 
  • long
  • double
  • utf8 string
  • varbyte (raw)
The older version had types like BigInteger, BigDecimal, Date, VarChar.
The new typesystem attempts to be closer to native types, both for performance reasons and also so that ports to C like languages is easier.

Another major change is the way the row data is serialised. The older typesystem did not store the metadata associated with each column, therefore a separate data dictionary was required to deserialize the data.   The new typesystem encodes the types in the serialised format, therefore a row can be reconstructed from the serialised data without reference to a data dictionary. 

What is unchanged is the status byte per column. Previously this only stored the value type in the column, i.e., Null, PlusInfinity, MinusInfinity or Value. In the new version I am hoping to expand the use of the status byte to encode 3 things:
  • value type - only Null or Value, taking 1 bit
  • If int or long or double, then the number of bytes used to store the value - encoded in 3 bits
  • If bool then the bool value encoded in 1 bit (overlayed with above, 2 bits unused)
  • If utf-8 string or varbyte then 1 bit to encode if the data is zero length or not (overlayed with above, 2 bits unused)
  • The remaining 4 bits to encode the type of the data.
One of the performance killers in the old version is the complete deserialization of data whenever a row is read into memory. This is a killer as the overhead of parsing certain types such as Strings, BigInteger, or BigDecimal is huge. The new version will try to avoid parsing the data whenever possible.

We all know these days that immutable objects are good for multi-threaded applications as they allow us to share data without synchronisation. The old typesystem relies heavily on immutable objects, which is one reason for parsing the row immediately upon deserialization. The problem with lazy parsing is that state must be maintained, and fields initialised upon first access - this makes the row itself mutable, and hence thread unsafe. The solution I am adopting for this is to create separate types. The Row type is immutable, but cannot directly access the bytestream of serialised data. A new type called RowReader is designed to mirror the access methods of Row, but do this over the bytestream. This type is not thread-safe - the caller must ensure that the type is not shared between threads in an unprotected manner. We shall also have a RowBuilder for constructing Row objects incrementally; the RowBuilder is also not thread-safe, and access to it must not be shared across threads in an unprotected manner.

Sunday, January 30, 2011

Next version of SimpleDBM

After a break, I have resumed work on creating SimpleDBM V2. This version is mostly a refactoring of the existing codebase to ensure:
  1. Better project structure - break up the project into smaller modules.
  2. Simple IOC container for auto wiring the modules.
  3. TypeSystem now integral part of the core, hence some modules can take advantage of the Row structure; in the current version, the TypeSystem is an add-on component.
  4. Multi-licensed - SimpleDBM V2 will be available under Apache as well as GPL licenses.
  5. Magic number and version info in the SimpleDBM files.
Rather than making enhancements at the same time as refactoring the codebase, I am going to keep changes to a small number, as I want to get the project restructure completed by Q1 2011. 

Moving the type system into the core is the biggest change in SimpleDBM. I think this will allow some modules to exploit the knowledge about rows and column types, and optimise for performance. Some things that may be possible are:
  • Compression of data in pages
  • Smaller redo log records for data updates
Note that the SimpleDBM V2 code is in a separate mercurial repository. Changes can be viewed at http://code.google.com/p/simpledbm/source/list?repo=v2.

Sunday, May 02, 2010

A Simple IOC Container

All of the SimpleDBM modules are designed with constructor based dependency injection in mind. But so far, these dependencies have been manually coded. I want to move away from manual setting up of dependencies and was therefore looking for a small IOC container that would serve my needs. Unfortunately, all the available dependency injection frameworks appear to be bloated and huge; PicoContainer used to be small, but now is  a 308k jar, SpringFramework with all dependencies is huge; Google Guice has an android edition that is 403k. Considering that SimpleDBM itself is about 632k in size, I don't fancy adding external libraries that cause the size to double or treble.

As my requirements are tiny (I only need support for singletons and constructor based dependency injection) I decided to roll out my own. The core of the IOC Container is implemented in just three files, consisting of less than 300 lines of code. Here are the links to these files:
Of course it would be nice to reuse other libraries and not have to write my own, but on the plus side, you can't beat the home made solution when it comes to size. As I have also removed the dependency on Log4J, SimpleDBM is now totally self contained with no external dependencies other than the standard libraries that are shipped with JDK 5. I find this liberating.

Sunday, April 25, 2010

Proposed license boilerplate

Given below is the boilerplate license notice that will be add to SimpleDBM source files from version 2 onwards. This is based upon the boilerplate used by Mozilla.org. Note that I decided to add LGPL to the mix as well, so that SimpleDBM V2 will be triple licensed. Hopefully that will ensure compatibility with the vast majority of Open Source licenses.

/**
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER.
 *
 * Contributor(s):
 *
 * The Original Software is SimpleDBM (www.simpledbm.org).
 * The Initial Developer of the Original Software is Dibyendu Majumdar.
 *
 * Portions Copyright 2005-2010 Dibyendu Majumdar. All Rights Reserved.
 *
 * The contents of this file are subject to the terms of the
 * Apache License Version 2 (the "APL"). You may not use this
 * file except in compliance with the License. A copy of the
 * APL may be obtained from:
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the APL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the APL, the GPL or the LGPL.
 *
 * Copies of GPL and LGPL may be obtained from:
 * http://www.gnu.org/licenses/license-list.html
 */

Monday, April 19, 2010

Multi-licensing

The next version of SimpleDBM will be available under the GPLv2 as now, as well as the Apache License. Dual licensing will allow people to use SimpleDBM in more flexible ways.

Sunday, April 18, 2010

Network client server API released

I am pleased to finally publish 1.0.18-ALPHA release of SimpleDBM. This release has following changes:
  • A network client server implementation that allows SimpleDBM to run as a standalone database server to which clients can connect remotely.
  • A sample application that demonstrates the use of the network API. The sample implements a simple discussion forum; front end has been created using Google Web Toolkit.
The version 1.x codebase is now going into maintenance phase, as I am not going to add any new features to this version. I will start work on version 2.x which will allow me to refactor some of the modules as  previously blogged.

Licensing revisited

In a previous post I wrote about why I preferred GPL license for SimpleDBM. But I am no longer sure; my intention was always to ensure that SimpleDBM can be used by anyone without worrying about licensing issues, and I have no desire to put restrictions on other people's work. So if someone enhanced SimpleDBM, they should be free to do whatever they like with their enhancement, and although it would be nice if they contributed back, I don't insist on it. But this philosophy is very different to the GPL, which asserts that any enhancements should also be GPL.

I also did not fully understand the restrictions that GPL poses on linking with another library. Users of SimpleDBM should definitely not have to change their license or adopt GPL just to be able to use SimpleDBM in their applications.

I am seriously considering changing the SimpleDBM license to some other; probably Apache Version 2.

Saturday, April 10, 2010

Roadmap

I have been thinking about how SimpleDBM should evolve. When I started the project my intention was to eventually add support for SQL, but now I can't see this happening in the near term. SimpleDBM is not aimed at competing with other SQL databases; SQL is nice because of the ease with which tables can be queries, joined etc., but implementing an SQL layer is quite a lot of work, which I am not able to put in right now.

Another subject that has interested me right from the beginning is multi-version concurrency. Unfortunately, I have not really found a way of implementing this which is satisfactory. The two main approaches are those taken by Oracle and PostgreSQL - which I have previously compared in a short paper. The Oracle approach is problematic because it requires page level redo/undo in the transaction system; SimpleDBM's BTree implementation uses logical undo, allowing for undo to be applied to a different page from the original. I do not like the PostgreSQL approach to MVCC either, as it does not support versioning in indexes.

Instead of adding large features such as above, I shall perhaps focus on the many smaller changes that I have been mulling over for some time now:
  • Refactor the code so that the modularity of SimpleDBM can be exploited better. This involves separately packaging the API from the implementation, and allowing implementations of individual modules to be easily swapped in.
  • Add statistics gathering so that useful metrics can be captured. Some work has been done in this area.
  • Improve performance and scalability of the lock manager.
  • Add support for sequences.
  • Add support for reverse indexes.
  • Add JMX monitoring capability.
  • Refactor the type system - and make the type system a first class component of the core engine. This needs a bit of explanation. When I first started SimpleDBM, my strategy was that the core database engine should be typeless, and should allow a type system to be plugged in. The core engine should treat records and keys as blobs of data, and not worry about their internal structure. This strategy allowed me to develop the core engine without first having to define a type system. However, it has meant that some things are less efficient - for instance, row updates cause the entire before and after images of the row to be logged. Another area of concern is the ability to compress data within pages, which is hard to do without some knowledge of the structure of the data inside the records.
  • Carry forward the work of making most types immutable, include row types. A row builder class can be provided to create rows, but once constructed the row should be immutable. 
  • Carry on improving the documentation.
  • Improve the test cases, and the test coverage.
  • Create a single threaded version which can run on small devices.
  • Add support for nested readonly transactions; these are useful for carrying out foreign key checks, should they be added in future.
  • Ensure the embedded and network API are interchangeable, and that clients can swap between the two without having to change any code. At present, the network API is completely separate from the embedded API.
  • Create a full blown sample application - work is ongoing to create this.
  • Try to raise awareness about SimpleDBM and build a community of users and developers.

    Tuesday, April 06, 2010

    Sample Network Application

    It is taking longer than I anticipated to create a sample application. The main hurdle has been mastering Google Web Toolkit enough to create the user interface. I am hacking from the sample mail application available in the GWT distribution; but the code is increasingly becoming very different.

    First, here is a screen shot from the web UI. Apologies for the rough edges; I am not a UI developer, building user interfaces is a chore to me.


    The basic UI is working - I have a stub server application waiting to be hooked up with the backend.

    The UI is built using the MVP paradigm, except that I don't use an EventBus, as I am the sole developer, and the added complexity of a bus, and associated event mechanisms is not warranted. I have a RequestProcessor class that handles the presentation logic.

    I have been thinking about how to create the primary key of some of the tables. I have settled for a special table that will hold sequences; each sequence has a name and a long value. As reverse indexes are not yet supported in SimpleDBM, I came up with the idea of a decreasing sequence so that as time goes by, by accessing the data in increasing sequence, I can ensure that newer data appears before older data. This goes to show that we can live with almost any limitation; a bit of thinking gives a solution to the problem!

    As sequences do not need to be rolled back ever, the sequence generator can execute its own small transaction whenever the sequence needs decrementing. To make things efficient, we can allocate chunks of sequences at a time, but for now, I will simply decrement one at a time.

    There is also nothing like really using a system to discover bugs. I found that the Long column type was missing functionality to set a Long value!

    Sunday, March 21, 2010

    Network client/server update

    I am still testing the new network client server implementation. Apologies for the delay in publishing this enhancement.

    I am also working on a simple sample web application to illustrate the network API. Learning all about GWT (Google Web Toolkit) which is a lot of fun.

    Sunday, December 20, 2009

    Network client server SimpleDBM

    I am happy to report that an initial implementation of network client server has been checked in. This will allow the SimpleDBM clients to use a thin client which will connect to a remote SimpleDBM server.

    The client interface is a simplified version of the Database API.

    The network protocol is based on a simple request/reply model. The messages are passed in the SimpleDBM serialization format.

    I looked at the possibility of using a third-party NIO framework for SimpleDBM. However, all existing frameworks seemed to be unnecessarily complicated, due to a desire to make them generic. In the end, I ended up writing my own NIO server. It was easier than I thought it might be; though it is early days  and I have not tested all the failure scenarios yet.

    There is no attempt to optimise the network traffic in the current implementation. The main goal right now is to prototype the client interface and get it to a satisfactory level. Optimisation can happen once the interface is stable.

    The current implementation does not have any security constraints; security must be handled by the client application.

    An example of how the client interacts with the server is shown in the snippet below:


    Properties properties = parseProperties("test.properties");
    // start a session
    SessionManager sessionManager = new SessionManager(properties,
    "localhost", 8000);
    TypeFactory ff = sessionManager.getTypeFactory();
    Session session = sessionManager.openSession();
    try {
    // create a table definition
    TypeDescriptor employee_rowtype[] = { ff.getIntegerType(), /* pk */
    ff.getVarcharType(20), /* name */
    ff.getVarcharType(20), /* surname */
    ff.getVarcharType(20), /* city */
    ff.getVarcharType(45), /* email address */
    ff.getDateTimeType(), /* date of birth */
    ff.getNumberType(2) /* salary */
    };
    TableDefinition tableDefinition = sessionManager
    .newTableDefinition("employee", 1, employee_rowtype);
    tableDefinition.addIndex(2, "employee1.idx", new int[] { 0 }, true,
    true);
    tableDefinition.addIndex(3, "employee2.idx", new int[] { 2, 1 },
    false, false);
    tableDefinition.addIndex(4, "employee3.idx", new int[] { 5 },
    false, false);
    tableDefinition.addIndex(5, "employee4.idx", new int[] { 6 },
    false, false);
    // create table
    session.createTable(tableDefinition);
    // now lets insert/update a row
    session.startTransaction(IsolationMode.READ_COMMITTED);
    boolean success = false;
    try {
    Table table = session.getTable(1);
    Row tableRow = table.getRow();
    tableRow.setInt(0, 1);
    tableRow.setString(1, "Joe");
    tableRow.setString(2, "Blogg");
    tableRow.setDate(5, getDOB(1930, 12, 31));
    tableRow.setString(6, "500.00");
    table.addRow(tableRow);
    TableScan scan = table.openScan(0, null, false);
    try {
    Row row = scan.fetchNext();
    while (row != null) {
    System.out.println("Fetched row " + row);
    row.setString(6, "501.00");
    scan.updateCurrentRow(row);
    row = scan.fetchNext();
    }
    } finally {
    scan.close();
    }
    success = true;
    } finally {
    if (success) {
    session.commit();
    } else {
    session.rollback();
    }
    }

    // now delete all the rows
    session.startTransaction(IsolationMode.READ_COMMITTED);
    success = false;
    try {
    Table table = session.getTable(1);
    TableScan scan = table.openScan(0, null, false);
    try {
    Row row = scan.fetchNext();
    while (row != null) {
    System.out.println("Deleting row " + row);
    scan.deleteRow();
    row = scan.fetchNext();
    }
    } finally {
    scan.close();
    }
    success = true;
    } finally {
    if (success) {
    session.commit();
    } else {
    session.rollback();
    }
    }
    } catch (Exception e) {
    e.printStackTrace();
    } finally {
    session.close();
    }

    Sunday, October 11, 2009

    Update on Data Dictionary implementation

    I described the implementation of the data dictionary in a previous post. I have updated the implementation to include support for dropping tables.

    I stated in my last post that the creation of the table definitions would be handled as post commit actions. In the end this wasn't necessary, and the table creation is logged against the page 0 of the virtual container 0. This container is pre-created with a single page when a new database is created. This allows arbitrary actions to be logged against the page (0,0).  Logging as post commit action would not have worked as the definitions should be created before any container that depends on these is created.

    I also fixed issues with the logging of the table and container creation so that these are now logged as undoable log records, and in the event the transaction aborts, the table creation is undone. I will describe the changes in a separate post.

    The action to drop the table definition is handled as a post commit action. This is because we want to avoid dropping any object until we are sure that the transaction will be committed.

    Saturday, July 04, 2009

    From Subversion to Mercurial

    SimpleDBM source code repository has been migrated from Subversion to Mercurial. For a somewhat rude and OTT presentation on the benefits of distributed version control systems over traditional centralized systems see the Linus Torvald's presentation on GIT.

    Sunday, April 12, 2009

    What next?

    The SimpleDBM core database engine has been out for a while, and although I am still working on improving the robustness of the code, writing more test cases, etc., it is also time to start thinking about what next.

    Couple of areas interest me.

    A simple SQL layer so that it becomes easier to write code against the database. Unlike other projects, I am not interested in a full-blown SQL implementation, just enough to cut down the procedural code one must write. My current thoughts are towards a very basic set of SQL statements to manipulate data from just one table. No joins, etc.

    A network server/client so that instead of embedding SimpleDBM inside the application it can be accessed remotely. Embedded implementations are hard to make use of in a multi-user environment. I am looking at projects like Netty and MINA as the underlying TCP/IP framework so that I can concentrate on the functional aspects rather than writing all the network stuff.

    Tuesday, March 31, 2009

    Dwarfs standing on the shoulders of giants

    As I mentioned in a previous post, when building SimpleDBM I have used design ideas that I have picked from various technical research papers as well as other OpenSource projects. I would like to list some of the influences here, and acknowledge my indebtedness.

    Papers from IBM's System-R research project have influenced the role of the database engine in SimpleDBM. I have named the core engine as RSS in honor of the System-R Relational Storage System (RSS).

    A general reference that has proved invaluable is the classic Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter. The Lock Manager and the Buffer Manager modules are based upon descriptions in this book.

    The Lock Manager also uses ideas from the OpenSource project Shore. The handling of lock conversions and the deadlock detector are based upon code from this project.

    The Transaction Manager is based upon the algorithms published in the ARIES series of papers by C. Mohan of IBM. I am also indebted to Mohan for the implementation of lock isolation modes, and implementation of free space management in containers.

    The BTree implementation is based upon the algorithms described by Ibrahim Jaluta et al in the paper Concurrency control and recovery for balanced B-link trees.

    The idea of using the write ahead log records for normal execution of forward actions is taken from Apache Derby. This is a neat idea that ensures that both normal and recovery processes use the same code for performing updates to the database. Essentially all updates are executed through log records, even during normal execution.

    Projects that I have learned from but that haven't directly influenced the implementation include:

    Delayed release

    The next BETA release of SimpleDBM has been delayed due to my desire to re-factor some of the code. The re-factoring has been focused on following:
    • Removing all singletons - even Loggers. Unfortunately, Log4J and other logging packages all use singletons, so this is not completely possible unless I replace the entire logging package.
    • Ensuring that the serialization mechanism uses constructor based object initialization rather than retrieving state after construction.
    • Above enables the use of final fields in many objects, allowing them to be immutable.
    All of these changes are designed to increase robustness, and also to ensure that multiple instances of SimpleDBM can co-exist within the same JVM, and even the same Class Loader, without conflict.

    Saturday, August 23, 2008

    A History of SimpleDBM

    The SimpleDBM project started as the DM1 project in August 2001. Initially, I started writing the database code in C. A year later, I ported the code to C++. Here is what I wrote at the time about my switch to C++:
    I decided to port the product to C++ primarily for following reasons:
    • I wanted to use the automatic construction and destruction facility in C++. I think this is one of the coolest features of C++. As an aside, I like Java's finally solution better than C++ destructors because with finally, I am able to control the destruction of objects better.

    • I was apprehensive about using C++, because I find that C++ is a large and complicated language. In order to keep the DM1 Threads code simple, I made some decisions that are sure to be controversial:

    • I do not use the C++ standard library, including the STL, at all. This is because I do not like the code bloat that results from its use. I also do not like the idea of using the heap for creating simple objects like strings. I think that C programs are generally more efficient than C++ programs, because they are much more circumspect about using the heap. One advantage of not using the C++ standard library is that it makes DM1 Threads more portable.

    • I do not use templates other than in a very small way.

    • I use sparingly or not at all, certain features of C++ language, such as multiple inheritance, operator overloading, new style typecast operators, etc. This is because I feel that none of these features are essential (Java does without them nicely).

    • I think that Java is a better language than C++, but the lack of certain features (efficient array handling, pointers, etc.) makes it unsuitable for a project like DM1. In many ways, I use C++ as a better Java. C# would have been a good choice, but due to its lack of availability on UNIX platforms, I was unable to use it.
    Progress over the next few years was slow. This was partly because there were long periods when I wrote nothing, and partly because I had to do research for the project. I am self taught programmer with no formal degree in Computer Science. I have also never worked for a database vendor. Implementing a database has been an ambition for a long time. I had already implemented single user persistence storage system in C++, but this was nowhere near a real database system with transactions.

    I was helped tremendously by reading code of other Open Source systems such as Shore and PostgreSQL. Another source of information was the research published by ACM and also by IBM researcher C. Mohan. I became a member of ACM in 2001 just so that I could access all the research.

    There weren't and still aren't any books that go into the nitty-gritty of implementing relational databases. The only book that came remotely close to discussing some of the details of a database engine was the book Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter. This was a life saver.

    By 2005 I had become very frustrated with C++. I found that the IDEs available for C++ were no match for the Java IDEs. Even simple refactoring of code was pretty onerous.

    Then IBM announced in 2004 that they were open sourcing Cloudscape. I was very excited because here was a database that was written in Java, and that performed quite well. Another key event was the release of Java 1.5 (Java 5.0). Finally there was a version of Java that had the locking primitives that I needed for my database project. I procrastinated for a while because I did not fancy porting all the code I had written in C++ to Java.

    Fortunately (for the project), the company I was working for at the time, got taken over. There was a period of inactivity as we awaited the fate of our IT department. Having not much to do at work meant that I had spare time and was able to use this time to port the code to Java. This was in late 2005, which was the most fruitful six months I spent on the project.

    I completed most of the core modules of SimpleDBM in the six months between July and Dec 2005. Since then I have spent more time testing and refactoring, and added the TypeSystem and Database API modules. After I changed my job in 2006, progress has been sporadic. I had a period of year and half when I was so busy at work that I was taking work home every day. I was unable to spend time on SimpleDBM at all.

    Early in 2008, I was approached by an ex-database enterpreneur who suggested that I should do some work on Apache Derby. I thought of ditching SimpleDBM to work on Derby. Unfortunately, this relationship did not work out. Now I think that if I am working on my own free time, then it is more fulfilling to work on SimpleDBM, as everything I create is my own. I have considerable freedom, and the ability to work on areas that interest me. I would very much like to work on Apache Derby but cannot afford to do so unless it is a paid job.



    Wednesday, July 09, 2008

    SimpleDBM release 1.0.8 BETA is out

    At long last, I have put together the 1.0.8 BETA release. This contains the experimental TypeSystem and Database API module.

    Thursday, July 03, 2008

    Update on SimpleDBM's Data Dictionary

    I have chosen to implement option 3 as described in a previous post. For each table, a special container is created, which is used to store not only the table structure but also structures of all associated indexes. Special containers are created for indexes as well but these contain pointers to the table structures. (The reason we need the index definition containers is that during recovery an index update may be performed prior to a table update.)

    A memory cache is maintained of the entire data dictionary. When a new table is created, its structure is immediately saved into the memory cache. The structure is also persisted using a PostCommitAction - which is a type of redo only log that is executed once the transaction commits.

    At system restart, the storage system is scanned for containers that have a specific naming convention and are in a well known location. This way, SimpleDBM is able to initialize the memory cache at startup. Note that this initialization occurs post recovery. The list of open containers is consulted to ensure that only those definitions are loaded that are relevant.

    There is a problem that during recovery, any redo/undo operation on a table or index will require access to the table/index definition. Since the data dictionary memory cache is not available at this stage - it is populated after recovery, the solution is to retrieve the definitions on demand.

    The table/index creation process uses a standalone transaction. During this time the container IDs are exclusively locked. This ensures that by the time any subsequent transaction performs a data manipulation operation against the table, the data dictionary memory cache has been updated to contain the table definition.

    At present tables cannot be dropped, as this has not yet been implemented.

    Along with the Data Dictionary implementation, there is also a high level Database API that is available. This hasn't been released yet as it is still being tested. I am hoping that a fully tested version will be available towards the later part of this year.