Sunday, August 28, 2005

More on Transaction Manager

After many hours of debugging, I finally have a version of the Transaction Manager that successfully handles the BitMgr Test. The BitMgr is modeled after the OneBit Resource Manager described in Transaction Processing: Concepts and Techniques. It uses a single data page, containing a set of bits that can either be on or off. Despite the simple design, the One BitMgr allows many of the features of a transactional system to tested.

Although the ARIES algorithm is described in great detail by its inventors, there are still a few areas that are alluded to but not fully described in the paper. For example:

  1. A post commit action is something that must be done after a successful commit. An example of a post commit action is the deleting of a container after it has been dropped. Since deleting a container is not recoverable, it is necessary to defer this action until it is definitely known that the transaction will commit. The challenge is how to ensure that post commit actions are properly executed, despite failures that may disrupt the commit processing.
  2. ARIES uses the LSN within a page to track the status of an update. This does not map very well to actions that are not necessarily related to a page, and also when they need to be redone unconditionally at system restart. As an example, when a container is created, the system should log it in such a way that at restart the container is recreated if it was not successfully created before. The way to do this is to maintain the status of the container in a page. However, this page also must reside in some container, so we have a recursive situation. Another example is the action of opening a container, which must be redone at system restart unconditionally.
  3. If a transaction has rolled back to a Savepoint, then any locks acquired after the Savepoint was established can be released. However, how can we determine which locks are safe to delete? ARIES suggests using the LSN as the Savepoint marker, however, this cannot be used to determine which locks are safe to remove. For instance, read locks are not related to LSNs.
  4. The transaction manager interacts with many other modules, and it is a challenge to ensure high concurrency, and deadlock avoidance between it and other modules.

Tuesday, August 23, 2005

SimpleDBM's Transaction Manager

This week I am working on the Transaction Manager for SimpleDBM. The Transaction Manager is based upon the well known algorithm known as ARIES. There is a freely available paper at ARIES inventor C.Mohan's website that describes ARIES in quite a lot of detail.

I wrote a short description of ARIES, which can be found at Apache Derby website.

ARIES is based upon the recovery algorithms used in IBM's original Database 2 (DB2) product. A good description of the DB2 recovery method can be found in the paper Data Recovery in IBM Database 2, by R.A.Crus. The main contribution of ARIES is that during system restart, failed transactions are also redone, as opposed to redoing only active/committed transactions. This simplifies the subsequent recovery.

Confusion with terminology and Latch implementation

The terms Lock and Log have very specific meaning in a DBMS. A DBMS Lock is a dynamically allocated synchronisation primitive, supporting multiple lock modes, used to manage concurrent access to Table rows. Locks are allocated as part of a transaction and retained until the transaction commits. Unfortunately, the term Lock is also used in a programming language like Java to represent Monitors and Mutexes, for example, Java 5.0 uses names like ReadLock, WriteLock, ReentrantLock, and ReentrantReadWriteLock. In DBMS terms, these are not locks. The preferred DBMS term for such primitives is the Latch.

In a DBMS, the term Log means the Write Ahead Log, used for logging all changes within the DBMS. To confuse matters, programming languages uses logging toolkits for generating trace messages. A well-known example is the Log4J toolkit.

Today, I came across a paper called Concurrency control and recovery for balanced B-link trees, by Ibrahim Jaluta, Seppo Sippu, and Eljas Soisalon-Soininen. I am pretty excited about this paper as I am going to work on the BTree implementation for SimpleDBM soon, and have been considering various alternative algorithms. I notice that this paper assumes that Latches support three different lock modes - Shared, Update and Exclusive. Another requirement is to support upgrades from Update to Exclusive and downgrades from Exclusive to Update latches. These requirements have led me to create my own Latch implementation, as the Java ReentrantReadWriteLock I have been using until now only supports Shared and Exclusive lock modes. My implementation uses the new LockSupport class available in Java 5.0, which allows a Thread to be safely suspended and then resumed. I toyed with the idea of building my Latch implementation using the AbstractQueuedSynchronizer class, but must admit that the working of this class is beyond my comprehension.

Sunday, August 21, 2005

Fine grained locking and Linked Lists

The Buffer Manager in SimpleDBM uses a number of different locks in order to obtain fine-grained locking. The assumption is that with fine-grained locking, concurrency will be higher and the system will scale better with larger volumes and greater number of CPUs. However, whether this is actually achieved depends largely on the efficiency of the locks. It would be interesting to create a different version of the Buffer Manager that uses coarse locking, and then compare the performance of the two. I suspect that the benefits of fine-grained locking will only show when the buffer pool is large.

Fine grained locking does have couple of issues. It makes your code more complicated and difficult to maintain and debug. It also increases memory usage. To give you an example, in the Buffer Manager, we have following locks.

  1. A ReentrantReadWriteLock to protect access to LRU chain.
  2. A ReentrantReadWriteLock for each cached page.
  3. A ReentrantLock for each Buffer Control Block.
  4. A ReentrantReadWriteLock for each bucket in the Hash Table.

If we have a buffer pool of 500 pages, and a hash table of 193 buckets, then the total number of lock objects would be:

1 + 500 + 193 = 694 ReentrantReadWriteLocks
plus 500 ReentrantLocks.

I hope that the java.util.concurrent.lock package is upto this type of usage.

One problem in the Buffer Manager implementation is the single global LRU lock. It would be a good idea to split the global LRU list to multiple lists, each protected by its own lock.

Linked Lists

Linked Lists are used heavily in the Buffer Manager. The LRU list is a linked list. Each of the hash buckets has its own linked list.

The problem with the standard linked lists is that the remove operation requires a search within the list. This did not strike me until recently, when I was considering how to implement LRU placement hints. The larger the list grows, the longer it will take to search for the item to be removed. Since hash chains are expected to be small, maybe it is not an issue there. The LRU chain, however, is expected to be long. A possible solution (ugly) would be to implement some custom linked list code for the LRU list. Sigh.

Friday, August 19, 2005

Buffer Manager implementation

Just completed the initial port of the Buffer Manager. Also checked in all the source code in the UNSTABLE_V100 branch.

Sunday, August 14, 2005

Buffer Manager coming soon

This week I am working on porting the Buffer Manager module. I expect to upload the code for it by the end of the week.

Why Java?

I had started the DBMS project DM1 back in 2002 using C++. I said then that I had chosen to code DM1 in C++ because of lack of certain features in Java, such as pointers and efficient array handling. It seemed to me that a DBMS needed a language that allowed full control of memory.

SimpleDBM is a pure Java project. So what has triggered the change from C++ to Java?
  1. Firstly, I no longer think that in the scale of things, absolute control of memory is that important, unless the DBMS is being targetted at an embedded environment and must use as little memory as possible. The trade-off is much simpler memory management. In SimpleDBM, I rely completely on Java's garbage collection facilities.
  2. My productivity in Java is far higher. This is not only due to a better designed language, but also because of the availability of free IDEs such as Eclipse, which make development so much faster and pleasurable.
  3. I tried using the C++ try/catch construct and destructors for proper cleanup of resources. My experience is that it is unworkable in large projects. Java's concept of finally block combined with garbage collection of memory, is a better and neater option. The finally block provides control of the resource cleanup, whereas C++ destructors don't.
  4. The biggest problem I had with Java was the lack of adequate primitives for concurrency control. The default synchronisation mechanisms were simply not adequate for the type of fine-grained concurrency I wanted to have in the DBMS. Fortunately, due to wonderful efforts of Doug Lea and others, Java 5.0 has new concurrency packages that solve my problem. There are still a few issues, I think, but on the whole, if I had to pick the single most important factor in enabling my shift to Java, I would say this is it.
  5. Java is simply a better language than C++. Why I think this is so will have to the subject of another post, but in a few words, Java's strengths are smaller and simpler language, support for interfaces (a feature that was proposed for C++ and rejected), built-in support for Threads and concurrency, garbage collection, finally blocks, dynamic loading and binding, checked Exceptions.
  6. A testamount to the superiority of Java is that the ported version is superior to the original C++ version, with better separation of modules, greater concurrency.

Welcome

Welcome to the official weblog for the SimpleDBM project. SimpleDBM is a new project that aims to implement relational database manager in Java. Please visit the SimpleDBM website for further details about the project.