Tuesday, November 15, 2005

Design choices - flexibility versus performance part 2

When designing the interface for the BTree module, there are some conflicting priorities.

An important goal is to ensure that the interface is reusable and makes as little assumptions about the content of "keys" and "values" as possible. Some of the points to consider are:

a) Should the BTree module be aware of the type system available in the system?
b) Should it be aware of how multiple attribute keys are handled?

I chose to keep the level of abstraction higher. Therefore in my implementation of the BTree module, the module has no knowledge of what is contained in the key, whether it is made up of single or multiple attributes, etc. The advantage is that a key can be anything that satisfies a relatively simple interface defined by the system. The disadvantages are:

1) The system cannot perform specific performance optimisations that would be dependent upon the knowledge of the attribute structure and type system. For example, the system cannot perform compression on the key structure.
2) The system cannot support search operators at an attribute level. When searching and fetching keys and values, only one composite operator is supported - >=. That is, the fetch algorithm knows how to obtain the next key that is equal or greater than the specified key.

On the whole, I am happy with the interface as it provides greater reusability. It allows the BTree module to be independently useful.

A problem I have been grappling with recently is how to support data driven construction of keys, without breaking the high level interface defined so far. In a DBMS system, the definition of the keys is stored in System Catalogs. There has to be a way by which the key definition can be used to generate keys. In the current design, the BTree module expects to be given a factory class for instantiating keys. I thought of creating a Key factory that would read a key definition and then generate keys based upon the definition. However, the problem is that such a key factory would be capable of generating many different types of keys, whereas the current BTree interface expects a one to one relationship between the key factory and the type of key. I have come to the conclusion that I need to enhance the system in two ways:

Firstly, I need to support one to many relationship between the key factory and the key. Since the BTree instance is tied to a specific type of key, it therefore needs to be enhanced to supply an "id" that enables the key factory to generate the specific type of key required by the BTree instance. This means that instead of invoking:

key = keyFactory.getNewInstance()

the BTree instance would invoke:

key = keyFactory.getNewInstance(myid)

The key factory would be able to use the "id" to determine the type of key required.

The second enhancement I need is to do with performance. At present, the Object Registry always generates new instances of objects, which would be inefficient for a key factory that needs to maintain its own internal registry. I therefore need to enhance the registry to support Singletons - key factories need to be cached in the registry so that the same factory can be used by multiple BTree instances.

My final point about the tradeoffs between flexibility and performance is to do with storage structure of keys and log records. I have tried to make the storage structures self describing. This means that the keys and values, as well as the log records, must persist sufficient information to be able to reconstruct their state when required. In a multi-attribute key, for example, this means that each attribute must store its type information, data length, etc. along with the data. This naturally increases the storage space requirement of entities. The benefit is that the system does not require external support to determine how to read data back from persistent storage. For example, the BTree module does not require the System Catalalog to be available, it has sufficient information to be able to read and write keys to persistent storage.

No comments: