Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed
« Writing and Programming memorizable.org »

B-Tree

An oak at The University of South Carolina-Columbia by David.

I’m implementing a B-Tree in Java. The project is called Strata. You can view the source, is is in the strata subdirectory of my subversion repository.

I’ve yet to choose a license.

This tree is the first indexing option in Momento, which is a large file XML database that I wrote last year, when I was in Ann Arbor.

Getting things done in Java is a challenge. There are many ways to get destracted.

Strata will index the normalized text of a specific node, and do so without concern about range queires. The only query supported at the outset is equality.

Then I can use Lucene to do full text indexing. In order for Lucene to be effective, you see, it must be a simple matter to retrieve the document. Indexing unique atom identifers, would give a key to fetch the document resulting from a Lucene query.

The research that I’m doing on B-Tree implemetnation can be tracked with my del.icio.us account, under the tag strata.

Thus, the objectives for Strata are as follows.

  • Leaf nodes only to store references to file positions, not store the data itself.
  • Branches also store only references to file positions, and hold a list of objects read from those file positions to use as keys.
  • Thread-safe.
  • File backed, paged.
  • Supports efficent insertion and deletion, stays balanced.
  • Equality comparison query.
  • Compares byte buffer ranges, not objects.

At some point, it may be better to keep partial keys in the tree itself.

Update: I’ve written a description of the tree structure at my user page on the New Orleans Wiki, under a Software Development page.

Leave a Reply