author: Sophie Lathouwers
title: Automatic Maintenance and Storage Optimization for Lazy Functional Databases
keywords: laziness, functional programming, graph rewriting, databases, parallelism, concurrency
topics: Algorithms and Data Structures
committee: Lesley Wevers
started: February 2015
end: July 2016

Description

We are investigating the use of lazy evaluation to reduce the blocking behaviour of transactions in database systems. The main idea is that a transaction can be committed before it is executed, and its execution can be deferred until the data that it affects is required (for instance, to show a user the result of a query). One application of this approach is deferring constraint checking on database updates, to allow faster updates. However, one of our main goals is to use laziness to eliminate blocking in large bulk operations. The idea is to split large transactions into many independent parts, which can each can be executed on demand to allow fast access to small parts of the database. This can for instance be used to perform online non-blocking schema transformations, which are blocking operations in current database systems.

In conjunction to laziness, we use the idea of purity from functional programming. I.e., a database is a value that cannot change. Instead, an update creates a new version of the database from an old version. The benefit of this approach is that database states are immutable, which allows read transactions to operate on database states without interfering with write transactions. This approach can be implemented efficiently using a graph reduction approach, which allows sharing of memory and computations between different versions of the database.

To perform lazy evaluation of large bulk operations in a purely functional setting, we perform lazy operations by 'pushing' them down a tree-structured index. For instance, map(f, branch(left, right)) rewrites to branch(map(f, left), map(f, right)). Every rewrite returns a tree node that the next transaction can use to begin processing. In the leaf nodes of the tree, we store data in bulk data structures, this allows efficient storage, and allows fast bulk processing of data. 

Challenges

Over time, as a result of transactions, the physical layout of data in memory can deteriorate, and lazy operations can build up in the tree without ever being executed. We would like to automatically optimize the data layout using idle CPU time. In particular, we would like to: - Force the execution of operations in the tree, to avoid a large build-up of lazy operations. - Merge (nearly) empty leaf nodes, which can be produced for instance after a filter operation. - Re-order record identifiers to optimize for storage in dense arrays.

The main challenge is to do all of this without blocking access to the database, and where possible, do this in parallel. An additional complication is that we don't want to waste time maintaining versions of the database that are not used anymore, e.g., one time query results.

Goal and Tasks

The goal of your project is to find an efficient approach to maintain lazy databases: - Investigate laziness, graph rewriting and the problems of lazy database maintenace. - You have to come up with a solution that satisfies the requirements. For instance, a potential solution could be to perform maintenance tasks in small batches. - Build an implementation of your solution. - Perform experiments to see how well your solution works, and where there is room for improvement. - Investigate how the maintenance operations can be executed in parallel using idle CPU time.

Other Projects

If you are interesting in this area, but not in this particular project, or you have your own idea for a project, feel free to contact me.

References

Additional Resources

  1. The paper