Berkeley DB Reference Guide:
Berkeley DB Transactional Data Store Applications

PrevRefNext

Architecting Transactional Data Store applications

When building transactionally protected applications, there are some special issues that must be considered. The most important one is that if any thread of control exits while holding Berkeley DB resources, recovery must be performed to do the following:

The Berkeley DB library cannot determine whether recovery is required; the application must make that decision. Furthermore, recovery must be single-threaded; that is, one thread of control or process must perform recovery before any other thread of control or process attempts to join the Berkeley DB environment.

It greatly simplifies matters that recovery may be performed regardless of whether recovery needs to be performed; that is, it is not an error to recover a database environment for which recovery is not strictly necessary. For this reason, applications should not try to determine if the database environment was active when a thread of control failed or the system rebooted. Instead, applications should run recovery each time a thread of control accessing a database environment fails, as well as before accessing any database environment after system reboot.

There are three ways to architect transactional Berkeley DB applications. The one chosen is usually based on whether or not the application is comprised of a single process or group of processes descended from a single process (for example, a server started when the system first boots), or if the application is comprised of unrelated processes (for example, processes started by users logged into the system.

  1. The first, and simplest, way to architect transactional applications is as a single, usually multithreaded, process.

    When the application starts, it opens and potentially creates the database environment, runs recovery, and then opens its databases. From then on, the application can create new threads of control as it chooses. The threads of control can either share Berkeley DB DB_ENV and DB handles, or have their own. In this model, databases are rarely opened or closed when more than a single thread of control is running; that is, they are opened when only a single thread is running, and closed after all threads but one have exited. The last thread of control to exit closes the databases and the environment.

    This architecture is simplest because it requires no monitoring of other threads of control. No cleanup is required if the process or system fails and the application can simply be restarted.

  2. The second way to architect transactional applications is as a group of related processes (the processes may or may not be multithreaded).

    This architecture requires the order in which threads of control are created and subsequently access the Berkeley DB environment be controlled because recovery must be single-threaded. The first thread of control to access the environment must run recovery, and no other thread should attempt to access the environment until recovery is complete.

    In addition, this architecture requires that threads of control be monitored. If any thread of control holding Berkeley DB resources exits without first cleanly discarding those resources, recovery should be performed. Before performing recovery, all threads using the Berkeley DB environment must relinquish all of their Berkeley DB resources (it does not matter if they do so gracefully or because they are forced to exit). Then, recovery can be run and the threads of control continued or restarted.

    The easiest way to structure groups of related processes is to first create a single process (often a shell script) that opens/creates the Berkeley DB environment and runs recovery, and then creates the processes or threads that will actually perform work. The initial thread has no further responsibilities other than to monitor the threads of control it has created, to ensure that none of them unexpectedly exits. If one exits, the initial process then kill all of the threads of control using the Berkeley DB environment, runs recovery, and restarts the working threads of control.

  3. The third way to architect transactional applications is as a group of unrelated processes (the processes may or may not be multithreaded).

    If it is not practical to have a single parent for the processes sharing a Berkeley DB environment, each process sharing the database environment should log their connection to and exit from the environment in a way allowing a monitoring process to detect if a thread of control might have acquired Berkeley DB resources and never released them.

    Berkeley DB supports this architecture with the DB_REGISTER flag to the DB_ENV->open method. If the DB_REGISTER flag is set, each process opening the database environment first checks that no other process has failed while holding an open DB_ENV handle. If failure is detected and either DB_RECOVER or DB_RECOVER_FATAL is also specified, recovery will be performed and then the open will proceed normally. If failure is detected and no recovery flag is specified, DB_RUNRECOVERY will be returned. If no failure is detected, DB_RECOVER and DB_RECOVER_FATAL will be ignored.

    For this architecture to work, all applications using the database environment must specify the DB_REGISTER flag when opening the environment. However, there is no additional requirement the application choose a single process to recover the environment, as the first process to open the database environment will know to perform recovery.

    There is a performance cost associated with the DB_ENV->open method and the DB_REGISTER flag, of roughly one system call per existing open DB_ENV handle. As DB_ENV handles are relatively long-lived, this should not be a problem for most applications.

    One final consideration when using the DB_REGISTER flag is the type of mutex being used. In the case of database environment failure when using test-and-set mutexes, threads of control waiting on a mutex when the environment fails will eventually notice the environment has failed, and will return an error from the Berkeley DB API. In the case of environment failure when using blocking mutexes, where the underlying system mutex implementation does not unblock mutex waiters should the thread of control holding the mutex die, threads of control waiting on a mutex when an environment fails can hang forever. Applications using DB_REGISTER on these systems should incorporate some form of long-running timer to kill the process should it fail to finish, or to return from the Berkeley DB API in a reasonable length of time.

    Alternatively, applications can implement their own monitoring. For example, initial "watcher" process would open/create the Berkeley DB environment and run recovery, and then create a sentinel file. Any other process wanting to use the Berkeley DB environment checks for the sentinel file; if the sentinel file exists, the process registers its process ID with the watcher and joins the database environment. When the process finishes with the environment, it unregisters its process ID with the watcher. The watcher periodically checks to ensure that no process has failed while using the environment. If a process does fail while using the environment, the watcher removes the sentinel file, kills all processes currently using the environment, runs recovery, and re-creates the sentinel file.

Obviously, when implementing the second and third architectures, it is important that any process monitoring other threads of control be as simple and well-tested as possible, because there is no recourse if it fails.


PrevRefNext

Copyright (c) 1996-2004 Sleepycat Software, Inc. - All rights reserved.