Interested in improving this site? Please check the To Do page.

Database File Format

Basics

Frontier's database file format is based on the “boundary tag method of storage allocation”, described in Donald Knuth, “The Art of Computer Programming, Volume 1: Fundamental Algorithms”, section 2.5 of the 3rd edition.

The low-level database routines are implemented in db.c, the window.dbstats verb is implemented in dbstats.c, and the builtins.db verbs are implemented in dbverbs.c.

The first 88 bytes of the file contain the database header, see the definition of tydatabaserecord in db.h.

The rest of the file is made up of contiguous database blocks, also referred to as nodes in the code.

Every database block has an 8-byte header and a 4-byte trailer. The most significant bit of the first four bytes of the header and the four bytes of the trailer indicate whether the block is free (1) or in use (0). The other 31 bits indicate the size of the database block in bytes, excluding the header and trailer. If the block is in use, the remaining four bytes of the header indicate the number of bytes belonging to the block that are actually in use right now. (The difference between the total size and the bytes actually in use is referred to in the code as the variance.) If the block is free, the remaining four bytes of the header contain the address of the next block in the linked list of free blocks (or nil if it's the last block in this list).

Addresses in the database file are specified as zero-based offsets in bytes from the beginning of the file, i.e. the first data block of the database is located at the db address 88.

Frontier uses the Mac's native byte order in the database file. On Windows, the byte order is swapped as neccessary by calling the shortswap or longswap macros defined in standard.h.

Due to the conventions used in the block headers and trailers, the size of the database is limited to 2 GB. This limit is strictly enforced by dbseteof. It might be possible to extend this to 4 GB by carefully checking all the function that deal with db addresses and block sizes.

Opening A Database

Opening a database is handled by dbopen. It reads the database header into a newly allocated handle named databasedata (a global) and checks the version number specified in the header.

Next, it calls dbshadowavaillist. In Frontier 6.1, this function would build an in-memory array – the shadow avail list – containing pointers to all the free blocks in the database and their sizes. This was done by jumping to the first free block in the database whose address is stored in the availlist field of the database header and then following the link to the next free block that is stored as a db address in the second four bytes of a free block.

In 6.2, we optimized Frontier to save this shadow avail list in its own database block and when opening the database, we now just read that block into memory instead of walking the avail list on disk.

dbopen is usually called from ccloadfile in cancoon.c. This function calls loadversion2cancoonfile, also in cancoon.c, to load the initial data from the file. This includes the root table of the database. The address of the database block containing the root table is stored in the views[0].adrroottable field of the database header.

Loading Objects

Frontier differentiates between scalar objects (boolean, number, date, string, etc.) and so-called external objects (table, wptext, outline, script, menubar). An external object is stored in its own database block while scalars are stored as part of the parent table, i.e. in the parent table's database block. langexternal.c implements an abstraction layer for external objects. It appears to be written to make adding new types of external objects relatively easy.

Before an external object is loaded into memory, the data field of the tyvaluerecord of its hashnode in the parent table is a handle to a struct of type tyexternalvariable [see langexternal.h], the flinmemory flag in that struct is set to false, and the variabledata field contains the db address of the database block containing the actual data.

When the external object gets loaded into memory, the flinmemory flag is set to true, the db address is stored in the oldaddress field, and the variabledata field is turned into a handle to the in-memory representation of the external datatype, e.g. a tyoutlinerecord [see op.h]. The type of the external object is indicated in the id field of the tyexternalvariable struct, see the definition of tyexternalid in langexternal.h for possible values.

The only exception to the rule mentioned in the first paragraph are scalars that are larger than 1kB (strings, binaries, compiled code, lists, and records). These are also stored in their own database blocks. Before a scalar larger than 1kB is loaded into memory, the fldiskval flag of the tyvaluerecord of its hashnode in the parent table is set to true and the value of the data field of the tyvaluerecord is the db address of the database block containing the actual value. When the object is loaded into the database, the fldiskval flag is cleared and the data field becomes a handle to the actual data. Note that in this case, there's no place to store the db address that the value was loaded from.

Deleting Objects

When you delete an external object from the in-memory database structure, its associated database block on disk is not immediately marked as free. Instead, the address of the database block as stored in the oldaddress field of the tyexternalvalue struct gets pushed onto the release stack of the database, a dynamical in-memory-only array.

Since there's no room for scalar objects larger than 1kB to remember the db address that they were loaded from, the address of the block on disk can't be pushed onto the release stack when the object is deleted from the in-memory database structure. Consequently, the address has to pushed onto the release stack when the object is loaded from the database.

The database file itself is not modified until you actually save the database.

Saving A Database

The top-level function that implements saving of databases is ccsavefile [in cancoon.c]. After gathering some info about the database to be saved, it calls tablesavesystemtable [in tablestructure.c].

tablesavesystemtable first calls tablepreflightsubsdirtyflag to update the flsubsdirty flag in all hashtables belonging to the database to be saved and currently loaded into memory.

Next, tablesavesystemtable calls tableverbpack to pack the root table of the database. Any subtables or external objects that are currently loaded into memory will recursively be packed and, if neccessary, saved to disk. If the root table needs saving itself, the packed table will be written to disk by calling dbsavehandle which takes two parameters: the handle returned by tablepacktable containing the packed table and the address of the database block where the root table had been stored before.

dbsavehandle locks the packed handle and then calls dballocate it the table has never been saved before (in that case the db address is nil), otherwise it calls dbassign.

dbassign first calls dbreadheader to determine the size of the block in the database file and how much of it is currently in use. If the block is big enough to hold the packed handle, the header and trailer of the block are updated by calling dbsetsize and the data is written to disk by calling dbmove. Otherwise, the block is released by calling dbrelease (described below) and finally the data is written to another block by calling dballocate.

dballocate first loops over the whole in-memory copy of the list of available blocks in order to locate a free block that is big enough to hold the data to be saved.

If a suitable block has been found, it will be split into two blocks without the remaining free block ending up smaller than 32 usable bytes. Otherwise, the whole block gets used. The shadow avail list is updated accordingly. If the avail list doesn't contain a block that is big enough, a new block will be allocated at the end of the file by calling dbwritedatablock after the file has been grown to accomodate the block by calling dbseteof.

After recursively saving all the changed objects in this way, tablesavesystemtable calls dbflushreleasestack which loops over all db addresses on the release stack and releases the blocks by calling dbrelease for every one of them.

dbrelease first calls dbreadheader to determine the size of the block in the database file, so that it can locate the blocks immediately to the left and to the right. Then it calls dbmergeright and dbmergeleft which attempt to merge the block to be released with the block to the right or to the left, if they are free as well. If both the block to the right and to the left are in use, the block to be released is marked as free, pushed onto the linked list of free blocks, and added to theshadow avail list.

After all the blocks have been released, dbflushreleasestack calls dbwriteshadowavaillist to write an updated copy of the in-memory shadow avail list to a block in the database.

Saving A Copy Of A Database

Saving a copy shares most of the code with the normal save routine. Basically, all objects are written to the new database, possibly after they have been temporarily loaded from the old database if neccessary. The end result is a database without any free blocks and consequentially a completely empty avail list.

Closing A Database

When closing a database file, dbclose disposes the memory currently allocated for the release stack and flushes the database header out to disk.


Personal Tools