Interested in improving this site? Please check the To Do page.

Database Structure

Basics

The kernel representation of a database table (hashtable) is a struct of type “tyhashtable”, the table elements (hashnodes) are structs of type “tyhashnode”. [see lang.h for the definitions]

The “tyhashnode” struct contains:

  • two fields of type hdlhashnode for linking to another hashnode (details below)
  • a field named “val” of type tyvaluerecord for storing the node's value
  • several status flags
  • a field named “hashkey” of variable length containing the name of the node as a pascal string

The “tyhashtable” struct contains:

  • a field named “hashbucket” that is an array of eleven hdlhashnodes that point to the heads of linked lists of hashnodes
  • a field named “hfirstsort” of type hdlhashnode that points to the head of a linked list of hashnodes representing the current table sortorder (details below)
  • a field named “prevhashtable” of type hdlhashnode that points to the previous hashtable in the stack of local tables created by evaluatelist
  • a field named “parenthashtable” of type hdlhashtable for linking to the table's parent table
  • a field named “thistableshashnode” of type hdlhashnode for linking to the hashnode of this table's parent table that contains this table
  • two fields named “timecreated” and “timelastsave” containing the creation and modification dates
  • a field named “tmpstack” that is an array of tyvaluerecords whose current size is given by the “cttmpstack” field (used for keeping track of temporary values created during execution of UserTalk code)
  • several status flags and other fields that I'm not going to cover here

Locating Table Elements By Name

When you add a new item to a table, Frontier first applies a hash function [hashfunction in langhash.c] to the item name. This hash function takes the first and last character of the item name, converts them to lowercase, and returns the sum of the two characters' ASCII values modulo 11. The result n is an integer between 0 and 10 inclusive.

Next, Frontier creates a new hashnode and sets its value. The new node is then made the head element of the n th hashbucket (a linked list of hashnodes, see definition above). See hashtableassign [in langhash.c] for a high level function that also handles assignment over an existing table item with the same name.

When a table element is to be located by name, Frontier again applies the hashfunction to the given name. The result determines which of the eleven hashbuckets contains the node we are looking for. Frontier then chains thru the linked list of hashnodes in that hashbucket until a case-insensitive string comparision between a hashnode's hashkey and the name we're looking for succeeds. For details, see hashtablelookup [in langhash.c] or, for a function that also throws a scripterror if the item can't be found, see langhashtablelookup [in langvalue.c].

The aim of this hashfunction business is to reduce the average number of string comparisons that need to be made to locate a table item. The hashfunction and the number of hashbuckets (currently eleven) should be chosen to minimize the average number of comparisons. This is the case if the hashnodes are distributed evenly between the hashbuckets. The worst case is for all hashnodes to end up in the same hashbucket, which is what you get when the first and last character of all table items are identical.

Locating Table Elements By Index

When you add a new item to a table, Frontier will also add it to a linked list that contains all hashnodes in the table. The “hfirstsort” field of the “tyhashtable” struct points to the first hashnode in that list, the “sortedlink” field of the “tyhashnode” structs always points to the next node in the list. The order of hashnodes in the list reflects the current sort order of the table, so if a new item is added to the table, the hashnode has to be linked into the proper position in that sorted list.

When you ask for the nth item in a table, Frontier has to loop thru the linked list of hashnodes until it gets to the nth item. See hashgetnthnode [in langhash.c].

When a script calls table.sort, it basically asks the kernel to rebuild this linked list from scratch according to the specified sort order. See hashquicksort [in langhash.c].

Tables and UserTalk

As mentioned in Compiling and Executing UserTalk Scripts, tables are used to keep track of local variables and local handlers (“modules” in kernel-speak) when executing UserTalk code.

These local tables are created by evaluatelist [in langevaluate.c] if the list of statements to be evaluated contains a local() statement or a local handler or if it is the body of a function definition. A handle to the most recently created local table is stored in the global currenthashtable variable, or, put another way, currenthashtable defines the most local scope. If a new local table is created, it becomes the new currenthashtable and the “prevhashtable” field of the “tyhashtable” struct is set to the previous value of currenthashtable.

When a variable name needs to be resolved during script execution, we start in currenthashtable. If the name isn't found in that table, we follow currenthashtable's “prevhashtable” link and look in that table, etc. until we arrive at the outermost local table. If the name still hasn't been found, we then proceed to check the tables listed in system.paths, and finally the top-levels of all open databases. See langsearchpathlookup [in langvalue.c].

Addresses

There are two kernel representations of addresses understood by the kernel. The first is usually referred to as an unresolved address and basically consists of the full address as a pascal string.

Addresses are resolved as needed at runtime, see hashresolvevalue [in langhash.c]. Resolved addresses consist of the handle pointing to the table that contains the referenced value and a string containing the name of the referenced hashnode in that table. (Note that resolving an address requires that the containing table already exist while a hashnode with the given name doesn't actually have to exist in that table yet.)

In-Memory Values vs. Disk Values

When a table is loaded from disk into memory, not all values in the table are loaded into memory. Scalar values, including strings and binaries as long as they are smaller than 1kB, are part of the table's block in the database file and they are loaded into memory right away. External values (subtables, wptexts, outlines, scripts) as well as strings and binaries with a size of 1kB or larger are stored in their own blocks in the database file. These objects are only loaded into memory when needed.

I believe the reason why tables currently can't be unloaded from memory is that if we did, any existing resolved addresses to items in the table would become invalid. The handle pointing to the hash table would become stale and you would crash if you attempted to access it. There does not appear to be a reliably way for the kernel to tell whether a handle to a hashtable is still valid without risking a crash.


Personal Tools