DB(3) 1991 DB(3)
NAME
btree_open, hash_open, recno_open - database access meth-
ods
SYNOPSIS
#include <sys/types.h>
#include <db.h>
DB *
btree_open(const char *file, int flags, int mode,
const BTREEINFO * openinfo);
DB *
hash_open(const char *file, int flags, int mode,
const HASHINFO * openinfo);
DB *
recno_open(const char *file, int flags, int mode,
const RECNOINFO * openinfo);
DESCRIPTION
Btree_open, hash_open, and recno_open are access method
interfaces to database files in btree, hashed, and flat-
file formats, respectively. The btree format is a repre-
sentation of a sorted, balanced tree structure. The
hashed format is an extensible, dynamic hashing scheme.
The flat-file format is a UNIX file with fixed or variable
length lines. These formats are described in more detail
below.
Access to all file types is based on key/data pairs.
Each routine opens file for reading and/or writing.
Databases never intended to be preserved on disk may be
created by setting the file parameter to NULL. The flags
and mode arguments are as specified to the open(2) rou-
tine, however, only the O_CREAT, O_EXCL, O_RDONLY, O_RDWR,
O_TRUNC and O_WRONLY flags are meaningful. The argument
openinfo is a pointer to an access method specific struc-
ture described below.
The open routines return a pointer to a DB structure on
success and NULL on error. The DB structure contains at
least the following fields:
typedef struct {
int (*close)(const DB *db);
int (*sync)(const DB *db);
int (*del)(const DB *db, const DBT *key, u_int flags);
int (*get)(const DB *db, DBT *key, DBT *data, u_int flags);
int (*put)(const DB *db, const DBT *key, const DBT *data,
u_int flags);
int (*seq)(const DB *db, DBT *key, DBT *data, u_int flags);
int type;
2, April 1
DB(3) 1991 DB(3)
void *openinfo;
} DB;
The elements of this structure consist of a pointer to an
access method specific structure and a set of routines
which perform various functions. All of these routines
take a pointer to a structure as returned by one of the
open routines, one or more pointers to key/data struc-
tures, and, optionally, a flag value.
openinfo
A pointer to an internal structure specific to the
access method.
type The type of the underlying access method; either
DB_BTREE, DB_HASH or DB_RECNO.
close A pointer to a routine to flush any cached informa-
tion to disk, free any allocated resources, and
close the database file. Since key/data pairs may
be cached in memory, failing to close the file with
a close routine may result in inconsistent or lost
information. Close routines return -1 on error
(setting errno) and 0 on success.
del A pointer to a routine to remove key/data pairs
from the database. Delete routines return -1 on
error (setting errno), 0 on success, and 1 if the
specified key was not in the file.
get A pointer to a routine which is the interface for
keyed retrieval from the database. The address and
length of the data associated with the specified
key are returned in the structure referenced by
data. Get routines return -1 on error (setting
errno), 0 on success, and 1 if the key was not in
the file.
put A pointer to a routine to store key/data pairs in
the database.
The parameter flag must be set to one of the fol-
lowing values:
R_IAFTER
Append the data immediately after the data
referenced by key, creating a new key/data
pair. (This implies that the access method
is able to create new keys, i.e. the keys
are ordered and independent, for example,
record numbers. Applicable only to the
RECNO access method.)
2, April 2
DB(3) 1991 DB(3)
R_IBEFORE
Insert the data immediately before the data
referenced by key, creating a new key/data
pair. (This implies that the access method
is able to create new keys, i.e. the keys
are ordered and independent, for example,
record numbers. Applicable only to the
RECNO access method.)
R_NOOVERWRITE
Enter the new key/data pair only if the key
does not previously exist.
R_PUT Enter the new key/data pair and replace any
previously existing key.
Put routines return -1 on error (setting errno), 0
on success, and 1 if the R_NOOVERWRITE flag was set
and the key already exists in the file.
seq A pointer to a routine which is the interface for
sequential retrieval from the database. The
address and length of the key are returned in the
structure referenced by key, and the address and
length of the data are returned in the structure
referenced by data.
Sequential key/data pair retrieval may begin at any
time, and the position of the ``cursor'' is not
affected by calls to the del, get, put, or sync
routines. Modifications to the database during a
sequential scan will be reflected in the scan, i.e.
records inserted behind the cursor will not be
returned while records inserted in front of the
cursor will be returned.
The flag value must be set to one of the following
values:
R_CURSOR
The data associated with the specified key
is returned. This differs from the get rou-
tines in that it sets the ``cursor'' to the
location of the key as well. (This implies
that the access method has a implicit order
which does not change. Applicable only to
the BTREE and RECNO access methods.)
R_FIRST
The first key/data pair of the database is
returned.
R_LAST The last key/data pair of the database is
returned. (This implies that the access
2, April 3
DB(3) 1991 DB(3)
method has a implicit order which does not
change. Applicable only to the BTREE and
RECNO access methods.)
R_NEXT Retrieve the key/data pair immediately after
the key/data pair most recently retrieved
using the seq routine. The cursor is moved
to the returned key/data pair. If flag is
set to R_NEXT the first time the seq routine
is called, the first key/data pair of the
database is returned.
R_PREV Retrieve the key/data pair immediately
before the key/data pair most recently
retrieved using the seq routine. The cursor
is moved to the returned key/data pair. If
flag is set to R_PREV the first time the seq
routine is called, the last key/data pair of
the database is returned. (This implies
that the access method has a implicit order
which does not change. Applicable only to
the BTREE and RECNO access methods.)
Seq routines return -1 on error (setting errno), 0
on success, 1 if there are no more key/data pairs
available. If the RECNO access method is being
used, and if the database file is a character spe-
cial file and no complete key/data pairs are cur-
rently available, the seq routines return 2.
sync A pointer to a routine to flush any cached informa-
tion to disk. If the database is in memory only,
the sync routine has no effect and will always suc-
ceed. Sync routines return -1 on error (setting
errno) and 0 on success.
KEY/DATA PAIRS
Access to all file types is based on key/data pairs. Both
keys and data are represented by the following data struc-
ture:
typedef struct {
void *data;
size_t size;
} DBT;
The elements of the DBT structure are defined as follows:
data A pointer to a byte string.
size The length of the byte string.
Key/data strings must fit into available memory.
2, April 4
DB(3) 1991 DB(3)
BTREE
One of the access methods is a btree: a sorted, balanced
tree structure with associated key/data pairs.
The access method specific data structure provided to
btree_open is as follows:
typedef struct {
u_long flags;
u_int psize;
u_int cachesize;
int (*compare)(const void *, const void *);
int lorder;
} BTREEINFO;
The elements of this structure are defined as follows:
flags The flag value is specified by or'ing any of the
following values:
R_DUP On insertion, if the key to be inserted
already exists, permit insertion anyway.
This flag permits duplicate keys in the
tree. By default, duplicates are not per-
mitted, and attempts to insert them will
fail. Note, the order of retrieval of
key/data pairs with duplicate keys is unde-
fined.
cachesize
A suggested maximum size, in bytes, of the memory
cache. Setting this value to zero specifies that
an appropriate amount of memory should be used.
Since every search examines the root page of the
tree, caching the most recently used pages substan-
tially improves access time. In addition, physical
writes are delayed as long as possible, so a moder-
ate cache can reduce the number of I/O operations
significantly. Obviously, using a cache increases
the likelihood of corruption or lost data if the
system crashes while a tree is being modified.
However, caching 10 pages decreases the creation
time of a large tree by between two and three
orders of magnitude.
compare
Compare is a user defined comparison function. It
must return an integer less than, equal to, or
greater than zero if the first argument is consid-
ered to be respectively less than, equal to, or
greater than the second. The same comparison func-
tion must be used on a given tree every time it is
opened. If no comparison function is specified,
strcmp(3) is used.
2, April 5
DB(3) 1991 DB(3)
lorder The byte order for 4-byte integers in the stored
database metadata. The number should represent the
order as an integer; for example, big endian order
would be the number 4,321. If lorder is 0 (no
order is specified) the current host order is used.
If the file already exists, the specified value is
ignored and the value specified when the tree was
created is used. (Obviously, portability of the
data forming the key/data pairs is the concern of
the application program.)
psize Page size is the size in bytes of the pages used
for nodes in the tree. If the file already
exists, the specified value is ignored and the
value specified when the tree was created is used.
If psize is zero, an appropriate page size is cho-
sen (based on the system memory and/or file system
constraints), but will never be less than 512
bytes.
If the pointer to the openinfo data structure is NULL, the
btree_open routine will use appropriate values.
If the database file already exists, and the O_TRUNC flag
is not specified to btree_open, the parameter psize
ignored.
Key structures may reference byte strings of slightly less
than one-half the tree's page size only (see psize). Data
structures may reference byte strings of essentially
unlimited length.
Searches, insertions, and deletions in a btree will all
complete in O lg N.
Forward sequential scans of a tree are from the least key
to the greatest.
Space freed up by deleting key/data pairs from a btree is
never reclaimed, although it is normally made available
for reuse. The exception to this is that space occupied
by large data items (those greater than one quarter the
size of a page) is neither reclaimed nor reused. This
means that the btree storage structure is grow-only. The
only solutions are to avoid excessive deletions, or to
create a fresh tree periodically from a scan of an exist-
ing one.
HASH
One of the access methods is hashed access and storage.
The access method specific data structure provided to
hash_open is as follows:
typedef struct {
2, April 6
DB(3) 1991 DB(3)
u_long (*hash)(const void *, const size_t);
u_int cachesize;
int bsize;
int ffactor;
int lorder;
int nelem;
} HASHINFO;
The elements of this structure are defined as follows:
bsize Bsize defines the hash table bucket size, and is,
by default, 256 bytes. It may be preferable to
increase the page size for disk-resident tables and
tables with large data items.
cachesize
A suggested maximum size, in bytes, of the memory
cache. Setting this value to zero specifies that
an appropriate amount of memory should be used.
ffactor
Ffactor indicates a desired density within the hash
table. It is an approximation of the number of
keys allowed to accumulate in any one bucket,
determining when the hash table grows or shrinks.
The default value is 8.
hash Hash is a user defined hash function. Since no
hash function performs equally well on all possible
data, the user may find that the built-in hash
function does poorly on a particular data set.
User specified hash functions must take two argu-
ments (a pointer to a byte string and a length) and
return an u_long to be used as the hash value.
lorder The byte order for 4-byte integers in the stored
database metadata. The number should represent the
order as an integer; for example, big endian order
would be the number 4,321. If lorder is 0 (no
order is specified) the current host order is used.
If the file already exists, the specified value is
ignored and the value specified when the tree was
created is used. (Obviously, portability of the
data forming the key/data pairs is the concern of
the application program.)
nelem Nelem is an estimate of the final size of the hash
table. If not set, the default value is 1. If not
set or set too low, hash tables will expand grace-
fully as keys are entered, although a slight per-
formance degradation may be noticed.
If the pointer to the openinfo data structure is NULL, the
hash_open routine will use appropriate values.
2, April 7
DB(3) 1991 DB(3)
If the hash table already exists, and the O_TRUNC flag is
not specified to hash_open, the parameters bsize, ffactor,
and nelem are ignored.
If a hash function is specified, hash_open will attempt to
determine if the hash function specified is the same as
the one with which the database was created, and will fail
if it is not.
Both key and data structures may reference byte strings of
essentially unlimited length.
Backward compatible interfaces to the routines described
in dbm(3), hsearch(3), and ndbm(3) are provided, however,
these interfaces are not compatible with previous file
formats.
RECNO
One of the access methods is either variable or fixed-
length records, the former delimited by a specific byte
value. The access method specific data structure provided
to recno_open is as follows:
typedef struct {
u_long flags;
u_int cachesize;
size_t reclen;
u_char bval;
} RECNOINFO;
The elements of this structure are defined as follows:
flags The flag value is specified by or'ing any of the
following values:
R_FIXEDLEN
The records are fixed-length, not byte
delimited. The structure element reclen
specifies the length of the record, and the
structure element bval is used as the pad
character.
R_SNAPSHOT
This flag requires that a snapshot of the
file be taken when recno_open is called,
instead of permitting any unmodified records
to be read from the original file.
cachesize
A suggested maximum size, in bytes, of the memory
cache. Setting this value to zero specifies that
an appropriate amount of memory should be used.
reclen The length of a fixed-length record.
2, April 8
DB(3) 1991 DB(3)
bval The delimiting byte to be used to mark the end of a
record for variable-length records, and the pad
character for fixed-length records.
Variable-length and fixed-length data files require key
structures to reference the following structure:
typedef struct {
u_long length;
u_long number;
u_long offset;
u_char valid;
} RECNOKEY;
The elements of this structure are defined as follows:
length The length of the record.
number The record number.
offset The offset in the file at which the record is
located.
valid A flag value which indicates the validity of the
other fields in the structure. The flag value is
specified by or'ing one or more of the following
values:
R_LENGTH
The record length is valid.
R_NUMBER
The record number is valid.
R_OFFSET
The byte offset is valid.
If the record retrieval is successful, the record number,
byte offset and record length are set in the RECNOKEY
structure referenced by the caller's key structure.
Data structures may reference byte strings of essentially
unlimited length.
ERRORS
The open routines may fail and set errno for any of the
errors specified for the library routines open(2) and mal-
loc(3) or the following:
[EFTYPE]
A file used by one of the open routines is incor-
rectly formatted.
2, April 9
DB(3) 1991 DB(3)
[EINVAL]
A parameter has been specified (hash function, pad
byte etc.) that is incompatible with the current
file specification or there is a mismatch between
the version number of file and the software.
The close routines may fail and set errno for any of the
errors specified for the library routines close(2),
read(2), write(2), free(3), or fsync(2).
The del, get, put and seq routines may fail and set errno
for any of the errors specified for the library routines
read(2), write(2), free(3) or malloc(3).
The sync routines may fail and set errno for any of the
errors specified for the library routine fsync(2).
SEE ALSO
Dynamic Hash Tables, Per-Ake Larson, Communications of the
ACM, April 1988.
A New Hash Package for UNIX, Margo Seltzer, USENIX Pro-
ceedings, Winter 1991.
BUGS
The typedef DBT is a mnemonic for ``data base thang'', and
was used because noone could think of a reasonable name
that wasn't already used.
None of the access methods provide any form of concurrent
access, locking, or transactions.
Only big and little endian byte order is supported.
2, April 10