There were several problems with rbt hashtable implementation:
1. Our internal hashing function returns uint64_t value, but it was
silently truncated to unsigned int in dns_name_hash() and
dns_name_fullhash() functions. As the SipHash 2-4 higher bits are
more random, we need to use the upper half of the return value.
2. The hashtable implementation in rbt.c was using modulo to pick the
slot number for the hash table. This has several problems because
modulo is: a) slow, b) oblivious to patterns in the input data. This
could lead to very uneven distribution of the hashed data in the
hashtable. Combined with the single-linked lists we use, it could
really hog-down the lookup and removal of the nodes from the rbt
tree[a]. The Fibonacci Hashing is much better fit for the hashtable
function here. For longer description, read "Fibonacci Hashing: The
Optimization that the World Forgot"[b] or just look at the Linux
kernel. Also this will make Diego very happy :).
3. The hashtable would rehash every time the number of nodes in the rbt
tree would exceed 3 * (hashtable size). The overcommit will make the
uneven distribution in the hashtable even worse, but the main problem
lies in the rehashing - every time the database grows beyond the
limit, each subsequent rehashing will be much slower. The mitigation
here is letting the rbt know how big the cache can grown and
pre-allocate the hashtable to be big enough to actually never need to
rehash. This will consume more memory at the start, but since the
size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
will only consume maximum of 32GB of memory for hashtable in the
worst case (and max-cache-size would need to be set to more than
4TB). Calling the dns_db_adjusthashsize() will also cap the maximum
size of the hashtable to the pre-computed number of bits, so it won't
try to consume more gigabytes of memory than available for the
database.
FIXME: What is the average size of the rbt node that gets hashed? I
chose the pagesize (4k) as initial value to precompute the size of
the hashtable, but the value is based on feeling and not any real
data.
For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.
Notes:
a. A doubly linked list should be used here to speedup the removal of
the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
(cherry picked from commit e24bc324b4)
To use the Dynamic DB sample driver, run named and check the log.
$ cd testing
$ named -gc named.conf
You should be able to see something like:
zone test/IN: loaded serial 0
zone arpa/IN: loaded serial 0
This means that the sample driver created empty zones "test." and
"arpa." as defined by "arg" parameters in named.conf.
$ dig @localhost test.
should work as usual and you should be able to see the dummy zone with
NS record pointing to the zone apex and A record with 127.0.0.1:
;; ANSWER SECTION:
test. 86400 IN A 127.0.0.1
test. 86400 IN NS test.
test. 86400 IN SOA test. test. 0 28800 7200 604800 86400
This driver creates two empty zones and allows query/transfer/update to
all IP addresses for demonstration purposes.
The driver wraps the RBT database implementation used natively by BIND,
and modifies the addrdataset() and substractrdataset() functions to do
additional work during dynamic updates.
A dynamic update modifies the target zone as usual. After that, the
driver detects whether the modified RR was of type A or AAAA, and if so,
attempts to appropriately generate or delete a matching PTR record in
one of the two zones managed by the driver.
E.g.:
$ nsupdate
> update add a.test. 300 IN A 192.0.2.1
> send
will add the A record
a.test. 300 IN A 192.0.2.1
and also automatically generate the PTR record
1.2.0.192.in-addr.arpa. 300 IN PTR a.test.
AXFR and RR deletion via dynamic updates should work as usual. Deletion
of a type A or AAAA record should delete the corresponding PTR record
too.
The zone is stored only in memory, and all changes will be lost on
reload/reconfig.
Hints for code readers:
- Driver initialization starts in driver.c: dyndb_init() function.
- New database implementation is registered by calling dns_db_register()
and passing a function pointer to it. This sample uses the function
create_db() to initialize the database.
- Zones are created later in instance.c: load_sample_instance_zones().
- Database entry points are in structure db.c: dns_dbmethods_t
sampledb_methods
- sampledb_methods points to an implementation of the database interface.
See the db.c: addrdataset() implementation and look at how the RBT
database instance is wrapped into an additional layer of logic.