After backporting #1870 to 9.11-S I saw that the condition check there
is different than in the main branch. In 9.11-S "stale" can mean
stale and serve-stale, or not active (awaiting cleanup). In 9.16 and
later versions, "stale" is stale and serve-stale, and "ancient" means
not active (awaiting cleanup). An "ancient" RRset is one that is not
active (TTL expired) and is not eligble for serve-stale.
Update the condition for rndc dumpdb -expired to closer match what is
in 9.11-S.
(cherry picked from commit 5614454c3b)
By changing the check in 'rdatasetiter_first' and 'rdatasetiter_next'
from "now > header->rdh_ttl" to "now - RBDTB_VIRTUAL > header->rdh_ttl"
we include expired rdataset entries so that they can be used for
"rndc dumpdb -expired".
(cherry picked from commit 17d5bd4493)
The rbtdb version glue_table has been refactored similarly to rbt.c hash
table, so it does use 32-bit hash function return values and apply
Fibonacci Hashing to lookup the index to the hash table instead of
modulo. For more details, see the lib/dns/rbt.c commit log.
(cherry picked from commit 01684cc219)
When a received RRSet has TTL 0, they would be preserved for
serve-stale (default `max-stale-cache` is 12 hours) rather than expiring
them quickly from the cache database.
This commit makes sure the RRSet didn't have TTL 0 before marking the
entry in the database as "stale".
(cherry picked from commit 6ffa2ddae0)
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range. Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.
(cherry picked from commit bde5c7632a)
There were several problems with rbt hashtable implementation:
1. Our internal hashing function returns uint64_t value, but it was
silently truncated to unsigned int in dns_name_hash() and
dns_name_fullhash() functions. As the SipHash 2-4 higher bits are
more random, we need to use the upper half of the return value.
2. The hashtable implementation in rbt.c was using modulo to pick the
slot number for the hash table. This has several problems because
modulo is: a) slow, b) oblivious to patterns in the input data. This
could lead to very uneven distribution of the hashed data in the
hashtable. Combined with the single-linked lists we use, it could
really hog-down the lookup and removal of the nodes from the rbt
tree[a]. The Fibonacci Hashing is much better fit for the hashtable
function here. For longer description, read "Fibonacci Hashing: The
Optimization that the World Forgot"[b] or just look at the Linux
kernel. Also this will make Diego very happy :).
3. The hashtable would rehash every time the number of nodes in the rbt
tree would exceed 3 * (hashtable size). The overcommit will make the
uneven distribution in the hashtable even worse, but the main problem
lies in the rehashing - every time the database grows beyond the
limit, each subsequent rehashing will be much slower. The mitigation
here is letting the rbt know how big the cache can grown and
pre-allocate the hashtable to be big enough to actually never need to
rehash. This will consume more memory at the start, but since the
size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
will only consume maximum of 32GB of memory for hashtable in the
worst case (and max-cache-size would need to be set to more than
4TB). Calling the dns_db_adjusthashsize() will also cap the maximum
size of the hashtable to the pre-computed number of bits, so it won't
try to consume more gigabytes of memory than available for the
database.
FIXME: What is the average size of the rbt node that gets hashed? I
chose the pagesize (4k) as initial value to precompute the size of
the hashtable, but the value is based on feeling and not any real
data.
For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.
Notes:
a. A doubly linked list should be used here to speedup the removal of
the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
(cherry picked from commit e24bc324b4)
The ThreadSanitizer found a data race when updating the stale header.
Instead of trying to acquire the write lock and failing occasionally
which would skew the statistics, the dns_rdatasetheader_t.attributes
field has been promoted to use stdatomics. Updating the attributes in
the mark_header_ancient() and mark_header_stale() now uses the cmpxchg
to update the attributes forfeiting the need to hold the write lock on
the tree. Please note that mark_header_ancient() still needs to hold
the lock because .dirty is being updated in the same go.
(cherry picked from commit 81d4230e60)
RBTDB node can now appear on the deadnodes lists following the changes
to decrement_reference in 176b23b6cd to
defer checking of node->down when the tree write lock is not held. The
node should be unlinked instead.
(cherry picked from commit 569cc155b8680d8ed12db1fabbe20947db24a0f9)
in addition to being more efficient, this prevents a possible crash by
looking up the node name before the tree sructure can be changed when
cleaning up dead nodes in addrdataset().
(cherry picked from commit db9d10e3c1)
Save 'i' to 'locknum' and use that rather than using
'header->node->locknum' when performing the deferred
unlock as 'header->node->locknum' can theoretically be
different to 'i'.
(cherry picked from commit 8dd8d48c9f)
"max-journal-size" is set by default to twice the size of the zone
database. however, the calculation of zone database size was flawed.
- change the size calculations in dns_db_getsize() to more accurately
represent the space needed for a journal file or *XFR message to
contain the data in the database. previously we returned the sizes
of all rdataslabs, including header overhead and offset tables,
which resulted in the database size being reported as much larger
than the equivalent journal transactions would have been.
- map files caused a particular problem here: the full name can't be
determined from the node while a file is being deserialized, because
the uppernode pointers aren't set yet. so we store "full name length"
in the dns_rbtnode structure while serializing, and clear it after
deserialization is complete.
Updating LRU requires write-locking the node, which causes contention.
Update LRU only if time difference is large enough.
(cherry picked from commit fe584c01cc)
Start enforcing the clang-format rules on changed files
Closes#46
See merge request isc-projects/bind9!3063
(cherry picked from commit a04cdde45d)
d2b5853b Start enforcing the clang-format rules on changed files
618947c6 Switch AlwaysBreakAfterReturnType from TopLevelDefinitions to All
654927c8 Add separate .clang-format files for headers
5777c44a Reformat using the new rules
60d29f69 Don't enforce copyrights on .clang-format
adjust clang-format options to get closer to ISC style
See merge request isc-projects/bind9!3061
(cherry picked from commit d3b49b6675)
0255a974 revise .clang-format and add a C formatting script in util
e851ed0b apply the modified style
Add curly braces using uncrustify and then reformat with clang-format back
Closes#46
See merge request isc-projects/bind9!3057
(cherry picked from commit 67b68e06ad)
36c6105e Use coccinelle to add braces to nested single line statement
d14bb713 Add copy of run-clang-tidy that can fixup the filepaths
056e133c Use clang-tidy to add curly braces around one-line statements
Reformat source code with clang-format
Closes#46
See merge request isc-projects/bind9!2156
(cherry picked from commit 7099e79a9b)
4c3b063e Import Linux kernel .clang-format with small modifications
f50b1e06 Use clang-format to reformat the source files
11341c76 Update the definition files for Windows
df6c1f76 Remove tkey_test (which is no-op anyway)
6412 cleanup:
6413 dns_rdataset_disassociate(&neg);
6414 dns_rdataset_disassociate(&negsig);
CID 1452700 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking closest suggests that it
may be null, but it has already been dereferenced on all
paths leading to the check.
6415 if (closest != NULL)
6416 free_noqname(mctx, &closest);
6367cleanup:
6368 dns_rdataset_disassociate(&neg);
6369 dns_rdataset_disassociate(&negsig);
CID 1452704 (#1 of 1): Dereference before null check
(REVERSE_INULL) check_after_deref: Null-checking noqname
suggests that it may be null, but it has already been
dereferenced on all paths leading to the check.
6370 if (noqname != NULL)
6371 free_noqname(mctx, &noqname);
This commit simplifies the cachedb rrset statistics in two ways:
- Introduce new rdtypecounter arithmetics, allowing bitwise
operations.
- Remove the special DLV statistic counter.
New rdtypecounter arithmetics
-----------------------------
"The rdtypecounter arithmetics is a brain twister". Replace the
enum counters with some defines. A rdtypecounter is now 8 bits for
RRtypes and 3 bits for flags:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | S |NX| RRType |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
If the 8 bits for RRtype are all zero, this is an Other RRtype.
Bit 7 is the NXRRSET (NX) flag and indicates whether this is a
positive (0) or a negative (1) RRset.
Then bit 5 and 6 mostly tell you if this counter is for an active,
stale, or ancient RRtype:
S = 0x00 means Active
S = 0x01 means Stale
S = 0x10 means Ancient
Since a counter cannot be stale and ancient at the same time, we
treat S = 0x11 as a special case to deal with NXDOMAIN counters.
S = 0x11 indicates an NXDOMAIN counter and in this case the RRtype
field signals the expiry of this cached item:
RRType = 0 means Active
RRType = 1 means Stale
RRType = 2 means Ancient
The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_buffer_allocate() now returns void.
This commits removes superfluous checks when using the isc_refcount API.
Examples of superfluous checks:
1. The isc_refcount_decrement function ensures there was not underflow,
so this check is superfluous:
INSIST(isc_refcount_decrement(&r) > 0);
2 .The isc_refcount_destroy() includes check whether the counter
is zero, therefore this is superfluous:
INSIST(isc_refcount_decrement(&r) == 1 && isc_refcount_destroy(&r));
In decrement_reference only test node->down if the tree lock
is held. As node->down is not always tested in
decrement_reference we need to test that it is non NULL in
cleanup_dead_nodes prior to removing the node from the rbt
tree. Additionally it is not always possible to aquire the
node lock and reactivate a node when adding parent nodes.
Reactivate such nodes in cleanup_dead_nodes if required.
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument. This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.