Commit Graph

554 Commits

Author SHA1 Message Date
Ondřej Surý
5674f76590 Use the Fibonacci Hashing for the RBTDB glue table
The rbtdb version glue_table has been refactored similarly to rbt.c hash
table, so it does use 32-bit hash function return values and apply
Fibonacci Hashing to lookup the index to the hash table instead of
modulo.  For more details, see the lib/dns/rbt.c commit log.

(cherry picked from commit 01684cc219)
2020-08-26 21:49:59 +10:00
Mark Andrews
511747307f rbtversion->glue_table_size must be read when holding a lock
(cherry picked from commit 33d0e8d168)
2020-08-26 21:49:59 +10:00
Ondřej Surý
f9711481ad Expire the 0 TTL RRSet quickly rather using them for serve-stale
When a received RRSet has TTL 0, they would be preserved for
serve-stale (default `max-stale-cache` is 12 hours) rather than expiring
them quickly from the cache database.

This commit makes sure the RRSet didn't have TTL 0 before marking the
entry in the database as "stale".

(cherry picked from commit 6ffa2ddae0)
2020-08-05 09:09:16 +02:00
Mark Andrews
14fe6e77a7 Always check the return from isc_refcount_decrement.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range.  Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.

(cherry picked from commit bde5c7632a)
2020-07-31 12:54:47 +10:00
Ondřej Surý
aa72c31422 Fix the rbt hashtable and grow it when setting max-cache-size
There were several problems with rbt hashtable implementation:

1. Our internal hashing function returns uint64_t value, but it was
   silently truncated to unsigned int in dns_name_hash() and
   dns_name_fullhash() functions.  As the SipHash 2-4 higher bits are
   more random, we need to use the upper half of the return value.

2. The hashtable implementation in rbt.c was using modulo to pick the
   slot number for the hash table.  This has several problems because
   modulo is: a) slow, b) oblivious to patterns in the input data.  This
   could lead to very uneven distribution of the hashed data in the
   hashtable.  Combined with the single-linked lists we use, it could
   really hog-down the lookup and removal of the nodes from the rbt
   tree[a].  The Fibonacci Hashing is much better fit for the hashtable
   function here.  For longer description, read "Fibonacci Hashing: The
   Optimization that the World Forgot"[b] or just look at the Linux
   kernel.  Also this will make Diego very happy :).

3. The hashtable would rehash every time the number of nodes in the rbt
   tree would exceed 3 * (hashtable size).  The overcommit will make the
   uneven distribution in the hashtable even worse, but the main problem
   lies in the rehashing - every time the database grows beyond the
   limit, each subsequent rehashing will be much slower.  The mitigation
   here is letting the rbt know how big the cache can grown and
   pre-allocate the hashtable to be big enough to actually never need to
   rehash.  This will consume more memory at the start, but since the
   size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
   will only consume maximum of 32GB of memory for hashtable in the
   worst case (and max-cache-size would need to be set to more than
   4TB).  Calling the dns_db_adjusthashsize() will also cap the maximum
   size of the hashtable to the pre-computed number of bits, so it won't
   try to consume more gigabytes of memory than available for the
   database.

   FIXME: What is the average size of the rbt node that gets hashed?  I
   chose the pagesize (4k) as initial value to precompute the size of
   the hashtable, but the value is based on feeling and not any real
   data.

For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.

Notes:
a. A doubly linked list should be used here to speedup the removal of
   the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

(cherry picked from commit e24bc324b4)
2020-07-30 11:57:24 +02:00
Ondřej Surý
0279cc76a7 Update STALE and ANCIENT header attributes atomically
The ThreadSanitizer found a data race when updating the stale header.
Instead of trying to acquire the write lock and failing occasionally
which would skew the statistics, the dns_rdatasetheader_t.attributes
field has been promoted to use stdatomics.  Updating the attributes in
the mark_header_ancient() and mark_header_stale() now uses the cmpxchg
to update the attributes forfeiting the need to hold the write lock on
the tree.  Please note that mark_header_ancient() still needs to hold
the lock because .dirty is being updated in the same go.

(cherry picked from commit 81d4230e60)
2020-07-08 12:01:46 +10:00
Witold Kręcicki
000c7d1340 rbtdb: cleanup_dead_nodes should ignore alive nodes on the deadlist
(cherry picked from commit c8f2d55acf)
2020-07-01 15:35:21 +02:00
Mark Andrews
6964a21fa6 Remove INSIST from from new_reference
RBTDB node can now appear on the deadnodes lists following the changes
to decrement_reference in 176b23b6cd to
defer checking of node->down when the tree write lock is not held.  The
node should be unlinked instead.

(cherry picked from commit 569cc155b8680d8ed12db1fabbe20947db24a0f9)
2020-06-18 10:18:42 +02:00
Mark Andrews
69c43a03d0 Ensure tree lock is always held when dns_rbt_fullnamefromnode is called
(cherry picked from commit eded3efb79)
2020-05-29 15:02:09 -07:00
Evan Hunt
8b154d0f9f pass the nodename to add32() instead of calling dns_rbt_fullnamefromnode()
in addition to being more efficient, this prevents a possible crash by
looking up the node name before the tree sructure can be changed when
cleaning up dead nodes in addrdataset().

(cherry picked from commit db9d10e3c1)
2020-05-29 15:02:09 -07:00
Evan Hunt
1ccfadefe0 don't bother checking for empty nodes when searching the NSEC3 tree
this avoids a time-wasting search that could occur during an
IXFR that replaced an NSEC3 chain.

(cherry picked from commit 7192edf9c2)
2020-05-12 13:36:55 -07:00
Mark Andrews
cf6b2a6c18 Silence missing unlock from Coverity.
Save 'i' to 'locknum' and use that rather than using
'header->node->locknum' when performing the deferred
unlock as 'header->node->locknum' can theoretically be
different to 'i'.

(cherry picked from commit 8dd8d48c9f)
2020-03-13 13:17:46 +11:00
Evan Hunt
c5405c2700 improve calculation of database size
"max-journal-size" is set by default to twice the size of the zone
database. however, the calculation of zone database size was flawed.

- change the size calculations in dns_db_getsize() to more accurately
  represent the space needed for a journal file or *XFR message to
  contain the data in the database. previously we returned the sizes
  of all rdataslabs, including header overhead and offset tables,
  which resulted in the database size being reported as much larger
  than the equivalent journal transactions would have been.
- map files caused a particular problem here: the full name can't be
  determined from the node while a file is being deserialized, because
  the uppernode pointers aren't set yet. so we store "full name length"
  in the dns_rbtnode structure while serializing, and clear it after
  deserialization is complete.
2020-03-12 00:38:37 -07:00
Witold Kręcicki
1e1c3f2e7f Increase nodelock count for both cache and regular db.
(cherry picked from commit 0344684385)
2020-02-28 10:05:25 +01:00
Witold Kręcicki
1af2ca358e Don't update LRU if the node was recently used.
Updating LRU requires write-locking the node, which causes contention.
Update LRU only if time difference is large enough.

(cherry picked from commit fe584c01cc)
2020-02-28 10:05:25 +01:00
Mark Andrews
761f66e2e3 sort RRSIG(SOA) to be last of RRSIGs with a common re-resign time
(cherry picked from commit a24fd55836)
2020-02-28 10:05:44 +11:00
Evan Hunt
11a0d771f9 fix spelling errors reported by Fossies.
(cherry picked from commit ba0313e649)
2020-02-21 07:05:31 +00:00
Ondřej Surý
829b461c54 Merge branch '46-enforce-clang-format-rules' into 'master'
Start enforcing the clang-format rules on changed files

Closes #46

See merge request isc-projects/bind9!3063

(cherry picked from commit a04cdde45d)

d2b5853b Start enforcing the clang-format rules on changed files
618947c6 Switch AlwaysBreakAfterReturnType from TopLevelDefinitions to All
654927c8 Add separate .clang-format files for headers
5777c44a Reformat using the new rules
60d29f69 Don't enforce copyrights on .clang-format
2020-02-14 08:45:59 +00:00
Ondřej Surý
cdef20bb66 Merge branch 'each-style-tweak' into 'master'
adjust clang-format options to get closer to ISC style

See merge request isc-projects/bind9!3061

(cherry picked from commit d3b49b6675)

0255a974 revise .clang-format and add a C formatting script in util
e851ed0b apply the modified style
2020-02-14 05:35:29 +00:00
Ondřej Surý
2e55baddd8 Merge branch '46-add-curly-braces' into 'master'
Add curly braces using uncrustify and then reformat with clang-format back

Closes #46

See merge request isc-projects/bind9!3057

(cherry picked from commit 67b68e06ad)

36c6105e Use coccinelle to add braces to nested single line statement
d14bb713 Add copy of run-clang-tidy that can fixup the filepaths
056e133c Use clang-tidy to add curly braces around one-line statements
2020-02-13 21:28:35 +00:00
Ondřej Surý
c931d8e417 Merge branch '46-just-use-clang-format-to-reformat-sources' into 'master'
Reformat source code with clang-format

Closes #46

See merge request isc-projects/bind9!2156

(cherry picked from commit 7099e79a9b)

4c3b063e Import Linux kernel .clang-format with small modifications
f50b1e06 Use clang-format to reformat the source files
11341c76 Update the definition files for Windows
df6c1f76 Remove tkey_test (which is no-op anyway)
2020-02-12 14:51:18 +00:00
Mark Andrews
0312e73e16 'closest' must be non NULL, remove test.
6412 cleanup:
6413        dns_rdataset_disassociate(&neg);
6414        dns_rdataset_disassociate(&negsig);

	CID 1452700 (#1 of 1): Dereference before null check (REVERSE_INULL)
	check_after_deref: Null-checking closest suggests that it
	may be null, but it has already been dereferenced on all
	paths leading to the check.

6415        if (closest != NULL)
6416                free_noqname(mctx, &closest);
2020-02-05 18:37:17 +11:00
Mark Andrews
1b1a94ea6d 'noqname' must be non NULL, remove test.
6367cleanup:
6368        dns_rdataset_disassociate(&neg);
6369        dns_rdataset_disassociate(&negsig);

	CID 1452704 (#1 of 1): Dereference before null check
	(REVERSE_INULL) check_after_deref: Null-checking noqname
	suggests that it may be null, but it has already been
	dereferenced on all paths leading to the check.

6370        if (noqname != NULL)
6371                free_noqname(mctx, &noqname);
2020-02-05 18:37:17 +11:00
Matthijs Mekking
37b41ff693 Simplify cachedb rrset statistic counters
This commit simplifies the cachedb rrset statistics in two ways:
- Introduce new rdtypecounter arithmetics, allowing bitwise
  operations.
- Remove the special DLV statistic counter.

New rdtypecounter arithmetics
-----------------------------
"The rdtypecounter arithmetics is a brain twister".  Replace the
enum counters with some defines.  A rdtypecounter is now 8 bits for
RRtypes and 3 bits for flags:

      0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |  |  |  |  |  |  S  |NX|         RRType        |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

If the 8 bits for RRtype are all zero, this is an Other RRtype.

Bit 7 is the NXRRSET (NX) flag and indicates whether this is a
positive (0) or a negative (1) RRset.

Then bit 5 and 6 mostly tell you if this counter is for an active,
stale, or ancient RRtype:

    S = 0x00 means Active
    S = 0x01 means Stale
    S = 0x10 means Ancient

Since a counter cannot be stale and ancient at the same time, we
treat S = 0x11 as a special case to deal with NXDOMAIN counters.

S = 0x11 indicates an NXDOMAIN counter and in this case the RRtype
field signals the expiry of this cached item:

    RRType = 0 means Active
    RRType = 1 means Stale
    RRType = 2 means Ancient
2020-02-04 11:58:34 +01:00
Ondřej Surý
c73e5866c4 Refactor the isc_buffer_allocate() usage using the semantic patch
The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_buffer_allocate() now returns void.
2020-02-03 08:29:00 +01:00
Ondřej Surý
6afa99362a Remove duplicate INSIST checks for isc_refcount API
This commits removes superfluous checks when using the isc_refcount API.

Examples of superfluous checks:

1. The isc_refcount_decrement function ensures there was not underflow,
   so this check is superfluous:

    INSIST(isc_refcount_decrement(&r) > 0);

2 .The isc_refcount_destroy() includes check whether the counter
   is zero, therefore this is superfluous:

    INSIST(isc_refcount_decrement(&r) == 1 && isc_refcount_destroy(&r));
2020-01-14 13:12:13 +01:00
Ondřej Surý
7c3e342935 Use isc_refcount_increment0() where appropriate 2020-01-14 13:12:13 +01:00
Mark Andrews
fd52417f71 address deadlock introduced in cd2469d3cd 2019-12-10 12:08:57 +00:00
Mark Andrews
c6efc0e50f Add is_leaf and send_to_prune_tree.
Add is_leaf and send_to_prune_tree to make the logic easier
to understand in cleanup_dead_nodes and decrement_reference.
2019-12-09 17:43:54 +00:00
Mark Andrews
176b23b6cd Testing node->down requires the tree lock to be held.
In decrement_reference only test node->down if the tree lock
is held.  As node->down is not always tested in
decrement_reference we need to test that it is non NULL in
cleanup_dead_nodes prior to removing the node from the rbt
tree.  Additionally it is not always possible to aquire the
node lock and reactivate a node when adding parent nodes.
Reactivate such nodes in cleanup_dead_nodes if required.
2019-12-09 17:43:54 +00:00
Mark Andrews
cd2469d3cd r/w of rbtdb->current_version requires that rbtdb->lock be held 2019-12-02 23:48:41 +00:00
Ondřej Surý
edd97cddc1 Refactor dns_name_dup() usage using the semantic patch 2019-11-29 14:00:37 +01:00
Mark Andrews
637b2c4e51 rdataset_setownercase and rdataset_getownercase need to obtain a node lock 2019-11-28 13:37:56 +01:00
Mark Andrews
8f6aaa7230 add comments 'tree_lock(write) must be held' 2019-11-27 09:58:15 +00:00
Mark Andrews
7cad3b2e91 rbtnode->nsec needs to be read while holding the tree lock 2019-11-27 09:58:15 +00:00
Mark Andrews
7d4d64340e use update_recordsandbytes in rbt_datafixer 2019-11-20 00:17:51 +08:00
Mark Andrews
0cda448248 always obtain write lock when updating version->{records,bytes} 2019-11-20 00:17:51 +08:00
Mark Andrews
fcb6dbcdd7 make header->count atomic 2019-11-19 17:29:20 +11:00
Ondřej Surý
d508ce4036 lib/dns/rbtdb.c: Add DbC check to safely dereference rbtdb in rbt_datafixer() 2019-10-03 09:04:26 +02:00
Ondřej Surý
69ecc711ac Move the failure handling block closer to the only place where it could fail 2019-10-01 10:43:26 +10:00
Ondřej Surý
c2dad0dcb2 Replace RUNTIME_CHECK(dns_name_copy(..., NULL)) with dns_name_copynf()
Use the semantic patch from the previous commit to replace all the calls to
dns_name_copy() with NULL as third argument with dns_name_copynf().
2019-10-01 10:43:26 +10:00
Ondřej Surý
5efa29e03a The final round of adding RUNTIME_CHECK() around dns_name_copy() calls
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument.  This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
2019-10-01 10:43:26 +10:00
Ondřej Surý
89b269b0d2 Add RUNTIME_CHECK() around result = dns_name_copy(..., NULL) calls
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable.  As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.

Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
2019-10-01 10:43:26 +10:00
Ondřej Surý
35bd7e4da0 Add RUNTIME_CHECK() around plain dns_name_copy(..., NULL) calls using spatch
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.
2019-10-01 10:43:26 +10:00
Ondřej Surý
e307273307 Fix unprotected access to rbtnode in lib/dns/rbtdb.c:add32() 2019-09-25 12:29:13 +02:00
Ondřej Surý
50e109d659 isc_event_allocate() cannot fail, remove the fail handling blocks
isc_event_allocate() calls isc_mem_get() to allocate the event structure.  As
isc_mem_get() cannot fail softly (e.g. it never returns NULL), the
isc_event_allocate() cannot return NULL, hence we remove the (ret == NULL)
handling blocks using the semantic patch from the previous commit.
2019-08-30 08:55:34 +02:00
Evan Hunt
c48979e6c5 simplify dns_rbtnodechain_init() by removing unnecessary 'mctx' parameter 2019-08-29 10:03:36 -07:00
Matthijs Mekking
4c0b0fa6a5 Simplify do_stats logic in rbtdb.c 2019-08-12 10:16:08 +02:00
Matthijs Mekking
a3af2c57e7 Make rbtdb maintain stale counters
When updating the statistics for RRset types, if a header is marked
stale or ancient, the appropriate statistic counters are decremented,
then incremented.

Also fix some out of date comments.
2019-08-12 10:16:08 +02:00
Ondřej Surý
ae83801e2b Remove blocks checking whether isc_mem_get() failed using the coccinelle 2019-07-23 15:32:35 -04:00