37815 Commits

Author SHA1 Message Date
Michal Nowak
8193c9b862 Update BIND version for release v9.18.16 2023-06-09 15:38:48 +02:00
Michal Nowak
7488850de1 Add a CHANGES marker 2023-06-09 15:38:05 +02:00
Michal Nowak
1264eb6da8 Merge branch 'mnowak/prepare-documentation-for-bind-9.18.16' into 'security-bind-9.18'
Prepare documentation for BIND 9.18.16

See merge request isc-private/bind9!543
2023-06-09 13:36:40 +00:00
Michał Kępień
7e2c9d2747 Tweak and reword release notes 2023-06-09 15:17:23 +02:00
Michal Nowak
ddfa545022 Prepare release notes for BIND 9.18.16 2023-06-09 15:17:23 +02:00
Michał Kępień
d3fa7e1930 Re-add a code comment to the "hooks" system test
Commit da74157440 removed a useful code
comment from the "hooks" system test.  Add it back to prevent confusion.
2023-06-09 15:17:23 +02:00
Michal Nowak
2d69829850 Merge branch '4089-confidential-stale-query-loop-bind-9.18' into 'security-bind-9.18'
[9.18][CVE-2023-2911] Fix stale-answer-client-timeout 0 crash

See merge request isc-private/bind9!523
2023-06-09 13:15:41 +00:00
Evan Hunt
10ac503a94 CHANGES and release notes for [GL #4089] 2023-06-09 14:58:22 +02:00
Matthijs Mekking
ff5bacf17c Fix serve-stale hang at shutdown
The 'refresh_rrset' variable is used to determine if we can detach from
the client. This can cause a hang on shutdown. To fix this, move setting
of the 'nodetach' variable up to where 'refresh_rrset' is set (in
query_lookup(), and thus not in ns_query_done()), and set it to false
when actually refreshing the RRset, so that when this lookup is
completed, the client will be detached.
2023-06-09 14:54:48 +02:00
Evan Hunt
240caa32b9 Stale answer lookups could loop when over recursion quota
When a query was aborted because of the recursion quota being exceeded,
but triggered a stale answer response and a stale data refresh query,
it could cause named to loop back where we are iterating and following
a delegation. Having no good answer in cache, we would fall back to
using serve-stale again, use the stale data, try to refresh the RRset,
and loop back again, without ever terminating until crashing due to
stack overflow.

This happens because in the functions 'query_notfound()' and
'query_delegation_recurse()', we check whether we can fall back to
serving stale data. We shouldn't do so if we are already refreshing
an RRset due to having prioritized stale data in cache.

In other words, we need to add an extra check to 'query_usestale()' to
disallow serving stale data if we are currently refreshing a stale
RRset.

As an additional mitigation to prevent looping, we now use the result
code ISC_R_ALREADYRUNNING rather than ISC_R_FAILURE when a recursion
loop is encountered, and we check for that condition in
'query_usestale()' as well.
2023-06-09 14:54:48 +02:00
Michal Nowak
e9af3d15d8 Merge branch '4055-improve-the-overmem-cache-cleaning-9.18' into 'security-bind-9.18'
[9.18] Improve RBT overmem cache cleaning

See merge request isc-private/bind9!527
2023-06-09 12:54:04 +00:00
Michal Nowak
ec72e11ee4 Set max-cache-size expectations for low values 2023-06-08 11:47:04 +02:00
Ondřej Surý
09fcd8f88a Add CHANGES and release note for [GL #4055] 2023-06-08 11:47:04 +02:00
Ondřej Surý
e9d5219fca Improve RBT overmem cache cleaning
When cache memory usage is over the configured cache size (overmem) and
we are cleaning unused entries, it might not be enough to clean just two
entries if the entries to be expired are smaller than the newly added
rdata.  This could be abused by an attacker to cause a remote Denial of
Service by possibly running out of the operating system memory.

Currently, the addrdataset() tries to do a single TTL-based cleaning
considering the serve-stale TTL and then optionally moves to overmem
cleaning if we are in that condition.  Then the overmem_purge() tries to
do another single TTL based cleaning from the TTL heap and then continue
with LRU-based cleaning up to 2 entries cleaned.

Squash the TTL-cleaning mechanism into single call from addrdataset(),
but ignore the serve-stale TTL if we are currently overmem.

Then instead of having a fixed number of entries to clean, pass the size
of newly added rdatasetheader to the overmem_purge() function and
cleanup at least the size of the newly added data.  This prevents the
cache going over the configured memory limit (`max-cache-size`).

Additionally, refactor the overmem_purge() function to reduce for-loop
nesting for readability.
2023-06-08 11:43:18 +02:00
Arаm Sаrgsyаn
36d019ffce Merge branch '4105-QryDropped-stats-counter-documentation-update-9.18' into 'bind-9.18'
[9.18] QryDropped stats counter documentation update

See merge request isc-projects/bind9!8011
2023-06-07 15:13:17 +00:00
Aram Sargsyan
dd2973996f QryDropped stats counter documentation update
Document which dropped queries are calculated by the QryDropped
statistics counter.

(cherry picked from commit 27c30fe8a4)
2023-06-07 14:01:46 +00:00
Arаm Sаrgsyаn
5206a06e11 Merge branch '4074-fix-stale-answer-client-timeout-with-clients-per-query-9.18' into 'bind-9.18'
[9.18] Fix a clients-per-query miscalculation bug

See merge request isc-projects/bind9!7997
2023-06-06 12:47:28 +00:00
Aram Sargsyan
545a3fe089 Add CHANGES and release notes for [GL #4074]
(cherry picked from commit 466a7d9b5f)
2023-06-06 12:46:18 +00:00
Aram Sargsyan
d91edda639 Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.

A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.

Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.

Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.

(cherry picked from commit 2ae5c4a674)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
f82aaedbdc Add clients-per-query checks for the fetchlimit system test
Check if clients-per-query quota works as expected with or without
a positive stale-answer-client-timeout value and serve-stale answers
enabled.

(cherry picked from commit 3bb2babcd0)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
71a27a2848 Light refactoring of the fetchlimit system test
Prepare the fetchlimit system test for adding a clients-per-query
check. Change some functions and commands to accept a destination
NS IP address instead of using the hardcoded 10.53.0.3.

(cherry picked from commit 7ebd055c78)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
17e09d8a10 Fix fetchlimit system test issues
1. Fix the numbering.
2. Fix an artifacts rewriting issue.
3. Add missing checks of 'ret' after some checks.
4. Fix extracting the quota value from the ADB dump.

(cherry picked from commit 101d829b02)
2023-06-06 12:45:00 +00:00
Ondřej Surý
449124c56d Merge branch '4038-resize-send-buffers-to-avoid-excessive-memory-allocation-9.18' into 'bind-9.18'
[9.18] Use appropriately sized send buffers for DNS messages over TCP

See merge request isc-projects/bind9!8005
2023-06-06 12:21:56 +00:00
Artem Boldariev
2c145b1862 Update CHANGES and release note [GL #4038]
Mention that memory usage was reduced by allocating properly sized
send buffers for stream-based transports.

(cherry picked from commit 8672d54847)
2023-06-06 14:04:01 +02:00
Artem Boldariev
285e75b3b0 Use appropriately sized send buffers for DNS messages over TCP
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.

(cherry picked from commit d8a5feb556)
2023-06-06 14:04:01 +02:00
Arаm Sаrgsyаn
e72c92c497 Merge branch '4106-lock-order-inversion-in-resolver.c' into 'bind-9.18'
[9.18] Fix a lock-order-inversion bug in resolver.c

See merge request isc-projects/bind9!8000
2023-06-06 11:56:01 +00:00
Aram Sargsyan
db45cab546 Fix a lock-order-inversion bug in resolver.c
There is a lock-order-inversion (potential deadlock) in resolver.c,
because in dns_resolver_shutdown() a resolver bucket lock is locked
while the resolver lock itself is already locked, while in
fctx_sendevents() the resolver lock is locked while a bucket lock
is locked before calling that function in fctx__done_detach().

The resolver lock/unlock in dns_resolver_shutdown() was added back in
the 317e36d47e commit to make sure that
the function is finished before the resolver object is destroyed.

Since res->exiting is atomic, it should be possible to remove the
resolver locking in dns_resolver_shutdown() and add it to the
send_shutdown_events() function which requires it.

Also, since 'res->exiting' is now set while unlocked, the 'INSIST'
in spillattimer_countdown() is wrong, and is removed.
2023-06-06 11:02:24 +00:00
Arаm Sаrgsyаn
ff3d25a47f Merge branch 'aram/statschannel-spilled-clients-counter-9.18' into 'bind-9.18'
[9.18] Add ClientQuota statistics channel counter

See merge request isc-projects/bind9!7993
2023-05-31 14:51:08 +00:00
Aram Sargsyan
9a3e00478f Add a CHANGES note for [GL !7978]
(cherry picked from commit fa9172d996)
2023-05-31 11:07:08 +00:00
Aram Sargsyan
b6eec9ee51 Update the documentation of the resolver statistics counters
The reference manual doesn't document all the available resolver
statistics counters. Add information about the missing counters.

(cherry picked from commit 08ebf39d1e)
2023-05-31 11:07:08 +00:00
Aram Sargsyan
cd47429365 Add ClientQuota statistics channel counter
This counter indicates the number of the resolver's spilled
queries due to reaching the clients per query quota.

(cherry picked from commit 04648d7c2f)
2023-05-31 11:07:08 +00:00
Michal Nowak
56ae462f21 Merge branch 'mnowak/alpine-3.18-9.18' into 'bind-9.18'
[9.18] Add Alpine Linux 3.18

See merge request isc-projects/bind9!7994
2023-05-31 10:16:09 +00:00
Michal Nowak
46e98810d7 Add Alpine Linux 3.18
(cherry picked from commit ddb846454d)
2023-05-31 12:03:52 +02:00
Michal Nowak
b751e2b4be Merge branch 'mnowak/look-for-core-files-in-TOP_BUILDDIR-9.18' into 'bind-9.18'
[9.18] Look for core files in $TOP_BUILDDIR

See merge request isc-projects/bind9!7986
2023-05-30 20:27:33 +00:00
Michal Nowak
2476d43acf Look for core files in $TOP_BUILDDIR
The get_core_dumps.sh script couldn't find and process core files of
out-of-tree configurations because it looked for them in the source
instead of the build directory.

(cherry picked from commit a13448a769)
2023-05-30 21:31:41 +02:00
Michal Nowak
99e910e4b9 Merge branch 'mnowak/custom-userspace-rcu-library-9.18' into 'bind-9.18'
[9.18] Change images for TSAN jobs

See merge request isc-projects/bind9!7987
2023-05-30 19:30:02 +00:00
Michal Nowak
44fff18b68 Change images for TSAN jobs
Fedora 38 and Debian "bullseye" images were "forked" to images used only
for TSAN CI jobs. The new images contain TSAN-aware liburcu that does
not fit well with ASAN CI jobs for which original images were also used.

liburcu is not used in this branch, but images are shared among
branches, and their use needs to be consistent in all maintained
branches.

(cherry picked from commit 04dda8661f)
2023-05-30 20:35:12 +02:00
Tom Krizek
4f3cfba6c0 Merge branch 'tkrizek-fix-pytest-base-port-9.18' into 'bind-9.18'
[9.18] Fix base_port calculation in pytest runner

See merge request isc-projects/bind9!7983
2023-05-30 15:37:37 +00:00
Tom Krizek
1b8f0711f2 Fix base_port calculation in pytest runner
The selected base port should be in the range <port_min, port_max), the
formula was incorrect.

Credit for discovering this fault goes to Ondrej Sury.

(cherry picked from commit e8ea6b610b)
2023-05-30 15:37:29 +02:00
Matthijs Mekking
076b8363fc Merge branch '3950-serve-stale-strikes-again-9.18' into 'bind-9.18'
[9.18] Fix serve-stale bug when cache has no data

See merge request isc-projects/bind9!7909
2023-05-30 13:15:13 +00:00
Matthijs Mekking
cbe0cddcd4 Add release note and changes for #3950
Fixing another serve-stale bug is still news.

(cherry picked from commit 23dbb6ba72)
2023-05-30 13:46:34 +02:00
Matthijs Mekking
b90bad93cb Fix serve-stale bug when cache has no data
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.

However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.

Add 'answer_found' checks to the serve-stale logic to fix this issue.

(cherry picked from commit bbd163acf6)
2023-05-30 13:46:00 +02:00
Matthijs Mekking
ad5d447348 Add serve-stale test case for GL #3950
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).

(cherry picked from commit c3d4fd3449)
2023-05-30 13:45:54 +02:00
Ondřej Surý
2a498d944a Merge branch '4098-remove-cruft-epoll-kqueue-configure-options-9.18' into 'bind-9.18'
[9.18] Remove obsolete epoll/kqueue/devpoll configure options

See merge request isc-projects/bind9!7975
2023-05-29 06:07:16 +00:00
Ondřej Surý
4fb2c9568d Add CHANGES note for [GL #4098]
(cherry picked from commit 0266760fdd)
2023-05-29 07:58:51 +02:00
Ondřej Surý
6b6076c882 Remove obsolete epoll/kqueue/devpoll configure options
Since we don't use networking directly but rather via libuv, these
configure options were no-op.  Remove the configure checks for epoll
(Linux), kqueue (BSDs) and /dev/poll (Solaris).

(cherry picked from commit 051f3d612f)
2023-05-29 07:58:03 +02:00
Mark Andrews
aca974dc29 Merge branch '4090-corrected-bad-insist-logic-in-isc_radix_remove-bind-9.18' into 'bind-9.18'
[9.18] Resolve "Corrected bad INSIST logic in isc_radix_remove()"

See merge request isc-projects/bind9!7974
2023-05-29 04:42:09 +00:00
Mark Andrews
eb52c30524 Add regression test for [GL # 4090]
These insertions are added to produce a radix tree that will trigger
the INSIST reported in [GL #4090].  Due to fixes added since BIND 9.9
an extra insert in needed to ensure node->parent is non NULL.

(cherry picked from commit 03ebe96110)
2023-05-29 13:27:51 +10:00
Mark Andrews
27eb8ed20f Move isc_mem_put to after node is checked for equality
isc_mem_put NULL's the pointer to the memory being freed.  The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.

(cherry picked from commit ac2e0bc3ff)
2023-05-29 13:27:51 +10:00
Evan Hunt
9a1d565f07 Merge branch '3905-deprecate-tkey-dhkey-v9_18' into 'bind-9.18'
mark 'tkey-dhkey' as deprecated

See merge request isc-projects/bind9!7956
2023-05-28 08:07:25 +00:00