There were cases in resolver.c when the `max-recursion-queries` quota was ineffective. It was possible to craft zones that would cause a resolver to waste resources by sending excessive queries while attempting to resolve a name. This has been addressed by correcting errors in the implementation of `max-recursion-queries`, and by reducing the default value from 100 to 32.
In addition, a new `max-query-restarts` option has been added which limits the number of times a recursive server will follow CNAME or DNAME records before terminating resolution. This was previously a hard-coded limit of 16, and now defaults to 11.
Closes#4741
Backport of MR !9281
Merge branch 'backport-4741-reclimit-restarts-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9282
max-query-restarts and max-recursion-queries values can now be set
on the command line in delv for testing purposes.
(cherry picked from commit 0d010ddebe)
implement, document, and test the 'max-query-restarts' option
which specifies the query restart limit - the number of times
we can follow CNAMEs before terminating resolution.
(cherry picked from commit 104f3b82fb)
MAX_RESTARTS is no longer hard-coded; ns_server_setmaxrestarts()
and dns_client_setmaxrestarts() can now be used to modify the
max-restarts value at runtime. in both cases, the default is 11.
(cherry picked from commit c5588babaf)
the number of steps that can be followed in a CNAME chain
before terminating the lookup has been reduced from 16 to 11.
(this is a hard-coded value, but will be made configurable later.)
(cherry picked from commit 05d78671bb)
previously, validator queries for DNSKEY and DS records were
not counted toward the quota for max-recursion-queries; they
are now.
(cherry picked from commit af7db89513)
there were cases in resolver.c when queries for NS records were
started without passing a pointer to the parent fetch's query counter;
as a result, the max-recursion-queries quota for those queries started
counting from zero, instead of sharing the limit for the parent fetch,
making the quota ineffective in some cases.
(cherry picked from commit d3b7e92783)
The new Fedora 40 TSAN images use libuv, urcu and OpenSSL libraries compiled with ThreadSanitizer. This (in theory) should enable better detection of memory races in those (most important) libraries.
Backport of MR !9264
Merge branch 'backport-ondrej/test-new-tsan-images-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9276
The TSAN-enabled libraries are installed to /usr/local, pass the
PKG_CONFIG_PATH and few other options to CFLAGS to the configure
arguments.
(cherry picked from commit ed766efc15)
When the SSL object was destroyed, it would invalidate all SSL_SESSION
objects including the cached, but not yet used, TLS session objects.
Properly disassociate the SSL object from the SSL_SESSION before we
store it in the TLS session cache, so we can later destroy it without
invalidating the cached TLS sessions.
Closes#4834
Backport of MR !9271
Merge branch 'backport-4834-detach-SSL-from-cached-SSL_SESSION-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9274
When the SSL object was destroyed, it would invalidate all SSL_SESSION
objects including the cached, but not yet used, TLS session objects.
Properly disassociate the SSL object from the SSL_SESSION before we
store it in the TLS session cache, so we can later destroy it without
invalidating the cached TLS sessions.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
Co-authored-by: Aram Sargsyan <aram@isc.org>
(cherry picked from commit c11b736e44)
When TLS connection (TLSstream) connection was accepted, the children
listening socket was not attached to sock->server and thus it could have
been freed before all the accepted connections were actually closed.
In turn, this would cause us to call isc_tls_free() too soon - causing
cascade errors in pending SSL_read_ex() in the accepted connections.
Properly attach and detach the children listening socket when accepting
and closing the server connections.
Closes#4833
Backport of MR !9270
Merge branch 'backport-4833-tlssock-needs-to-attach-to-child-tlslistener-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9273
When TLS connection (TLSstream) connection was accepted, the children
listening socket was not attached to sock->server and thus it could have
been freed before all the accepted connections were actually closed.
In turn, this would cause us to call isc_tls_free() too soon - causing
cascade errors in pending SSL_read_ex() in the accepted connections.
Properly attach and detach the children listening socket when accepting
and closing the server connections.
(cherry picked from commit 684f3eb8e6)
Missing file util/dtrace.sh prevented builds on system without dtrace utility.
This has been corrected.
Fixes: #4835
Backport of MR !9262
Merge branch 'backport-pspacek/gitattribute-fixes-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9272
Ensure that system tests can be executed without Python hypothesis
package.
Closes#4831
Backport of MR !9265
Merge branch 'backport-4831-isctest-make-hypothesis-optional-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9267
Query responses should contain the question section with some exceptions. Dig was not reporting this.
Closes#4808
Backport of MR !9233
Merge branch 'backport-4808-have-dig-report-missing-question-section-in-axfr-response-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9269
The question section should be present in the first AXFR/IXFR
response and in other QUERY responses unless no question was sent.
Issue a warning if the question section is not present.
(cherry picked from commit 327e890910)
Ensure that the selected algorithms remains stable throughout the entire test session. Crypto support detection was rewritten to python and simplified.
Closes#4202Closes#4422
Related #3810
Backport of MR !8803
Merge branch 'backport-4202-algorithm-detection-pytest-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9201
When attempting to run the system tests using v9.20.0 code, the test
setup will fail with ERROR due to setup.sh calling conf.sh which
attempts to call get_algorithms.py script which was deleted in this MR.
This should be reverted once v9.20.1 with the updated code is released.
Ensure all the variables are initialized when running the main function
of isctest module. This enables proper environment variables during test
script development when only conf.sh is sourced, rather than the script
being executed by the pytest runner.
(cherry picked from commit d7ace928b5)
Run the crypto support checks when initializing the isctest package and
save those results in environment variable. This removes the need to
repeatedly check for crypto operation support, as it's not something
that would change at test runtime.
(cherry picked from commit 25cb39b7fc)
Instead of invoking get_algorithms.py script repeatedly (which may yield
different results), move the algorithm configuration to an isctest
module. This ensures the variables are consistent across the entire test
run.
(cherry picked from commit 8302db407c)
Fix an assertion failure that could happen as a result of data race between free_gluetable() and addglue() on the same headers.
Closes#4691
Backport of MR !9126
Merge branch 'backport-4691-fix-data-race-between-free_gluetable-and-addglue-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9256
When adding glue to the header, we add header to the wait-free stack to
be cleaned up later which sets wfc_node->next to non-NULL value. When
the actual cleaning happens we would only cleanup the .glue_list, but
since the database isn't locked for the time being, the headers could be
reused while cleaning the existing glue entries, which creates a data
race between database versions.
Revert the code back to use per-database-version hashtable where keys
are the node pointers. This allows each database version to have
independent glue cache table that doesn't affect nodes or headers that
could already "belong" to the future database version.
(cherry picked from commit 5beae5faf9)
when searching the cache for a node so that we can delete an rdataset, it isn't necessary to set the 'create' flag. if the
node doesn't exist yet, we won't be able to delete anything from it anyway.
Backport of MR !9158
Merge branch 'backport-each-minor-findnode-refactor-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9253
when searching the cache for a node so that we can delete an
rdataset, it is not necessary to set the 'create' flag. if the
node doesn't exist yet, we then we won't be able to delete
anything from it anyway.
(cherry picked from commit 6b720bfe1a)
When a priming query is complete, it's currently logged at level ISC_LOG_DEBUG(1), regardless of success or failure. We are now raising it to ISC_LOG_NOTICE in the case of failure. [GL #3516]
Closes#3516
Backport of MR !9121
Merge branch 'backport-3516-log-priming-errors-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9250
when a priming query is complete, it's currently logged at
level ISC_LOG_DEBUG(1), regardless of success or failure. we
are now raising it to ISC_LOG_NOTICE in the case of failure.
(cherry picked from commit a84d54c6ff)
The previous work in this area was led by the belief that we might be
calling call_rcu() from within call_rcu() callbacks. After carefully
checking all the current callback, it became evident that this is not
the case and the problem isn't enough rcu_barrier() calls, but something
entirely else.
Call the rcu_barrier() just once as that's enough and the multiple
rcu_barrier() calls will not hide the real problem anymore, so we can
find it.
Backport of MR !9134
Merge branch 'backport-4607-call-a-single-rcu_barrier-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9247
The previous work in this area was led by the belief that we might be
calling call_rcu() from within call_rcu() callbacks. After carefully
checking all the current callback, it became evident that this is not
the case and the problem isn't enough rcu_barrier() calls, but something
entirely else.
Call the rcu_barrier() just once as that's enough and the multiple
rcu_barrier() calls will not hide the real problem anymore, so we can
find it.
(cherry picked from commit 13941c8ca7)
Checking the version of `named-checkconf` would end with assertion failure. This has been fixed.
Closes#4827
Backport of MR !9243
Merge branch 'backport-4827-cleanup-dst-only-if-initialized-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9246
The dst_lib_destroy() should be called only if dst_lib_init() was called
before. In named-checkconf, that is guarded by dst_cleanup variable
that was erroneously set to true by default. Set the dst_cleanup to
'false' by default.
(cherry picked from commit c54880e3fa)
An assertion failure triggers when the TSIG has valid cryptographic signature, but the time is invalid. This can happen when the times between the primary and secondary servers are not synchronised.
Closes#4811
Backport of MR !9234
Merge branch 'backport-4811-fix-isc_buffer_putuint48-buffer-size-requirement-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9245
Add a system test that sets TSIG fudge to 0, waits three seconds and
then sends signed message to the server. This tests the path where the
time difference between the client and the server is outside of the TSIG
fudge value.
(cherry picked from commit 8def0c3b12)
The tsig unit test was only testing if everything went ok, but it was
not testing whether the error paths work. Add two more unit tests - one
uses the time outside of the TSIG skew, and the second trashes the
signature with random data.
(cherry picked from commit 3835d75f00)
When putting the 48-bit number into a fixed-size buffer that's exactly 6
bytes, the assertion failure would occur as the 48-bit number is
internally represented as 64-bit number and the code was checking if
there is enough space for `sizeof(val)`. This causes assertion failure
when otherwise valid TSIG signature has a bad timing information.
Specify the size of the argument explicitly, so the 48-bit number
doesn't require 8-byte long buffer.
(cherry picked from commit 37dbd57c16)
When automatic-interface-scan is disabled, the route socket was still being opened. Add new API to connect / disconnect from the route socket only as needed.
Additionally, move the block that disables periodic interface rescans to a place where it actually have access to the configuration values. Previously, the values were being checked before the configuration was loaded.
Backport of https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/9122
Merge branch '4757-dont-open-routing-socket-if-not-needed-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9239
When automatic-interface-scan is disabled, the route socket was still
being opened. Add new API to connect / disconnect from the route socket
only as needed.
Additionally, move the block that disables periodic interface rescans to
a place where it actually have access to the configuration values.
Previously, the values were being checked before the configuration was
loaded.
(cherry picked from commit b26079fdaf)
The fcount_incr() was incorrectly skipping the accounting for the fetches-per-zone if the force argument was set to true. We want to skip the accounting only when the fetches-per-zone is completely disabled, but for individual names we need to do the accounting even if we are forcing the result to be success.
Backport of https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/9115
Merge branch 'backport-4786-forced-fcount_incr-should-still-increment-count-and-allowed-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9241
The fcount_incr() was incorrectly skipping the accounting for the
fetches-per-zone if the force argument was set to true. We want to skip
the accounting only when the fetches-per-zone is completely disabled,
but for individual names we need to do the accounting even if we are
forcing the result to be success.
(cherry picked from commit a513d4c07f)
The PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_ERRORCHECK_NP are usually not defines, but enum values, so simple preprocessor check doesn't work.
Check for PTHREAD_MUTEX_ADAPTIVE_NP from the autoconf AS_COMPILE_IFELSE block and define HAVE_PTHREAD_MUTEX_ADAPTIVE_NP. This should enable adaptive mutex on Linux and FreeBSD.
As PTHREAD_MUTEX_ERRORCHECK actually comes from POSIX and Linux glibc does define it when compatibility macros are being set, we can just use PTHREAD_MUTEX_ERRORCHECK instead of PTHREAD_MUTEX_ERRORCHECK_NP.
Backport of https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/9111
Merge branch 'backport-ondrej/fix-adaptive-mutex-use-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!9240
The PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_ERRORCHECK_NP are
usually not defines, but enum values, so simple preprocessor check
doesn't work.
Check for PTHREAD_MUTEX_ADAPTIVE_NP from the autoconf AS_COMPILE_IFELSE
block and define HAVE_PTHREAD_MUTEX_ADAPTIVE_NP. This should enable
adaptive mutex on Linux and FreeBSD.
As PTHREAD_MUTEX_ERRORCHECK actually comes from POSIX and Linux glibc
does define it when compatibility macros are being set, we can just use
PTHREAD_MUTEX_ERRORCHECK instead of PTHREAD_MUTEX_ERRORCHECK_NP.
(cherry picked from commit cc4f99bc6d)