when nsupdate sends an SOA query to a resolver, if it fails
with REFUSED, nsupdate will now try the next server rather than
aborting the update completely.
(cherry picked from commit 2100331307)
This check intermittently failed:
I:serve-stale:check not in cache longttl.example times out...
I:serve-stale:failed
This corresponds to this query in the test:
$DIG -p ${PORT} +tries=1 +timeout=3 @10.53.0.3 longttl.example TXT
Looking at the dig output for a failed test, the query actually got a
response from the authoritative server (in one specific example the
query time was 2991 msec, close to 3 seconds).
After doing the query for the test, we enable the authoritative
server after a sleep of three seconds. If we bump this sleep to 4
seconds, the race will be more in favor of the query timing out,
making it unlikely that this test will fail intermittently.
Bump the subsequent wait_for_log checks also with one second.
(cherry picked from commit 05e73a24f0)
Add three more test cases that detect a configuration error if the
key-directory is inherited but has the same value for a zone in a
different view with a deviating DNSSEC policy.
(cherry picked from commit 84cfd95e95722191195cd4b09ce6f19960868597)
Previously, we would set the locale on a global level and that could
possibly lead to different behaviour in underlying functions. In this
commit, we change to code to use the system locale only when calling the
libidn2 functions and reset the locale back to "POSIX" when exiting the
libidn2 code.
(cherry picked from commit 0d35b3f1a9)
The built-in "_bind" view does not allow recursion and therefore does
not need a large cache database. However, as "max-cache-size" is not
explicitly set for that view in the default configuration, it inherits
that setting from global options. Set "max-cache-size" for the built-in
"_bind" view to a fixed value (2 MB, i.e. the smallest allowed value) to
prevent needlessly preallocating memory for its cache RBT hash table.
(cherry picked from commit 86698ded32)
Currently the implicit default for the "max-cache-size" option is "90%".
As this option is inherited by all configured views, using multiple
views can lead to memory exhaustion over time due to overcommitment.
The "max-cache-size 90%;" default also causes cache RBT hash tables to
be preallocated for every configured view, which does not really make
sense for views which do not allow recursion.
To limit this problem's potential for causing operational issues, use a
minimal-sized cache for views which do not allow recursion and do not
have "max-cache-size" explicitly set (either in global configuration or
in view configuration).
For configurations which include multiple views allowing recursion,
adjusting "max-cache-size" appropriately is still left to the operator.
(cherry picked from commit 86541b39d3)
The timeout originally picked for "rndc status" invocations (2 seconds)
in the test attempting to reproduce a deadlock caused by running
multiple "rndc addzone", "rndc modzone", and "rndc delzone" commands
concurrently causes intermittent failures of the "addzone" system test
in GitLab CI. Increase the timeout to 10 seconds to make such failures
less probable. Adjust code comments accordingly.
(cherry picked from commit ac4c58e8ce)
Add a set of system tests which check the contents of the AUTHORITY
section for signed, insecure delegation responses constructed from CNAME
records and wildcards, both for zones using NSEC and NSEC3.
(cherry picked from commit 26ec4b9a89)
Add a test case where 'named' is restarted and ensure that an already
signed zone does not change its NSEC3 parameters.
The test case first tests the current zone and saves the used salt
value. Then after restart it checks if the salt (and other parameters)
are the same as before the restart.
This test case changes 'set_nsec3param'. This will now reset the salt
value, and when checking for NSEC3PARAM we will store the salt and
use it when testing the NXDOMAIN response. This does mean that for
every test case we now have to call 'set_nsec3param' explicitly (and
can not omit it because it is the same as the previous zone).
Finally, slightly changed some echo output to make debugging friendlier.
(cherry picked from commit 08a9e7add1)
Make sure an incoming IXFR containing an SOA record which is not placed
at the apex of the transferred zone does not result in a broken version
of the zone being served by named and/or a subsequent crash.
(cherry picked from commit 5547003a3d)
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing. With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
(cherry picked from commit 8a5c62de83)
Ensure that if prefetch is triggered as a result of a query
restart, it won't have the TRYSTALE_ONTIMEOUT flag set.
(cherry picked from commit 8c047feb3a)
Add a test case where a client request is received and the stale
timeout occurs, but it is not served stale data because there is no entry
in the cache, then is served an authoritative answer once the background
fetch completes. This ensures that a stale timeout only affects a
subsequent response if the client was answered.
(cherry picked from commit c64589bf46)
- send a query for an AAAA which will be resolved as a mapped A
- disable authoritative responses
- wait for the negative AAAA response to become stale
- send another query, wait for the stale answer
- re-enable authorative responses so that a real answer arrives
- currently, this triggers an assertion in query.c
(cherry picked from commit 453e905d7e)
In the shutdown system test multiple queries are sent to a resolver
instance, in the meantime we terminate the same resolver process for
which the queries were sent to, either via rndc stop or a SIGTERM
signal, that means the resolver may not be able to answer all those
queries, since it has initiated the shutdown process.
The dnspython library raises a dns.resolver.NoNameservers exception when
a resolver object fails to receive an answer from the specified list
of nameservers (resolver.nameservers list), we need to handle this
exception as this is something that may happen since we asked the
resolver to terminate, as a result it may not answer clients even if
an answer is available, as the operation will be canceled.
(cherry picked from commit b19cd2d83b)
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
(cherry picked from commit e31cc1eeb4)
We should also lock kasp when reading key files, because at the same
time the zone in another view may be updating the key file.
(cherry picked from commit 252a1ae0a1)
Also, add "set -e" to all shell scripts of the views test to exit when
any command fails or is unknown, e.g., this on OpenBSD:
tests.sh[174]: seq: not found
(cherry picked from commit a4b7eb7188)
The seq command is not defined in the POSIX standard and is missing on
OpenBSD. Given that the system test code is meant to be POSIX-compliant
replace it with a shell construct.
(cherry picked from commit a08487ec3d)
Add two tests to make sure named-checkconf catches key-directory issues
where a zone in multiple views uses the same directory but has
different dnssec-policies. One test sets the key-directory specifically,
the other inherits the default key-directory (NULL, aka the working
directory).
Also update the good.conf test to allow zones in different views
with the same key-directory if they use the same dnssec-policy.
Also allow zones in different views with different key-directories if
they use different dnssec-policies.
Also allow zones in different views with the same key-directories if
only one view uses a dnssec-policy (the other is set to "none").
Also allow zones in different views with the same key-directories if
no views uses a dnssec-policy (zone in both views has the dnssec-policy
set to "none").
(cherry picked from commit df1aecd5ff)
PyLint 2.8.2 reports the following suggestions for two Python scripts
used in the system test suite:
************* Module tests_rndc_deadlock
bin/tests/system/addzone/tests_rndc_deadlock.py:71:4: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
************* Module tests-shutdown
bin/tests/system/shutdown/tests-shutdown.py:68:4: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
bin/tests/system/shutdown/tests-shutdown.py:157:8: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
Implement the above suggestions by using
concurrent.futures.ThreadPoolExecutor() and subprocess.Popen() as
context managers.
(cherry picked from commit a8163551ed)
FreeBSD 13.0 replaced GNU grep with BSD grep and removed support for
"redundant escapes for most ordinary characters" from regex(3) library,
therefore the matching sequence in digdelv/tests.sh needs to be
rewritten otherwise it fails with:
grep: trailing backslash (\)
- memory tracing failed if the driver didn't have access
to the isc_mem_debugging variable.
- remove RTLD_DEEPBIND from dlopen() flags as it causes
shared libraries to be unable to access thread-local storage,
which is needed when enqueuing tasks.
this rolls up numerous changes that have been applied to the
main branch, including moving isc_task operations into the
netmgr event loops, and other general stabilization.
This commit adds POSIX nanosleep() and usleep() shim implementation for
Windows to help implementors use less #ifdef _WIN32 in the code.
(cherry picked from commit c37ff5d188)
When looking for key files, we could use isdigit rather than checking
if the character is within the range [0-9].
Use (unsigned char) cast to ensure the value is representable in the
unsigned char type (as suggested by the isdigit manpage).
Change " & 0xff" occurrences to the recommended (unsigned char) type
cast.
(cherry picked from commit 1998ad6c776a9c17c27788b17765dee90d9e25df)
Just like with dynamic and/or inline-signing zones, check if no two
or more zone configurations set the same filename. In these cases,
the zone files are not read-only and named-checkconf should catch
a configuration where multiple zone statements write to the same file.
Add some bad configuration tests where KASP zones reference the same
zone file.
Update the good-kasp test to allow for two zones configure the same
file name, dnssec-policy none.
(cherry picked from commit 0b5fc0afcfd1a0bb7c1f16b63872b7ee26fb2777)
Make sure no DNSSEC contents are added to the zonefile if dnssec-policy
is set to "none" (and no .state files exist for the zone).
(cherry picked from commit 5246c16f43e6fda7587193a4dd801951cf87db14)
Add a test for default.kasp that if we remove the private key file,
no successor key is created for it. We need to update the kasp script
to deal with a missing private key. If this is the case, skip checks
for private key files.
Add a test with a zone for which the private key of the ZSK is missing.
Add a test with a zone for which the private key of the KSK is missing.
(cherry picked from commit 4a8ad0a77f)
The rndc command 'dnssec -status' only considered keys from
'dns_dnssec_findmatchingkeys' which only includes keys with accessible
private keys. Change it so that offline keys are also listed in the
status.
(cherry picked from commit b3a5859a9b)
When the feature was backported, we should have leave it disabled by
default, it turns out the default `100%` is producing some unexpected
results (under investigation), so for the time being, we are going to to
disable the max-ixfr-ratio.
When the `named` would hang on startup it would be killed with SIGKILL
leaving us with no information about the state the process was in.
This commit changes the start.pl script to send SIGABRT instead, so we
can properly collect and process the coredump from the hung named
process.
(cherry picked from commit 861a236937)
The kasp system test performs for each zone a couple of checks to make
sure the zone is signed correctly. To avoid test failures caused by
timing issues, there is first a check to ensure the zone is done
signing, 'wait_for_done_signing'. This function waits with the DNSSEC
checks until a "zone_rekey done" log message is seen for a specific
key.
Unfortunately this is not sufficient to avoid test failures due to
timing issues, because there is a small amount of time in between this
log message and the newly signed zone actually being served.
Therefore, in 'check_apex', retry for three seconds the DNSKEY query
check. After that, additional checks should pass without retries,
because at that point we know for sure the zone has been resigned with
the expected keys.
Also reduce the number of redundant 'check_signatures'
(cherry picked from commit 572f421df4)
The nsupdate system test did not record failures from the
'update_test.pl' Perl script. This was because the 'ret' value was
not being saved outside the '{ $PERL ... || ret=1 } cat_i' scope.
Change this piece to store the output in a separate file and then
cat its contents. Now the 'ret' value is being saved.
Also record failures in 'update_test.pl' if sending the update
failed.
Add missing 'n' incrementals to 'nsupdate/test.sh' to keep track of
test numbers.
(cherry picked from commit 5b31811b5f)