The previous commit failed some tests because we expect that if a
fetch fails and we have stale candidates in cache, the
stale-refresh-time window is started. This means that if we hit a stale
entry in cache and answering stale data is allowed, we don't bother
resolving it again for as long we are within the stale-refresh-time
window.
This is useful for two reasons:
- If we failed to fetch the RRset that we are looking for, we are not
hammering the authoritative servers.
- Successor clients don't need to wait for stale-answer-client-timeout
to get their DNS response, only the first one to query will take
the latency penalty.
The latter is not useful when stale-answer-client-timeout is 0 though.
So this exception code only to make sure we don't try to refresh the
RRset again if it failed to do so recently.
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7. It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
When the TCP test is run on the busy server, the server might take a
while to wind the server down because it might still be processing all
that 300k invalid XFR requests.
Increate the rncd wait time to 120 seconds, the SIGTERM time to 300
seconds, and reduce the time to wait for ans servers from 1200 second
to just 120 seconds.
Because the dns_zonemgr_create() was run before the loopmgr was started,
the isc_ratelimiter API was more complicated that it had to be. Move
the dns_zonemgr_create() to run_server() task which is run on the main
loop, and simplify the isc_ratelimiter API implementation.
The isc_timer is now created in the isc_ratelimiter_create() and
starting the timer is now separate async task as is destroying the timer
in case it's not launched from the loop it was created on. The
ratelimiter tick now doesn't have to create and destroy timer logic and
just stops the timer when there's no more work to do.
This should also solve all the races that were causing the
isc_ratelimiter to be left dangling because the timer was stopped before
the last reference would be detached.
Add several test cases in the 'upforwd' system test to make sure
that different scenarios of Dynamic DNS update forwarding are
tested, in particular when both the original and forwarded requests
are over Do53, or DoT, or they use different transports.
There's a known memory leak in the engine_pkcs11 at the time of writing
this and it interferes with the named ability to check for memory leaks
in the OpenSSL memory context by default.
Add an autoconf option to explicitly enable the memory leak detection,
and use it in the CI except for pkcs11 enabled builds. When this gets
fixed in the engine_pkc11, the option can be enabled by default.
As we can't check the deallocations done in the library memory contexts
by default because it would always fail on non-clean exit (that happens
on error or by calling exit() early), we just want to enable the checks
to be done on normal exit.
The libxml2 library provides a way to replace the default allocator with
user supplied allocator (malloc, realloc, strdup and free).
Create a memory context specifically for libxml2 to allow tracking the
memory usage that has originated from within libxml2. This will provide
a separate memory context for libxml2 to track the allocations and when
shutting down the application it will check that all libxml2 allocations
were returned to the allocator.
Additionally, move the xmlInitParser() and xmlCleanupParser() calls from
bin/named/main.c to library constructor/destructor in libisc library.
The HMACs and GSSAPI are just using unallocated values.
Moving them around shouldn't cause issues.
Only the dnssec system test knew the internal number in use for hmacmd5.
Switch the primary to require 'next_key' for zone transfers then
update the catalog zone to say to use 'next_key'. Next update the
zones contents then check that those changes are seen on the
secondary.
Add a simple test PKI based on the existing one in the doth test.
Check ephemeral, forward-secrecy, and forward-secrecy-mutual-tls
TLS configurations with different scenarios.
The comments in CA.cfg file serve as a good tutorial for setting up
a simple PKI for a system test. There is a typo in one of the presented
commands, which results in openssl not exiting with an error message
instead of generating a certificate.
Fix the typo.
If an NS RRset at the parent side of a delegation point only contains
in-bailiwick NS records, at least one glue record should be included in
every referral response sent for such a delegation point or else clients
will need to send follow-up queries in order to determine name server
addresses. In certain edge cases (when the total size of a referral
response without glue records was just below to the UDP packet size
limit), named failed to adhere to that rule by sending non-truncated,
glueless referral responses.
Add tests attempting to trigger that bug in several different scenarios,
covering all possible combinations of the following factors:
- type of zone (signed, unsigned),
- glue record type (A, AAAA, both).
Bring the "glue" system test up to speed with other system tests: add
check numbering, ensure test artifacts are preserved upon failure,
improve error reporting, make the test fail upon unexpected errors,
address ShellCheck warnings.
Instead of expecting that telemetry check has already finished,
wait for it for maximum of three seconds, because named is run with
-tat=3, so the telemetry check must happen with 3 second window.
Co-authored-by: Evan Hunt <each@isc.org>
This change prepares ground for sending DNS requests using DoT,
which, in particular, will be used for forwarding dynamic updates
to TLS-enabled primaries.
The tcp Pytest on OpenBSD fairly reliably fails when receive_tcp()
on a socket is attempted:
> (response, rtime) = dns.query.receive_tcp(sock, timeout())
tests-tcp.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.9/site-packages/dns/query.py:659: in receive_tcp
ldata = _net_read(sock, 2, expiration)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sock = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>
count = 2, expiration = 1662719959.8106785
def _net_read(sock, count, expiration):
"""Read the specified number of bytes from sock. Keep trying until we
either get the desired amount, or we hit EOF.
A Timeout exception will be raised if the operation is not completed
by the expiration time.
"""
s = b''
while count > 0:
try:
> n = sock.recv(count)
E socket.timeout: timed out
This is because the socket is already closed.
Bump the socket connection timeout to 10 seconds.
sd_notify() may be called by a service to notify the service manager
about state changes. It can be used to send arbitrary information,
encoded in an environment-block-like string. Most importantly, it can be
used for start-up completion notification.
Add libsystemd check to autoconf script and when the library is detected
add calls to sd_notify() around the server->reload_status changes.
Co-authored-by: Petr Špaček <pspacek@isc.org>
If there was a collision of key id across algorithms it was not
possible to determine where counter applies to which algorithm for
xml statistics while for json only one of the values was emitted.
The key names are now "<algorithm-number>+<id>" (e.g. "8+54274").
the 'resolve' binary was added for testing dns_client as part of
the export library. the export libraries are no longer supported,
and tests using 'delv' provide the same coverage, so 'resolve' can
be removed now.
dns_request_create() was a front-end to dns_request_createvia() that
was only used by test binaries. dns_request_createvia() has been
renamed to dns_request_create(), and the test programs that formerly
used dns_request_create() have been updated to use the new parameters.
The test expected `xn--ah-` to be treated as a syntax error (punycode
requires letters after the last hyphen) but libidn2 on buster
converted the label to `ah` instead. To avoid this bug, change the
invalid label to `xn--0000h` which translates to an out-of-range
unicode codepoint (beyond the maximum value) which is corectly
trated as invalid in older libidn2.
The change to `testsock.pl` in commit 258a896a broke the system
tests in out-of-tree builds because `ifconfig.sh.in` is not
copied to the worktree. Use `ifconfig.sh` instead.
In rndc_recvdone(), if 'sends' was not 0, then 'recvs' was not
decremented, in which case isc_loopmgr_shutdown() was never reached,
which could cause a hang. (This has not been observed to happen, but
the code was incorrect on examination.)
Reduce the number of places that know about the number of IP addresses
required by the system tests, by changing `testsock.pl` to read the
`max` from `ifconfig.sh.in`. This should make the test runner fail
early with a clear message when the interfaces have been set up by an
obsolete script.
Add comments to cross-reference `ifconfig.sh.in`, `testsock.pl`, and
`org.isc.bind.system` to make it easier to remember what needs
updating when an IP address is added.
If there are any problems with IDN processing, DiG will now quietly
handle the name as if IDN were disabled. This means that international
query names are rendered verbatim on the wire, and ACE names are
printed raw without conversion to UTF8.
If you want to check the syntax of international domain names,
use the `idn2` utility.
There was a ubsan error reporting an invalid value for interface_auto
(a boolean value cannot be 190) because it was not initialized. To
avoid this problem happening again, ensure the whole of the server
structure is initialized to zero before setting the (relatively few)
non-zero elements.
It is possible to bypass Response Rate Limiting (RRL)
`responses-per-second` limitation using specially crafted wildcard
names, because the current implementation, when encountering a found
DNS name generated from a wildcard record, just strips the leftmost
label of the name before making a key for the bucket.
While that technique helps with limiting random requests like
<random>.example.com (because all those requests will be accounted
as belonging to a bucket constructed from "example.com" name), it does
not help with random names like subdomain.<random>.example.com.
The best solution would have been to strip not just the leftmost
label, but as many labels as necessary until reaching the suffix part
of the wildcard record from which the found name is generated, however,
we do not have that information readily available in the context of RRL
processing code.
Fix the issue by interpreting all valid wildcard domain names as
the zone's origin name concatenated to the "*" name, so they all will
be put into the same bucket.
The zone 'retransfer3.' tests whether zones that 'rndc signing
-nsec3param' requests are queued even if the zone is not loaded.
The test assumes that if 'rndc signing -list' shows that the zone is
done signing with two keys, and there are no NSEC3 chains pending, the
zone is done handling the '-nsec3param' queued requests. However, it
is possible that the 'rndc signing -list' command is received before
the corresponding privatetype records are added to the zone (the records
that are used to retrieve the signing status with 'rndc signing').
This is what happens in test failure
https://gitlab.isc.org/isc-projects/bind9/-/jobs/2722752.
The 'rndc signing -list retransfer3' is thus an unreliable check.
It is simpler to just remove the check and wait for a certain amount
of time and check whether ns3 has re-signed the zone using NSEC3.