Call 'dns_zone_rekey' after a 'rndc dnssec -checkds' or 'rndc dnssec
-rollover' command is received, because such a command may influence
the next key event. Updating the keys immediately avoids unnecessary
rollover delays.
The kasp system test no longer needs to call 'rndc loadkeys' after
a 'rndc dnssec -checkds' or 'rndc dnssec -rollover' command.
CDS/CDNSKEY DELETE records are only useful if they are signed,
otherwise the parent cannot verify these RRsets anyway. So once the DS
has been removed (and signaled to BIND), we can remove the DNSKEY and
RRSIG records, and at this point we can also remove the CDS/CDNSKEY
records.
Change the 'check_keys' function to try three times. Some intermittent
kasp test failures are because we are inspecting the key files
before the actual change has happen. The 'retry_quiet' approach allows
for a bit more time to let the write operation finish.
This MR introduces a new system test 'keymgr2kasp' to test
migration to 'dnssec-policy'. It moves some existing tests from
the 'kasp' system test to here.
Also a common script 'kasp.sh', to be used in kasp specific tests,
is introduced.
The 'keymgr_key_init()' function initializes key states if they have
not been set previously. It looks at the key timing metadata and
determines using the given times whether a state should be set to
RUMOURED or OMNIPRESENT.
However, the DNSKEY and ZRRSIG states were mixed up: When looking
at the Activate timing metadata we should set the ZRRSIG state, and
when looking at the Published timing metadata we should set the
DNSKEY state.
Add two test zones that migrate to dnssec-policy. Test if the key
states are set accordingly given the timing metadata.
The rumoured.kasp zone has its Publish/Active/SyncPublish times set
not too long ago so the key states should be set to RUMOURED. The
omnipresent.kasp zone has its Publish/Active/SyncPublish times set
long enough to set the key states to OMNIPRESENT.
Slightly change the init_migration_keys function to set the
key lifetime to "none" (legacy keys don't have lifetime). Then in the
test case set the expected key lifetime explicitly.
This commit is somewhat editorial as it does not introduce something
new nor fixes anything.
The layout in keymgr2kasp/tests.sh has been changed, with the
intention to make more clear where a test scenario ends and begins.
The publication time of some ZSKs has been changed. It makes a more
clear distinction between publication time and activation time.
The kasp system test was getting pretty large, and more tests are on
the way. Time to split up. Move tests that are related to migrating
to dnssec-policy to a separate directory 'keymgr2kasp'.
The named-checkzone tool can also be invoked as named-compilezone. Make
sure a man page is installed for that alias. Move and rename the
"man_named-checkzone" label to prevent a Sphinx duplicate label warning
from being raised (see commit 84862e96c1
for more information).
The named-nzd2nzf utility is only built and installed for LMDB-enabled
builds. Adjust the relevant Makefile.am file to make sure the
named-nzd2nzf.1 man page is also only built and installed for
LMDB-enabled builds.
The dnstap-read utility is only built and installed for dnstap-enabled
builds. Adjust the relevant Makefile.am file to make sure the
dnstap-read.1 man page is also only built and installed for
dnstap-enabled builds.
Issue #2575 was merged to 9.16 only as change 5603, but a placeholder
was not added to CHANGES in the main branch. This commit adds the
placeholder and renumbers the two subsequent changes.
Resolve "dig -u is extremely inaccurate, especially on machines with the kernel timer tick set at 100Hz"
Closes#2592
See merge request isc-projects/bind9!4826
The TIME_NOW macro calls isc_time_now which uses CLOCK_REALTIME_COARSE
for getting the current time. This is perfectly fine for millisecond,
however when the user request microsecond resolutiuon, they are going
to get very inaccurate results. This is especially true on a server
class machine where the clock ticks may be set to 100HZ.
This changes dig to use the new TIME_NOW_HIRES macro that uses the
CLOCK_MONOTONIC_RAW that is more expensive, but gets the *actual*
current time rather than the at the last kernel time tick.
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.
This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
In CMocka versions << 1.1.3, the skip() function would cause the whole
unit test to abort when CMOCKA_TEST_ABORT is set. As this is problem
only in Debian 9 Stretch and Ubuntu 16.04 Xenial, we just require the
CMocka >= 1.1.3 and disable the unit testing on Debian 9 Stretch until
we can pull the libcmocka-dev from stretch-backports and remove the
Ubuntu 16.04 Xenial from the CI as it is reaching End of Standard
Support at the end of April 2021.
When NETMGR_TRACE(_VERBOSE) is enabled, the build would fail on some
non-Linux non-glibc platforms because:
* Use <stdint.h> print macros because uint_fast32_t is not always
unsigned long
* The header <execinfo.h> is not available on non-glibc, thus commit
adds dummy backtrace() and backtrace_symbols_fd() functions for
platforms without HAVE_BACKTRACE
The netmgr unit tests were designed to push the system limits to maximum
by sending as many queries as possible in the busy loop from multiple
threads. This mostly works with UDP, but in the stateful protocol where
establishing the connection takes more time, it failed quite often in
the CI. On FreeBSD, this happened more often, because the socket() call
would fail spuriosly making the problem even worse.
This commit does several things to improve reliability:
* return value of isc_nm_<proto>connect() is always checked and retried
when scheduling the connection fails
* The busy while loop has been slowed down with usleep(1000); so the
netmgr threads could schedule the work and get executed.
* The isc_thread_yield() was replaced with usleep(1000); also to allow
the other threads to do any work.
* Instead of waiting on just one variable, we wait for multiple
variables to reach the final value
* We are wrapping the netmgr operations (connects, reads, writes,
accepts) with reference counting and waiting for all the callbacks to
be accounted for.
This has two effects:
a) the isc_nm_t is always clean of active sockets and handles when
destroyed, so it will prevent the spurious INSIST(references == 1)
from isc_nm_destroy()
b) the unit test now ensures that all the callbacks are always called
when they should be called, so any stuck test means that there was
a missing callback call and it is always a real bug
These changes allows us to remove the workaround that would not run
certain tests on systems without port load-balancing.
In tls_error(), we now call isc__nm_tlsdns_failed_read() instead of just
stopping timer and reading from the socket. This allows us to properly
cleanup any pending operation on the socket.
When shutting down, calling the isc__nm_failed_connect_cb() was delayed
until the connect callback would be called. It turned out that the
connect callback might not get called at all when the socket is being
shut down. Call the failed_connect_cb() directly in the
tlsdns_shutdown() instead of waiting for the connect callback to call it.
After a partial write the tls.senddata buffer would be rearranged to
contain only the data tha wasn't sent and the len part would be made
shorter, which would lead to attempt to free only part of a socket's
tls.senddata buffer.
The tlsdns_cycle() might call uv_write() to write data to the socket,
when this happens and the socket is shutdown before the callback
completes, the uvreq structure was not freed because the callback would
be called with non-zero status code.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.
In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again
The system tests were missing a test that would test tcp-initial-timeout
and tcp-idle-timeout.
This commit adds new "timeouts" system test that adds:
* Test that waits longer than tcp-initial-timeout and then checks
whether the socket was closed
* Test that sends and receives DNS message then waits longer than
tcp-initial-timeout but shorter time than tcp-idle-timeout than
sends DNS message again than waits longer than tcp-idle-timeout
and checks whether the socket was closed
* Similar test, but bursting 25 DNS messages than waiting longer than
tcp-initial-timeout and shorter than tcp-idle-timeout than do second
25 DNS message burst
* Check whether transfer longer than tcp-initial-timeout succeeds