this restores functionality that was removed in commit 03be5a6b4e,
allowing named to search in authoritative zone databases outside the
current zone for additional data, if and only if recursion is allowed
and minimal-responses is disabled.
This commit adds a lengthy test where the ZSK is rolled but the
KSK is offline (except for when the DNSKEY RRset is changed). The
specific scenario has the `dnskey-kskonly` configuration option set
meaning the DNSKEY RRset should only be signed with the KSK.
A new zone `updatecheck-kskonly.secure` is added to test against,
that can be dynamically updated, and that can be controlled with rndc
to load the DNSSEC keys.
There are some pre-checks for this test to make sure everything is
fine before the ZSK roll, after the new ZSK is published, and after
the old ZSK is deleted. Note there are actually two ZSK rolls in
quick succession.
When the latest added ZSK becomes active and its predecessor becomes
inactive, the KSK is offline. However, the DNSKEY RRset did not
change and it has a good signature that is valid for long enough.
The expected behavior is that the DNSKEY RRset stays signed with
the KSK only (signature does not need to change). However, the
test will fail because after reconfiguring the keys for the zone,
it wants to add re-sign tasks for the new active keys (in sign_apex).
Because the KSK is offline, named determines that the only other
active key, the latest ZSK, will be used to resign the DNSKEY RRset,
in addition to keeping the RRSIG of the KSK.
The question is: Why do we need to resign the DNSKEY RRset
immediately when a new key becomes active? This is not required,
only once the next resign task is triggered the new active key
should replace signatures that are in need of refreshing.
Some system tests assume dig's default setings are in effect. While
these defaults may only be silently overridden (because of specific
options set in /etc/resolv.conf) for BIND releases using liblwres for
parsing /etc/resolv.conf (i.e. BIND 9.11 and older), it is arguably
prudent to make sure that tests relying on specific +timeout and +tries
settings specify these explicitly in their dig invocations, in order to
prevent test failures from being triggered by any potential changes to
current defaults.
Simply looking for the key ID surrounded by spaces in the tested
dnssec-signzone output file is not a precise enough method of checking
for signatures prepared using a given key ID: it can be tripped up by
cross-algorithm key ID collisions and certain low key IDs (e.g. 60, the
TTL specified in bin/tests/system/dnssec/signer/example.db.in), which
triggers false positives for the "dnssec" system test. Make key ID
extraction precise by using an awk script which operates on specific
fields.
The "mirror" system test expects all dig queries (including recursive
ones) to be responded to within 1 second, which turns out to be overly
optimistic in certain cases and leads to false positives being
triggered. Increase dig query timeout used throughout the "mirror"
system test to 2 seconds in order to alleviate the issue.
Currently, ns3 in the "mirror" system test sends trust anchor telemetry
queries every second as it is started with "-T tat=1". Given the number
of trust anchors configured on ns3 (9), TAT-related traffic clutters up
log files, hindering troubleshooting efforts. Increase TAT query
interval to 3 seconds in order to alleviate the issue.
Note that the interval chosen cannot be much higher if intermittent test
failures are to be avoided: TAT queries are only sent after the
configured number of seconds passes since resolver startup. Quick
experiments show that even on contemporary hardware, ns3 should be
running for at least 5 seconds before it is first shut down, so a
3-second TAT query interval seems to be a reasonable, future-proof
compromise. Ensure the relevant check is performed before ns3 is first
shut down to emphasize this trade-off and make it more clear by what
time TAT queries are expected to be sent.
"rndc dumpdb" works asynchronously, i.e. the requested dump may not yet
be fully written to disk by the time "rndc" returns. Prevent false
positives for the "serve-stale" system test by only checking dump
contents after the line indicating that it is complete is written.
This tests both the cases when the DLV trust anchor is of an
unsupported or disabled algorithm, as well as if the DLV zone
contains a key with an unsupported or disabled algorithm.
Some values returned by dstkey_fromconfig() indicate that key loading
should be interrupted, others do not. There are also certain subsequent
checks to be made after parsing a key from configuration and the results
of these checks also affect the key loading process. All of this
complicates the key loading logic.
In order to make the relevant parts of the code easier to follow, reduce
the body of the inner for loop in load_view_keys() to a single call to a
new function, process_key(). Move dstkey_fromconfig() error handling to
process_key() as well and add comments to clearly describe the effects
of various key loading errors.
More specifically: ignore configured trusted and managed keys that
match a disabled algorithm. The behavioral change is that
associated responses no longer SERVFAIL, but return insecure.
bin/tests/system/stop.pl only waits for the PID file to be cleaned up
while named cleans up the lock file after the PID file. Thus, the
aforementioned script may consider a named instance to be fully shut
down when in fact it is not.
Fix by also checking whether the lock file exists when determining a
given instance's shutdown status. This change assumes that if a named
instance uses a lock file, it is called "named.lock".
Also rename clean_pid_file() to pid_file_exists(), so that it is called
more appropriately (it does not clean up the PID file itself, it only
returns the server's identifier if its PID file is not yet cleaned up).
MR !1141 broke the way stop.pl is invoked when start.pl fails:
- start.pl changes the working directory to $testdir/$server before
attempting to start $server,
- commit 27ee629e6b causes the $testdir
variable in stop.pl to be determined using the $SYSTEMTESTTOP
environment variable, which is set to ".." by all tests.sh scripts,
- commit e227815af5 makes start.pl pass
$test (the test's name) rather than $testdir (the path to the test's
directory) to stop.pl when a given server fails to start.
Thus, when a server is restarted from within a tests.sh script and such
a restart fails, stop.pl attempts to look for the server directory in a
nonexistent location ($testdir/$server/../$test, i.e. $testdir/$test,
instead of $testdir/../$test). Fix the issue by changing the working
directory before stop.pl is invoked in the scenario described above.
Remove another remnant of shared secret HMAC-MD5 support.
Explain that with currently recommended setups DNSKEY records are
inserted automatically, but you can still use $INCLUDE in other cases.
On Unix systems, the CYGWIN environment variable is not set at all when
BIND system tests are run. If a named instance crashes on shutdown or
otherwise fails to clean up its pidfile and the CYGWIN environment
variable is not set, stop.pl will print an uninitialized value warning
on standard error. Prevent this by using defined().
ifconfig.sh depends on config.guess for platform guessing. It uses it to
choose between ifconfig or ip tools to configure interfaces. If
system-wide automake script is installed and local was not found, use
platform guess. It should work well on mostly any sane platform. Still
prefers local guess, but passes when if cannot find it.
When a zone is converted from NSEC to NSEC3, the private record at zone
apex indicating that NSEC3 chain creation is in progress may be removed
during a different (later) zone_nsec3chain() call than the one which
adds the NSEC3PARAM record. The "delzsk.example" zone check only waits
for the NSEC3PARAM record to start appearing in dig output while private
records at zone apex directly affect "rndc signing -list" output. This
may trigger false positives for the "autosign" system test as the output
of the "rndc signing -list" command used for checking ZSK deletion
progress may contain extra lines which are not accounted for. Ensure
the private record is removed from zone apex before triggering ZSK
deletion in the aforementioned check.
Also future-proof the ZSK deletion progress check by making it only look
at lines it should care about.
For checks querying a named instance with "dnssec-accept-expired yes;"
set, authoritative responses have a TTL of 300 seconds. Assuming empty
resolver cache, TTLs of RRsets in the ANSWER section of the first
response to a given query will always match their authoritative
counterparts. Also note that for a DNSSEC-validating named resolver,
validated RRsets replace any existing non-validated RRsets with the same
owner name and type, e.g. cached from responses received while resolving
CD=1 queries. Since TTL capping happens before a validated RRset is
inserted into the cache and RRSIG expiry time does not impose an upper
TTL bound when "dnssec-accept-expired yes;" is set and, as pointed out
above, the original TTLs of the relevant RRsets equal 300 seconds, the
RRsets in the ANSWER section of the responses to expiring.example/SOA
and expired.example/SOA queries sent with CD=0 should always be exactly
120 seconds, never a lower value. Make the relevant TTL checks stricter
to reflect that.
Always expecting a TTL of exactly 300 seconds for RRsets found in the
ADDITIONAL section of responses received for CD=1 queries sent during
TTL capping checks is too strict since these responses will contain
records cached from multiple DNS messages received during the resolution
process.
In responses to queries sent with CD=1, ns.expiring.example/A in the
ADDITIONAL section will come from a delegation returned by ns2 while the
ANSWER section will come from an authoritative answer returned by ns3.
If the queries to ns2 and ns3 happen at different Unix timestamps,
RRsets cached from the older response will have a different TTL by the
time they are returned to dig, triggering a false positive.
Allow a safety margin of 60 seconds for checks inspecting the ADDITIONAL
section of responses to queries sent with CD=1 to fix the issue. A
safety margin this large is likely overkill, but it is used nevertheless
for consistency with similar safety margins used in other TTL capping
checks.
Commit c032c54dda inadvertently changed
the DNS message section inspected by one of the TTL capping checks from
ADDITIONAL to ANSWER, introducing a discrepancy between that check's
description and its actual meaning. Revert to inspecting the ADDITIONAL
section in the aforementioned check.
Changes introduced by commit 6b8e4d6e69
were incomplete as not all time-sensitive checks were updated to match
revised "nta-lifetime" and "nta-recheck" values. Prevent rare false
positives by updating all NTA-related checks so that they work reliably
with "nta-lifetime 12s;" and "nta-recheck 9s;". Update comments as well
to prevent confusion.