An IXFR containing SOA records with owner names different than the
transferred zone's origin can result in named serving a version of that
zone without an SOA record at the apex. This causes a RUNTIME_CHECK
assertion failure the next time such a zone is refreshed. Fix by
immediately rejecting a zone transfer (either an incremental or
non-incremental one) upon detecting an SOA record not placed at the apex
of the transferred zone.
5625. [func] Reduce the supported maximum number of iterations
that can be configured in an NSEC3 zones to 150.
[GL #2642]
(cherry picked from commit e04f06873f)
While working on the serve-stale backports, I noticed the following
oddities:
1. In the serve-stale system test, in one case we keep track of the
time how long it took for dig to complete. In commit
aaed7f9d8c, the code removed the
exception to check for result == ISC_R_SUCCESS on stale found
answers, and adjusted the test accordingly. This failed to update
the time tracking accordingly. Move the t1/t2 time track variables
back around the two dig commands to ensure the lookups resolved
faster than the resolver-query-timeout.
2. We can remove the setting of NS_QUERYATTR_STALEOK and
DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)"
code path, because they are added later when we know we have
actually found a stale answer on a stale timeout lookup.
3. We should clear the NS_QUERYATTR_STALEOK flag from the client
query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that
flag is set on the rdataset attributes).
4. In 'bin/named/config.c' we should set the configuration options
in alpabetical order.
5. In the ARM, in the backports we have added "(stale)" between
"cached" and "RRset" to make more clear a stale RRset may be
returned in this scenario.
(cherry picked from commit 104b676235)
Exerting excessive I/O load on the host running system tests should be
avoided in order to limit the number of false positives reported by the
system test suite. In some cases, running named with "-d 99" (which is
the default for system tests) results in a massive amount of logs being
generated, most of which are useless. Implement a log file size check
to draw developers' attention to overly verbose named instances used in
system tests. The warning threshold of 200,000 lines was chosen
arbitrarily.
(cherry picked from commit 241e85ef0c)
The regression test for CVE-2020-8620 causes a lot of useless messages
to be logged. However, globally decreasing the log level for the
affected named instance would be a step too far as debugging information
may be useful for troubleshooting other checks in the "tcp" system test.
Starting a separate named instance for a single check should be avoided
when possible and thus is also not a good solution. As a compromise,
run "rndc trace 1" for the affected named instance before starting the
regression test for CVE-2020-8620.
(cherry picked from commit 17e5c2a50e)
The system test framework starts all named instances with the "-d 99"
command line option (unless it is overridden by a named.args file in a
given instance's working directory). This causes a lot of log messages
to be written to named.run files - currently over 5 million lines for a
single test suite run. While debugging information preserved in the log
files is essential for troubleshooting intermittent test failures, some
system tests involve sending hundreds or even thousands of queries,
which causes the relevant log files to explode in size. When multiple
tests (or even multiple test suites) are run in parallel, excessive
logging contributes considerably to the I/O load on the test host,
increasing the odds of intermittent test failures getting triggered.
Decrease the debug level for the seven most verbose named instances:
- use "-d 3" for ns2 in the "cacheclean" system test (it is the lowest
logging level at which the test still passes without the need to
apply any changes to tests.sh),
- use "-d 1" for the other six named instances.
This roughly halves the number of lines logged by each test suite run
while still leaving enough information in the logs to allow at least
basic troubleshooting in case of test failures.
This approach was chosen as it results in a greater decrease in the
number of lines logged than running all named instances with "-d 3",
without causing any test failures.
(cherry picked from commit 4a8d404876)
The test spawns 4 parallel workers that keep adding, modifying and
deleting zones, the main thread repeatedly checks wheter rndc
status responds within a reasonable period.
While environment and timing issues may affect the test, in most
test cases the deadlock that was taking place before the fix used to
trigger in less than 7 seconds in a machine with at least 2 cores.
It follows a description of the steps that were leading to the deadlock:
1. `do_addzone` calls `isc_task_beginexclusive`.
2. `isc_task_beginexclusive` waits for (N_WORKERS - 1) halted tasks,
this blocks waiting for those (no. workers -1) workers to halt.
...
isc_task_beginexclusive(isc_task_t *task0) {
...
while (manager->halted + 1 < manager->workers) {
wake_all_queues(manager);
WAIT(&manager->halt_cond, &manager->halt_lock);
}
```
3. It is possible that in `task.c / dispatch()` a worker is running a
task event, if that event blocks it will not allow this worker to
halt.
4. `do_addzone` acquires `LOCK(&view->new_zone_lock);`,
5. `rmzone` event is called from some worker's `dispatch()`, `rmzone`
blocks waiting for the same lock.
6. `do_addzone` calls `isc_task_beginexclusive`.
7. Deadlock triggered, since:
- `rmzone` is wating for the lock.
- `isc_task_beginexclusive` is waiting for (no. workers - 1) to
be halted
- since `rmzone` event is blocked it won't allow the worker to halt.
To fix this, we updated do_addzone code to call isc_task_beginexclusive
before the lock is acquired, we postpone locking to the nearest required
place, same for isc_task_beginexclusive.
The same could happen with rndc modzone, so that was addressed as well.
Add a check to the "dnssec" system test which ensures that RRSIG(SOA)
RRsets present anywhere else than at the zone apex are automatically
removed after a zone containing such RRsets is loaded.
(cherry picked from commit 24bf4b946a)
If there happens to be a RRSIG(SOA) that is not at the zone apex
for any reason it should not be considered as a stopping condition
for incremental zone signing.
(cherry picked from commit b7cdc3583e)
When the keymgr needs to create new keys, it is possible it needs to
create multiple keys. The keymgr checks for keyid conflicts with
already existing keys, but it should also check against that it just
created.
(cherry picked from commit 668301f138)
GitLab CI pipelines do not currently include a Linux job that would have
GSSAPI support disabled. Add the "--without-gssapi" option to the
./configure invocation on Debian 9 to address that deficiency and also
to continuously test that build-time switch.
(cherry picked from commit a3957af864)
If "tkey-gssapi-credential" is set in the configuration and GSSAPI
support is not available, named will refuse to start. As the test
system framework does not support starting named instances
conditionally, ensure that "tkey-gssapi-credential" is only present in
named.conf if GSSAPI support is available.
(cherry picked from commit 6feac68b50)
If we don't wait for named to finish starting, 'rndc stop' may
fail due to the listen limit being reached in named leading
to a false negative test
(cherry picked from commit 8d5870f9df)
Four named instances in the "nsupdate" system test have GSS-TSIG support
enabled. All of them currently use "tkey-gssapi-keytab". Configure two
of them with "tkey-gssapi-credential" to test that option.
As "tkey-gssapi-keytab" and "tkey-gssapi-credential" both provide the
same functionality, no test modifications are required. The difference
between the two options is that the value of "tkey-gssapi-keytab" is an
explicit path to the keytab file to acquire credentials from, while the
value of "tkey-gssapi-credential" is the name of the principal whose
credentials should be used; those credentials are looked up in the
keytab file expected by the Kerberos library, i.e. /etc/krb5.keytab by
default. The path to the default keytab file can be overridden using by
setting the KRB5_KTNAME environment variable. Utilize that variable to
use existing keytab files with the "tkey-gssapi-credential" option.
The KRB5_KTNAME environment variable should not interfere with the
"tkey-gssapi-keytab" option. Nevertheless, rename one of the keytab
files used with "tkey-gssapi-keytab" to something else than the contents
of the KRB5_KTNAME environment variable in order to make sure that both
"tkey-gssapi-keytab" and "tkey-gssapi-credential" are actually tested.
(cherry picked from commit 1746d2e84a)
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx. This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.
This commit just cleans up the extra function.
(cherry picked from commit 3388ef36b3)
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs. This means, there's one less
layer as following changes have been done:
* struct isc_socket and struct isc_socketmgr have been removed
* struct isc__socket and struct isc__socketmgr have been renamed
to struct isc_socket and struct isc_socketmgr
* struct isc_task and struct isc_taskmgr have been removed
* struct isc__task and struct isc__taskmgr have been renamed
to struct isc_task and struct isc_taskmgr
* struct isc_timer and struct isc_timermgr have been removed
* struct isc__timer and struct isc__timermgr have been renamed
to struct isc_timer and struct isc_timermgr
* All the associated code that dealt with typing isc_<foo>
to isc__<foo> and back has been removed.
(cherry picked from commit 16fe0d1f41)
When resolve.c was moved from lib/samples to bin/tests/system, the
resolve.vcxproj.in would still contain old paths to the directory
root. This commit adds one more ..\ to match the directory depth.
Additionally, fixup the path in BINDInstall.vcxproj.in to be
bin/tests/system and not bin/tests/samples.
(cherry picked from commit f14e678624)
"resolve" is used by the resolver system tests, and I'm not
certain whether delv exercises the same code, so rather than
remove it, I moved it to bin/tests/system.
(cherry picked from commit d0ec7d1f33)