The resolve binary is affected by GL#4119 which occassionally makes it
hand during system tests when running with TSAN. This is a workaround to
avoid wasting resources caused by a CI timeout for the system test tsan
jobs.
The conditions that trigger the crash:
- a stale record is in cache
- stale-answer-client-timeout is 0
- multiple clients query for the stale record, enough of them to exceed
the recursive-clients quota
- the response from the authoritative is sufficiently delayed so that
recursive-clients quota is exceeded first
The reproducer attempts to simulate this situation. However, it hasn't
proven to be 100 % reproducible, especially in CI. When reproducing
locally, the priming query also seems to sometimes interfere and prevent
the crash. When the reproducer is ran twice, it appears to be more
reliable in reproducing the issue.
(cherry picked from commit f617512d37)
The keys directory should be cleaned up in clean.sh. Doing that in the
test itself isn't reliable which may lead to failing mkdir which causes
the test to fail with set -e.
(cherry picked from commit 062dfac28e)
Because of a typo, the fetch.pl script tries to extract the server
address from the input parameter 'a' instead of 's'. Fix the typo.
(cherry picked from commit aa7538fd38)
The "uname -o" command is harmful on OpenBSD because this platform does
not know about the "-o" option. It is a permanent failure since system
tests are started with "set -e".
(cherry picked from commit ad3efede4d)
To improve the compatibility of the inline test with the `set -e`
option, ensure all commands which are expected to pass are explicitly
checked for return code and non-zero return codes are handled.
(cherry picked from commit e5f2addcaa)
The changes were mostly done with sed:
find . -name '*.sh' | xargs sed -i 's/`\([^`]*\)`/$(\1)/g'
There have been a few manual changes where the regex wasn't sufficient
(e.g. backslashes inside the `...`) or wrong (`...` referring to docs or
in comments).
(manually picked from commit 05baf7206b)
Ensure handling of return code from previous command doesn't cause the
script to halt if that code is non-zero when running with `set -e`.
(cherry picked from commit 837c190d9e)
Change the way arithmetic operations are performed in system test shell
scripts from using `expr` to `$(())`. This ensures that updating the
variable won't end up with a non-zero exit code, which would case the
script to exit prematurely when `set -e` is in effect.
The following replacements were performed using sed in all text files
(git grep -Il '' | xargs sed -i):
s/status=`expr $status + $ret`/status=$((status + ret))/g
s/n=`expr $n + 1`/n=$((n + 1))/g
s/t=`expr $t + 1`/t=$((t + 1))/g
s/status=`expr $status + 1`/status=$((status + 1))/g
s/try=`expr $try + 1`/try=$((try + 1))/g
(manually picked from commit 4d42bdc245)
Ensure all shell system tests are executed with the errexit option set.
This prevents unchecked return codes from commands in the test from
interfering with the tests, since any failures need to be handled
explicitly.
(cherry picked from commit 01bc805f89)
With the pytest runner, when BIND crashed during test runtime, the
get_core_dumps.sh script hasn't been run, and core dumps were not
detected.
(cherry picked from commit 89c77daddb)
Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.
In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.
This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.
(cherry picked from commit 9c40cf0566)
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one. This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.
(cherry picked from commit 674249f66a)
In HTTP/1.0 and HTTP/1.1, RFC 9112 section 9.6 says the last response
in a connection should include a `Connection: close` header, but the
statschannel server omitted it.
In an HTTP/1.0 response, the statschannel server can sometimes send a
`Connection: keep-alive` header when it is about to close the
connection. There are two ways:
If the first request on a connection is keep-alive and the second
request is not, then _both_ responses have `Connection: keep-alive`
but the connection is (correctly) closed after the second response.
If a single request contains
Connection: close
Connection: keep-alive
then RFC 9112 section 9.3 says the keep-alive header is ignored, but
the statschannel sends a spurious keep-alive in its response, though
it correctly closes the connection.
To fix these bugs, make it more clear that the `httpd->flags` are part
of the per-request-response state. The Connection: flags are now
described in terms of the effect they have instead of what causes them
to be set.
(manually picked from commit e18ca83a3b)
Pass 5 second timeout to the rndc status command(s) to avoid hitting the
hard 10 second timeout from subprocess.call, which would result in an
unwanted exception that would only mask the real issue: if the rndc
status times out in this test, it is likely due to the server not
stopping as it should.
(cherry picked from commit ceed694659)
The shutdown test attempts to shut down the server using two different
methods - rndc and sigterm. Use pytest.mark.parametrize to run these as
separate test cases for easier identification of failures.
(cherry picked from commit 603c58ee28)
Make the cds/setup.sh compatible with the workaround which relies on
testing the TSAN_OPTIONS variable which may not be set.
(cherry picked from commit 76d9873ef6)
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.
(manually picked from commit 06056c44a7)