Commit Graph

5420 Commits

Author SHA1 Message Date
Tom Krizek
37d14c69c0 Export env variables in system tests
Certain variables have to be exported in order for the system tests to
work. It makes little sense to export the variables in one place/script
while they're defined in another place.

Since it makes no harm, export all the variables to make the behaviour
more predictable and consistent. Previously, some variables were
exported as environment variables, while others were just shell
variables which could be used once the configuration was sourced from
another script. However, they wouldn't be exposed to spawned processes.

For simplicity sake (and for the upcoming effort to run system tests
with pytest), export all variables that are used. TESTS, PARALLEL_UNIX
and SUBDIRS variables are automake-specific, aren't used anywhere else
and thus not exported.
2022-10-27 12:14:29 +02:00
Tom Krizek
bb1c6bbdc7 Support testcrypto.sh usage without including conf.sh
The only variable really needed for the script to work is the path to
the $KEYGEN binary. Allow setting this via an environment variable to
avoid loading conf.sh (and causing a chicken-egg problem). Also make
testcrypto.sh executable to allow its use from conf.sh.
2022-10-27 12:14:29 +02:00
Tom Krizek
01b293b055 Unify indentation level in testcrypto.sh 2022-10-27 12:14:27 +02:00
Matthijs Mekking
622a499027 Add xfer system test case
Add a test case that if the first primary fails, the fallback of a
second primary on plain DNS works. This is mainly to test that the port
configuration inheritance works correctly.
2022-10-27 11:39:34 +02:00
Tom Krizek
6295572b05 Remove misleading comment from serve-stale test
The stale-answer-client-timeout option is not set to 0 in the config
neither is it the default value. This was probably caused by a
copy-paste error.
2022-10-24 14:23:27 +02:00
Tom Krizek
a4d72a57f9 Test serve stale cache with timeout 0 and CNAME
Add a couple of tests that verify the serve-stale behavior when
stale-answer-client-timeout is set to 0 and a (stale) CNAME record is
queried.

Related #3517
2022-10-24 14:23:26 +02:00
Aram Sargsyan
89fa9a6592 Add another prefetch check in the resolver system test
The test triggers a prefetch, but fails to check if it acutally
happened, which prevented it from catching a bug when the record's
TTL value matches the configured prefetch eligibility value.

Check that prefetch happened by comparing the TTL values.
2022-10-21 10:17:03 +00:00
Ondřej Surý
cd0e5c5784 Replace some raw nc usage in statschannel system test with curl
For tests where the TCP connection might get interrupted abruptly,
replace the nc with curl as the data sent from server to client might
get lost because of abrupt TCP connection.  This happens when the TCP
connection gets closed during sending the large request to the server.

As we already require curl for other system tests, replace the nc usage
in the statschannel test with curl that actually understands the
HTTP/1.1 protocol, so the same connection is reused for sending the
consequtive requests, but without client-side "pipelining".

For the record, the server doesn't support parallel processing of the
pipelined request, so it's a bit misnomer here, because what we are
actually testing is that we process all requests received in a single
TCP read callback.
2022-10-20 12:23:34 +02:00
Evan Hunt
575a924b1a add a test with CD=1 query for pending data
this is a regression test for [GL #3247].
2022-10-19 11:36:11 -07:00
Ondřej Surý
0f56a53d66 Remove the time requirement for the statschannel truncated test
The 5 seconds requirement to finish the 'pipelined with truncated
stream' was causing spurious failures in the CI because the job runners
might be very busy and sending 128k of data might simply take some time.

Remove the time requirement altogether, there's actually no reason why
the test SHOULD or even MUST finish under 5 seconds.
2022-10-19 14:08:24 +02:00
Tom Krizek
cbd0355328 Remove generated controls.conf file from system tests
The controls.conf file shouldn't be used directly without templating it
first. Remove this no longer used hard-coded file to avoid confusion.
2022-10-19 12:59:27 +02:00
Tom Krizek
cb0a2ae1dd Revive dupsigs system test
Correctly source conf.sh in dupsigs test scripts (fix issue introduced
by 093af1c00a).

Update dupsigs test for dnssec-dnskey-kskonly default. Since v9.17.20,
the dnssec-dnskey-kskonly is set to yes. Update the test to not expect
the additional RRSIG with ZSK for DNSKEY.

Speed up the test from 20 minutes to 2.5 minutes and make it part of the
default test suite executed in CI.
- decrease number of records to sign from 2000 to 500
- decrease the signing interval by a factor of 6
- shorten the final part of the test after last signing (since nothing
  new happens there)

Finally, clarify misleading comments about (in)sufficient time for zone
re-signing. The time used in the test is in fact sufficient for the
re-signing to happen. If it wasn't, the previous ZSK would end up being
deleted while its signatures would still be present, which is a
situation where duplicate signatures can still happen.
2022-10-19 12:59:27 +02:00
Tom Krizek
7495deea3e Revive the stress system test
Ensure the port numbers are dynamically filled in with copy_setports.

Clarify test fail condition.

Make the stress test part of the default test suite since it doesn't
seem to run too long or interfere with other tests any more (the
original note claiming so is more than 20 years old).

Related !6883
2022-10-19 12:59:27 +02:00
Tom Krizek
235ae5f344 Revive dialup system test
Properly template the port number in config files with copy_setports.

The test takes two minutes on my machine which doesn't seem like a
proper justification to exclude it from the test suite, especially
considering we run these tests in parallel nowadays. The resource usage
doesn't seems significantly increased so it shouldn't interfere with
other system tests.

There also exists a precedent for longer running system tests that are
already part of the default system test suite (e.g. serve-stale takes
almost three minutes on the same machine).
2022-10-19 12:59:27 +02:00
Tom Krizek
1e7d832342 Make digdelv test work in different network envs
When a target server is unreachable, the varying network conditions may
cause different ICMP message (or no message). The host unreachable
message was discovered when attempting to run the test locally while
connected to a VPN network which handles all traffic.

Extend the dig output check with "host unreachable" message to avoid a
false negative test result in certain network environments.
2022-10-19 12:59:25 +02:00
Michał Kępień
604d8f0b96 Add tests for CVE-2022-2795
Add a test ensuring that the amount of work fctx_getaddresses() performs
for any encountered delegation is limited: delegate example.net to a set
of 1,000 name servers in the redirect.com zone, the names of which all
resolve to IP addresses that nothing listens on, and query for a name in
the example.net domain, checking the number of times the findname()
function gets executed in the process; fail if that count is excessively
large.

Since the size of the referral response sent by ans3 is about 20 kB, it
cannot be sent back over UDP (EMSGSIZE) on some operating systems in
their default configuration (e.g. FreeBSD - see the
net.inet.udp.maxdgram sysctl).  To enable reliable reproduction of
CVE-2022-2795 (retry patterns vary across BIND 9 versions) and avoid
false positives at the same time (thread scheduling - and therefore the
number of fetch context restarts - vary across operating systems and
across test runs), extend bin/tests/system/resolver/ans3/ans.pl so that
it also listens on TCP and make "ns1" in the "resolver" system test
always use TCP when communicating with "ans3".

Also add a test (foo.bar.sub.tld1/TXT) that ensures the new limitations
imposed on the resolution process by the mitigation for CVE-2022-2795 do
not prevent valid, glueless delegation chains from working properly.
2022-10-19 11:53:08 +02:00
Evan Hunt
3c11fafadf test for growth of compressed pipelined responses
add a test to compare the Content-Length of successive compressed
messages on a single HTTP connection that should contain the same
data; fail if the size grows by more than 100 bytes from one query
to the next.
2022-10-18 17:16:00 +02:00
Petr Špaček
c3e7bed1ab Fix cookie system test for builds without --enable-developer
The "connecting via TCP" message comes from FCTXTRACE which is not
available on some builds.
2022-10-18 13:54:45 +02:00
Petr Špaček
ddf46056ca Allow system tests to run under root user when inside CI
https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
says variable CI_SERVER="yes" is available in all versions of Gitlab.
2022-10-18 13:30:16 +02:00
Tony Finch
26ed03a61e Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
2022-10-17 13:43:59 +01:00
Michal Nowak
212c4de043 Replace fgrep and egrep with grep -F/-E
GNU Grep 3.8 reports the following warnings:

    egrep: warning: egrep is obsolescent; using grep -E
    fgrep: warning: fgrep is obsolescent; using grep -F
2022-10-17 09:08:15 +02:00
Michal Nowak
65e91ef5e6 Remove stray backslashes
GNU Grep 3.8 reports several instances of stray backslashes in matching
patterns:

    grep: warning: stray \ before /
    grep: warning: stray \ before :
2022-10-17 09:08:15 +02:00
Tony Finch
45b2d8938b Simplify and speed up DNS name compression
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.

This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.

Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.

There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.

A small benchmark suggests this is about 2x faster.
2022-10-17 08:45:44 +02:00
Tom Krizek
05180154d9 Remove system test delzone
There are multiple reasons to remove this test as obsolete:

- The test may not possibly work for over 2.5 years, since
  98b3b93791 removed the rndc.py python
  tool on which this test relies.
- It isn't part of the test suite either in CI or locally unless it is
  explicitly enabled. As a result, there are many issues which prevent
  the test from being executed caused by various refactoring efforts
  accumulated over time.
- Even if the test could be executed, it has no clear failure condition.
  If the python script(s) fail, the test still passes.
2022-10-14 16:35:20 +02:00
Ondřej Surý
cad2706cce Replace the statschannel truncated tests with two new tests
Now that the artificial limit on the recv buffer has been removed, the
current system test always fails because it tests if the truncation has
happened.

Add test that sending more than 10 headers makes the connection to
closed; and add test that sending huge HTTP request makes the connection
to be closed.
2022-10-14 11:26:54 +02:00
Artem Boldariev
95a551de7b doth system test: increase transfers-in/out limits
Sometimes doth test could intermittently fail shortly after start due
to inability to complete a zone transfer in time. As it turned out, it
could happen due to transfers-in/out limits. Initially the defaults
were fine, but over time, especially when adding Strict/Mutual TLS, we
added more than 10 zones so it became possible to hit the limits.

This commit takes care of that by bumping the limits.
2022-10-12 21:52:52 +03:00
Artem Boldariev
354494cd10 doth system test - decrease HTTP listener quota size
This commit reduces the size of HTTP listener quota from 300 (default)
to 100 so that it would make hitting any global limits in case of
running multiple tests in parallel in multiple containers unlikely.

This way the need in opening many file descriptors of different
kinds (e.g. client side connections and pipes) gets significantly
reduced while the required code paths are still verified.
2022-10-12 21:46:39 +03:00
Michał Kępień
18e20f95f6 Fix startup detection after restart in start.pl
The bin/tests/system/start.pl script waits until a "running" message is
logged by a given name server instance before attempting to send a
version.bind/CH/TXT query to it.  The idea behind this was to make the
script wait until named loads all the zones it is configured to serve
before telling the system test framework that a given server is ready to
use; this prevents the need to add boilerplate code that waits for a
specific zone to be loaded to each test expecting that.

The problem is that when it looks for "running" messages, the
bin/tests/system/start.pl script assumes that the existence of any such
message in the named.run file indicates that a given named instance has
already finished loading all zones.  Meanwhile, some system tests
restart all the named instances they use throughout their lifetime (some
even do that a few times), for example to run Python-based tests.  The
bin/tests/system/start.pl script handles such a scenario incorrectly: as
soon as it finds any "running" message in the named.run file it inspects
and it gets a response to a version.bind/CH/TXT query, it tells the
system test framework that a given server is ready to use, which might
not be true - it is possible that only the "version.bind" zone is loaded
at that point and the "running" message found was logged by a
previously-shutdown named instance. This triggers intermittent failures
for Python-based tests.

Fix by improving the logic that the bin/tests/system/start.pl script
uses to detect server startup: check how many "running" lines are
present in a given named.run file before attempting to start a named
instance and only proceed with version.bind/CH/TXT queries when the
number of "running" lines found in that named.run file increases after
the server is started.
2022-10-11 11:54:57 +02:00
Michał Kępień
9146b956ae Do not truncate ns2 logs in the "rrsetorder" test
In the "rrsetorder" system test, the ns2 named instance is restarted
without passing the --restart option to bin/tests/system/start.pl.  This
causes the log file for that named instance to be needlessly truncated.
Prevent this from happening by restarting the affected named instance
in the same way as all the other named instances used in system tests.
2022-10-11 11:54:57 +02:00
Petr Špaček
058c1744ba Clarify error message about missing inline-signing & dnssec-policy 2022-10-06 10:26:30 +02:00
Mark Andrews
491a8cfe96 Add sleeps to ixfr system test
ensure that at least a second has passed since a zone was last loaded
to prevent it accidentally being skipped as up to date.
2022-10-06 08:18:03 +11:00
Michal Nowak
f5d9fa6ea4 Drop flake8 ignore lists
flake8 is not used in BIND 9 CI and inline ignore lists are not needed
anymore.
2022-10-05 17:56:24 +02:00
Mark Andrews
285351d4b2 Add additional forensics to zero system test 2022-10-05 07:46:01 +00:00
Matthijs Mekking
0681b15225 If refresh stale RRset times out, start stale-refresh-time
The previous commit failed some tests because we expect that if a
fetch fails and we have stale candidates in cache, the
stale-refresh-time window is started. This means that if we hit a stale
entry in cache and answering stale data is allowed, we don't bother
resolving it again for as long we are within the stale-refresh-time
window.

This is useful for two reasons:
- If we failed to fetch the RRset that we are looking for, we are not
  hammering the authoritative servers.

- Successor clients don't need to wait for stale-answer-client-timeout
  to get their DNS response, only the first one to query will take
  the latency penalty.

The latter is not useful when stale-answer-client-timeout is 0 though.

So this exception code only to make sure we don't try to refresh the
RRset again if it failed to do so recently.
2022-10-05 08:20:48 +02:00
Mark Andrews
6d561d3886 Add support for 'dohpath' to SVCB (and HTTPS)
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7.  It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
2022-10-04 14:21:41 +11:00
Ondřej Surý
d971472321 Be more patient when stopping servers in the system tests
When the TCP test is run on the busy server, the server might take a
while to wind the server down because it might still be processing all
that 300k invalid XFR requests.

Increate the rncd wait time to 120 seconds, the SIGTERM time to 300
seconds, and reduce the time to wait for ans servers from 1200 second
to just 120 seconds.
2022-09-30 17:12:44 +02:00
Aram Sargsyan
ae4296729c Test dynamic update forwarding when using a TLS-enabled primary
Add several test cases in the 'upforwd' system test to make sure
that different scenarios of Dynamic DNS update forwarding are
tested, in particular when both the original and forwarded requests
are over Do53, or DoT, or they use different transports.
2022-09-28 09:36:24 +00:00
Mark Andrews
432064f63c Suffix may be used before it is assigned a value
CID 350722 (#5 of 7): Bad use of null-like value (FORWARD_NULL)
        12. invalid_operation: Invalid operation on null-like value suffix.
    145        r.authority.append(
    146            dns.rrset.from_text(
    147                "icky.ptang.zoop.boing." + suffix,
    148                1,
    149                IN,
    150                NS,
    151                "a.bit.longer.ns.name." + suffix,
    152            )
    153        )
2022-09-27 23:47:12 +00:00
Mark Andrews
09f7e0607a Convert DST_ALG defines to enum and group HMAC algorithms
The HMACs and GSSAPI are just using unallocated values.
Moving them around shouldn't cause issues.
Only the dnssec system test knew the internal number in use for hmacmd5.
2022-09-27 16:54:36 +02:00
Mark Andrews
176e172210 Check that changing the TSIG key is successful
Switch the primary to require 'next_key' for zone transfers then
update the catalog zone to say to use 'next_key'.  Next update the
zones contents then check that those changes are seen on the
secondary.
2022-09-27 21:54:02 +10:00
Aram Sargsyan
f2bb80d6ae Extend the nsupdate system test with DoT-related checks
Add a simple test PKI based on the existing one in the doth test.

Check ephemeral, forward-secrecy, and forward-secrecy-mutual-tls
TLS configurations with different scenarios.
2022-09-23 13:23:49 +00:00
Aram Sargsyan
60f1a73754 Fix a typo in doth system test's CA.cfg
The comments in CA.cfg file serve as a good tutorial for setting up
a simple PKI for a system test. There is a typo in one of the presented
commands, which results in openssl not exiting with an error message
instead of generating a certificate.

Fix the typo.
2022-09-23 13:23:49 +00:00
Petr Špaček
c46ad4aec2 Fix JUnit test status generator for out-of-tree system tests
- Use separate paths for tests results and test script
- For tarball tests include the conversion script in the `make dist`
2022-09-22 15:20:23 +02:00
Michał Kępień
1814349374 Add tests for broken glueless referrals
If an NS RRset at the parent side of a delegation point only contains
in-bailiwick NS records, at least one glue record should be included in
every referral response sent for such a delegation point or else clients
will need to send follow-up queries in order to determine name server
addresses.  In certain edge cases (when the total size of a referral
response without glue records was just below to the UDP packet size
limit), named failed to adhere to that rule by sending non-truncated,
glueless referral responses.

Add tests attempting to trigger that bug in several different scenarios,
covering all possible combinations of the following factors:

  - type of zone (signed, unsigned),
  - glue record type (A, AAAA, both).
2022-09-22 14:03:17 +02:00
Michał Kępień
791d26b99e Clean up the "glue" system test
Bring the "glue" system test up to speed with other system tests: add
check numbering, ensure test artifacts are preserved upon failure,
improve error reporting, make the test fail upon unexpected errors,
address ShellCheck warnings.
2022-09-22 14:03:17 +02:00
Ondřej Surý
6797ca49cf Wait for the telemetry check to finish
Instead of expecting that telemetry check has already finished,
wait for it for maximum of three seconds, because named is run with
-tat=3, so the telemetry check must happen with 3 second window.

Co-authored-by: Evan Hunt <each@isc.org>
2022-09-22 09:45:54 +02:00
Aram Sargsyan
90959f6166 Implement TLS transport support for dns_request and dns_dispatch
This change prepares ground for sending DNS requests using DoT,
which, in particular, will be used for forwarding dynamic updates
to TLS-enabled primaries.
2022-09-19 16:36:28 +00:00
Michal Nowak
658cae9fad Bump socket.create_connection() timeout to 10 seconds
The tcp Pytest on OpenBSD fairly reliably fails when receive_tcp()
on a socket is attempted:

    >           (response, rtime) = dns.query.receive_tcp(sock, timeout())

    tests-tcp.py:50:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    /usr/local/lib/python3.9/site-packages/dns/query.py:659: in receive_tcp
        ldata = _net_read(sock, 2, expiration)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    sock = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>
    count = 2, expiration = 1662719959.8106785

        def _net_read(sock, count, expiration):
            """Read the specified number of bytes from sock.  Keep trying until we
            either get the desired amount, or we hit EOF.
            A Timeout exception will be raised if the operation is not completed
            by the expiration time.
            """
            s = b''
            while count > 0:
                try:
    >               n = sock.recv(count)
    E               socket.timeout: timed out

This is because the socket is already closed.

Bump the socket connection timeout to 10 seconds.
2022-09-15 11:13:36 +02:00
Evan Hunt
a2bbe578bf Add tests for the new log messages with refusal reason
Update the allow-query test to check for the new log messages.
2022-09-15 06:50:57 +02:00
Mark Andrews
b1ef1ded69 Emit key algorithm + key id in dnssec signing statsistics
If there was a collision of key id across algorithms it was not
possible to determine where counter applies to which algorithm for
xml statistics while for json only one of the values was emitted.
The key names are now "<algorithm-number>+<id>" (e.g. "8+54274").
2022-09-15 08:42:45 +10:00