Commit Graph

10146 Commits

Author SHA1 Message Date
Matthijs Mekking
96583e7c40 Fix inline test with missing $INCLUDE
The test case for a zone with a missing include file was wrong for two
reasons:
1. It was loading the wrong file (master5 instead of master6)
2. It did actually not set the $ret variable to 1 if the test failed
   (it should default to ret=1 and clear the variable if the
   appropriate log is found).
2021-04-14 10:04:40 +02:00
Matthijs Mekking
6463ee0f40 Add inline-signing with $INCLUDE test
Add a test case for inline-signing for a zone with an $INCLUDE
statement. There is already a test for a missing include file, this
one adds a test for a zone with an include file that does exist.

Test if the record in the included file is loaded.
2021-04-14 10:04:40 +02:00
Matthijs Mekking
9af8caa733 Implement draft-vandijk-dnsop-nsec-ttl
The draft says that the NSEC(3) TTL must have the same TTL value
as the minimum of the SOA MINIMUM field and the SOA TTL. This was
always the intended behaviour.

Update the zone structure to also track the SOA TTL. Whenever we
use the MINIMUM value to determine the NSEC(3) TTL, use the minimum
of MINIMUM and SOA TTL instead.

There is no specific test for this, however two tests need adjusting
because otherwise they failed: They were testing for NSEC3 records
including the TTL. Update these checks to use 600 (the SOA TTL),
rather than 3600 (the SOA MINIMUM).
2021-04-13 11:26:26 +02:00
Matthijs Mekking
a83c8cb0af Use stale TTL as RRset TTL in dumpdb
It is more intuitive to have the countdown 'max-stale-ttl' as the
RRset TTL, instead of 0 TTL. This information was already available
in a comment "; stale (will be retained for x more seconds", but
Support suggested to put it in the TTL field instead.
2021-04-13 09:48:20 +02:00
Matthijs Mekking
debee6157b Check staleness in bind_rdataset
Before binding an RRset, check the time and see if this record is
stale (or perhaps even ancient). Marking a header stale or ancient
happens only when looking up an RRset in cache, but binding an RRset
can also happen on other occasions (for example when dumping the
database).

Check the time and compare it to the header. If according to the
time the entry is stale, but not ancient, set the STALE attribute.
If according to the time is ancient, set the ANCIENT attribute.

We could mark the header stale or ancient here, but that requires
locking, so that's why we only compare the current time against
the rdh_ttl.

Adjust the test to check the dump-db before querying for data. In the
dumped file the entry should be marked as stale, despite no cache
lookup happened since the initial query.
2021-04-13 09:48:20 +02:00
Matthijs Mekking
2a5e0232ed Fix nonsensical stale TTL values in cache dump
When introducing change 5149, "rndc dumpdb" started to print a line
above a stale RRset, indicating how long the data will be retained.

At that time, I thought it should also be possible to load
a cache from file. But if a TTL has a value of 0 (because it is stale),
stale entries wouldn't be loaded from file. So, I added the
'max-stale-ttl' to TTL values, and adjusted the $DATE accordingly.

Since we actually don't have a "load cache from file" feature, this
is premature and is causing confusion at operators. This commit
changes the 'max-stale-ttl' adjustments.

A check in the serve-stale system test is added for a non-stale
RRset (longttl.example) to make sure the TTL in cache is sensible.

Also, the comment above stale RRsets could have nonsensical
values. A possible reason why this may happen is when the RRset was
marked a stale but the 'max-stale-ttl' has passed (and is actually an
RRset awaiting cleanup). This would lead to the "will be retained"
value to be negative (but since it is stored in an uint32_t, you would
get a nonsensical value (e.g. 4294362497).

To mitigate against this, we now also check if the header is not
ancient. In addition we check if the stale_ttl would be negative, and
if so we set it to 0. Most likely this will not happen because the
header would already have been marked ancient, but there is a possible
race condition where the 'rdh_ttl + serve_stale_ttl' has passed,
but the header has not been checked for staleness.
2021-04-13 09:48:20 +02:00
Mark Andrews
38449de93b Update named's usage description 2021-04-12 12:07:44 +10:00
Michał Kępień
c3718b926b Use the same port selection method on all systems
When system tests are run on Windows, they are assigned port ranges that
are 100 ports wide and start from port number 5000.  This is a different
port assignment method than the one used on Unix systems.  Drop the "-p"
command line option from bin/tests/system/run.sh invocations used for
starting system tests on Windows to unify the port assignment method
used across all operating systems.
2021-04-08 11:12:37 +02:00
Michał Kępień
31e5ca4bd9 Rework get_ports.sh to make it not use a lock file
The get_ports.sh script is used for determining the range of ports a
given system test should use.  It first determines the start of the port
range to return (the base port); it can either be specified explicitly
by the caller or chosen randomly.  Subsequent ports are picked
sequentially, starting from the base port.  To ensure no single port is
used by multiple tests, a state file (get_ports.state) containing the
last assigned port is maintained by the script.  Concurrent access to
the state file is protected by a lock file (get_ports.lock); if one
instance of the script holds the lock file while another instance tries
to acquire it, the latter retries its attempt to acquire the lock file
after sleeping for 1 second; this retry process can be repeated up to 10
times before the script returns an error.

There are some problems with this approach:

  - the sleep period in case of failure to acquire the lock file is
    fixed, which leads to a "thundering herd" type of problem, where
    (depending on how processes are scheduled by the operating system)
    multiple system tests try to acquire the lock file at the same time
    and subsequently sleep for 1 second, only for the same situation to
    likely happen the next time around,

  - the lock file is being locked and then unlocked for every single
    port assignment made, not just once for the entire range of ports a
    system test should use; in other words, the lock file is currently
    locked and unlocked 13 times per system test; this increases the
    odds of the "thundering herd" problem described above preventing a
    system test from getting one or more ports assigned before the
    maximum retry count is reached (assuming multiple system tests are
    run in parallel); it also enables the range of ports used by a given
    system test to be non-sequential (which is a rather cosmetic issue,
    but one that can make log interpretation harder than necessary when
    test failures are diagnosed),

  - both issues described above cause unnecessary delays when multiple
    system tests are started in parallel (due to high lock file
    contention among the system tests being started),

  - maintaining a state file requires ensuring proper locking, which
    complicates the script's source code.

Rework the get_ports.sh script so that it assigns non-overlapping port
ranges to its callers without using a state file or a lock file:

  - add a new command line switch, "-t", which takes the name of the
    system test to assign ports for,

  - ensure every instance of get_ports.sh knows how many ports all
    system tests which form the test suite are going to need in total
    (based on the number of subdirectories found in bin/tests/system/),

  - in order to ensure all instances of get_ports.sh work on the same
    global port range (so that no port range collisions happen), a
    stable (throughout the expected run time of a single system test
    suite) base port selection method is used instead of the random one;
    specifically, the base port, unless specified explicitly using the
    "-p" command line switch, is derived from the number of hours which
    passed since the Unix Epoch time,

  - use the name of the system test to assign ports for (passed via the
    new "-t" command line switch) as a unique index into the global
    system test range, to ensure all system tests use disjoint port
    ranges.
2021-04-08 11:12:37 +02:00
Michal Nowak
cd0a34df1b Move fromhex.pl script to bin/tests/system/
The fromhex.pl script needs to be copied from the source directory to
the build directory before any test is run, otherwise the out-of-tree
fails to find it. Given that the script is used only in system test,
move it to bin/tests/system/.
2021-04-08 11:04:26 +02:00
Mark Andrews
bb6f0faeed Check that upgrade of managed-keys.bind.jnl succeeded
Update the system to include a recoverable managed.keys journal created
with <size,serial0,serial1,0> transactions and test that it has been
updated as part of the start up process.
2021-04-07 20:27:22 +02:00
Evan Hunt
d2ea8f4245 Ensure dig lookup is detached on UDP connect failure
dig could hang when UDP connect failed due to a dangling lookup object.
2021-04-07 15:36:59 +02:00
Ondřej Surý
86f4872dd6 isc_nm_*connect() always return via callback
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).

This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
2021-04-07 15:36:59 +02:00
Evan Hunt
a70cd026df move UDP connect retries from dig into isc_nm_udpconnect()
dig previously ran isc_nm_udpconnect() three times before giving
up, to work around a freebsd bug that caused connect() to return
a spurious transient EADDRINUSE. this commit moves the retry code
into the network manager itself, so that isc_nm_udpconnect() no
longer needs to return a result code.
2021-04-07 15:36:59 +02:00
Matthijs Mekking
e443279bbf Change default stale-answer-client-timeout to off
Using "stale-answer-client-timeout" turns out to have unforeseen
negative consequences, and thus it is better to disable the feature
by default for the time being.
2021-04-07 14:10:31 +02:00
Matthijs Mekking
aaed7f9d8c Remove result exception on staleonly lookup
When implementing "stale-answer-client-timeout", we decided that
we should only return positive answers prematurely to clients. A
negative response is not useful, and in that case it is better to
wait for the recursion to complete.

To do so, we check the result and if it is not ISC_R_SUCCESS, we
decide that it is not good enough. However, there are more return
codes that could lead to a positive answer (e.g. CNAME chains).

This commit removes the exception and now uses the same logic that
other stale lookups use to determine if we found a useful stale
answer (stale_found == true).

This means we can simplify two test cases in the serve-stale system
test: nodata.example is no longer treated differently than data.example.
2021-04-02 10:02:40 +02:00
Mark Andrews
35e8f56b49 Test dynamic libraries should not be installed
Tag the libraries with check_ to prevent them being installed
by "make install".  Additionally make check requires .so to be
create which requires .lai files to be constructed which, in
turn, requires -rpath <dir> as part of "linking" the .la file.
2021-04-01 19:11:54 +11:00
Diego Fronza
3b98c4d311 Update dig's man page
Adjusted man page entries for +tries and +retry options to reflect the
fact that now those options apply to TCP as well.
2021-03-25 14:08:40 -03:00
Diego Fronza
4f82cc41cc Added tests for tries=1 and retry=0 on TCP EOF
Added tests to ensure that dig won't retry sending a query over tcp
(+tcp) when a TCP connection is closed prematurely (EOF is read) if
either +tries=1 or retry=0 is specified on the command line.
2021-03-25 14:08:40 -03:00
Diego Fronza
e680896003 Adjusted dig system tests
Now that premature EOF on tcp connections take +tries and +retry into
account, the dig system tests handling TCP EOF with +tries=1 were
expecting dig to do a second attempt in handling the tcp query, which
doesn't happen anymore.

To make the test work as expected +tries value was adjusted to 2, to
make it behave as before after the new update on dig.
2021-03-25 14:08:40 -03:00
Diego Fronza
78f6ead480 Don't retry +tcp queries on failure if tries=1 or retries=0
Before this commit, a premature EOF (connection closed) on tcp queries
was causing dig to automatically attempt to send the query again, even
if +tries=1 or +retries=0 was provided on command line.

This commit fix the problem by taking into account the no. of retries
specified by the user when processing a premature EOF on tcp
connections.
2021-03-25 14:08:39 -03:00
Matthijs Mekking
93ed215065 Add kasp.sh to run.sh.in script
Add kasp.sh to the list of scripts copied from the source directory to
the build directory before any test is run. This will fix
the out-of-tree test failures introduced in commit
ecb073bdd6 on the 'main' branch.
2021-03-24 08:55:24 +01:00
Matthijs Mekking
82d667e1d5 Fix some intermittent kasp failures
When calling "rndc dnssec -checkds", it may take some milliseconds
before the appropriate changes have been written to the state file.
Add retry_quiet mechanisms to allow the write operation to finish.

Also retry_quiet the check for the next key event. A "rndc dnssec"
command may trigger a zone_rekey event and this will write out
a new "next key event" log line, but it may take a bit longer than
than expected in the tests.
2021-03-22 11:58:26 +01:00
Matthijs Mekking
82f72ae249 Rekey immediately after rndc checkds/rollover
Call 'dns_zone_rekey' after a 'rndc dnssec -checkds' or 'rndc dnssec
-rollover' command is received, because such a command may influence
the next key event. Updating the keys immediately avoids unnecessary
rollover delays.

The kasp system test no longer needs to call 'rndc loadkeys' after
a 'rndc dnssec -checkds' or 'rndc dnssec -rollover' command.
2021-03-22 11:58:26 +01:00
Matthijs Mekking
6f31f62d69 Delete CDS/CDNSKEY records when zone is unsigned
CDS/CDNSKEY DELETE records are only useful if they are signed,
otherwise the parent cannot verify these RRsets anyway. So once the DS
has been removed (and signaled to BIND), we can remove the DNSKEY and
RRSIG records, and at this point we can also remove the CDS/CDNSKEY
records.
2021-03-22 10:30:59 +01:00
Matthijs Mekking
f211c7c2a1 Allow CDS/CDNSKEY DELETE records in unsigned zone
While not useful, having a CDS/CDNSKEY DELETE record in an unsigned
zone is not an error and "named-checkzone" should not complain.
2021-03-22 10:25:30 +01:00
Matthijs Mekking
d5531df79a Retry quiet check keys
Change the 'check_keys' function to try three times. Some intermittent
kasp test failures are because we are inspecting the key files
before the actual change has happen. The 'retry_quiet' approach allows
for a bit more time to let the write operation finish.
2021-03-22 09:50:05 +01:00
Matthijs Mekking
c40c1ebcb1 Test keymgr2kasp state from timing metadata
Add two test zones that migrate to dnssec-policy. Test if the key
states are set accordingly given the timing metadata.

The rumoured.kasp zone has its Publish/Active/SyncPublish times set
not too long ago so the key states should be set to RUMOURED. The
omnipresent.kasp zone has its Publish/Active/SyncPublish times set
long enough to set the key states to OMNIPRESENT.

Slightly change the init_migration_keys function to set the
key lifetime to "none" (legacy keys don't have lifetime). Then in the
test case set the expected key lifetime explicitly.
2021-03-22 09:50:05 +01:00
Matthijs Mekking
f6fa254256 Editorial commit keymgr2kasp test
This commit is somewhat editorial as it does not introduce something
new nor fixes anything.

The layout in keymgr2kasp/tests.sh has been changed, with the
intention to make more clear where a test scenario ends and begins.

The publication time of some ZSKs has been changed. It makes a more
clear distinction between publication time and activation time.
2021-03-22 09:50:05 +01:00
Matthijs Mekking
ecb073bdd6 Introduce kasp.sh
Add a script similar to conf.sh to include common functions and
variables for testing KASP. Currently used in kasp, keymgr2kasp, and
nsec3.
2021-03-22 09:50:05 +01:00
Matthijs Mekking
5389172111 Move kasp migration tests to different directory
The kasp system test was getting pretty large, and more tests are on
the way. Time to split up. Move tests that are related to migrating
to dnssec-policy to a separate directory 'keymgr2kasp'.
2021-03-22 09:50:05 +01:00
Michał Kępień
185a1a5643 Install man page for named-compilezone
The named-checkzone tool can also be invoked as named-compilezone.  Make
sure a man page is installed for that alias.  Move and rename the
"man_named-checkzone" label to prevent a Sphinx duplicate label warning
from being raised (see commit 84862e96c1
for more information).
2021-03-22 09:36:48 +01:00
Patrick McLean
56cef1495f dig: Use high resolution clocks when microsecond accuracy is requested
The TIME_NOW macro calls isc_time_now which uses CLOCK_REALTIME_COARSE
for getting the current time. This is perfectly fine for millisecond,
however when the user request microsecond resolutiuon, they are going
to get very inaccurate results. This is especially true on a server
class machine where the clock ticks may be set to 100HZ.

This changes dig to use the new TIME_NOW_HIRES macro that uses the
CLOCK_MONOTONIC_RAW that is more expensive, but gets the *actual*
current time rather than the at the last kernel time tick.
2021-03-20 11:25:55 -07:00
treysis
6b2ea00621 Add filter-a plugin for IPv6-dominant environments
(cherry picked from commit 78f6cd57e1cc166823415438fe2d19a324cf7a67)
2021-03-19 08:06:55 +01:00
Ondřej Surý
36ddefacb4 Change the isc_nm_(get|set)timeouts() to work with milliseconds
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit.  The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.

This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
2021-03-18 16:37:57 +01:00
Ondřej Surý
64cff61c02 Add TCP timeouts system test
The system tests were missing a test that would test tcp-initial-timeout
and tcp-idle-timeout.

This commit adds new "timeouts" system test that adds:

  * Test that waits longer than tcp-initial-timeout and then checks
    whether the socket was closed

  * Test that sends and receives DNS message then waits longer than
    tcp-initial-timeout but shorter time than tcp-idle-timeout than
    sends DNS message again than waits longer than tcp-idle-timeout
    and checks whether the socket was closed

  * Similar test, but bursting 25 DNS messages than waiting longer than
    tcp-initial-timeout and shorter than tcp-idle-timeout than do second
    25 DNS message burst

  * Check whether transfer longer than tcp-initial-timeout succeeds
2021-03-18 16:37:57 +01:00
Matthijs Mekking
0cae3249e3 Add test for thaw dynamic kasp zone
Add a test for freezing, manually updating, and then thawing a dynamic
zone with "dnssec-policy". In the kasp system test we add parameters
to the "update_is_signed" check to signal the indicated IP addresses
for the labels "a" and "d". If set to '-', the test is skipped.

After nsupdating the dynamic.kasp zone, we revert the update (with
nsupdate) and update the zone again, but now with the freeze/thaw
approach.
2021-03-17 08:24:17 +01:00
Matthijs Mekking
ee0835d977 Fix a XoT crash
The transport should also be detached when we skip a master, otherwise
named will crash when sending a SOA query to the next master over TLS,
because the transport must be NULL when we enter
'dns_view_gettransport'.
2021-03-16 10:11:12 +01:00
Mark Andrews
25d1276170 Ignore the actual error code returned by getaddrinfo
when testing if interactive mode continues or not on
invalid hostname.  We only need to detect that getaddrinfo
failed and that we continued or not.
2021-03-16 10:20:28 +11:00
Matthijs Mekking
87591de6f7 Fix servestale fetchlimits crash
When we query the resolver for a domain name that is in the same zone
for which is already one or more fetches outstanding, we could
potentially hit the fetch limits. If so, recursion fails immediately
for the incoming query and if serve-stale is enabled, we may try to
return a stale answer.

If the resolver is also is authoritative for the parent zone (for
example the root zone), first a delegation is found, but we first
check the cache for a better response.

Nothing is found in the cache, so we try to recurse to find the
answer to the query.

Because of fetch-limits 'dns_resolver_createfetch()' returns an error,
which 'ns_query_recurse()' propagates to the caller,
'query_delegation_recurse()'.

Because serve-stale is enabled, 'query_usestale()' is called,
setting 'qctx->db' to the cache db, but leaving 'qctx->version'
untouched. Now 'query_lookup()' is called to search for stale data
in the cache database with a non-NULL 'qctx->version'
(which is set to a zone db version), and thus we hit an assertion
in rbtdb.

This crash was introduced in 'main' by commit
8bcd7fe69e.
2021-03-11 12:16:14 +01:00
Mark Andrews
af0ee2c718 Rename 'yield' to 'waitforsignal' due to namespace clash 2021-03-11 11:34:15 +11:00
Mark Andrews
926b9056b7 add journal to conf.sh.common 2021-03-08 11:36:00 +11:00
Evan Hunt
dbffb212ce add basic DoH system tests
- rename dot to doth, as it now covers both dot and doh.
- merge xot into doth as it's closely related.
- added long-lived key and cert files (expiring 2121).
- add tests with https-get, https-post, http-plain, alternate
  endpoints, and both static and ephemeral TLS configuration.
- incidentally fixed a memory leak in dig that occurred if +https
  was specified more than once.
2021-03-05 18:09:42 +02:00
Artem Boldariev
ca9a15e3bc DoH: call send callbacks after data was actually sent 2021-03-05 13:29:32 +02:00
Evan Hunt
88752b1121 refactor outgoing HTTP connection support
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
  into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
  function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
  and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
  in isc_nm_httpconnect(); this eliminates the need to parse the URI
  and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
  for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
  isc_tlsctx_createclient(), we now have a function
  isc_tlsctx_enable_http2client_alpn() that can be run from
  isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
  isc_nm_send() or isc_nm_read() is called on an http socket, it will
  be stored until a corresponding isc_nm_read() or _send() arrives; when
  we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
  internal helper function by the DoH unit test. (eventually doh_test
  should be rewritten to use read and send, and this function should
  be removed.)
- added implementations of isc__nm_tls_settimeout() and
  isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
  help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
  currently we try to bypass HTTP caching proxies, but ideally we should
  interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
2021-03-05 13:29:26 +02:00
Ondřej Surý
9c8b7a5c45 add preliminary DoH client support to dig
add options "+https", "+https-get" and "+http-plain" to
allow dig to connect over HTTP/2 channels.
2021-03-05 13:28:17 +02:00
Mark Andrews
4015af02d8 Move cleanup of queries to later in the shutdown sequence
to avoid TSAN report

    WARNING: ThreadSanitizer: data race
      Write of size 8 at 0x000000000001 by main thread:
        #0 free <null>
        #1 default_memfree lib/isc/mem.c:440
        #2 mem_put lib/isc/mem.c:363
        #3 isc__mem_free lib/isc/mem.c:1012
        #4 main bin/tools/mdig.c:2231

      Previous read of size 1 at 0x000000000005 by thread T1:
        #0 dns_name_fromtext lib/dns/name.c:1121
        #1 sendquery bin/tools/mdig.c:596
        #2 sendqueries bin/tools/mdig.c:779
        #3 dispatch lib/isc/task.c:1153
        #4 run lib/isc/task.c:1345
        #5 isc__trampoline_run lib/isc/trampoline.c:184
        #6 <null> <null>

      Thread T1 (running) created by main thread at:
        #0 pthread_create <null>
        #1 isc_thread_create pthreads/thread.c:79
        #2 isc_taskmgr_create lib/isc/task.c:1435
        #3 main bin/tools/mdig.c:2148

    SUMMARY: ThreadSanitizer: data race in __interceptor_free
2021-03-04 13:21:56 +01:00
Ondřej Surý
8153729d3a Use int type to store result from isc_commandline_parse()
The C standard actually doesn't define char as signed or unsigned, and
it could be either according to underlying architecture.  It turns out
that while it's usually signed type, it isn't on arm64 where it's
unsigned.

isc_commandline_parse() return int, just use that instead of the char.
2021-03-04 10:43:00 +01:00
Evan Hunt
a0aefa1de6 create 'journal' system test
tests that version 1 journal files containing version 1 transaction
headers are rolled forward correctly on server startup, then updated
into version 2 journals. also checks journal file consistency and
'max-journal-size' behavior.
2021-03-03 17:54:47 -08:00
Mark Andrews
fb2d0e2897 extend named-journalprint to be able to force the journal version
named-journalprint can now upgrade or downgrade a journal file
in place; the '-u' option upgrades and the '-d' option downgrades.
2021-03-03 17:54:47 -08:00