Commit Graph

10039 Commits

Author SHA1 Message Date
Matthijs Mekking
d8c6655d7d Rewrap comments to 80 char width serve-stale test 2021-01-25 10:48:16 -03:00
Diego Fronza
6ab9070457 Add documentation for stale-answer-client-timeout 2021-01-25 10:47:14 -03:00
Diego Fronza
35fd039d03 Add system tests for stale-answer-client-timeout
This commit add 4 tests for the new option:
	1. Test default configuration of stale-answer-client-timeout, a
	   value of 1.8 seconds, with stale-refresh-time disabled.

	2. Test disabling of stale-answer-client-timeout.

	3. Test stale-answer-client-timeout with a value of zero, in this
	   case we take advantage of a log entry which shows that a stale
	   answer was promptly used before an attempt to refresh the RRset
	   is made. We also check, by activating a disabled authoritative
	   server, that the RRset was successfully refreshed after that.

	4. Test stale-answer-client-timeout 0 with stale-refresh-time 4, in
	   this test we want to ensure a couple things:

	   - If we have a stale RRSet entry in cache, a request must be
		 promptly answered with this data, while BIND must also attempt
		 to refresh the RRSet in background.

	   - If the attempt to refresh the RRSet times out, the RRSet must
		 have its stale-refresh-time window activated.

	   - If a new request for the same RRSet arrives, it must be
		 promptly answered with stale data due to stale-refresh-time
		 being active for this RRSet, in this case no attempt to refresh
		 the RRSet is made.

	   - Enable authoritative server, ensure that the RRSet was not
		 refreshed, to honor stale-refresh-time.

	   - Wait for stale-refresh-window time pass, send another request
		 for the same RRSet, this time we expect the answer to be the
		 stale entry in cache being hit due to
		 stale-answer-client-timeout 0.

	    - Send another request, this time we expect the answer to be an
		  active RRSet, since it must have been refreshed during the
		  previous request.
2021-01-25 10:47:14 -03:00
Diego Fronza
0ad6f594f6 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.
2021-01-25 10:47:14 -03:00
Diego Fronza
a12bf4b61b Adjusted serve-stale test
After the addition of stale-answer-client-timeout a test was broken due
to the following behavior expected by the test.

1. Prime cache data.example txt.
2. Disable authoritative server.
3. Send a query for data.example txt.
4. Recursive server will timeout and answer from cache with stale RRset.
5. Recursive server will activate stale-refresh-time due to the previous
   failure in attempting to refresh the RRset.
6. Send a query for data.example txt.
7. Expect stale answer from cache due to stale-refresh-time
window being active, even if authoritative server is up.

Problem is that in step 4, due to the new option
stale-answer-client-timeout, recursive server will answer with stale
data before the actual fetch completes.

Since the original fetch is still running in background, if we re-enable
the authoritative server during that time, the RRset will actually be
successfully refreshed, and stale-refresh-window will not be activated.

The next queries will fail because they expect the TTL of the RRset to
match the one in the stale cache, not the one just refreshed.

To solve this, we explicitly disable stale-answer-client-timeout for
this test, as it's not the feature we are interested in testing here
anyways.
2021-01-25 10:47:14 -03:00
Diego Fronza
171a5b7542 Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
2021-01-25 10:47:14 -03:00
Ondřej Surý
e493e04c0f Refactor TLSDNS module to work with libuv/ssl directly
* Following the example set in 634bdfb16d, the tlsdns netmgr
  module now uses libuv and SSL primitives directly, rather than
  opening a TLS socket which opens a TCP socket, as the previous
  model was difficult to debug.  Closes #2335.

* Remove the netmgr tls layer (we will have to re-add it for DoH)

* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
  library; move the OpenSSL initialization/deinitialization from dstapi
  needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()

* Add couple of new shims needed for OpenSSL 1.0.x

* When LibreSSL is used, require at least version 2.7.0 that
  has the best OpenSSL 1.1.x compatibility and auto init/deinit

* Enforce OpenSSL 1.1.x usage on Windows

* Added a TLSDNS unit test and implemented a simple TLSDNS echo
  server and client.
2021-01-25 09:19:22 +01:00
Evan Hunt
a8a49bb783 check whether taskset works before running cpu test
the taskset command used for the cpu system test seems
to be failing under vmware, causing a test failure. we
can try the taskset command and skip the test if it doesn't
work.
2021-01-20 13:37:52 -08:00
Matthijs Mekking
437d271483 Special case tests for lmdb
When compiling BIND 9 without lmdb, this is promoted from
'not operational' to 'not configured', resulting in a failure (and no
longer a warning) if ldmb-related configuration options are set.

Special case certain system tests to avoid test failures on systems
that do not have lmdb.
2021-01-19 10:12:40 +01:00
Matthijs Mekking
c6c3e2d074 Update doc files
Run make doc after all the code changes related to #1086.
2021-01-19 10:12:40 +01:00
Matthijs Mekking
87744f218d Remove a lot of obsoleted options
These options were ancient or made obsolete a long time ago, it is
safe to remove them.

Also stop printing ancient options, they should be treated the same as
unknown options.

Removed options: lwres, geoip-use-ecs, sit-secret, use-ixfr,
acache-cleaning-interval, acache-enable, additional-from-auth,
additional-from-cache, allow-v6-synthesis, dnssec-enable,
max-acache-size, nosit-udp-size, queryport-pool-ports,
queryport-pool-updateinterval, request-sit, use-queryport-pool, and
support-ixfr.
2021-01-19 10:12:40 +01:00
Matthijs Mekking
df435fc7da Remove the option 'dnssec-lookaside'
Obsoleted in 9.15, we can remove the option in 9.17.
2021-01-19 10:12:40 +01:00
Matthijs Mekking
a9828dd170 Update documentation on -E option
The -E option does not default to pkcs11 if --with-pkcs11 is set,
but always needs to be set explicitly.
2021-01-19 09:05:28 +01:00
Matthijs Mekking
8df629d0b2 Fix control flow issue CID 314969 in zoneconf.c
Coverity Scan identified the following issue in bin/named/zoneconf.c:

    *** CID 314969:  Control flow issues  (DEADCODE)
    /bin/named/zoneconf.c: 2212 in named_zone_inlinesigning()

    if (!inline_signing && !zone_is_dynamic &&
        cfg_map_get(zoptions, "dnssec-policy", &signing) == ISC_R_SUCCESS &&
        signing != NULL)
    {
        if (strcmp(cfg_obj_asstring(signing), "none") != 0) {
            inline_signing = true;
    >>>     CID 314969:  Control flow issues  (DEADCODE)
    >>>     Execution cannot reach the expression ""no"" inside this statement: "dns_zone_log(zone, 1, "inli...".
            dns_zone_log(
                zone, ISC_LOG_DEBUG(1), "inline-signing: %s",
                inline_signing
                ? "implicitly through dnssec-policy"
                : "no");
        } else {
                ...
        }
    }

This is because we first set 'inline_signing = true' and then check
its value in 'dns_zone_log'.
2021-01-18 11:48:09 +01:00
Matthijs Mekking
3be65246f8 Update serve-stale system test with new defaults 2021-01-11 11:13:45 +01:00
Matthijs Mekking
e15a433b23 Update serve-stale config defaults
Change the serve-stale configuration defaults so that they match the
recommendations from RFC 8767.
2021-01-11 11:13:45 +01:00
Mark Andrews
c36bd83822 Fix dnssec-signzone and dnssec-verify logging
The newlines need to be appended to the messages generated by report
in a atomic manner.
2021-01-04 03:59:10 +00:00
Matthijs Mekking
2fc42b598b Fix a quirky mkeys test failure
The mkeys system test started to fail after introducing support for
zones transitioning to unsigned without going bogus. This is because
there was actually a bug in the code: if you reconfigure a zone and
remove the "auto-dnssec" option, the zone is actually still DNSSEC
maintained. This is because in zoneconf.c there is no call
to 'dns_zone_setkeyopt()' if the configuration option is not used
(cfg_map_get(zoptions, "auto-dnssec", &obj) will return an error).

The mkeys system test implicitly relied on this bug: initially the
root zone is being DNSSEC maintained, then at some point it needs to
reset the root zone in order to prepare for some tests with bad
signatures. Because it needs to inject a bad signature, 'auto-dnssec'
is removed from the configuration.

The test pass but for the wrong reasons:

I:mkeys:reset the root server
I:mkeys:reinitialize trust anchors
I:mkeys:check positive validation (18)

The 'check positive validation' test works because the zone is still
DNSSEC maintained: The DNSSEC records in the signed root zone file on
disk are being ignored.

After fixing the bug/introducing graceful transition to insecure,
the root zone is no longer DNSSEC maintained after the reconfig.

The zone now explicitly needs to be reloaded because otherwise the
'check positive validation' test works against an old version of the
zone (the one with all the revoked keys), and the test will obviously
fail.
2020-12-23 09:02:11 +01:00
Matthijs Mekking
cf420b2af0 Treat dnssec-policy "none" as a builtin zone
Configure "none" as a builtin policy. Change the 'cfg_kasp_fromconfig'
api so that the 'name' will determine what policy needs to be
configured.

When transitioning a zone from secure to insecure, there will be
cases when a zone with no DNSSEC policy (dnssec-policy none) should
be using KASP. When there are key state files available, this is an
indication that the zone once was DNSSEC signed but is reconfigured
to become insecure.

If we would not run the keymgr, named would abruptly remove the
DNSSEC records from the zone, making the zone bogus. Therefore,
change the code such that a zone will use kasp if there is a valid
dnssec-policy configured, or if there are state files available.
2020-12-23 09:02:11 +01:00
Matthijs Mekking
756674f6d1 Small adjustments to kasp rndc_checkds function
Slightly better test output, and only call 'load keys' if the
'rndc checkds' call succeeded.
2020-12-23 09:02:11 +01:00
Matthijs Mekking
fa2e4e66b0 Add tests for going from secure to insecure
Add two test zones that will be reconfigured to go insecure, by
setting the 'dnssec-policy' option to 'none'.

One zone was using inline-signing (implicitly through dnssec-policy),
the other is a dynamic zone.

Two tweaks to the kasp system test are required: we need to set
when to except the CDS/CDS Delete Records, and we need to know
when we are dealing with a dynamic zone (because the logs to look for
are slightly different, inline-signing prints "(signed)" after the
zone name, dynamic zones do not).
2020-12-23 09:02:11 +01:00
Mark Andrews
09f00ad5dd PYTHON may be null
When Python is not present, PYTHON=$(command -v "@PYTHON@") will exit
the script with 1, prevent that by adding "|| true".
2020-12-23 09:16:26 +11:00
Matthijs Mekking
f1a097964c Add test for cpu affinity
Add a test to check BIND 9 honors CPU affinity mask. This requires
some changes to the start script, to construct the named command.
2020-12-23 09:16:26 +11:00
Mark Andrews
77372e9e24 Handle shared library platforms that don't support inter library dependancies 2020-12-21 01:09:45 +00:00
Mark Andrews
08df4f420a Reorder in library dependancy order 2020-12-21 01:09:45 +00:00
Michal Nowak
befcbcac28 Fix a reference to rndc(8) in named(8) manual page 2020-12-14 13:10:10 +01:00
Mark Andrews
eb1b29b19e Update dnssec-signzone -N soa-serial-format description
document the autoincrement when the serial would go backwards.
2020-12-11 10:48:28 +01:00
Ondřej Surý
ef685bab5c Print warning when falling back to increment soa serial method
When using the `unixtime` or `date` method to update the SOA serial,
`named` and `dnssec-signzone` would silently fallback to `increment`
method to prevent the new serial number to be smaller than the old
serial number (using the serial number arithmetics).  Add a warning
message when such fallback happens.
2020-12-11 10:48:28 +01:00
Mark Andrews
5684c21bcf Generate PTR records for DNS64 mapped ipv4only.arpa reverses.
Rather than generating CNAMES records pointing into IN-ADDR.ARPA,
generate PTR records directly as the names are known as per RFC 8880.
2020-12-11 14:17:52 +11:00
Mark Andrews
cdfe660326 Checking synthesis of AAAA of builtin ipv4only.arpa 2020-12-11 14:17:47 +11:00
Mark Andrews
c51ef23c22 Implement ipv4only.arpa forward and reverse zones as per RFC 8880. 2020-12-11 14:16:40 +11:00
Ondřej Surý
7ba18870dc Reformat sources using clang-format-11 2020-12-08 18:36:23 +01:00
Ondřej Surý
151852f428 Fix datarace when UDP/TCP connect fails and we are in nmthread
When we were in nmthread, the isc__nm_async_<proto>connect() function
executes in the same thread as the isc__nm_<proto>connect() and on a
failure, it would block indefinitely because the failure branch was
setting sock->active to false before the condition around the wait had a
chance to skip the WAIT().

This also fixes the zero system test being stuck on FreeBSD 11, so we
re-enable the test in the commit.
2020-12-03 13:56:34 +01:00
Michał Kępień
6697f6f066 Temporarily disable the "legacy" test on Windows
The current issues with the way dig handles TCP "connection refused"
errors cause the "legacy" system test to consistently fail on Windows
due to the expected strings not being present in dig output.
Temporarily disable the "legacy" system test on Windows by moving it
from the PARALLEL_COMMON list to the PARALLEL_UNIX list until the
situation is rectified.
2020-12-03 12:48:43 +01:00
Ondřej Surý
94afea9325 Don't use stack allocated buffer for uv_write()
On FreeBSD, the stack is destroyed more aggressively than on Linux and
that revealed a bug where we were allocating the 16-bit len for the
TCPDNS message on the stack and the buffer got garbled before the
uv_write() sendback was executed.  Now, the len is part of the uvreq, so
we can safely pass it to the uv_write() as the req gets destroyed after
the sendcb is executed.
2020-12-03 08:58:16 +01:00
Ondřej Surý
79c196fc77 Change the default value for nocookie-udp-size back to 4096
The DNS Flag Day 2020 reduced all the EDNS buffer sizes to 1232.  In
this commit, we revert the default value for nocookie-udp-size back to
4096 because the option is too obscure and most people don't realize
that they also need to change this configuration option in addition to
max-udp-size.
2020-12-02 11:06:42 +01:00
Ondřej Surý
0f57732d13 Skip the zero, xfer and ixfr tests on non-Linux platforms
Due to the platform differences, on non-Linux platforms, the xfer and
ixfr tests fails and zero test gets stuck.

This commit will get reverted when we add support for netmgr
multi-threading.
2020-12-01 17:24:06 +01:00
Ondřej Surý
634bdfb16d Refactor netmgr and add more unit tests
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested.  It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.

NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.

The changes that are included in this commit are listed here
(extensively, but not exclusively):

* The netmgr_test unit test was split into individual tests (udp_test,
  tcp_test, tcpdns_test and newly added tcp_quota_test)

* The udp_test and tcp_test has been extended to allow programatic
  failures from the libuv API.  Unfortunately, we can't use cmocka
  mock() and will_return(), so we emulate the behaviour with #define and
  including the netmgr/{udp,tcp}.c source file directly.

* The netievents that we put on the nm queue have variable number of
  members, out of these the isc_nmsocket_t and isc_nmhandle_t always
  needs to be attached before enqueueing the netievent_<foo> and
  detached after we have called the isc_nm_async_<foo> to ensure that
  the socket (handle) doesn't disappear between scheduling the event and
  actually executing the event.

* Cancelling the in-flight TCP connection using libuv requires to call
  uv_close() on the original uv_tcp_t handle which just breaks too many
  assumptions we have in the netmgr code.  Instead of using uv_timer for
  TCP connection timeouts, we use platform specific socket option.

* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}

  When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
  waiting for socket to either end up with error (that path was fine) or
  to be listening or connected using condition variable and mutex.

  Several things could happen:

    0. everything is ok

    1. the waiting thread would miss the SIGNAL() - because the enqueued
       event would be processed faster than we could start WAIT()ing.
       In case the operation would end up with error, it would be ok, as
       the error variable would be unchanged.

    2. the waiting thread miss the sock->{connected,listening} = `true`
       would be set to `false` in the tcp_{listen,connect}close_cb() as
       the connection would be so short lived that the socket would be
       closed before we could even start WAIT()ing

* The tcpdns has been converted to using libuv directly.  Previously,
  the tcpdns protocol used tcp protocol from netmgr, this proved to be
  very complicated to understand, fix and make changes to.  The new
  tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
  Closes: #2194, #2283, #2318, #2266, #2034, #1920

* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
  pass accepted TCP sockets between netthreads, but instead (similar to
  UDP) uses per netthread uv_loop listener.  This greatly reduces the
  complexity as the socket is always run in the associated nm and uv
  loops, and we are also not touching the libuv internals.

  There's an unfortunate side effect though, the new code requires
  support for load-balanced sockets from the operating system for both
  UDP and TCP (see #2137).  If the operating system doesn't support the
  load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
  on FreeBSD 12+), the number of netthreads is limited to 1.

* The netmgr has now two debugging #ifdefs:

  1. Already existing NETMGR_TRACE prints any dangling nmsockets and
     nmhandles before triggering assertion failure.  This options would
     reduce performance when enabled, but in theory, it could be enabled
     on low-performance systems.

  2. New NETMGR_TRACE_VERBOSE option has been added that enables
     extensive netmgr logging that allows the software engineer to
     precisely track any attach/detach operations on the nmsockets and
     nmhandles.  This is not suitable for any kind of production
     machine, only for debugging.

* The tlsdns netmgr protocol has been split from the tcpdns and it still
  uses the old method of stacking the netmgr boxes on top of each other.
  We will have to refactor the tlsdns netmgr protocol to use the same
  approach - build the stack using only libuv and openssl.

* Limit but not assert the tcp buffer size in tcp_alloc_cb
  Closes: #2061
2020-12-01 16:47:07 +01:00
Mark Andrews
ab0bf49203 Adjust default value of "max-recursion-queries"
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries.  Increase its default value from 75 to 100.
2020-12-01 23:47:23 +11:00
Michal Nowak
9567cefd39 Drop bin/tests/headerdep_test.sh.in
The bin/tests/headerdep_test.sh script has not been updated since it was
first created and it cannot be used as-is with the current BIND source
code.  Better tools (e.g. "include-what-you-use") emerged since the
script was committed back in 2000, so instead of trying to bring it up
to date, remove it from the source repository.
2020-11-27 13:11:41 +01:00
Mark Andrews
bd9155590e Check that missing cookies are handled 2020-11-26 20:48:46 +00:00
Michal Nowak
6428fc26af Write traceback file to the same directory as core file
The traceback files could overwrite each other on systems which do not
use different core dump file names for different processes.  Prevent
that by writing the traceback file to the same directory as the core
dump file.

These changes still do not prevent the operating system from overwriting
a core dump file if the same binary crashes multiple times in the same
directory and core dump files are named identically for different
processes.
2020-11-26 18:01:34 +01:00
Mark Andrews
0f0a006c7e Unify whitespace in bin/tests/system/run.sh.in
Replace tabs with spaces to make whitespace consistent across the entire
bin/tests/system/run.sh.in script.
2020-11-26 18:01:33 +01:00
Matthijs Mekking
6b5d7357df Detect NSEC3 salt collisions
When generating a new salt, compare it with the previous NSEC3
paremeters to ensure the new parameters are different from the
previous ones.

This moves the salt generation call from 'bin/named/*.s' to
'lib/dns/zone.c'. When setting new NSEC3 parameters, you can set a new
function parameter 'resalt' to enforce a new salt to be generated. A
new salt will also be generated if 'salt' is set to NULL.

Logging salt with zone context can now be done with 'dnssec_log',
removing the need for 'dns_nsec3_log_salt'.
2020-11-26 10:43:59 +01:00
Matthijs Mekking
3b4c764b43 Add zone context to "generated salt" logs 2020-11-26 10:43:59 +01:00
Matthijs Mekking
7878f300ff Move logging of salt in separate function
There may be a desire to log the salt without losing the context
of log module, level, and category.
2020-11-26 10:43:59 +01:00
Matthijs Mekking
6f97bb6b1f Change nsec3param salt config to saltlen
Upon request from Mark, change the configuration of salt to salt
length.

Introduce a new function 'dns_zone_checknsec3aram' that can be used
upon reconfiguration to check if the existing NSEC3 parameters are
in sync with the configuration. If a salt is used that matches the
configured salt length, don't change the NSEC3 parameters.
2020-11-26 10:43:59 +01:00
Matthijs Mekking
00c5dabea3 Add check for NSEC3 and key algorithms
NSEC3 is not backwards compatible with key algorithms that existed
before the RFC 5155 specification was published.
2020-11-26 10:43:59 +01:00
Matthijs Mekking
f10790b02d Disable one nsec3 test due to GL #2216
This known bug makes the test fail. There is no trivial fix so disable
test case for now.
2020-11-26 10:43:59 +01:00
Matthijs Mekking
a5b45bdd03 Add some NSEC3 optout tests
Make sure that just changing the optout value recreates the chain.
2020-11-26 10:43:27 +01:00