Commit Graph

32690 Commits

Author SHA1 Message Date
Diego Fronza
bea63000db Add system tests for stale-answer-client-timeout
This commit add 4 tests for the new option:
	1. Test default configuration of stale-answer-client-timeout, a
	   value of 1.8 seconds, with stale-refresh-time disabled.

	2. Test disabling of stale-answer-client-timeout.

	3. Test stale-answer-client-timeout with a value of zero, in this
	   case we take advantage of a log entry which shows that a stale
	   answer was promptly used before an attempt to refresh the RRset
	   is made. We also check, by activating a disabled authoritative
	   server, that the RRset was successfully refreshed after that.

	4. Test stale-answer-client-timeout 0 with stale-refresh-time 4, in
	   this test we want to ensure a couple things:

	   - If we have a stale RRSet entry in cache, a request must be
		 promptly answered with this data, while BIND must also attempt
		 to refresh the RRSet in background.

	   - If the attempt to refresh the RRSet times out, the RRSet must
		 have its stale-refresh-time window activated.

	   - If a new request for the same RRSet arrives, it must be
		 promptly answered with stale data due to stale-refresh-time
		 being active for this RRSet, in this case no attempt to refresh
		 the RRSet is made.

	   - Enable authoritative server, ensure that the RRSet was not
		 refreshed, to honor stale-refresh-time.

	   - Wait for stale-refresh-window time pass, send another request
		 for the same RRSet, this time we expect the answer to be the
		 stale entry in cache being hit due to
		 stale-answer-client-timeout 0.

	    - Send another request, this time we expect the answer to be an
		  active RRSet, since it must have been refreshed during the
		  previous request.

(cherry picked from commit 35fd039d03)
2021-01-29 10:39:20 +01:00
Diego Fronza
8324c3ddfe Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.

(cherry picked from commit e219422575)
2021-01-29 10:39:09 +01:00
Diego Fronza
0aebad96b5 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.

(cherry picked from commit 0ad6f594f6)
2021-01-29 10:38:58 +01:00
Diego Fronza
93ca968644 Adjusted serve-stale test
After the addition of stale-answer-client-timeout a test was broken due
to the following behavior expected by the test.

1. Prime cache data.example txt.
2. Disable authoritative server.
3. Send a query for data.example txt.
4. Recursive server will timeout and answer from cache with stale RRset.
5. Recursive server will activate stale-refresh-time due to the previous
   failure in attempting to refresh the RRset.
6. Send a query for data.example txt.
7. Expect stale answer from cache due to stale-refresh-time
window being active, even if authoritative server is up.

Problem is that in step 4, due to the new option
stale-answer-client-timeout, recursive server will answer with stale
data before the actual fetch completes.

Since the original fetch is still running in background, if we re-enable
the authoritative server during that time, the RRset will actually be
successfully refreshed, and stale-refresh-window will not be activated.

The next queries will fail because they expect the TTL of the RRset to
match the one in the stale cache, not the one just refreshed.

To solve this, we explicitly disable stale-answer-client-timeout for
this test, as it's not the feature we are interested in testing here
anyways.

(cherry picked from commit a12bf4b61b)
2021-01-29 10:38:47 +01:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Diego Fronza
7bf8950a0a Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.

(cherry picked from commit 74840ec50b)
2021-01-29 10:35:26 +01:00
Diego Fronza
f3bd27373d Avoid iterating name twice when constructing fctx->info
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.

(cherry picked from commit 49c40827f6)
2021-01-29 10:35:17 +01:00
Mark Andrews
443e743156 Merge branch '2420-xmlfreetextwriter-could-be-called-twice-v9_16' into 'v9_16'
Resolve "CID 316510: Memory - corruptions (USE_AFTER_FREE)"

See merge request isc-projects/bind9!4617
2021-01-28 22:40:11 +00:00
Mark Andrews
73abc2e09c Add CHANGES entry for [GL #2420]
(cherry picked from commit 95114f7d60)
2021-01-28 21:42:44 +00:00
Mark Andrews
f217a0cbae Stop xmlFreeTextWriter being called twice
xmlFreeTextWriter could be called twice if xmlDocDumpFormatMemoryEnc
failed.

(cherry picked from commit b5cf54252a)
2021-01-28 21:42:44 +00:00
Michał Kępień
c2856d8ee4 Merge branch '1515-fix-backported-changes-entry' into 'v9_16'
Fix backported CHANGES entry for GL #1515

See merge request isc-projects/bind9!4619
2021-01-28 13:41:51 +00:00
Michał Kępień
085704d780 Fix backported CHANGES entry for GL #1515
Backported changes should reuse the original CHANGES entry rather than
add a new one.  As the change addressing GL #1515 was originally
described in CHANGES entry 5362, make its backport reuse that entry.
2021-01-28 14:38:21 +01:00
Mark Andrews
0e6c0df91b Merge branch 'marka-changes-line-length-v9_16' into 'v9_16'
Marka changes line length v9 16

See merge request isc-projects/bind9!4615
2021-01-28 05:16:53 +00:00
Mark Andrews
bcb5af3666 fix overly long line
(cherry picked from commit 28449acded)
2021-01-28 15:08:41 +11:00
Mark Andrews
0dd5483cb0 Detect overly long CHANGES lines
(cherry picked from commit b1ecab6383)
2021-01-28 15:08:09 +11:00
Mark Andrews
5647c7a86b Merge branch '2413-after-upgrade-to-bind9-9-16-11-named-is-killed-with-status-11-segv-v9_16' into 'v9_16'
Resolve "after upgrade to bind9 9.16.11 named is killed with status=11/SEGV"

See merge request isc-projects/bind9!4614
2021-01-28 03:09:17 +00:00
Mark Andrews
4d08f4aa4f Add release note for [GL #2413]
(cherry picked from commit 79fad620a2)
2021-01-28 13:43:48 +11:00
Mark Andrews
cc1d9c6e7e Add CHANGES for [GL #2413]
(cherry picked from commit 5ec9999b28)
2021-01-28 13:43:47 +11:00
Mark Andrews
abb2eb9d97 Add a named acl example
(cherry picked from commit 3dee62cfa5)
2021-01-28 13:43:47 +11:00
Mark Andrews
6a0b751555 Require 'ctx' to be non-NULL in cfg_acl_fromconfig{,2}
(cherry picked from commit a8b55992a8)
2021-01-28 13:43:47 +11:00
Mark Andrews
85318b521d Pass an afg_aclconfctx_t structure to cfg_acl_fromconfig
in named_zone_inlinesigning.  A NULL pointer does not work.

(cherry picked from commit 2b3fcd7156)
2021-01-28 13:43:47 +11:00
Mark Andrews
9cd1271c2f Merge branch '2391-check-nsupdate-y-for-all-hmac-algorithms-v9_16' into 'v9_16'
Check that 'nsupdate -y' works for all HMAC algorithms

See merge request isc-projects/bind9!4612
2021-01-28 02:33:53 +00:00
Mark Andrews
a30675998b Check that 'nsupdate -y' works for all HMAC algorithms
(cherry picked from commit 4b01ba44ea)
2021-01-28 12:57:03 +11:00
Mark Andrews
2e28d06e18 Merge branch '2073-dnssec-verify-tries-all-keys-which-results-in-poor-performance-v9_16' into 'v9_16'
Resolve "dnssec-verify tries all keys which results in poor performance"

See merge request isc-projects/bind9!4611
2021-01-28 01:52:29 +00:00
Mark Andrews
63d22d4a57 Add CHANGES note
(cherry picked from commit 3f0859d223)
2021-01-28 12:18:31 +11:00
Mark Andrews
afc75de0cc Optimise dnssec-verify
dns_dnssec_keyfromrdata() only needs to be called once per DNSKEY
rather than once per verification attempt.

(cherry picked from commit c75b325832)
2021-01-28 12:18:31 +11:00
Mark Andrews
e5229a9c36 Merge branch '2342-rndc-retransfer-issues-misleading-diagnostic-on-primary-zone-v9_16' into 'v9_16'
Resolve "rndc retransfer issues misleading diagnostic on primary zone"

See merge request isc-projects/bind9!4610
2021-01-28 00:06:10 +00:00
Mark Andrews
008de80f5a Add CHANGES
(cherry picked from commit 1f55f49f21)
2021-01-28 09:44:26 +11:00
Mark Andrews
52b73db20b Check 'rndc retransfer' of primary error message
(cherry picked from commit 8f36b8567a)
2021-01-28 09:44:26 +11:00
Mark Andrews
b416d8fcdf Improve the diagnostic 'rndc retransfer' error message
(cherry picked from commit dd3520ae41)
2021-01-28 09:44:26 +11:00
Matthijs Mekking
59d78bf729 Merge branch '2178-dnssec-fromlabel-ec_crash-v9_16' into 'v9_16'
Resolve "dnssec-keyfromlabel  ECDSAP256SHA256 error on AEP Keypers HSM"

See merge request isc-projects/bind9!4602
2021-01-26 15:08:07 +00:00
Matthijs Mekking
4a36b6d918 Make opensslecdsa_parse use fromlabel
When 'opensslecdsa_parse()' encounters a label tag in the private key
file, load the private key with 'opensslecdsa_fromlabel()'. Otherwise
load it from the private structure.

This was attempted before with 'load_privkey()' and 'uses_engine()',
but had the same flaw as 'opensslecdsa_fromlabel()' had previously,
that is getting the private and public key separately, juggling with
pointers between EC_KEY and EVP_PKEY, did not create a valid
cryptographic key that could be used for signing.

(cherry picked from commit 57ac70ad46)
2021-01-26 15:04:59 +01:00
Matthijs Mekking
97185ecac2 Simplify opensslecdsa_fromlabel
The 'opensslecdsa_fromlabel()' function does not need to get the
OpenSSL engine twice to load the private and public key. Also no need
to call 'dst_key_to_eckey()' as the EC_KEY can be derived from the
loaded EVP_PKEY's.

Add some extra checks to ensure the key has the same base id and curve
(group nid) as the dst key.

Since we already have the EVP_PKEY, no need to call 'finalize_eckey()',
instead just set the right values in the key structure.

(cherry picked from commit 393052d6ff)
2021-01-26 15:04:51 +01:00
Matthijs Mekking
f555cec0af Replace EVP_DigestFinal with EVP_DigestFinal_ex
The openssl docs claim that EVP_DigestFinal() is obsolete and that
one should use EVP_DigestFinal_ex() instead.

(cherry picked from commit 1fcd0ef8bd)
2021-01-26 15:04:38 +01:00
Matthijs Mekking
56b0861049 Add notes and changes for [#2178]
(cherry picked from commit 37d11f5be0)
2021-01-26 15:04:30 +01:00
Matthijs Mekking
9e2ea5efb1 Don't set pubkey if eckey already has public key
The 'ecdsa_check()' function tries to correctly set the public key
on the eckey, but this should be skipped if the public key is
retrieved via the private key.

(cherry picked from commit 06b9724152)
2021-01-26 15:04:21 +01:00
Matthijs Mekking
e3acfb44d5 ECDSA code should not use RSA label
The 'opensslecdsa_tofile()' function tags the label as an RSA label,
that is a copy paste error and should be of course an ECDSA label.

(cherry picked from commit 46afeca8bf)
2021-01-26 15:04:11 +01:00
Matthijs Mekking
8b25d3ab57 Correctly update pointers to pubkey and privkey
The functions 'load_pubkey_from_engine()' and
'load_privkey_from_engine()' did not correctly store the pointers.

Update both functions to add 'EC_KEY_set_public_key()' and
'EC_KEY_set_private_key()' respectively, so that the pointers to
the public and private keys survive the "load from engine" functions.

(cherry picked from commit 01239691a1)
2021-01-26 15:04:03 +01:00
Matthijs Mekking
f66df9f1b7 load_pubkey_from_engine() should load public key
The 'function load_pubkey_from_engine()' made a call to the libssl
function 'ENGINE_load_private_key'.  This is a copy paste error and
should be 'ENGINE_load_public_key'.

(cherry picked from commit 370285a62d)
2021-01-26 15:03:43 +01:00
Ondřej Surý
d77ac5c767 Merge branch '2403-dig-has-a-fit-with-option-multi-typo-on-multi-v9_16' into 'v9_16'
Report unknown dash option during the pre-parse phase (v9.16)

See merge request isc-projects/bind9!4600
2021-01-26 13:26:47 +00:00
Mark Andrews
77ae42b68c Add CHANGES note for [GL #2403]
(cherry picked from commit 0b6da18f31)
2021-01-26 14:19:12 +01:00
Mark Andrews
702a00d10e Report unknown dash option during the pre-parse phase
(cherry picked from commit 3361c0d6f8)
2021-01-26 14:18:54 +01:00
Ondřej Surý
2bcb00919f Merge branch '2349-backport-max-ixfr-ration-v9_16' into 'v9_16'
Backport max-ixfr-ratio to BIND 9.16

See merge request isc-projects/bind9!4598
2021-01-26 12:10:22 +00:00
Evan Hunt
f5362ed135 CHANGES and release note 2021-01-26 12:38:32 +01:00
Evan Hunt
62202b0e6d prevent ixfr/ns1 being removed 2021-01-26 12:38:32 +01:00
Evan Hunt
077e2c2a74 add serial number to "transfer ended" log messages 2021-01-26 12:38:32 +01:00
Evan Hunt
9529d1ed0d add a system test for AXFR fallback when max-ixfr-ratio is exceeded
also cleaned up the ixfr system test:

- use retry_quiet when applicable
- use scripts to generate test zones
- improve consistency
2021-01-26 12:38:32 +01:00
Evan Hunt
2df6ffc051 check size ratio when responding to IXFR requests 2021-01-26 12:38:32 +01:00
Evan Hunt
9950247c78 improve calculation of database transfer size
- change name of 'bytes' to 'xfrsize' in dns_db_getsize() parameter list
  and related variables; this is a more accurate representation of what
  the function is doing
- change the size calculations in dns_db_getsize() to more accurately
  represent the space needed for a *XFR message or journal file to contain
  the data in the database. previously we returned the sizes of all
  rdataslabs, including header overhead and offset tables, which
  resulted in the database size being reported as much larger than the
  equivalent *XFR or journal.
- map files caused a particular problem here: the fullname can't be
  determined from the node while a file is being deserialized, because
  the uppernode pointers aren't set yet. so we store "full name length"
  in the dns_rbtnode structure while serializing, and clear it after
  deserialization is complete.
2021-01-26 12:38:32 +01:00
Evan Hunt
70df95e9f5 dns_journal_iter_init() can now return the size of the delta
the call initailizing a journal iterator can now optionally return
to the caller the size in bytes of an IXFR message (not including
DNS header overhead, signatures etc) containing the differences from
the beginning to the ending serial number.

this is calculated by scanning the journal transaction headers to
calculate the transfer size. since journal file records contain a length
field that is not included in IXFR messages, we subtract out the length
of those fields from the overall transaction length.

this necessitated adding an "RR count" field to the journal transaction
header, so we know how many length fields to subract. NOTE: this will
make existing journal files stop working!
2021-01-26 12:38:32 +01:00