Commit Graph

32737 Commits

Author SHA1 Message Date
Matthijs Mekking
70fbbedc24 Adjust serve-stale test
The number of queries to use in the burst can be reduced, as we have
a very low fetch limit of 1.

The dig command in 'wait_for_fetchlimits()' should time out sooner as
we expect a SERVFAIL to be returned promptly.

Enabling serve-stale can be done before hitting fetch-limits. This
reduces the chance that the resolver queries time out and fetch count
is reset. The chance of that happening is already slim because
'resolver-query-timeout' is 10 seconds, but better to first let the
data become stale rather than doing that while attempting to resolve.

(cherry picked from commit 00f575e7ef)
2021-02-08 16:10:12 +01:00
Matthijs Mekking
2afaff75ed Use stale on error also when unable to recurse
The 'query_usestale()' function was only called when in
'query_gotanswer()' and an unexpected error occurred. This may have
been "quota reached", and thus we were in some cases returning
stale data on fetch-limits (and if serve-stale enabled of course).

But we can also hit fetch-limits when recursing because we are
following a referral (in 'query_notfound()' and
'query_delegation_recurse()'). Here we should also check for using
stale data in case an error occurred.

Specifically don't check for using stale data when refetching a
zero TTL RRset from cache.

Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()'
function to avoid code duplication.

(cherry picked from commit 8bcd7fe69e)
2021-02-08 16:10:03 +01:00
Matthijs Mekking
e02ce9e833 Add notes and change entry for [#2434]
This concludes the serve-stale improvements.

(cherry picked from commit ed8421693c)
2021-02-08 16:09:36 +01:00
Matthijs Mekking
2e28e5587e Add test for serve-stale /w fetch-limits
Add a test case when fetch-limits are reached and we have stale data
in cache.

This test starts with a positive answer for 'data.example/TXT' in
cache.

1. Reload named.conf to set fetch limits.
2. Disable responses from the authoritative server.
3. Now send a batch of queries to the resolver, until hitting the
   fetch limits. We can detect this by looking at the response RCODE,
   at some point we will see SERVFAIL responses.
4. At that point we will turn on serve-stale.
5. Clients should see stale answers now.
6. An incoming query should not set the stale-refresh-time window,
   so a following query should still get a stale answer because of a
   resolver failure (and not because it was in the stale-refresh-time
   window).

(cherry picked from commit 11b74fc176)
2021-02-08 16:07:43 +01:00
Matthijs Mekking
dbf5428629 Only start stale refresh window when resuming
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.

Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.

(This commit also wraps some docstrings to 80 characters width)

(cherry picked from commit aabdedeae3)
2021-02-08 16:07:43 +01:00
Matthijs Mekking
809ec0a224 Use stale data also if we are not resuming
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.

This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).

If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.

(cherry picked from commit c6fd02aed5)
2021-02-08 16:07:43 +01:00
Mark Andrews
6642078698 Merge branch '2468-cid-318094-null-pointer-dereferences-reverse_inull-v9_16' into 'v9_16'
Remove redundant 'version == NULL' check

See merge request isc-projects/bind9!4663
2021-02-08 05:39:01 +00:00
Mark Andrews
8092b7eec6 Remove redundant 'version == NULL' check
*** CID 318094:  Null pointer dereferences  (REVERSE_INULL)
    /lib/dns/rbtdb.c: 1389 in newversion()
    1383     	version->xfrsize = rbtdb->current_version->xfrsize;
    1384     	RWUNLOCK(&rbtdb->current_version->rwlock, isc_rwlocktype_read);
    1385     	rbtdb->next_serial++;
    1386     	rbtdb->future_version = version;
    1387     	RBTDB_UNLOCK(&rbtdb->lock, isc_rwlocktype_write);
    1388
       CID 318094:  Null pointer dereferences  (REVERSE_INULL)
       Null-checking "version" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    1389     	if (version == NULL) {
    1390     		return (result);
    1391     	}
    1392
    1393     	*versionp = version;
    1394

(cherry picked from commit 456d53d1fb)
2021-02-08 16:17:52 +11:00
Mark Andrews
6781bf3eac Merge branch '1697-isc_rwlock_init-can-no-longer-fail-in-master-clean-up-calls-v9_16' into 'v9_16'
Cleanup redundant isc_rwlock_init() result checks

See merge request isc-projects/bind9!4662
2021-02-08 05:13:40 +00:00
Mark Andrews
a900d79ea8 Cleanup redundant isc_rwlock_init() result checks
(cherry picked from commit 3b11bacbb7)
2021-02-08 15:13:49 +11:00
Mark Andrews
7358a58db4 Merge branch '2469-cid-281461-untrusted-loop-bound-v9_16' into 'v9_16'
Attempt to silence untrusted loop bound

See merge request isc-projects/bind9!4661
2021-02-08 03:59:47 +00:00
Mark Andrews
c2a5b88275 Attempt to silence untrusted loop bound
Assign hit_len + key_len to len and test the result
rather than recomputing and letting the compiler simplify.

    213        isc_region_consume(&region, 2); /* hit length + algorithm */
        9. tainted_return_value: Function uint16_fromregion returns tainted data. [show details]
        10. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data.
        11. tainted_return_value: Function uint16_fromregion returns tainted data.
        12. tainted_data_transitive: Call to function uint16_fromregion with tainted argument *region.base returns tainted data.
        13. var_assign: Assigning: key_len = uint16_fromregion(&region), which taints key_len.
    214        key_len = uint16_fromregion(&region);
        14. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        15. Condition key_len == 0, taking false branch.
    215        if (key_len == 0) {
    216                RETERR(DNS_R_FORMERR);
    217        }
        16. Condition !!(_r->length >= _l), taking true branch.
        17. Condition !!(_r->length >= _l), taking true branch.
    218        isc_region_consume(&region, 2);
        18. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        19. Condition region.length < (unsigned int)(hit_len + key_len), taking false branch.
    219        if (region.length < (unsigned)(hit_len + key_len)) {
    220                RETERR(DNS_R_FORMERR);
    221        }
    222
        20. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        21. Condition _r != 0, taking false branch.
    223        RETERR(mem_tobuffer(target, rr.base, 4 + hit_len + key_len));
        22. lower_bounds: Casting narrower unsigned key_len to wider signed type int effectively tests its lower bound.
        23. var_assign_var: Compound assignment involving tainted variable 4 + hit_len + key_len to variable source->current taints source->current.
    224        isc_buffer_forward(source, 4 + hit_len + key_len);
    225
    226        dns_decompress_setmethods(dctx, DNS_COMPRESS_NONE);

    CID 281461 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR)
        24. tainted_data: Using tainted variable source->active - source->current as a loop boundary.
    Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
    227        while (isc_buffer_activelength(source) > 0) {
    228                dns_name_init(&name, NULL);
    229                RETERR(dns_name_fromwire(&name, source, dctx, options, target));
    230        }

(cherry picked from commit 2f946c831a)
2021-02-08 14:05:11 +11:00
Michal Nowak
aa98a53cc0 Merge branch 'mnowak/check-arm-pdf-validity-v9_16' into 'v9_16'
[v9_16] Check PDF file structure with QPDF

See merge request isc-projects/bind9!4651
2021-02-03 17:01:43 +00:00
Michal Nowak
9f7669cab3 Check PDF file structure with QPDF
"qpdf --check" checks file structure of generated ARM PDF.

(cherry picked from commit 359708b9d6)
2021-02-03 17:50:08 +01:00
Matthijs Mekking
241c4bd613 Merge branch '2377-allow-a-records-below-an-_spf-label-as-a-check-names-exception-v9_16' into 'v9_16'
Resolve "Allow A records below an '_spf' label as a check-names exception"

See merge request isc-projects/bind9!4650
2021-02-03 16:48:58 +00:00
Mark Andrews
4bd8bcf236 Add release note entry
(cherry picked from commit 1294918702)
2021-02-03 16:32:43 +01:00
Mark Andrews
86f7b64408 Add CHANGES
(cherry picked from commit 2b5091ac17)
2021-02-03 16:26:40 +01:00
Mark Andrews
7976a7264a Check that A record is accepted with _spf label present
(cherry picked from commit a3b2b86e7f)
2021-02-03 16:26:32 +01:00
Mark Andrews
6da9f238d4 Allow A records below '_spf' labels as recommend by RFC7208
(cherry picked from commit 63c16c8506)
2021-02-03 16:26:25 +01:00
Matthijs Mekking
5710207191 Merge branch '2375-dnssec-policy-three-is-a-crowd-rollover-bug-v9_16' into 'v9_16'
Resolve "three is a crowd" dnssec-policy key rollover bug (9.16)

See merge request isc-projects/bind9!4649
2021-02-03 15:11:10 +00:00
Matthijs Mekking
76f7f598b3 Add kasp test todo for [#2375]
This bugfix has been manually verified but is missing a unit test.
Created GL #2471 to track this.

(cherry picked from commit 189f5a3f28)
2021-02-03 15:48:29 +01:00
Matthijs Mekking
ce2a37a990 Use NUM_KEYSTATES constant where appropriate
We use the number 4 a lot when working on key states. Better to use
the NUM_KEYSTATES constant instead.

(cherry picked from commit 98ace6d97d)
2021-02-03 15:48:20 +01:00
Matthijs Mekking
c0e98d8adb Add change and release note for [#2375]
News worthy.

(cherry picked from commit 7947f7f9c6)
2021-02-03 15:48:09 +01:00
Matthijs Mekking
a8fba11da9 Cleanup keymgr.c
Three small cleanups:

1. Remove an unused keystr/dst_key_format.
2. Initialize a dst_key_state_t state with NA.
3. Update false comment about local policy (local policy only adds
   barrier on transitions to the RUMOURED state, not the UNRETENTIVE
   state).

(cherry picked from commit 189d9a2d21)
2021-02-03 15:47:40 +01:00
Matthijs Mekking
ceac392e19 Fix DS/DNSKEY hidden or chained functions
There was a bug in function 'keymgr_ds_hidden_or_chained()'.

The funcion 'keymgr_ds_hidden_or_chained()' implements (3e) of rule2
as defined in the "Flexible and Robust Key Rollover" paper. The rules
says: All DS records need to be in the HIDDEN state, or if it is not
there must be a key with its DNSKEY and KRRSIG in OMNIPRESENT, and
its DS in the same state as the key in question. In human langauge,
if all keys have their DS in HIDDEN state you can do what you want,
but if a DS record is available to some validators, there must be
a chain of trust for it.

Note that the barriers on transitions first check if the current
state is valid, and then if the next state is valid too. But
here we falsely updated the 'dnskey_omnipresent' (now 'dnskey_chained')
with the next state. The next state applies to 'key' not to the state
to be checked. Updating the state here leads to (true) always, because
the key that will move its state will match the falsely updated
expected state. This could lead to the assumption that Key 2 would be
a valid chain of trust for Key 1, while clearly the presence of any
DS is uncertain.

The fix here is to check if the DNSKEY and KRRSIG are in OMNIPRESENT
state for the key that does not have its DS in the HIDDEN state, and
only if that is not the case, ensure that there is a key with the same
algorithm, that provides a valid chain of trust, that is, has its
DNSKEY, KRRSIG, and DS in OMNIPRESENT state.

The changes in 'keymgr_dnskey_hidden_or_chained()' are only cosmetical,
renaming 'rrsig_omnipresent' to 'rrsig_chained' and removing the
redundant initialization of the DST_KEY_DNSKEY expected state to NA.

(cherry picked from commit 291bcc3721)
2021-02-03 15:47:30 +01:00
Matthijs Mekking
6ff0e99fa7 Update keymgr_key_is_successor() calls
The previous commit changed the function definition of
'keymgr_key_is_successor()', this commit updates the code where
this function is called.

In 'keymgr_key_exists_with_state()' the logic is also updated slightly
to become more readable. First handle the easy cases:
- If the key does not match the state, continue with the next key.
- If we found a key with matching state, and there is no need to
  check the successor relationship, return (true).
- Otherwise check the successor relationship.

In 'keymgr_key_has_successor()' it is enough to check if a key has
a direct successor, so instead of calling 'keymgr_key_is_successor()',
we can just check 'keymgr_direct_dep()'.

In 'dns_keymgr_run()', we want to make sure that there is no
dependency on the keys before retiring excess keys, so replace
'keymgr_key_is_successor()' with 'keymgr_dep()'.

(cherry picked from commit 600915d1b2)
2021-02-03 15:47:23 +01:00
Matthijs Mekking
5e40515671 Implement Equation(2) of "Flexible Key Rollover"
So far the key manager could only deal with two keys in a rollover,
because it used a simplified version of the successor relationship
equation from "Flexible and Robust Key Rollover" paper. The simplified
version assumes only two keys take part in the key rollover and it
for that it is enough to check the direct relationship between two
keys (is key x the direct predecessor of key z and is key z the direct
successor of key x?).

But when a third key (or more keys) comes into the equation, the key
manager would assume that one key (or more) is redundant and removed
it from the zone prematurely.

Fix by implementing Equation(2) correctly, where we check for
dependencies on keys:

z ->T x: Dep(x, T) = ∅ ∧
         (x ∈ Dep(z, T) ∨
          ∃ y ∈ Dep(z, T)(y != z ∧ y ->T x ∧ DyKyRySy = DzKzRzSz))

This says: key z is a successor of key x if:
- key x depends on key z if z is a direct successor of x,
- or if there is another key y that depends on key z that has identical
  key states as key z and key y is a successor of key x.
- Also, key x may not have any other keys depending on it.

This is still a simplified version of Equation(2) (but at least much
better), because the paper allows for a set of keys to depend on a
key. This is defined as the set Dep(x, T). Keys in the set Dep(x, T)
have a dependency on key x for record type T. The BIND implementation
can only have one key in the set Dep(x, T). The function
'keymgr_dep()' stores this key in 'uint32_t *dep' if there is a
dependency.

There are two scenarios where multiple keys can depend on a single key:

1. Rolling keys is faster than the time required to finish the
   rollover procedure. This scenario is covered by the recursive
   implementation, and checking for a chain of direct dependencies
   will suffice.

2. Changing the policy, when a zone is requested to be signed with
   a different key length for example. BIND 9 will not mark successor
   relationships in this case, but tries to move towards the new
   policy. Since there is no successor relationship, the rules are
   even more strict, and the DNSSEC reconfiguration is actually slower
   than required.

Note: this commit breaks the build, because the function definition
of 'keymgr_key_is_successor' changed. This will be fixed in the
following commit.

(cherry picked from commit cc38527b63)
2021-02-03 15:47:14 +01:00
Michał Kępień
84911f53f1 Merge branch '2448-tweak-sphinx-build-invocations-v9_16' into 'v9_16'
[v9_16] Tweak sphinx-build invocations

See merge request isc-projects/bind9!4646
2021-02-03 12:04:36 +00:00
Michał Kępień
36ea46fbe0 Explicitly specify sphinx-build cache directories
When sphinx-build is invoked without the -d command line switch, the
default path to the directory in which cached environment and doctree
files are placed is OUTPUTDIR/.doctrees.  This causes the contents of
such cache directories to needlessly be included in BIND release
directories.  Avoid that by employing the -d command line switch to make
each sphinx-build process use a cache directory outside the output
directory.  Make sure these cache directories are separate from each
other as well, to prevent multiple sphinx-build processes running in
parallel from interfering with each other.
2021-02-03 12:18:45 +01:00
Michał Kępień
7e0c374d83 Make sphinx-build warnings fatal
In order to prevent documentation building issues from being glossed
over, pass the -W command line switch to all sphinx-build invocations.
This causes the latter to return with a non-zero exit code whenever any
Sphinx warnings are triggered.

(cherry picked from commit 51479ed9a3)
2021-02-03 12:18:45 +01:00
Matthijs Mekking
ffc5e54fd8 Merge branch '2406-kasp-init-inactive-delete-metadata-v9_16' into 'v9_16'
Resolve "kasp: look at Inactive/Delete when initializing state files" (9.16)

See merge request isc-projects/bind9!4643
2021-02-03 08:55:18 +00:00
Matthijs Mekking
3f6dafe1f4 Remove initialize goal code
Since keys now have their goals initialized in 'keymgr_key_init()',
remove this redundant piece of code in 'keymgr_key_run()'.

(cherry picked from commit 82632fa6d9)
2021-02-03 08:42:51 +01:00
Matthijs Mekking
4170288a91 Correctly initialize old key with state file
The 'key_init()' function is used to initialize a state file for keys
that don't have one yet. This can happen if you are migrating from a
'auto-dnssec' or 'inline-signing' to a 'dnssec-policy' configuration.

It did not look at the "Inactive" and "Delete" timing metadata and so
old keys left behind in the key directory would also be considered as
a possible active key. This commit fixes this and now explicitly sets
the key goal to OMNIPRESENT for keys that have their "Active/Publish"
timing metadata in the past, but their "Inactive/Delete" timing
metadata in the future. If the "Inactive/Delete" timing metadata is
also in the past, the key goal is set to HIDDEN.

If the "Inactive/Delete" timing metadata is in the past, also the
key states are adjusted to either UNRETENTIVE or HIDDEN, depending on
how far in the past the metadata is set.

(cherry picked from commit 76cf72e65a)
2021-02-03 08:42:32 +01:00
Matthijs Mekking
314accf7ef Update legacy-keys kasp test
The 'legacy-keys.kasp' test checks that a zone with key files but not
yet state files is signed correctly. This test is expanded to cover
the case where old key files still exist in the key directory. This
covers bug #2406 where keys with the "Delete" timing metadata are
picked up by the keymgr as active keys.

Fix the 'legacy-keys.kasp' test, by creating the right key files
(for zone 'legacy-keys.kasp', not 'legacy,kasp').

Use a unique policy for this zone, using shorter lifetimes.

Create two more keys for the zone, and use 'dnssec-settime' to set
the timing metadata in the past, long enough ago so that the keys
should not be considered by the keymgr.

Update the 'key_unused()' test function, and consider keys with
their "Delete" timing metadata in the past as unused.

Extend the test to ensure that the keys to be used are not the old
predecessor keys (with their "Delete" timing metadata in the past).

Update the test so that the checks performed are consistent with the
newly configured policy.

(cherry picked from commit d4b2b7072d)
2021-02-03 08:41:00 +01:00
Mark Andrews
9448cb15c9 Merge branch '2093-tsan-files-are-not-being-captured-by-unit-tests-2' into 'v9_16'
Resolve "tsan files are not being captured by unit tests"

See merge request isc-projects/bind9!4470
2021-02-02 13:02:06 +00:00
Mark Andrews
48715f6ad4 Look for tsan files in the top build directory 2021-02-02 12:33:19 +00:00
Michal Nowak
5fd71c560e Merge branch 'mnowak/add-rsabigexponent-README-v9_16' into 'v9_16'
[v9_16] Add README.md file to rsabigexponent system test

See merge request isc-projects/bind9!4631
2021-01-29 14:54:57 +00:00
Michal Nowak
7aa33bf5e4 Add README.md file to rsabigexponent system test
This README.md describes why is bigkey needed.

(cherry picked from commit a247f24dfa)
2021-01-29 15:54:07 +01:00
Matthijs Mekking
49f6324f2c Merge branch '2442-tsan-error-lib-dns-rbtdb-c-v9_16' into 'v9_16'
Fix race condition on check_stale_header

See merge request isc-projects/bind9!4630
2021-01-29 14:50:22 +00:00
Diego Fronza
51663408bc Fix race condition on check_stale_header
This commit fix a race that could happen when two or more threads have
failed to refresh the same RRset, the threads could simultaneously
attempt to update the header->last_refresh_fail_ts field in
check_stale_header, a field used to implement stale-refresh-time.

By making this field atomic we avoid such race.

(cherry picked from commit c75575e350)
2021-01-29 15:29:00 +01:00
Matthijs Mekking
5d83066455 Merge branch '2247-add-serve-stale-option-to-set-client-timeout-v9_16' into 'v9_16'
Resolve "Add serve-stale option to set client timeout"

See merge request isc-projects/bind9!4625
2021-01-29 10:31:06 +00:00
Matthijs Mekking
dba01187e3 Rewrap comments to 80 char width serve-stale test
(cherry picked from commit d8c6655d7d)
2021-01-29 10:43:51 +01:00
Matthijs Mekking
99c72bf5da Update code flow in query.c wrt stale data
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.

Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.

Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.

Update documentation to be more verbose and to match then new code
flow.

(cherry picked from commit fa0c9280d2)
2021-01-29 10:43:41 +01:00
Diego Fronza
0e62c53c5b Extracted common function from query_lookup and query_refresh_rrset
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.

Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.

(cherry picked from commit 966060c03b)
2021-01-29 10:43:27 +01:00
Diego Fronza
5cbb28a40e Small optimization in query_usestale
This commit makes the code in query_usestale easier to follow, it also
doesn't attach/detach to the database if stale answers are not enabled.

(cherry picked from commit f89ac07b28)
2021-01-29 10:41:39 +01:00
Diego Fronza
fe4e0b889c Add CHANGES note for [GL #2247]
(cherry picked from commit 42c789c763)
2021-01-29 10:41:28 +01:00
Diego Fronza
b89fc52cd1 Add documentation for stale-answer-client-timeout
(cherry picked from commit 6ab9070457)
2021-01-29 10:39:31 +01:00
Diego Fronza
bea63000db Add system tests for stale-answer-client-timeout
This commit add 4 tests for the new option:
	1. Test default configuration of stale-answer-client-timeout, a
	   value of 1.8 seconds, with stale-refresh-time disabled.

	2. Test disabling of stale-answer-client-timeout.

	3. Test stale-answer-client-timeout with a value of zero, in this
	   case we take advantage of a log entry which shows that a stale
	   answer was promptly used before an attempt to refresh the RRset
	   is made. We also check, by activating a disabled authoritative
	   server, that the RRset was successfully refreshed after that.

	4. Test stale-answer-client-timeout 0 with stale-refresh-time 4, in
	   this test we want to ensure a couple things:

	   - If we have a stale RRSet entry in cache, a request must be
		 promptly answered with this data, while BIND must also attempt
		 to refresh the RRSet in background.

	   - If the attempt to refresh the RRSet times out, the RRSet must
		 have its stale-refresh-time window activated.

	   - If a new request for the same RRSet arrives, it must be
		 promptly answered with stale data due to stale-refresh-time
		 being active for this RRSet, in this case no attempt to refresh
		 the RRSet is made.

	   - Enable authoritative server, ensure that the RRSet was not
		 refreshed, to honor stale-refresh-time.

	   - Wait for stale-refresh-window time pass, send another request
		 for the same RRSet, this time we expect the answer to be the
		 stale entry in cache being hit due to
		 stale-answer-client-timeout 0.

	    - Send another request, this time we expect the answer to be an
		  active RRSet, since it must have been refreshed during the
		  previous request.

(cherry picked from commit 35fd039d03)
2021-01-29 10:39:20 +01:00
Diego Fronza
8324c3ddfe Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.

(cherry picked from commit e219422575)
2021-01-29 10:39:09 +01:00
Diego Fronza
0aebad96b5 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.

(cherry picked from commit 0ad6f594f6)
2021-01-29 10:38:58 +01:00