Commit Graph

761 Commits

Author SHA1 Message Date
Mark Andrews
7a703244ed Address theoretical buffer overrun in recent change
The strlcat() call was wrong.

    *** CID 316608:  Memory - corruptions  (OVERRUN)
    /lib/dns/resolver.c: 5017 in fctx_create()
    5011     	 * Make fctx->info point to a copy of a formatted string
    5012     	 * "name/type".
    5013     	 */
    5014     	dns_name_format(name, buf, sizeof(buf));
    5015     	dns_rdatatype_format(type, typebuf, sizeof(typebuf));
    5016     	p = strlcat(buf, "/", sizeof(buf));
    >>>     CID 316608:  Memory - corruptions  (OVERRUN)
    >>>     Calling "strlcat" with "buf + p" and "1036UL" is suspicious because "buf" points into a buffer of 1036 bytes and the function call may access "(char *)(buf + p) + 1035UL". [Note: The source code implementation of the function has been overridden by a builtin model.]
    5017     	strlcat(buf + p, typebuf, sizeof(buf));
    5018     	fctx->info = isc_mem_strdup(mctx, buf);
    5019
    5020     	FCTXTRACE("create");
    5021     	dns_name_init(&fctx->name, NULL);
    5022     	dns_name_dup(name, mctx, &fctx->name);

(cherry picked from commit 59bf6e71e2)
2021-03-03 10:55:38 +01:00
Mark Andrews
a900d79ea8 Cleanup redundant isc_rwlock_init() result checks
(cherry picked from commit 3b11bacbb7)
2021-02-08 15:13:49 +11:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Diego Fronza
f3bd27373d Avoid iterating name twice when constructing fctx->info
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.

(cherry picked from commit 49c40827f6)
2021-01-29 10:35:17 +01:00
Mark Andrews
5c10b5a4e8 Adjust default value of "max-recursion-queries"
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries.  Increase its default value from 75 to 100.

(cherry picked from commit ab0bf49203)
2020-12-02 00:53:49 +11:00
Mark Andrews
df5f076a02 Tighten DNS COOKIE response handling
Fallback to TCP when we have already seen a DNS COOKIE response
from the given address and don't have one in this UDP response. This
could be a server that has turned off DNS COOKIE support, a
misconfigured anycast server with partial DNS COOKIE support, or a
spoofed response. Falling back to TCP is the correct behaviour in
all 3 cases.

(cherry picked from commit 0e3b1f5a25)
2020-11-27 08:15:11 +11:00
Mark Andrews
e554daa76c fctx->id was not initalised 2020-11-09 21:48:22 +00:00
Mark Andrews
84922b2dc7 Restore the dns_message_reset() call before the dns_dispatch_getnext()
This was accidentally lost in the process of moving rmessage from fctx
to query.  Without this dns_message_setclass() will fail.

(cherry picked from commit 1f63bb15b3)
2020-10-08 16:27:10 +11:00
Ondřej Surý
f0989bdf03 The dns_message_create() cannot fail, change the return to void
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.

(cherry picked from commit 33eefe9f85)
2020-09-30 14:26:26 +02:00
Diego Fronza
f557681472 Properly handling dns_message_t shared references
This commit fix the problems that arose when moving the dns_message_t
object from fetchctx_t to the query structure.

Since the lifetime of query objects are different than that of a
fetchctx and the dns_message_t object held by the query may be being
used by some external module, e.g. validator, even after the query
may have been destroyed, propery handling of the references to the
message were added in this commit to avoid accessing an already
destroyed object.

Specifically, in rctx_done(), a reference to the message is attached
at the beginning of the function and detached at the end, since a
possible call to fctx_cancelquery() would release the dns_message_t
object, and in the next lines of code a call to rctx_nextserver()
or rctx_chaseds() would require a valid pointer to the same object.

In valcreate() a new reference is attached to the message object,
this ensures that if the corresponding query object is destroyed
before the validator attempts to access it, no invalid pointer
access occurs.

In validated() we have to attach a new reference to the message,
since we destroy the validator object at the beginning of the
function, and we need access to the message in the next lines of
the same function.

rctx_nextserver() and rctx_chaseds() functions were adapted to
receive a new parameter of dns_message_t* type, this was so they
could receive a valid reference to a dns_message_t since using the
response context respctx_t to access the message through
rctx->query->rmessage could lead to an already released reference
due to the query being canceled.

(cherry picked from commit cde6227a68)
2020-09-30 11:35:11 +10:00
Diego Fronza
dfa2b7a247 Fix invalid dns message state in resolver's logic
The assertion failure REQUIRE(msg->state == DNS_SECTION_ANY), caused
by calling dns_message_setclass within function resquery_response()
in resolver.c, was happening due to wrong management of dns message_t
objects used to process responses to the queries issued by the
resolver.

Before the fix, a resolver's fetch context (fetchctx_t) would hold
a pointer to the message, this same reference would then be used
over all the attempts to resolve the query, trying next server,
etc... for this to work the message object would have it's state
reset between each iteration, marking it as ready for a new processing.

The problem arose in a scenario with many different forwarders
configured, managing the state of the dns_message_t object was
lacking better synchronization, which have led it to a invalid
dns_message_t state in resquery_response().

Instead of adding unnecessarily complex code to synchronize the
object, the dns_message_t object was moved from fetchctx_t structure
to the query structure, where it better belongs to, since each query
will produce a response, this way whenever a new query is created
an associated dns_messate_t is also created.

This commit deals mainly with moving the dns_message_t object from
fetchctx_t to the query structure.

(cherry picked from commit 02f9e125c1)
2020-09-30 11:34:57 +10:00
Diego Fronza
da84f8d1fd Refactored dns_message_t for using attach/detach semantics
This commit will be used as a base for the next code updates in
order to have a better control of dns_message_t objects' lifetime.

(cherry picked from commit 12d6d13100)
2020-09-30 11:34:42 +10:00
Evan Hunt
df698d73f4 update all copyright headers to eliminate the typo 2020-09-14 16:50:58 -07:00
Mark Andrews
f6ba3ec731 Address lock-order-inversion
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=12714)
  Cycle in lock order graph: M100252 (0x7b7c00010a08) => M1171 (0x7b7400000dc8) => M100252

  Mutex M1171 acquired here while holding mutex M100252 in thread T1:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 dns_resolver_createfetch3 /builds/isc-projects/bind9/lib/dns/resolver.c:9585:2 (libdns.so.1110+0x1769fd)
    #2 dns_resolver_createfetch /builds/isc-projects/bind9/lib/dns/resolver.c:9504:10 (libdns.so.1110+0x174e17)
    #3 create_fetch /builds/isc-projects/bind9/lib/dns/validator.c:1156:10 (libdns.so.1110+0x1c1e5f)
    #4 validatezonekey /builds/isc-projects/bind9/lib/dns/validator.c:2124:13 (libdns.so.1110+0x1c3b6d)
    #5 start_positive_validation /builds/isc-projects/bind9/lib/dns/validator.c:2301:10 (libdns.so.1110+0x1bfde9)
    #6 validator_start /builds/isc-projects/bind9/lib/dns/validator.c:3647:12 (libdns.so.1110+0x1bef62)
    #7 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #8 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M100252 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 validator_start /builds/isc-projects/bind9/lib/dns/validator.c:3628:2 (libdns.so.1110+0x1bee31)
    #2 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #3 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M100252 acquired here while holding mutex M1171 in thread T1:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 dns_validator_destroy /builds/isc-projects/bind9/lib/dns/validator.c:3912:2 (libdns.so.1110+0x1bf788)
    #2 validated /builds/isc-projects/bind9/lib/dns/resolver.c:4916:2 (libdns.so.1110+0x18fdfd)
    #3 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #4 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M1171 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 validated /builds/isc-projects/bind9/lib/dns/resolver.c:4907:2 (libdns.so.1110+0x18fc3d)
    #2 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #3 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Thread T1 'isc-worker0000' (tid=12729, running) created by main thread at:
    #0 pthread_create <null> (delv+0x42afdb)
    #1 isc_thread_create /builds/isc-projects/bind9/lib/isc/pthreads/thread.c:60:8 (libisc.so.1107+0x726d8)
    #2 isc__taskmgr_create /builds/isc-projects/bind9/lib/isc/task.c:1468:7 (libisc.so.1107+0x4d635)
    #3 isc_taskmgr_createinctx /builds/isc-projects/bind9/lib/isc/task.c:2091:11 (libisc.so.1107+0x4f4ac)
    #4 main /builds/isc-projects/bind9/bin/delv/delv.c:1639:2 (delv+0x4b7f96)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/builds/isc-projects/bind9/bin/delv/.libs/delv+0x4483a6) in pthread_mutex_lock
(cherry picked from commit 992a79a14b)
2020-09-09 16:22:39 +10:00
Ondřej Surý
56d2cf6f1e Print diagnostics on dns_name_issubdomain() failure in fctx_create()
Log diagnostic message when dns_name_issubdomain() in the fctx_create()
when the resolver is qname minimizing and forwarding at the same time.

(cherry picked from commit 0a22024c27)
2020-09-02 18:29:01 +02:00
Diego Fronza
eb9d8e9e10 Fix resolution of unusual ip6.arpa names
Before this commit, BIND was unable to resolve ip6.arpa names like
the one reported in issue #1847 when using query minimization.

As reported in the issue, an attempt to resolve a name like
'rec-test-dom-158937817846788.test123.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.3.4.3.5.4.0.8.2.6.0.1.0.0.2.ip6.arpa'
using default settings would fail.

The reason was that query minimization algorithm in 'fctx_minimize_qname'
would divide any ip6.arpa names in increasing number of labels,
7,11, ... up to 35, thus limiting the destination name (minimized) to a number
of 35 labels.

In case the last query minimization attempt (with 35 labels) would fail with
NXDOMAIN, BIND would attempt the query mininimization again with the exact
same QNAME, limited on the 35 labels, and that in turn would fail again.

This fix avoids this fail loop by considering the extra labels that may appear
in the leftmost part of an ip6.arpa name, those after the IPv6 part.

(cherry picked from commit 230d79c191)
2020-09-02 16:52:39 +02:00
Evan Hunt
81514ff925 permanently disable QNAME minimization in a fetch when forwarding
QNAME minimization is normally disabled when forwarding. if, in the
course of processing a fetch, we switch back to normal recursion at
some point, we can't safely start minimizing because we may have
been left in an inconsistent state.
2020-08-05 15:44:18 +02:00
Mark Andrews
14fe6e77a7 Always check the return from isc_refcount_decrement.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range.  Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.

(cherry picked from commit bde5c7632a)
2020-07-31 12:54:47 +10:00
Michał Kępień
b6c33087b0 Fix idle timeout for connected TCP sockets
When named acting as a resolver connects to an authoritative server over
TCP, it sets the idle timeout for that connection to 20 seconds.  This
fixed timeout was picked back when the default processing timeout for
each client query was hardcoded to 30 seconds.  Commit
000a8970f8 made this processing timeout
configurable through "resolver-query-timeout" and decreased its default
value to 10 seconds, but the idle TCP timeout was not adjusted to
reflect that change.  As a result, with the current defaults in effect,
a single hung TCP connection will consistently cause the resolution
process for a given query to time out.

Set the idle timeout for connected TCP sockets to half of the client
query processing timeout configured for a resolver.  This allows named
to handle hung TCP connections more robustly and prevents the timeout
mismatch issue from resurfacing in the future if the default is ever
changed again.

(cherry picked from commit 953d704bd2)
2020-07-30 11:16:09 +02:00
Witold Kręcicki
03e583ffa8 Fix assertion failure during startup when the server is under load.
When we're coming back from recursion fetch_callback does not accept
DNS_R_NXDOMAIN as an rcode - query_gotanswer calls query_nxdomain in
which an assertion fails on qctx->is_zone. Yet, under some
circumstances, qname minimization will return an DNS_R_NXDOMAIN - when
root zone mirror is not yet loaded. The fix changes the DNS_R_NXDOMAIN
answer to DNS_R_SERVFAIL.
2020-07-01 12:55:12 +02:00
Witold Kręcicki
c3dcab5f13 Fix a data access race in resolver
We were passing client address to dns_resolver_createfetch as a pointer
and it was saved as a pointer. The client (with its address) could be
gone before the fetch is finished, and in a very odd scenario
log_formerr would call isc_sockaddr_format() which first checks if the
address family is valid (and at this point it still is), then the
sockaddr is cleared, and then isc_netaddr_fromsockaddr is called which
fails an assertion as the address family is now invalid.

(cherry picked from commit 175c4d9055)
2020-06-05 18:58:13 -07:00
Mark Andrews
39bb741927 Count queries to the root and TLD servers as well 2020-05-19 13:57:07 +02:00
Mark Andrews
b9c4f1b648 Reduce the number of fetches we make when looking up addresses
If there are more that 5 NS record for a zone only perform a
maximum of 4 address lookups for all the name servers.  This
limits the amount of remote lookup performed for server
addresses at each level for a given query.
2020-05-19 13:57:07 +02:00
Diego Fronza
bba353d512 Fixed rebinding protection bug when using forwarder setups
BIND wasn't honoring option "deny-answer-aliases" when configured to
forward queries.

Before the fix it was possible for nameservers listed in "forwarders"
option to return CNAME answers pointing to unrelated domains of the
original query, which could be used as a vector for rebinding attacks.

The fix ensures that BIND apply filters even if configured as a forwarder
instance.

(cherry picked from commit af6a4de3d5ad6c1967173facf366e6c86b3ffc28)
2020-04-08 08:52:58 +02:00
Diego Fronza
277581c5a1 Fixed disposing of resolver->references in destroy() function 2020-03-06 13:37:07 -03:00
Diego Fronza
341b69aa7e Fixed potential-lock-inversion
This commit simplifies a bit the lock management within dns_resolver_prime()
and prime_done() functions by means of turning resolver's attribute
"priming" into an atomic_bool and by creating only one dependent object on the
lock "primelock", namely the "primefetch" attribute.

By having the attribute "priming" as an atomic type, it save us from having to
use a lock just to test if priming is on or off for the given resolver context
object, within "dns_resolver_prime" function.

The "primelock" lock is still necessary, since dns_resolver_prime() function
internally calls dns_resolver_createfetch(), and whenever this function
succeeds it registers an event in the task manager which could be called by
another thread, namely the "prime_done" function, and this function is
responsible for disposing the "primefetch" attribute in the resolver object,
also for resetting "priming" attribute to false.

It is important that the invariant "priming == false AND primefetch == NULL"
remains constant, so that any thread calling "dns_resolver_prime" knows for sure
that if the "priming" attribute is false, "primefetch" attribute should also be
NULL, so a new fetch context could be created to fulfill this purpose, and
assigned to "primefetch" attribute under the lock protection.

To honor the explanation above, dns_resolver_prime is implemented as follow:
	1. Atomically checks the attribute "priming" for the given resolver context.
	2. If "priming" is false, assumes that "primefetch" is NULL (this is
           ensured by the "prime_done" implementation), acquire "primelock"
	   lock and create a new fetch context, update "primefetch" pointer to
	   point to the newly allocated fetch context.
	3. If "priming" is true, assumes that the job is already in progress,
	   no locks are acquired, nothing else to do.

To keep the previous invariant consistent, "prime_done" is implemented as follow:
	1. Acquire "primefetch" lock.
	2. Keep a reference to the current "primefetch" object;
	3. Reset "primefetch" attribute to NULL.
	4. Release "primefetch" lock.
	5. Atomically update "priming" attribute to false.
	6. Destroy the "primefetch" object by using the temporary reference.

This ensures that if "priming" is false, "primefetch" was already reset to NULL.

It doesn't make any difference in having the "priming" attribute not protected
by a lock, since the visible state of this variable would depend on the calling
order of the functions "dns_resolver_prime" and "prime_done".

As an example, suppose that instead of using an atomic for the "priming" attribute
we employed a lock to protect it.
Now suppose that "prime_done" function is called by Thread A, it is then preempted
before acquiring the lock, thus not reseting "priming" to false.
In parallel to that suppose that a Thread B is scheduled and that it calls
"dns_resolver_prime()", it then acquires the lock and check that "priming" is true,
thus it will consider that this resolver object is already priming and it won't do
any more job.
Conversely if the lock order was acquired in the other direction, Thread B would check
that "priming" is false (since prime_done acquired the lock first and set "priming" to false)
and it would initiate a priming fetch for this resolver.

An atomic variable wouldn't change this behavior, since it would behave exactly the
same, depending on the function call order, with the exception that it would avoid
having to use a lock.

There should be no side effects resulting from this change, since the previous
implementation employed use of the more general resolver's "lock" mutex, which
is used in far more contexts, but in the specifics of the "dns_resolver_prime"
and "prime_done" it was only used to protect "primefetch" and "priming" attributes,
which are not used in any of the other critical sections protected by the same lock,
thus having zero dependency on those variables.
2020-03-06 13:37:07 -03:00
Evan Hunt
11a0d771f9 fix spelling errors reported by Fossies.
(cherry picked from commit ba0313e649)
2020-02-21 07:05:31 +00:00
Ondřej Surý
829b461c54 Merge branch '46-enforce-clang-format-rules' into 'master'
Start enforcing the clang-format rules on changed files

Closes #46

See merge request isc-projects/bind9!3063

(cherry picked from commit a04cdde45d)

d2b5853b Start enforcing the clang-format rules on changed files
618947c6 Switch AlwaysBreakAfterReturnType from TopLevelDefinitions to All
654927c8 Add separate .clang-format files for headers
5777c44a Reformat using the new rules
60d29f69 Don't enforce copyrights on .clang-format
2020-02-14 08:45:59 +00:00
Ondřej Surý
cdef20bb66 Merge branch 'each-style-tweak' into 'master'
adjust clang-format options to get closer to ISC style

See merge request isc-projects/bind9!3061

(cherry picked from commit d3b49b6675)

0255a974 revise .clang-format and add a C formatting script in util
e851ed0b apply the modified style
2020-02-14 05:35:29 +00:00
Ondřej Surý
2e55baddd8 Merge branch '46-add-curly-braces' into 'master'
Add curly braces using uncrustify and then reformat with clang-format back

Closes #46

See merge request isc-projects/bind9!3057

(cherry picked from commit 67b68e06ad)

36c6105e Use coccinelle to add braces to nested single line statement
d14bb713 Add copy of run-clang-tidy that can fixup the filepaths
056e133c Use clang-tidy to add curly braces around one-line statements
2020-02-13 21:28:35 +00:00
Ondřej Surý
c931d8e417 Merge branch '46-just-use-clang-format-to-reformat-sources' into 'master'
Reformat source code with clang-format

Closes #46

See merge request isc-projects/bind9!2156

(cherry picked from commit 7099e79a9b)

4c3b063e Import Linux kernel .clang-format with small modifications
f50b1e06 Use clang-format to reformat the source files
11341c76 Update the definition files for Windows
df6c1f76 Remove tkey_test (which is no-op anyway)
2020-02-12 14:51:18 +00:00
Ondřej Surý
bc1d4c9cb4 Clear the pointer to destroyed object early using the semantic patch
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
2020-02-09 18:00:17 -08:00
Witold Kręcicki
d708370db4 Fix atomics usage for mutexatomics 2020-02-08 12:34:19 -08:00
Ondřej Surý
a9bd6f6ea6 Fix comparison between type uint16_t and wider type size_t in a loop
Found by LGTM.com (see below for description), and while it should not
happen as EDNS OPT RDLEN is uint16_t, the fix is easy.  A little bit
of cleanup is included too.

> In a loop condition, comparison of a value of a narrow type with a value
> of a wide type may result in unexpected behavior if the wider value is
> sufficiently large (or small). This is because the narrower value may
> overflow. This can lead to an infinite loop.
2020-02-05 01:41:13 +00:00
Ondřej Surý
7dfc092f06 Use C11 atomics for nfctx, kill unused dns_resolver_nrunning() 2020-01-14 13:12:13 +01:00
Mark Andrews
62abb6aa82 make resolver->zspill atomic to prevent potential deadlock 2019-12-12 08:26:59 +00:00
Mark Andrews
13aaeaa06f Note bucket lock requirements and move REQUIRE inside locked section. 2019-12-10 22:16:15 +00:00
Mark Andrews
5589748eca lock access to fctx->nqueries 2019-12-10 22:16:15 +00:00
Mark Andrews
912ce87479 Make fctx->attributes atomic.
FCTX_ATTR_SHUTTINGDOWN needs to be set and tested while holding the node
lock but the rest of the attributes don't as they are task locked. Making
fctx->attributes atomic allows both behaviours without races.
2019-12-03 08:58:53 +11:00
Mark Andrews
9ca6ad6311 Assign fctx->client when fctx is created rather when the join happens.
This prevents races on fctx->client whenever a new fetch joins a existing
fetch (by calling fctx_join) as it is now invariant for the active life of
fctx.
2019-12-02 06:01:46 +00:00
Ondřej Surý
edd97cddc1 Refactor dns_name_dup() usage using the semantic patch 2019-11-29 14:00:37 +01:00
Ondřej Surý
a5189eefa5 lib/dns/resolver.c: Call dns_adb_endudpfetch() only for UDP queries
The dns_adb_beginudpfetch() is called only for UDP queries, but
the dns_adb_endudpfetch() is called for all queries, including
TCP.  This messages the quota counting in adb.c.
2019-11-19 02:53:56 +08:00
Michał Kępień
fce3c93ea2 Prevent TCP failures from affecting EDNS stats
EDNS mechanisms only apply to DNS over UDP.  Thus, errors encountered
while sending DNS queries over TCP must not influence EDNS timeout
statistics.
2019-10-31 09:54:05 +01:00
Michał Kępień
6cd115994e Prevent query loops for misbehaving servers
If a TCP connection fails while attempting to send a query to a server,
the fetch context will be restarted without marking the target server as
a bad one.  If this happens for a server which:

  - was already marked with the DNS_FETCHOPT_EDNS512 flag,
  - responds to EDNS queries with the UDP payload size set to 512 bytes,
  - does not send response packets larger than 512 bytes,

and the response for the query being sent is larger than 512 byes, then
named will pointlessly alternate between sending UDP queries with EDNS
UDP payload size set to 512 bytes (which are responded to with truncated
answers) and TCP connections until the fetch context retry limit is
reached.  Prevent such query loops by marking the server as bad for a
given fetch context if the advertised EDNS UDP payload size for that
server gets reduced to 512 bytes and it is impossible to reach it using
TCP.
2019-10-31 08:48:35 +01:00
Mark Andrews
622bef6aec reset fctx->qmindcname and fctx->qminname after processing a delegation 2019-10-01 22:09:04 -07:00
Evan Hunt
488cb4da10 SERVFAIL if a prior qmin fetch has not been canceled when a new one starts 2019-10-01 20:41:53 -07:00
Ondřej Surý
c2dad0dcb2 Replace RUNTIME_CHECK(dns_name_copy(..., NULL)) with dns_name_copynf()
Use the semantic patch from the previous commit to replace all the calls to
dns_name_copy() with NULL as third argument with dns_name_copynf().
2019-10-01 10:43:26 +10:00
Ondřej Surý
5efa29e03a The final round of adding RUNTIME_CHECK() around dns_name_copy() calls
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument.  This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
2019-10-01 10:43:26 +10:00
Ondřej Surý
89b269b0d2 Add RUNTIME_CHECK() around result = dns_name_copy(..., NULL) calls
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable.  As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.

Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
2019-10-01 10:43:26 +10:00
Ondřej Surý
35bd7e4da0 Add RUNTIME_CHECK() around plain dns_name_copy(..., NULL) calls using spatch
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.
2019-10-01 10:43:26 +10:00