893 Commits

Author SHA1 Message Date
Ondřej Surý
fa7b7973e3 Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA.  This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.

Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.

(cherry picked from commit a1982cf1bb)
2025-01-15 14:13:45 +01:00
Matthijs Mekking
413ba531f5 Remove unused maxquerycount
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.

(cherry picked from commit 397ca34e34)
2024-12-06 15:19:01 +00:00
Matthijs Mekking
a0ce89bc15 Implement global limit for outgoing queries
This global limit is not reset on query restarts and is a hard limit
for any client request.

Note: This commit has been significantly modified because of many
merge conflicts due to the dns_resolver_createfetch api changes.

(cherry picked from commit 16b3bd1cc7)
2024-12-06 15:17:53 +00:00
Matthijs Mekking
5a806910a8 Implement 'max-query-count'
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.

(cherry picked from commit bbc16cc8e6)
2024-12-06 15:17:53 +00:00
Ondřej Surý
c5bac96fd0 Remove redundant parentheses from the return statement
(cherry picked from commit 0258850f20)
2024-11-19 16:06:16 +01:00
Mark Andrews
b1cf7997a7 Store static-stub addresses seperately in the adb
Static-stub address and addresses from other sources where being
mixed together resulting in static-stub queries going to addresses
not specified in the configuration or alternatively static-stub
addresses being used instead of the real addresses.

(cherry picked from commit b3a2c790f3)
2024-10-01 15:30:17 +10:00
Aram Sargsyan
ef344cbd5e Fix a 'serverquota' counter calculation bug
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.

Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.

(cherry picked from commit e430ce7039)
2024-09-18 01:25:01 +00:00
Evan Hunt
a11367ade3 reduce the max-recursion-queries default to 32
the number of iterative queries that can be sent to resolve a
name now defaults to 32 rather than 100.

(cherry picked from commit 7e3b425dc2)
2024-08-07 15:36:15 -07:00
Evan Hunt
14bce7e275 add debug logging when creating or attaching to a query counter
fctx_create() now logs at debug level 9 when the fctx attaches
to an existing counter or creates a new one.

(cherry picked from commit 825f3d68c5)
2024-08-07 15:36:14 -07:00
Evan Hunt
18e39d989f apply max-recursion-queries quota to validator queries
previously, validator queries for DNSKEY and DS records were
not counted toward the quota for max-recursion-queries; they
are now.

(cherry picked from commit af7db89513)
2024-08-07 15:36:09 -07:00
Evan Hunt
5ab4cae4ed attach query counter to NS fetches
there were cases in resolver.c when queries for NS records were
started without passing a pointer to the parent fetch's query counter;
as a result, the max-recursion-queries quota for those queries started
counting from zero, instead of sharing the limit for the parent fetch,
making the quota ineffective in some cases.

(cherry picked from commit d3b7e92783)
2024-08-07 14:51:44 -07:00
Evan Hunt
18d7be118f raise the log level of priming failures
when a priming query is complete, it's currently logged at
level ISC_LOG_INFO, regardless of success or failure. we
are now changing it to ISC_LOG_NOTICE in the case of failure
and ISC_LOG_DEBUG(1) in case of success.

(cherry picked from commit a84d54c6ff)
2024-08-05 15:31:38 +02:00
Mark Andrews
4704c28cab Cleanup old clang-format string splitting
(cherry picked from commit 6d1c7beb15)
2024-08-01 15:58:16 +10:00
Mark Andrews
7643a0322f Remove false positive qname minimisation error
Don't report qname minimisation NXDOMAIN errors when the result is
NXDOMAIN.

(cherry picked from commit f78beca942)
2024-08-01 15:58:16 +10:00
Mark Andrews
6455527830 Clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
When calling dns_resolver_createfetch in resolver.c with a callback
of resume_dslookup, clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT from
options as DNS_EVENT_TRYSTALE is not an expected event type and
triggers a REQUIRE.
2024-06-06 07:48:49 +02:00
Mark Andrews
d3f708ba56 Update resquery_senddone handling of ISC_R_TIMEDOUT
Treat timed out as an address specific error.

(cherry picked from commit 56c3dcc5d7)
2024-06-04 07:38:40 +00:00
Mark Andrews
99d2b4079f Update resquery_senddone handling of ISC_R_CONNECTIONRESET
Treat connection reset as an address specific error.

(cherry picked from commit 4e3dd85b8d)
2024-06-04 07:38:40 +00:00
Mark Andrews
e87a5e7bff Handle ISC_R_HOSTDOWN and ISC_R_NETDOWN in resolver.c
These error codes should be treated like other unreachable error
codes.

(cherry picked from commit 180b1e7939)
2024-06-04 07:38:40 +00:00
Ondřej Surý
1b3b0cef22 Split fast and slow task queues
Change the taskmgr (and thus netmgr) in a way that it supports fast and
slow task queues.  The fast queue is used for incoming DNS traffic and
it will pass the processing to the slow queue for sending outgoing DNS
messages and processing resolver messages.

In the future, more tasks might get moved to the slow queues, so the
cached and authoritative DNS traffic can be handled without being slowed
down by operations that take longer time to process.
2024-02-01 21:47:29 +01:00
Aram Sargsyan
92e5173a9f Don't use an uninitialized link on an error path
Move the block on the error path, where the link is checked, to a place
where it makes sense, to avoid accessing an unitialized link when
jumping to the 'cleanup_query' label from 4 different places. The link
is initialized only after those jumps happen.

In addition, initilize the link when creating the object, to avoid
similar errors.

(cherry picked from commit fb7bbbd1be)
2023-09-28 10:30:42 +00:00
Evan Hunt
995b78ea4e clean up numbering of FETCHOPT and ADDRINFO flags
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.

(cherry picked from commit 0955cf1af5)
2023-07-04 11:58:09 -07:00
Mark Andrews
5739b4817a In rctx_answer return DNS_R_DELEGATION on NOFOLLOW
When DNS_FETCHOPT_NOFOLLOW is set DNS_R_DELEGATION needs to be
returned to restart the resolution process rather than converting
it to ISC_R_SUCCESS.

(cherry picked from commit ea11650376)
2023-06-28 12:32:26 +02:00
Mark Andrews
7f2eeb60ee Skip some QNAME mininisation queries if possible
If we know that the NS RRset for an intermediate label doesn't exist
on cache contents don't query using that name when looking for a
referral.

(cherry picked from commit 80bc0ee075)
2023-06-28 12:32:23 +02:00
Mark Andrews
b3a97da7a7 Use NS rather than A records for qname-minimization relaxed
Remove all references to DNS_FETCHOPT_QMIN_USE_A and adjust
the expected tests results in the qmin system test.

(cherry picked from commit dd00b3c50b)
2023-06-28 12:31:49 +02:00
Mark Andrews
0d3693f08f Remove unnecessary REQUIRE in dns_resolver_attach
There is no harm in aquiring an additional reference to the resolver
after it has started shutting down.  All the REQUIRE was doing was
introducing a point of failure when shutting down the server.
2023-06-27 05:19:56 +00:00
Mark Andrews
e7e29278a8 Handle FORMERR on unknown EDNS option that are echoed
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.

(cherry picked from commit f3b24ba789)
2023-06-26 16:36:11 +02:00
Aram Sargsyan
d91edda639 Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.

A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.

Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.

Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.

(cherry picked from commit 2ae5c4a674)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
db45cab546 Fix a lock-order-inversion bug in resolver.c
There is a lock-order-inversion (potential deadlock) in resolver.c,
because in dns_resolver_shutdown() a resolver bucket lock is locked
while the resolver lock itself is already locked, while in
fctx_sendevents() the resolver lock is locked while a bucket lock
is locked before calling that function in fctx__done_detach().

The resolver lock/unlock in dns_resolver_shutdown() was added back in
the 317e36d47e commit to make sure that
the function is finished before the resolver object is destroyed.

Since res->exiting is atomic, it should be possible to remove the
resolver locking in dns_resolver_shutdown() and add it to the
send_shutdown_events() function which requires it.

Also, since 'res->exiting' is now set while unlocked, the 'INSIST'
in spillattimer_countdown() is wrong, and is removed.
2023-06-06 11:02:24 +00:00
Aram Sargsyan
cd47429365 Add ClientQuota statistics channel counter
This counter indicates the number of the resolver's spilled
queries due to reaching the clients per query quota.

(cherry picked from commit 04648d7c2f)
2023-05-31 11:07:08 +00:00
Evan Hunt
960e4dadd2 check for invalid protocol when dispatch fails
treat ISC_R_INVALIDPROTO as a networking error when it occurs.

(cherry picked from commit 2269a3e6fb)
2023-04-21 12:47:07 +02:00
Ondřej Surý
4bf253ffe1 Properly handle ISC_R_SHUTTINGDOWN in resquery_response()
When resquery_response() was called with ISC_R_SHUTTINDOWN, the region
argument would be NULL, but rctx_respinit() would try to pass
region->base and region->len to the isc_buffer_init() leading to
a NULL pointer dereference.  Properly handle non-ISC_R_SUCCESS by
ignoring the provided region.

(cherry picked from commit 93259812dd)
2023-03-23 12:26:09 +01:00
Michał Kępień
8b4dcc27ef Merge tag 'v9_18_11' into v9_18
BIND 9.18.11
2023-01-25 21:26:22 +01:00
Ondřej Surý
e26aa4cbb1 Don't use reference counting in isc_timer unit
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions.  The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself.  This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference.  Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.

Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.

The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.

(cherry picked from commit ae01ec2823)
2023-01-18 22:39:26 +01:00
Aram Sargsyan
a4fc5e5158 Cancel all fetch events in dns_resolver_cancelfetch()
Although 'dns_fetch_t' fetch can have two associated events, one for
each of 'DNS_EVENT_FETCHDONE' and 'DNS_EVENT_TRYSTALE' types, the
dns_resolver_cancelfetch() function is designed in a way that it
expects only one existing event, which it must cancel, and when it
happens so that 'stale-answer-client-timeout' is enabled and there
are two events, only one of them is canceled, and it results in an
assertion in dns_resolver_destroyfetch(), when it finds a dangling
event.

Change the logic of dns_resolver_cancelfetch() function so that it
cancels both the events (if they exist), and in the right order.

(cherry picked from commit ec2098ca35)
2023-01-12 12:54:02 +01:00
Evan Hunt
5fd93c66aa remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.

(cherry picked from commit 916ea26ead)
2023-01-09 14:23:26 -08:00
Ondřej Surý
d48f5e253f Don't cleanup uninitialized dns_resolver buckets
If the isc_task_create_bound() fails in the middle of buckets
initialization - the most common case would be shutdown initialized
during reload, not all tasks would be initialized, but the cleanup
code would try to cleanup all buckets.

Make sure that we cleanup only the initialized buckets by setting
ntasks to the number of already initialized tasks on the error path.
2023-01-03 10:33:23 +01:00
Aram Sargsyan
926f0323b6 Fix an ADB quota management error in the resolver
Normally, when a 'resquery_t' object is created in fctx_query(),
we call dns_adb_beginudpfetch() (which increases the ADB quota)
only if it's a UDP query. Then, in fctx_cancelquery(), we call
dns_adb_endudpfetch() to decreases back the ADB quota, again only
if it's a UDP query.

The problem is that a UDP query can become a TCP query, preventing
the quota from adjusting back in fctx_cancelquery() later.

Call dns_adb_beginudpfetch() also when switching the query type
from UDP to TCP.

(cherry picked from commit 53afe1f978)
2022-12-23 10:08:00 +00:00
Ondřej Surý
5cc12ab92c Fix the thread safety in the dns_dispatch unit
The dispatches are not thread-bound, and used freely between various
threads (see the dns_resolver and dns_request units for details).

This refactoring make sure that all non-const dns_dispatch_t and
dns_dispentry_t members are accessed under a lock, and both object now
track their internal state (NONE, CONNECTING, CONNECTED, CANCELED)
instead of guessing the state from the state of various struct members.

During the refactoring, the artificial limit DNS_DISPATCH_SOCKSQUOTA on
UDP sockets per dispatch was removed as the limiting needs to happen and
happens on in dns_resolver and limiting the number of UDP sockets
artificially in dispatch could lead to unpredictable behaviour in case
one dispatch has the limit exhausted by others are idle.

The TCP artificial limit of DNS_DISPATCH_MAXREQUESTS makes even less
sense as the TCP connections are only reused in the dns_request API
that's not a heavy user of the outgoing connections.

As a side note, the fact that UDP and TCP dispatch pretends to be same
thing, but in fact the connected UDP is handled from dns_dispentry_t and
dns_dispatch_t acts as a broker, but connected TCP is handled from
dns_dispatch_t and dns_dispatchmgr_t acts as a broker doesn't really
help the clarity of this unit.

This refactoring kept to API almost same - only dns_dispatch_cancel()
and dns_dispatch_done() were merged into dns_dispatch_done() as we need
to cancel active netmgr handles in any case to not leave dangling
connections around.  The functions handling UDP and TCP have been mostly
split to their matching counterparts and the dns_dispatch_<function>
functions are now thing wrappers that call <udp|tcp>_dispatch_<function>
based on the socket type.

More debugging-level logging was added to the unit to accomodate for
this fact.

(cherry picked from commit 6f317f27ea)
2022-12-21 12:41:15 +00:00
Ondřej Surý
095f634f48 Try next server on resolver timeout
Instead of resending to the same server on the (dispatch) timeout in the
resolver, try the next server.

(cherry picked from commit 5466a48fc9)
2022-12-16 18:37:22 +01:00
Michal Nowak
1d7d504338 Update sources to Clang 15 formatting 2022-11-29 09:14:07 +01:00
Tony Finch
303cdf8e27 Deduplicate time unit conversion factors
The various factors like NS_PER_MS are now defined in a single place
and the names are no longer inconsistent. I chose the _PER_SEC names
rather than _PER_S because it is slightly more clear in isolation;
but the smaller units are always NS, US, and MS.

(cherry picked from commit 00307fe318)
2022-11-25 14:16:09 +00:00
Mark Andrews
43e714336a Select the appropriate namespace when using a dual stack server
When using dual-stack-servers the covering namespace to check whether
answers are in scope or not should be fctx->domain.  To do this we need
to be able to distingish forwarding due to forwarders clauses and
dual-stack-servers.  A new flag FCTX_ADDRINFO_DUALSTACK has been added
to signal this.

(cherry picked from commit dfbffd77f9)
2022-11-17 13:05:12 +11:00
Aram Sargsyan
bb9cc81dd4 Match prefetch eligibility behavior with ARM
ARM states that the "eligibility" TTL is the smallest original TTL
value that is accepted for a record to be eligible for prefetching,
but the code, which implements the condition doesn't behave in that
manner for the edge case when the TTL is equal to the configured
eligibility value.

Fix the code to check that the TTL is greater than, or equal to the
configured eligibility value, instead of just greater than it.

(cherry picked from commit 863f51466e)
2022-10-21 10:22:29 +00:00
Aram Sargsyan
64feeba60f Call dns_adb_endudpfetch() on error path, if required
For UDP queries, after calling dns_adb_beginudpfetch() in fctx_query(),
make sure that dns_adb_endudpfetch() is also called on error path, in
order to adjust the quota back.

(cherry picked from commit 5da79e2be0)
2022-10-21 08:36:34 +00:00
Aram Sargsyan
a83a58467d Always call dns_adb_endudpfetch() in fctx_cancelquery() for UDP queries
It is currently possible that dns_adb_endudpfetch() is not
called in fctx_cancelquery() for a UDP query, which results
in quotas not being adjusted back.

Always call dns_adb_endudpfetch() for UDP queries.

(cherry picked from commit e4569373ca)
2022-10-21 08:36:34 +00:00
Aram Sargsyan
4a311b9bb4 Unlink the query under cleanup_query
In the cleanup code of fctx_query() function there is a code path
where 'query' is linked to 'fctx' and it is being destroyed.

Make sure that 'query' is unlinked before destroying it.

(cherry picked from commit ac889684c7)
2022-10-21 08:36:34 +00:00
Tony Finch
f273fdfc12 De-duplicate __FILE__, __LINE__
Mostly generated automatically with the following semantic patch,
except where coccinelle was confused by #ifdef in lib/isc/net.c

@@ expression list args; @@
- UNEXPECTED_ERROR(__FILE__, __LINE__, args)
+ UNEXPECTED_ERROR(args)
@@ expression list args; @@
- FATAL_ERROR(__FILE__, __LINE__, args)
+ FATAL_ERROR(args)

(cherry picked from commit ec50c58f52)
2022-10-17 16:00:26 +01:00
Michał Kępień
e2014ba9e3 Bound the amount of work performed for delegations
Limit the amount of database lookups that can be triggered in
fctx_getaddresses() (i.e. when determining the name server addresses to
query next) by setting a hard limit on the number of NS RRs processed
for any delegation encountered.  Without any limit in place, named can
be forced to perform large amounts of database lookups per each query
received, which severely impacts resolver performance.

The limit used (20) is an arbitrary value that is considered to be big
enough for any sane DNS delegation.

(cherry picked from commit 3a44097fd6)
2022-09-08 11:11:30 +02:00
Aram Sargsyan
47e4ef0696 Improve fetch limit logging
When initially hitting the `fetches-per-zone` value, a log message
is being generated for the event of dropping the first fetch, then
any further log events occur only when another fetch is being dropped
and 60 seconds have been passed since the last logged message.

That logic isn't ideal because when the counter of the outstanding
fetches reaches zero, the structure holding the counters' values will
get deleted, and the information about the dropped fetches accumulated
during the last minute will not be logged.

Improve the fcount_logspill() function to makie sure that the final
values are getting logged before the counter object gets destroyed.

(cherry picked from commit 039871ceb7)
2022-08-01 13:54:46 +00:00
Michał Kępień
b855c6b6c9 Stop resolving invalid names in resume_dslookup()
Commit 7b2ea97e46 introduced a logic bug
in resume_dslookup(): that function now only conditionally checks
whether DS chasing can still make progress.  Specifically, that check is
only performed when the previous resume_dslookup() call invokes
dns_resolver_createfetch() with the 'nameservers' argument set to
something else than NULL, which may not always be the case.  Failing to
perform that check may trigger assertion failures as a result of
dns_resolver_createfetch() attempting to resolve an invalid name.

Example scenario that leads to such outcome:

 1. A validating resolver is configured to forward all queries to
    another resolver.  The latter returns broken DS responses that
    trigger DS chasing.

 2. rctx_chaseds() calls dns_resolver_createfetch() with the
    'nameservers' argument set to NULL.

 3. The fetch fails, so resume_dslookup() is called.  Due to
    fevent->result being set to e.g. DNS_R_SERVFAIL, the default branch
    is taken in the switch statement.

 4. Since 'nameservers' was set to NULL for the fetch which caused the
    resume_dslookup() callback to be invoked
    (fctx->nsfetch->private->nameservers), resume_dslookup() chops off
    one label off fctx->nsname and calls dns_resolver_createfetch()
    again, for a name containing one label less than before.

 5. Steps 3-4 are repeated (i.e. all attempts to find the name servers
    authoritative for the DS RRset being chased fail) until fctx->nsname
    becomes stripped down the the root name.

 6. Since resume_dslookup() does not check whether DS chasing can still
    make progress, it strips off a label off the root name and continues
    its attempts at finding the name servers authoritative for the DS
    RRset being chased, passing an invalid name to
    dns_resolver_createfetch().

Fix by ensuring resume_dslookup() always checks whether DS chasing can
still make progress when a name server fetch fails.  Update code
comments to ensure the purpose of the relevant dns_name_equal() check is
clear.

(cherry picked from commit 1a79aeab44)
2022-07-13 11:00:32 +02:00