Commit Graph

14211 Commits

Author SHA1 Message Date
Mark Andrews
900efd613f Clear OpenSSL errors on SHA failures
(cherry picked from commit 247422c69f)
2023-09-01 13:45:34 +10:00
Mark Andrews
34a0bb146c Clear OpenSSL errors on engine errors
(cherry picked from commit 2ba62aebce)
2023-09-01 13:43:20 +10:00
Mark Andrews
aca6f3e82d Clear OpenSSL errors on EVP failures
(cherry picked from commit 4ea926934a)
2023-09-01 13:40:32 +10:00
Mark Andrews
b5b13771f2 Clear OpenSSL errors on EVP_PKEY_new failures
(cherry picked from commit 6df53cdb87)
2023-09-01 13:37:02 +10:00
Mark Andrews
4d37996b1a Clear OpenSSL errors on EC_KEY_get0_private_key failures
(cherry picked from commit 86b04368b0)
2023-09-01 13:34:14 +10:00
Mark Andrews
386e88d0e4 Clear OpenSSL errors on EVP_PKEY_get0_EC_KEY failures
(cherry picked from commit abd8c03592)
2023-09-01 13:30:30 +10:00
Mark Andrews
ababcd28c1 Clear OpenSSL errors on EVP_PKEY_get_bn_param failures
(cherry picked from commit d8a9adc821)
2023-09-01 13:24:02 +10:00
Mark Andrews
fb503aa275 Clear OpenSSL errors on EVP_MD_CTX_create failures
(cherry picked from commit 8529be30bb)
2023-09-01 13:13:59 +10:00
Mark Andrews
290896921d Clear OpenSSL errors on ECDSA_SIG_new failures
(cherry picked from commit eafcd41120)
2023-09-01 13:13:06 +10:00
Matthijs Mekking
6e078a79d5 After cache flush, restore serve-stale settings
When flushing the cache, we create a new cache database. The serve-stale
settings need to be restored after doing this. We already did this
for max-stale-ttl, but forgot to do this for stale-refresh-time.

(cherry picked from commit 3ae721db6c)
2023-08-31 11:13:08 +02:00
Mark Andrews
58be5d8ed0 rr_exists should not error if the name does not exist
rr_exists errored if the name did not exist in the zone.  This was
not an issue prior to the addition of krb5-subdomain-self-rhs and
ms-subdomain-self-rhs as the only name used was the zone name which
always existed.

(cherry picked from commit b76a15977a)
2023-08-30 10:05:09 +10:00
Mark Andrews
2282d5325a Only declare 'ex' if we will use it
Fixes
>>>     CID 464851:  Possible Control flow issues  (DEADCODE)
>>>     Execution cannot reach this statement: "BN_free(ex);".

Makes conditionals between declaring and use constistent. BN_free is
not needed as BIGNUM's returned by RSA_get0_key are not to be freed.
2023-08-29 22:05:27 +00:00
Tony Finch
525afc666a Parse statschannel Content-Length: more carefully
A negative or excessively large Content-Length could cause a crash
by making `INSIST(httpd->consume != 0)` fail.

(cherry picked from commit 26e10e8fb5)
2023-08-23 15:44:11 +02:00
Ondřej Surý
701eb26f97 Workaround faulty stdatomic.h header detection on Oracle Linux 7
Oracle Linux 7 sets __STDC_VERSION__ to 201112L, but doesn't define
__STDC_NO_ATOMICS__, so we try to include <stdatomic.h> without the
header present in the system.  Since we are already detecting the header
in the autoconf, use the HAVE_STDATOMIC_H for more reliable detecting
whether <stdatomic.h> header is present.
2023-08-22 14:23:05 +02:00
Evan Hunt
07f6c63a80 prevent query_coveringnsec() from running twice
when synthesizing a new CNAME, we now check whether the target
matches the query already being processed. if so, we do not
restart the query; this prevents a waste of resources.

(cherry picked from commit 0ae8b2e056)
2023-08-21 14:31:10 -07:00
Tony Finch
57069556eb Fix a stack buffer overflow in the statistics channel
A long timestamp in an If-Modified-Since header could overflow a
fixed-size buffer.

(cherry picked from commit b22c87ca61)
2023-08-14 13:07:47 +02:00
Matthijs Mekking
dab43f84dd Change default TTLsig to one week
Commit dc6dafdad1 allows larger TTL values
in zones that go insecure, and ignores the maximum zone TTL.

This means that if you use TTL values larger than 1 day in your zone,
your zone runs the risk of going bogus before it moves safely to
insecure.

Most resolvers by default cap the maximum TTL that they cache RRsets,
at one day (Unbound, Knot, PowerDNS) so that is fine. However, BIND 9's
default is one week.

Change the default TTLsig to one week, so that also for BIND 9
resolvers in the default cases responses for zones that are going
insecure will not be evaluated as bogus.

This change does mean that when unsigning your zone, it will take six
days longer to safely go insecure, regardless of what TTL values you
use in the zone.

(cherry picked from commit 32686beabc)
2023-08-02 12:19:25 +02:00
Evan Hunt
3cc1e5e12a deprecate "dialup" and "heartbeat-interval"
these options concentrate zone maintenance actions into
bursts for the benefit of servers with intermittent connections.
that's no longer something we really need to optimize.

(cherry picked from commit eeeccec67c)
2023-08-01 18:41:49 -07:00
Matthijs Mekking
a21407d062 Ignore max-zone-ttl on dnssec-policy insecure
Allow larger TTL values in zones that go insecure. This is necessary
because otherwise the zone will not be loaded due to the max-zone-ttl
of P1D that is part of the current insecure policy.

In the keymgr.c code, default back to P1D if the max-zone-ttl is set
to zero.

(cherry picked from commit dc6dafdad1)
2023-08-01 09:53:03 +02:00
Mark Andrews
b64aa2d7a2 Return REFUSED if GSSAPI is not configured
Return REFUSED if neither a keytab nor a gssapi credential is
configured to GSSAPI/TKEY requests.

(cherry picked from commit b5076014b9)
2023-07-29 05:46:32 +10:00
Mark Andrews
c36d41d39c Mark a primary as unreachable on timed out in xfin
When a primary server is not responding, mark it as temporarialy
unreachable.  This will prevent too many zones queuing up on a
unreachable server and allow the refresh process to move onto
the next primary sooner once it has been so marked.

(cherry picked from commit 621c117101)
2023-07-22 09:00:08 +10:00
Ondřej Surý
c2c2ec0c96 Don't process detach and close as priority netmgr events
The detach (and possibly close) netmgr events can cause additional
callbacks to be called when under exclusive mode.  The detach can
trigger next queued TCP query to be processed and close will call
configured close callback.

Move the detach and close netmgr events from the priority queue to the
normal queue as the detaching and closing the sockets can wait for the
exclusive mode to be over.
2023-07-20 18:37:48 +02:00
Aram Sargsyan
4fdb57a1f3 Add shutdown checks in dns_catz_dbupdate_callback()
When a zone database update callback is called, the 'catzs' object,
extracted from the callback argument, might be already shutting down,
in which case the 'catzs->zones' can be NULL and cause an assertion
failure when calling isc_ht_find().

Add an early return from the callback if 'catzs->shuttingdown' is true.

Also check the validity of 'catzs->zones' after locking 'catzs' in
case there is a race with dns_catz_shutdown_catzs() running in another
thread.

(cherry picked from commit 28bb419edc)
2023-07-06 11:27:45 +00:00
Ondřej Surý
26bb402b44 Run RPZ and catalog zones tasks in exclusive mode
All the heavy RPZ and CATZ work is already running with offloaded
threads, and running the remaining small functions in exclusive mode
offers more synchronization guaranties.

Move the update notify registration code from the offloaded
dns__catz_update_cb() function into dns__catz_done_cb().

After this change, it should be safe to remove the lock/unlock code
from the dns_catz_dbupdate_register() and dns_catz_dbupdate_unregister()
functions, as they were causing a benign TSAN lock-order-inversion
report.
2023-07-06 10:44:03 +00:00
Aram Sargsyan
c67ce97045 Fix a data race between the dns_zone and dns_catz modules
The dns_zone_catz_enable_db() and dns_zone_catz_disable_db()
functions can race with similar operations in the catz module
because there is no synchronization between the threads.

Add catz functions which use the view's catalog zones' lock
when registering/unregistering the database update notify callback,
and use those functions in the dns_zone module, instead of doing it
directly.

(cherry picked from commit 6f1f5fc307)
2023-07-06 10:44:03 +00:00
Evan Hunt
995b78ea4e clean up numbering of FETCHOPT and ADDRINFO flags
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.

(cherry picked from commit 0955cf1af5)
2023-07-04 11:58:09 -07:00
Tony Finch
1ddf2b87f5 Improve statschannel HTTP Connection: header protocol conformance
In HTTP/1.0 and HTTP/1.1, RFC 9112 section 9.6 says the last response
in a connection should include a `Connection: close` header, but the
statschannel server omitted it.

In an HTTP/1.0 response, the statschannel server can sometimes send a
`Connection: keep-alive` header when it is about to close the
connection. There are two ways:

If the first request on a connection is keep-alive and the second
request is not, then _both_ responses have `Connection: keep-alive`
but the connection is (correctly) closed after the second response.

If a single request contains

	Connection: close
	Connection: keep-alive

then RFC 9112 section 9.3 says the keep-alive header is ignored, but
the statschannel sends a spurious keep-alive in its response, though
it correctly closes the connection.

To fix these bugs, make it more clear that the `httpd->flags` are part
of the per-request-response state. The Connection: flags are now
described in terms of the effect they have instead of what causes them
to be set.

(manually picked from commit e18ca83a3b)
2023-07-04 14:53:08 +02:00
Mark Andrews
c73876fa90 Emit deprecated warning for K* file pairs
We try reading the same file using different methods so only
emit a warning if we successfully read the file.

(cherry picked from commit e3e20ed76e)
2023-06-29 10:52:48 +10:00
Mark Andrews
2376abc18e Restore the ability to read legacy K*+157+* files
The ability to read legacy HMAC-MD5 K* keyfile pairs using algorithm
number 157 was accidentally lost when the algorithm numbers were
consolidated into a single block, in commit
09f7e0607a.

The assumption was that these algorithm numbers were only known
internally, but they were also used in key files. But since HMAC-MD5
got renumbered from 157 to 160, legacy HMAC-MD5 key files no longer
work.

Move HMAC-MD5 back to 157 and GSSAPI back to 160.  Add exception for
GSSAPI to list_hmac_algorithms.

(cherry picked from commit 3f93d3f757)
2023-06-29 10:32:10 +10:00
Mark Andrews
5739b4817a In rctx_answer return DNS_R_DELEGATION on NOFOLLOW
When DNS_FETCHOPT_NOFOLLOW is set DNS_R_DELEGATION needs to be
returned to restart the resolution process rather than converting
it to ISC_R_SUCCESS.

(cherry picked from commit ea11650376)
2023-06-28 12:32:26 +02:00
Mark Andrews
7f2eeb60ee Skip some QNAME mininisation queries if possible
If we know that the NS RRset for an intermediate label doesn't exist
on cache contents don't query using that name when looking for a
referral.

(cherry picked from commit 80bc0ee075)
2023-06-28 12:32:23 +02:00
Mark Andrews
b3a97da7a7 Use NS rather than A records for qname-minimization relaxed
Remove all references to DNS_FETCHOPT_QMIN_USE_A and adjust
the expected tests results in the qmin system test.

(cherry picked from commit dd00b3c50b)
2023-06-28 12:31:49 +02:00
Mark Andrews
0d3693f08f Remove unnecessary REQUIRE in dns_resolver_attach
There is no harm in aquiring an additional reference to the resolver
after it has started shutting down.  All the REQUIRE was doing was
introducing a point of failure when shutting down the server.
2023-06-27 05:19:56 +00:00
Mark Andrews
e7e29278a8 Handle FORMERR on unknown EDNS option that are echoed
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.

(cherry picked from commit f3b24ba789)
2023-06-26 16:36:11 +02:00
Michal Nowak
d527fca768 Merge tag 'v9.18.16' into bind-9.18
BIND 9.18.16
2023-06-21 19:51:22 +02:00
Aram Sargsyan
0c751ce72e Update the event loop's time after executing a task
Tasks can block for a long time, especially when used by tools in
interactive mode. Update the event loop's time to avoid unexpected
errors when processing later events during the same callback.
For example, newly started timers can fire too early, because the
current time was stale. See the note about uv_update_time() in the
https://docs.libuv.org/en/v1.x/timer.html#c.uv_timer_start page.
2023-06-20 10:21:54 +00:00
Ondřej Surý
be0f38553e Make isc_result tables smaller
The isc_result_t enum was to sparse when each library code would skip to
next << 16 as a base.  Remove the huge holes in the isc_result_t enum to
make the isc_result tables more compact.

This change required a rewrite how we map dns_rcode_t to isc_result_t
and back, so we don't ever return neither isc_result_t value nor
dns_rcode_t out of defined range.

(cherry picked from commit a8e6c3b8f7)
2023-06-15 16:27:17 +02:00
Ondřej Surý
a29de517fa Refactor how we map isc_result_t <-> dns_rcode_t
The mapping functions between isc_result_t and dns_rcode_t could return
both isc_result_t values not defined in the header and dns_rcode_t
values not defined in the header because it blindly maps anything
withing full 12-bits defined for RCODEs to isc_result_t and back.

Refactor the dns_result_{from,to}rcode() functions to always return
valid isc_result_t and dns_rcode_t values by explicitly mapping the
values to each other and returning DNS_R_SERVFAIL (dns_rcode_servfail)
when encountering value out of the defined range.

(cherry picked from commit b53d1d7069)
2023-06-15 16:27:17 +02:00
Midnight Veil
5172f4c32a Translate POSIX errorcode EROFS to ISC_R_NOPERM
Report "permission denied" instead of "unexpected error"
when trying to update a zone file on a read-only file system.

(cherry picked from commit dd6acc1cac)
2023-06-14 13:48:25 +01:00
Aram Sargsyan
6154eab679 Fix catz db update callback registration logic error
When a catalog zone is updated using AXFR, the zone database is changed,
so it is required to unregister the update notification callback from
the old database, and register it for the new one.

Currently, here is the order of the steps happening in such scenario:

1. The zone.c:zone_startload() function registers the notify callback
   on the new database using dns_zone_catz_enable_db()
2. The callback, when called, notices that the new 'db' is different
   than 'catz->db', and unregisters the old callback for 'catz->db',
   marks that it's unregistered by setting 'catz->db_registered' to
   false, then it schedules an update if it isn't already scheduled.
3. The offloaded update process, after completing its job, notices that
   'catz->db_registered' is false, and (re)registers the update callback
   for the current database it is working on. There is no harm here even
   if it was registered also on step 1, and we can't skip it, because
   this function can also be called "artificially" during a
   reconfiguration, and in that case the registration step is required
   here.

A problem arises when before step 1 an update process was already
in a running state, operating on the old database, and finishing its
work only after step 2. As described in step 3, dns__catz_update_cb()
notices that 'catz->db_registered' is false and registers the callback
on the current database it is working on, which, at that state, is
already obsolete and unused by the zone. When it detaches the database,
the function which is responsible for its cleanup (e.g. free_rbtdb())
asserts because there is a registered update notify callback there.

To fix the problem, instead of delaying the (re)registration to step 3,
make sure that the new callback is registered and 'catz->db_registered'
is accordingly marked on step 2.

(cherry picked from commit 998765fea5)
2023-06-14 09:24:41 +00:00
Matthijs Mekking
ff5bacf17c Fix serve-stale hang at shutdown
The 'refresh_rrset' variable is used to determine if we can detach from
the client. This can cause a hang on shutdown. To fix this, move setting
of the 'nodetach' variable up to where 'refresh_rrset' is set (in
query_lookup(), and thus not in ns_query_done()), and set it to false
when actually refreshing the RRset, so that when this lookup is
completed, the client will be detached.
2023-06-09 14:54:48 +02:00
Evan Hunt
240caa32b9 Stale answer lookups could loop when over recursion quota
When a query was aborted because of the recursion quota being exceeded,
but triggered a stale answer response and a stale data refresh query,
it could cause named to loop back where we are iterating and following
a delegation. Having no good answer in cache, we would fall back to
using serve-stale again, use the stale data, try to refresh the RRset,
and loop back again, without ever terminating until crashing due to
stack overflow.

This happens because in the functions 'query_notfound()' and
'query_delegation_recurse()', we check whether we can fall back to
serving stale data. We shouldn't do so if we are already refreshing
an RRset due to having prioritized stale data in cache.

In other words, we need to add an extra check to 'query_usestale()' to
disallow serving stale data if we are currently refreshing a stale
RRset.

As an additional mitigation to prevent looping, we now use the result
code ISC_R_ALREADYRUNNING rather than ISC_R_FAILURE when a recursion
loop is encountered, and we check for that condition in
'query_usestale()' as well.
2023-06-09 14:54:48 +02:00
Ondřej Surý
e9d5219fca Improve RBT overmem cache cleaning
When cache memory usage is over the configured cache size (overmem) and
we are cleaning unused entries, it might not be enough to clean just two
entries if the entries to be expired are smaller than the newly added
rdata.  This could be abused by an attacker to cause a remote Denial of
Service by possibly running out of the operating system memory.

Currently, the addrdataset() tries to do a single TTL-based cleaning
considering the serve-stale TTL and then optionally moves to overmem
cleaning if we are in that condition.  Then the overmem_purge() tries to
do another single TTL based cleaning from the TTL heap and then continue
with LRU-based cleaning up to 2 entries cleaned.

Squash the TTL-cleaning mechanism into single call from addrdataset(),
but ignore the serve-stale TTL if we are currently overmem.

Then instead of having a fixed number of entries to clean, pass the size
of newly added rdatasetheader to the overmem_purge() function and
cleanup at least the size of the newly added data.  This prevents the
cache going over the configured memory limit (`max-cache-size`).

Additionally, refactor the overmem_purge() function to reduce for-loop
nesting for readability.
2023-06-08 11:43:18 +02:00
Aram Sargsyan
d91edda639 Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.

A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.

Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.

Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.

(cherry picked from commit 2ae5c4a674)
2023-06-06 12:45:00 +00:00
Artem Boldariev
285e75b3b0 Use appropriately sized send buffers for DNS messages over TCP
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.

(cherry picked from commit d8a5feb556)
2023-06-06 14:04:01 +02:00
Aram Sargsyan
db45cab546 Fix a lock-order-inversion bug in resolver.c
There is a lock-order-inversion (potential deadlock) in resolver.c,
because in dns_resolver_shutdown() a resolver bucket lock is locked
while the resolver lock itself is already locked, while in
fctx_sendevents() the resolver lock is locked while a bucket lock
is locked before calling that function in fctx__done_detach().

The resolver lock/unlock in dns_resolver_shutdown() was added back in
the 317e36d47e commit to make sure that
the function is finished before the resolver object is destroyed.

Since res->exiting is atomic, it should be possible to remove the
resolver locking in dns_resolver_shutdown() and add it to the
send_shutdown_events() function which requires it.

Also, since 'res->exiting' is now set while unlocked, the 'INSIST'
in spillattimer_countdown() is wrong, and is removed.
2023-06-06 11:02:24 +00:00
Aram Sargsyan
cd47429365 Add ClientQuota statistics channel counter
This counter indicates the number of the resolver's spilled
queries due to reaching the clients per query quota.

(cherry picked from commit 04648d7c2f)
2023-05-31 11:07:08 +00:00
Matthijs Mekking
b90bad93cb Fix serve-stale bug when cache has no data
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.

However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.

Add 'answer_found' checks to the serve-stale logic to fix this issue.

(cherry picked from commit bbd163acf6)
2023-05-30 13:46:00 +02:00
Mark Andrews
27eb8ed20f Move isc_mem_put to after node is checked for equality
isc_mem_put NULL's the pointer to the memory being freed.  The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.

(cherry picked from commit ac2e0bc3ff)
2023-05-29 13:27:51 +10:00
Evan Hunt
88383aa158 mark 'tkey-dhkey' as deprecated
Diffie-Hellman TKEY mode has been removed for 9.20.
2023-05-28 00:55:34 -07:00