Commit Graph

657 Commits

Author SHA1 Message Date
Evan Hunt
bfbc6a6c84 make "max_restarts" a configurable value
MAX_RESTARTS is no longer hard-coded; ns_server_setmaxrestarts()
and dns_client_setmaxrestarts() can now be used to modify the
max-restarts value at runtime. in both cases, the default is 11.

(cherry picked from commit c5588babaf)
2024-08-07 15:36:15 -07:00
Evan Hunt
dd88a4cdfc reduce MAX_RESTARTS to 11
the number of steps that can be followed in a CNAME chain
before terminating the lookup has been reduced from 16 to 11.
(this is a hard-coded value, but will be made configurable later.)

(cherry picked from commit 05d78671bb)
2024-08-07 15:36:14 -07:00
Aram Sargsyan
946931ccb7 Return SERVFAIL for a too long CNAME chain
Due to the maximum query restart limitation a long CNAME chain
it is cut after 16 queries but named still returns NOERROR.

Return SERVFAIL instead and the partial answer.

(cherry picked from commit b621f1d88e)
2024-07-31 15:14:43 +00:00
Ondřej Surý
e31190e704 Reset the TCP connection on a failed send
When sending fails, the ns__client_request() would not reset the
connection and continue as nothing is happening.  This comes from the
model that we don't care about failed UDP sends because datagrams are
unreliable anyway, but it greatly affects TCP connections with
keep-alive.

The worst case scenario is as follows:

1. the 3-way TCP handshake gets completed
2. the libuv calls the "uv_connection_cb" callback
3. the TCP connection gets queue because of the tcp-clients quota
4. the TCP client sends as many DNS messages as the buffers allow
5. the TCP connection gets dropped by the client due to the timeout
6. the TCP connection gets accepted by the server
7. the data already sent by the client gets read
8. all sending fails immediately because the TCP connection is dead
9. we consume all the data in the buffer in a very tight loop

As it doesn't make sense to trying to process more data on the TCP
connection when the sending is failing, drop the connection immediately
on the first sending error.

(cherry picked from commit bf9fd2a6ff)
2024-07-03 09:10:30 +02:00
Mark Andrews
9cfd20cd90 Clear qctx->zversion
Clear qctx->zversion when clearing qctx->zrdataset et al in
lib/ns/query.c:qctx_freedata.  The uncleared pointer could lead to
an assertion failure if zone data needed to be re-saved which could
happen with stale data support enabled.

(cherry picked from commit 179fb3532ab8d4898ab070b2db54c0ce872ef709)
2024-06-10 19:20:06 +02:00
Petr Špaček
bef3d2cca3 Remove support for SIG(0) message verification 2024-06-10 19:02:49 +02:00
Matthijs Mekking
7bb36ae56e Log error when update fails
The new "too many records" error can make an update fail without the
error being logged. This commit fixes that.

(cherry picked from commit 558923e5405894cf976d102f0d246a28bdbb400c)
2024-06-10 18:51:28 +02:00
Ondřej Surý
0b1d70ed2a Remove the extra memory context with own arena for sending
(cherry picked from commit 8d4cc41c291f8a77a723ae8e62533538b3632d50)
2024-06-10 18:43:46 +02:00
Ondřej Surý
3f6b7f57a6 Replace the tcp_buffers memory pool with static per-loop buffer
As a single thread can process only one TCP send at the time, we don't
really need a memory pool for the TCP buffers, but it's enough to have
a single per-loop (client manager) static buffer that's being used to
assemble the DNS message and then it gets copied into own sending
buffer.

In the future, this should get optimized by exposing the uv_try API
from the network manager, and first try to send the message directly
and allocate the sending buffer only if we need to send the data
asynchronously.

(cherry picked from commit 297cc840fbaf34b9dfa1d02d88a023cd5bf5dc4a)
2024-06-10 18:43:46 +02:00
Aram Sargsyan
4e70342142 ns_client: reuse TCP send buffers
Constantly allocating, reallocating and deallocating 64K TCP send
buffers by 'ns_client' instances takes too much CPU time.

There is an existing mechanism to reuse the ns_clent_t structure
associated with the handle using 'isc_nmhandle_getdata/_setdata'
(see ns_client_request()), but it doesn't work with TCP, because
every time ns_client_request() is called it gets a new handle even
for the same TCP connection, see the comments in
streamdns_on_complete_dnsmessage().

To solve the problem, we introduce an array of available (unused)
TCP buffers stored in ns_clientmgr_t structure so that a 'client'
working via TCP can have a chance to reuse one (if there is one)
instead of allocating a new one every time.
2024-06-10 18:43:45 +02:00
Matthijs Mekking
4ef23ad0ff RPZ response's SOA record is incorrectly set to 1
An RPZ response's SOA record TTL is set to 1 instead of the SOA TTL,
a boolean value is passed on to query_addsoa, which is supposed to be
a TTL value. I don't see what value is appropriate to be used for
overriding, so we will pass UINT32_MAX.

(cherry picked from commit 5d7e613e81)
2024-05-06 12:18:08 +02:00
Mark Andrews
7498db6366 Don't use static stub when returning best NS
If we find a static stub zone in query_addbestns look for a parent
zone which isn't a static stub.

(cherry picked from commit 40816e4e35)
2024-03-14 15:33:25 +11:00
Michał Kępień
4ad3c694f1 Merge tag 'v9.18.24' into bind-9.18
BIND 9.18.24
2024-02-14 13:35:19 +01:00
Aram Sargsyan
cbc0357881 Improve the definition of the DNS_GETDB_* flags
Use the (1 << N) form for defining the flags, in order to avoid
errors like the one fixed in the previous commit.

Also convert the definitions to an enum, as done in some of our
recent refactoring work.

(cherry picked from commit 0d7c7777da)
2024-02-02 15:06:48 +00:00
Aram Sargsyan
2bcd6c2fd3 Fix the DNS_GETDB_STALEFIRST flag
The DNS_GETDB_STALEFIRST flag is defined as 0x0C, which is the
combination of the DNS_GETDB_PARTIAL (0x04) and the
DNS_GETDB_IGNOREACL (0x08) flags (0x04 | 0x08 == 0x0C) , which is
an obvious error.

All the flags should be power of two, so they don't interfere with
each other. Fix the DNS_GETDB_STALEFIRST flag by setting it to 0x10.

(cherry picked from commit be7d8fafe2)
2024-02-02 15:06:43 +00:00
Artem Boldariev
cff69c65b5 Fix flawed logic when detecting same listener type
The older version of the code was reporting that listeners are going
to be of the same type after reconfiguration when switching from DoT
to HTTPS listener, making BIND abort its executions.

That was happening due to the flaw in logic due to which the code
could consider a current listener and a configuration for the new one
to be of the same type (DoT) even when the new listener entry is
explicitly marked as HTTP.

The checks for PROXY in between the configuration were masking that
behaviour, but when porting it to 9.18 (when there is no PROXY
support), the behaviour was exposed.

Now the code mirrors the logic in 'interface_setup()' closely (as it
was meant to).

(cherry picked from commit 8ae661048d)
2024-01-15 14:31:06 +02:00
Artem Boldariev
2be0acf3f3 Recreate listeners on DNS transport change
This commit ensures that listeners are recreated on reconfiguration in
the case when their type changes (or when PROXY protocol type changes,
too).

Previously, if a "listen-on" statement was modified to represent a
different transport, BIND would not pick-up the change on
reconfiguration if listener type changes (e.g. DoH -> DoT) for a given
interface address and port combination. This commit fixes that by
recreating the listener.

Initially, that worked for most of the new transports as we would
recreate listeners on each reconfiguration for DoH and DoT. But at
some point we changed that in such a way that listeners were not
recreated to avoid rebinding a port as on some platforms only root can
do that for port numbers <1000, making some ports binding possible
only on start-up. We chose to asynchronously update listener socket
settings (like TLS contexts, HTTP settings) instead.

Now, we both avoid recreating the sockets if unnecessary and recreate
listeners when listener type changes.

(cherry picked from commit d59cf5e0ce)
2024-01-15 14:31:06 +02:00
Mark Andrews
9999eebbf7 Report the type being filtered from an UPDATE
When processing UPDATE request DNSKEY, CDNSKEY and CDS record that
are managed by named are filtered out.  The log message has been
updated to report the actual type rather that just DNSKEY.

(cherry picked from commit 2cf6cf967d)
2024-01-13 01:58:57 +11:00
Mark Andrews
4efcfa8f1c Apply filters to CDS and CDNSKEY records 2024-01-12 19:56:54 +11:00
Matthijs Mekking
0d36d98791 Add new dns_rdatatype_iskeymaterial() function
The following code block repeats quite often:

    if (rdata.type == dns_rdatatype_dnskey ||
        rdata.type == dns_rdatatype_cdnskey ||
        rdata.type == dns_rdatatype_cds)

Introduce a new function to reduce the repetition.

(cherry picked from commit ef58f2444f)
2024-01-12 19:56:54 +11:00
Mark Andrews
f7e137f321 Restore dns64 state during serve-stale processing
If we are in the process of looking for the A records as part of
dns64 processing and the server-stale timeout triggers, redo the
dns64 changes that had been made to the orignal qctx.

(cherry picked from commit 1fcc483df1)
2024-01-05 12:20:25 +01:00
Mark Andrews
b42b1fe051 Save the correct result value to resume with nxdomain-redirect
The wrong result value was being saved for resumption with
nxdomain-redirect when performing the fetch.  This lead to an assert
when checking that RFC 1918 reverse queries where not leaking to
the global internet.

(cherry picked from commit 9d0fa07c5e)
2024-01-05 12:03:59 +01:00
Mark Andrews
617f73426d Adjust comment to have correct message limit value
(cherry picked from commit 560c245971)
2023-11-16 12:22:08 +11:00
Ondřej Surý
6a85e79c0b Reformat sources with up-to-date clang-format-17 2023-11-13 17:13:07 +01:00
Matthijs Mekking
76c9019403 Don't ignore auth zones when in serve-stale mode
When serve-stale is enabled and recursive resolution fails, the fallback
to lookup stale data always happens in the cache database. Any
authoritative data is ignored, and only information learned through
recursive resolution is examined.

If there is data in the cache that could lead to an answer, and this can
be just the root delegation, the resolver will iterate further, getting
closer to the answer that can be found by recursing down the root, and
eventually puts the final response in the cache.

Change the fallback to serve-stale to use 'query_getdb()', that finds
out the best matching database for the given query.

(cherry picked from commit 2322425016)
2023-10-31 13:52:08 +01:00
Michal Nowak
7c6632e174 Update the source code formatting using clang-format-17 2023-10-18 09:02:57 +02:00
Aram Sargsyan
c061b90cc6 Remove unnecessary NULL-checks in ns__client_setup()
All these pointers are guaranteed to be non-NULL.

Additionally, update a comment to remove obviously outdated
information about the function's requirements.

(cherry picked from commit b970556f21)
2023-10-02 10:04:56 +00:00
Aram Sargsyan
2a57c12922 Remove an unnecessary NULL-check
In the ns__client_put_cb() callback function the 'client->manager'
pointer is guaranteed to be non-NULL, because in ns__client_request(),
before setting up the callback, the ns__client_setup() function is
called for the 'client', which makes sure that 'client->manager' is set.

Removing the NULL-check resolves the following static analyzer warning:

    /lib/ns/client.c: 1675 in ns__client_put_cb()
    1669     		dns_message_puttemprdataset(client->message, &client->opt);
    1670     	}
    1671     	client_extendederror_reset(client);
    1672
    1673     	dns_message_detach(&client->message);
    1674
    >>>     CID 465168:  Null pointer dereferences  (REVERSE_INULL)
    >>>     Null-checking "client->manager" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    1675     	if (client->manager != NULL) {
    1676     		ns_clientmgr_detach(&client->manager);
    1677     	}
    1678
    1679     	/*
    1680     	 * Detaching the task must be done after unlinking from
2023-09-14 10:39:24 +00:00
Artem Boldariev
1cc17f797e Allocate DNS send buffers using dedicated per-worker memory arenas
This commit ensures that memory allocations related to DNS send
buffers are routed through dedicated per-worker memory arenas in order
to decrease memory usage on high load caused by TCP-based DNS
transports.

We do that by following jemalloc developers suggestions:

https://github.com/jemalloc/jemalloc/issues/2483#issuecomment-1639019699
https://github.com/jemalloc/jemalloc/issues/2483#issuecomment-1698173849
(cherry picked from commit 01cc7edcca)
2023-09-05 15:02:30 +02:00
Mark Andrews
58be5d8ed0 rr_exists should not error if the name does not exist
rr_exists errored if the name did not exist in the zone.  This was
not an issue prior to the addition of krb5-subdomain-self-rhs and
ms-subdomain-self-rhs as the only name used was the zone name which
always existed.

(cherry picked from commit b76a15977a)
2023-08-30 10:05:09 +10:00
Evan Hunt
07f6c63a80 prevent query_coveringnsec() from running twice
when synthesizing a new CNAME, we now check whether the target
matches the query already being processed. if so, we do not
restart the query; this prevents a waste of resources.

(cherry picked from commit 0ae8b2e056)
2023-08-21 14:31:10 -07:00
Mark Andrews
b3a97da7a7 Use NS rather than A records for qname-minimization relaxed
Remove all references to DNS_FETCHOPT_QMIN_USE_A and adjust
the expected tests results in the qmin system test.

(cherry picked from commit dd00b3c50b)
2023-06-28 12:31:49 +02:00
Matthijs Mekking
ff5bacf17c Fix serve-stale hang at shutdown
The 'refresh_rrset' variable is used to determine if we can detach from
the client. This can cause a hang on shutdown. To fix this, move setting
of the 'nodetach' variable up to where 'refresh_rrset' is set (in
query_lookup(), and thus not in ns_query_done()), and set it to false
when actually refreshing the RRset, so that when this lookup is
completed, the client will be detached.
2023-06-09 14:54:48 +02:00
Evan Hunt
240caa32b9 Stale answer lookups could loop when over recursion quota
When a query was aborted because of the recursion quota being exceeded,
but triggered a stale answer response and a stale data refresh query,
it could cause named to loop back where we are iterating and following
a delegation. Having no good answer in cache, we would fall back to
using serve-stale again, use the stale data, try to refresh the RRset,
and loop back again, without ever terminating until crashing due to
stack overflow.

This happens because in the functions 'query_notfound()' and
'query_delegation_recurse()', we check whether we can fall back to
serving stale data. We shouldn't do so if we are already refreshing
an RRset due to having prioritized stale data in cache.

In other words, we need to add an extra check to 'query_usestale()' to
disallow serving stale data if we are currently refreshing a stale
RRset.

As an additional mitigation to prevent looping, we now use the result
code ISC_R_ALREADYRUNNING rather than ISC_R_FAILURE when a recursion
loop is encountered, and we check for that condition in
'query_usestale()' as well.
2023-06-09 14:54:48 +02:00
Artem Boldariev
285e75b3b0 Use appropriately sized send buffers for DNS messages over TCP
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.

(cherry picked from commit d8a5feb556)
2023-06-06 14:04:01 +02:00
Matthijs Mekking
b90bad93cb Fix serve-stale bug when cache has no data
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.

However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.

Add 'answer_found' checks to the serve-stale logic to fix this issue.

(cherry picked from commit bbd163acf6)
2023-05-30 13:46:00 +02:00
Aram Sargsyan
305bf677ab Implement new -T options for xfer system tests
'-T transferinsecs' makes named interpret the max-transfer-time-out,
max-transfer-idle-out, max-transfer-time-in and max-transfer-idle-in
configuration options as seconds instead of minutes.

'-T transferslowly' makes named to sleep for one second for every
xfrout message.

'-T transferstuck' makes named to sleep for one minute for every
xfrout message.

(cherry picked from commit dfaecfd752)
2023-04-21 17:21:32 +02:00
Evan Hunt
61692942b8 remove named_os_gethostname()
this function was just a front-end for gethostname(). it was
needed when we supported windows, which has a different function
for looking up the hostname; it's not needed any longer.

(cherry picked from commit 197334464e)
2023-02-18 12:27:19 -08:00
Evan Hunt
9f1c6d9744 refactor dns_clientinfo_init(); use separate function to set ECS
Instead of using an extra rarely-used paramater to dns_clientinfo_init()
to set ECS information for a client, this commit adds a function
dns_clientinfo_setecs() which can be called only when ECS is needed.

(cherry picked from commit ff3fdaa424)
2023-02-08 00:13:12 -08:00
Michał Kępień
8b4dcc27ef Merge tag 'v9_18_11' into v9_18
BIND 9.18.11
2023-01-25 21:26:22 +01:00
Aram Sargsyan
8f209c7dcf Refactor isc_nm_xfr_allowed()
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.

(cherry picked from commit 41dc48bfd7)
2023-01-19 12:20:10 +00:00
Aram Sargsyan
a4fc5e5158 Cancel all fetch events in dns_resolver_cancelfetch()
Although 'dns_fetch_t' fetch can have two associated events, one for
each of 'DNS_EVENT_FETCHDONE' and 'DNS_EVENT_TRYSTALE' types, the
dns_resolver_cancelfetch() function is designed in a way that it
expects only one existing event, which it must cancel, and when it
happens so that 'stale-answer-client-timeout' is enabled and there
are two events, only one of them is canceled, and it results in an
assertion in dns_resolver_destroyfetch(), when it finds a dangling
event.

Change the logic of dns_resolver_cancelfetch() function so that it
cancels both the events (if they exist), and in the right order.

(cherry picked from commit ec2098ca35)
2023-01-12 12:54:02 +01:00
Mark Andrews
38323f3b9f Move the mapping of SIG and RRSIG to ANY
dns_db_findext() asserts if RRSIG is passed to it and
query_lookup_stale() failed to map RRSIG to ANY to prevent this.  To
avoid cases like this in the future, move the mapping of SIG and RRSIG
to ANY for qctx->type to qctx_init().

(cherry picked from commit 56eae06418)
2023-01-12 12:27:28 +01:00
Evan Hunt
65d70ebd20 move update ACL and update-policy checks before quota
check allow-update, update-policy, and allow-update-forwarding before
consuming quota slots, so that unauthorized clients can't fill the
quota.

(this moves the access check before the prerequisite check, which
violates the precise wording of RFC 2136. however, RFC co-author Paul
Vixie has stated that the RFC is mistaken on this point; it should have
said that access checking must happen *no later than* the completion of
prerequisite checks, not that it must happen exactly then.)

(cherry picked from commit 964f559edb)
2023-01-12 12:02:35 +01:00
Evan Hunt
9f1ebd25f6 add an update quota
limit the number of simultaneous DNS UPDATE events that can be
processed by adding a quota for update and update forwarding.
this quota currently, arbitrarily, defaults to 100.

also add a statistics counter to record when the update quota
has been exceeded.

(cherry picked from commit 7c47254a14)
2023-01-12 12:02:35 +01:00
Matthijs Mekking
f481073110 Don't set EDE in ns_client_aclchecksilent
The ns_client_aclchecksilent is used to check multiple ACLs before
the decision is made that a query is denied. It is also used to
determine if recursion is available. In those cases we should not
set the extended DNS error "Prohibited".

(cherry picked from commit 798c8f57d4)
2023-01-10 10:02:14 +00:00
Evan Hunt
5fd93c66aa remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.

(cherry picked from commit 916ea26ead)
2023-01-09 14:23:26 -08:00
Michał Kępień
90408617d7 Check for NULL before dereferencing qctx->rpz_st
Commit 9ffb4a7ba1 causes Clang Static
Analyzer to flag a potential NULL dereference in query_nxdomain():

    query.c:9394:26: warning: Dereference of null pointer [core.NullDereference]
            if (!qctx->nxrewrite || qctx->rpz_st->m.rpz->addsoa) {
                                    ^~~~~~~~~~~~~~~~~~~
    1 warning generated.

The warning above is for qctx->rpz_st potentially being a NULL pointer
when query_nxdomain() is called from query_resume().  This is a false
positive because none of the database lookup result codes currently
causing query_nxdomain() to be called (DNS_R_EMPTYWILD, DNS_R_NXDOMAIN)
can be returned by a database lookup following a recursive resolution
attempt.  Add a NULL check nevertheless in order to future-proof the
code and silence Clang Static Analyzer.

(cherry picked from commit 07592d1315)
(cherry picked from commit a4547a1093)
2023-01-09 14:26:02 +01:00
Matthijs Mekking
271bc20b1c Consider non-stale data when in serve-stale mode
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:

A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.

When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.

With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.

(cherry picked from commit 91a1a8efc5)
2023-01-09 14:26:02 +01:00
Artem Boldariev
5de938c6cf Fix TLS session resumption via IDs when Mutual TLS is used
This commit fixes TLS session resumption via session IDs when
client certificates are used. To do so it makes sure that session ID
contexts are set within server TLS contexts. See OpenSSL documentation
for 'SSL_CTX_set_session_id_context()', the "Warnings" section.

(cherry picked from commit 837fef78b1)
2022-12-14 18:32:26 +02:00