4973 Commits

Author SHA1 Message Date
Mark Andrews
cbf416a284 Call isc__iterated_hash_initialize
The iterated hash implementation needs to be initialised
on the worker thread.  Also clean it up after we are done.

(cherry picked from commit 988dc57c8c)
2025-03-04 13:49:38 +00:00
Artem Boldariev
9977c7e5fa DoH: Bump the active streams processing limit
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.

The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.

(cherry picked from commit eaad0aefe6)
2025-03-03 10:12:27 +00:00
Artem Boldariev
b1ca1b3abc DoH: remove obsolete INSIST() check
The check, while not active by default, is not valid since the commit
8b8f4d500d.

See 'if (total == 0) { ...' below branch to understand why.

(cherry picked from commit 217a1ebd79)
2025-03-03 10:12:27 +00:00
Artem Boldariev
0bc12d0deb DoH: Flush HTTP write buffer on an outgoing DNS message
Previously, the code would try to avoid sending any data regardless of
what it is unless:

a) The flush limit is reached;
b) There are no sends in flight.

This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").

Now, additionally to the listed cases, we also:

c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).

That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.

(cherry picked from commit c5f7968856)
2025-03-03 10:12:27 +00:00
Artem Boldariev
30226c749f DoH: Limit the number of delayed IO processing requests
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.

This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.

(cherry picked from commit 0e1b02868a)
2025-03-03 10:12:27 +00:00
Artem Boldariev
515d84e1f6 DoH: Simplify http_do_bio()
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d.

The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.

Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).

(cherry picked from commit 0956fb9b9e)
2025-03-03 10:12:27 +00:00
Ondřej Surý
ace7c879a8 Add isc_timer_running() function to check status of timer
In the next commit, we need to know whether the timer has been started
or stopped.  Add isc_timer_running() function that returns true if the
timer has been started.

(cherry picked from commit b9e3cd5d2a)
2025-02-21 22:27:25 +01:00
Aram Sargsyan
18fbc3f735 Fix isc_quota bug
Running jobs which were entered into the isc_quota queue is the
responsibility of the isc_quota_release() function, which, when
releasing a previously acquired quota, checks whether the queue
is empty, and if it's not, it runs a job from the queue without touching
the 'quota->used' counter. This mechanism is susceptible to a possible
hangup of a newly queued job in case when between the time a decision
has been made to queue it (because used >= max) and the time it was
actually queued, the last quota was released. Since there is no more
quotas to be released (unless arriving in the future), the newly
entered job will be stuck in the queue.

Fix the wrong memory ordering for 'quota->used', as the relaxed
ordering doesn't ensure that data modifications made by one thread
are visible in other threads.

Add checks in both isc_quota_release() and isc_quota_acquire_cb()
to make sure that the described hangup does not happen. Also see
code comments.

(cherry picked from commit c6529891bb)
2025-02-20 12:20:25 +00:00
Artem Boldariev
788e925261 DoH: http_send_outgoing() return value is not used
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.

(cherry picked from commit 2adabe835a)
2025-02-19 20:34:29 +02:00
Artem Boldariev
47e9b47742 DoH: Fix missing send callback calls
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.

It could lead to potential memory leaks and consequent shutdown hangs.

(cherry picked from commit 8b8f4d500d)
2025-02-19 20:34:29 +02:00
Artem Boldariev
6b9387e2ee DoH: change how the active streams number is calculated
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.

The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.

(cherry picked from commit a22bc2d7d4)
2025-02-19 20:34:29 +02:00
Artem Boldariev
96e8ea1245 DoH: Track the amount of in flight outgoing data
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.

Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.

(cherry picked from commit 05e8a50818)
2025-02-19 20:34:29 +02:00
Ondřej Surý
a9f4e3369a Reduce false sharing in dns_qpcache
Instead of having many node_lock_count * sizeof(<member>) arrays, pack
all the members into a qpcache_bucket_t struct that is cacheline aligned
and have a single array of those.

Additionaly, make both the head and the tail of isc_queue_t padded, not
just the head, to prevent false sharing of the lock-free structure with
the lock that follows it.

(cherry picked from commit c602d76c1f)
2025-02-04 23:27:28 +01:00
Artem Boldariev
50a062e5ce DoH: reduce excessive bad request logging
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.

This commit fixes that.

(cherry picked from commit 937b5f8349)
2025-01-15 16:07:13 +01:00
Artem Boldariev
c53541bfc5 Do not stop timer in isc_nm_read_stop() in manual timer mode
A call to isc_nm_read_stop() would always stop reading timer even in
manual timer control mode which was added with StreamDNS in mind. That
looks like an omission that happened due to how timers are controlled
in StreamDNS where we always stop the timer before pausing reading
anyway (see streamdns_on_complete_dnsmessage()). That would not work
well for HTTP, though, where we might want pause reading without
stopping the timer in the case we want to split incoming data into
multiple chunks to be processed independently.

I suppose that it happened due to NM refactoring in the middle of
StreamDNS development (at the time isc_nm_cancelread() and
isc_nm_pauseread() were removed), as the StreamDNS code seems to be
written as if timers are not stoping during a call to
isc_nm_read_stop().

(cherry picked from commit 4ae4e255cf)
2025-01-15 16:05:56 +01:00
Artem Boldariev
36e9720d24 DoH: introduce manual read timer control
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.

That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.

(cherry picked from commit 609a41517b)
2025-01-15 16:05:47 +01:00
Artem Boldariev
4907248d14 DoH: floodding clients detection
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.

Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.

Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:

The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.

(cherry picked from commit 3425e4b1d0)
2025-01-15 16:05:33 +01:00
Artem Boldariev
5eec1f5368 DoH: process data chunk by chunk instead of all at once
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.

Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.

That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.

Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.

(cherry picked from commit 9846f395ad)
2025-01-15 16:05:13 +01:00
Artem Boldariev
4f8ade0e1e TLS SNI - add low level support for SNI to the networking code
This commit adds support for setting SNI hostnames in outgoing
connections over TLS.

Most of the changes are related to either adapting the code to accept
and extra argument in *connect() functions and a couple of changes to
the TLS Stream to actually make use of the new SNI hostname
information.

(cherry picked from commit 6691a1530d)
2024-12-26 18:31:03 +02:00
Pavel Březina
93bef0ea28 mark loop as shuttingdown earlier in shutdown_cb
`shutdown_trigger_close_cb` is not called in the main loop since
queued events in the `loop->async_trigger`, including loop teardown
(shutdown_server) are processed first, before the `uv_close` callback
is executed..

In order to pass the information to the queued events, it is necessary
to set the flag earlier in the process and not wait for the `uv_close`
callback to trigger.

(cherry picked from commit 67e21d94d4)
2024-12-10 19:52:13 +00:00
Ondřej Surý
476757770b Update picohttpparser.{c,h} with upstream repository
Upstream code doesn't do regular releases, so we need to regularly
sync the code from the upstream repository.  This is synchronization up
to the commit f8d0513 from Jan 29, 2024.

(cherry picked from commit d14a76e115)
2024-12-08 12:30:07 +00:00
Matthijs Mekking
a7b291adc7 Fix nsupdate hang when processing a large update
The root cause is the fix for CVE-2024-0760 (part 3), which resets
the TCP connection on a failed send. Specifically commit
4b7c61381f stops reading on the socket
because the TCP connection is throttling.

When the tcpdns_send_cb callback thinks about restarting reading
on the socket, this fails because the socket is a client socket.
And nsupdate is a client and is using the same netmgr code.

This commit removes the requirement that the socket must be a server
socket, allowing reading on the socket again after being throttled.

(cherry picked from commit aa24b77d8b)
2024-12-06 08:31:19 +00:00
Matthijs Mekking
492f79560d Implement global limit for outgoing queries
This global limit is not reset on query restarts and is a hard limit
for any client request.

(cherry picked from commit 16b3bd1cc7)
2024-12-06 06:20:33 +00:00
Matthijs Mekking
511c86facb Implement getter function for counter limit
(cherry picked from commit ca7d487357)
2024-12-06 06:20:33 +00:00
Ondřej Surý
624ea6c57e Move contributed DLZ modules into a separate repository
The DLZ modules are poorly maintained as we only ensure they can still
be compiled, the DLZ interface is blocking, so anything that blocks the
query to the database blocks the whole server and they should not be
used except in testing.  The DLZ interface itself should be scheduled
for removal.

(cherry picked from commit a6cce753e2)
2024-11-26 16:24:17 +01:00
Alessio Podda
0472494417 Incrementally apply AXFR transfer
Reintroduce logic to apply diffs when the number of pending tuples is
above 128. The previous strategy of accumulating all the tuples and
pushing them at the end leads to excessive memory consumption during
transfer.

This effectively reverts half of e3892805d6

(cherry picked from commit 99b4f01b33)
2024-11-26 07:17:06 +00:00
Mark Andrews
983d8a6821 Provide more visibility into configuration errors
by logging SSL_CTX_use_certificate_chain_file and
SSL_CTX_use_PrivateKey_file errors

(cherry picked from commit 9006839ed7)
2024-11-26 12:25:01 +11:00
Ondřej Surý
c22176c0f9 Remove redundant semicolons after the closing braces of functions
(cherry picked from commit 1a19ce39db)
2024-11-19 14:26:56 +01:00
Ondřej Surý
58a15d38c2 Remove redundant parentheses from the return statement
(cherry picked from commit 0258850f20)
2024-11-19 14:26:52 +01:00
Evan Hunt
b5475c9cda corrected code style errors
- add missing brackets around one-line statements
- add paretheses around return values
2024-10-18 19:31:56 +00:00
Mark Andrews
887e874e93 Fix recursive-clients 0
Setting recursive-clients 0 triggered an assertion in isc_quota_soft.
This has now been fixed.

(cherry picked from commit 840eaa628d)
2024-10-17 22:05:22 +00:00
Petr Menšík
75a50925f7 Remove unused <openssl/{hmac,engine}.h> headers from OpenSSL shims
The <openssl/{hmac,engine}.h> headers were unused and including the
<openssl/engine.h> header might cause build failure when OpenSSL
doesn't have Engines support enabled.

See https://fedoraproject.org/wiki/Changes/OpensslDeprecateEngine
2024-10-16 04:39:43 +00:00
Ondřej Surý
4b4c550cd8 Don't enable SO_REUSEADDR on outgoing UDP sockets
Currently, the outgoing UDP sockets have enabled
SO_REUSEADDR (SO_REUSEPORT on BSDs) which allows multiple UDP sockets to
bind to the same address+port.  There's one caveat though - only a
single (the last one) socket is going to receive all the incoming
traffic.  This in turn could lead to incoming DNS message matching to
invalid dns_dispatch and getting dropped.

Disable setting the SO_REUSEADDR on the outgoing UDP sockets.  This
needs to be done explicitly because `uv_udp_open()` silently enables the
option on the socket.

(cherry picked from commit eec30c33c2)
2024-10-02 12:16:58 +00:00
Ondřej Surý
5701bf9dab Use release memory ordering when incrementing reference counter
As the relaxed memory ordering doesn't ensure any memory
synchronization, it is possible that the increment will succeed even
in the case when it should not - there is a race between
atomic_fetch_sub(..., acq_rel) and atomic_fetch_add(..., relaxed).
Only the result is consistent, but the previous value for both calls
could be same when both calls are executed at the same time.

(cherry picked from commit 88227ea665)
2024-10-02 09:09:35 +02:00
Nicki Křížek
f2fa1b7d63 Update code formatting
clang 19 was updated in the base image.

(cherry picked from commit ebb5bd9c0f)
2024-09-21 12:45:27 +02:00
Nicki Křížek
38fb8bed49 Revert "Double the number of threadpool threads"
This reverts commit 6857df20a4.

(cherry picked from commit 842abe9fbf)
2024-09-20 14:51:33 +00:00
Nicki Křížek
379d7faeac Merge tag 'v9.20.2' into bind-9.20 2024-09-18 18:06:27 +02:00
Ondřej Surý
6bff6df272 Limit the outgoing UDP send queue size
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages.  As those are not going to be
delivered anyway (as we argued when we stopped enlarging the operating
system send and receive buffers), try to send the UDP messages directly
using `uv_udp_try_send()` and if that fails, drop the outgoing UDP
message.

(cherry picked from commit b576c4c977)
2024-09-17 16:31:25 +02:00
alessio
6e42d96cf1 Do not set SO_INCOMING_CPU
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue and setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.

(cherry picked from commit 8b8149cdd2)
2024-09-16 12:57:08 +00:00
Ondřej Surý
17f23224d1 Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.

Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop.  In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.

(cherry picked from commit 6370e9b311)
2024-09-12 14:39:07 +00:00
Michal Nowak
0aeefb9741 Update code formatting
clang 19 was updated in the base image.

(cherry picked from commit ff69d07fed)
2024-09-11 09:33:13 +00:00
Nicki Křížek
4d8491396d Double the number of threadpool threads
Introduce this temporary workaround to reduce the impact of long-running
tasks in offload threads which can block the resolution of queries.

(cherry picked from commit 6857df20a4)
2024-09-06 14:55:38 +02:00
Ondřej Surý
5255843f9b Follow the number of CPU set by taskset/cpuset
Administrators may wish to constrain the set of cores that BIND 9 runs
on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on
other O/S), for example to achieve higher (or more stable) performance
by more closely associating threads with individual NIC rx queues. If
the admin has used taskset, it follows that BIND ought to
automatically use the given number of CPUs rather than the system wide
count.

Co-Authored-By: Ray Bellis <ray@isc.org>
(cherry picked from commit 5a2df8caf5)
2024-09-03 13:52:10 +00:00
Ondřej Surý
619d21b57c Stop using malloc_usable_size and malloc_size
Although the nanual page of malloc_usable_size says:

    Although the excess bytes can be over‐written by the application
    without ill effects, this is not good programming practice: the
    number of excess bytes in an allocation depends on the underlying
    implementation.

it looks like the premise is broken with _FORTIFY_SOURCE=3 on newer
systems and it might return a value that causes program to stop with
"buffer overflow" detected from the _FORTIFY_SOURCE.  As we do have own
implementation that tracks the allocation size that we can use to track
the allocation size, we can stop relying on this introspection function.

Also the newer manual page for malloc_usable_size changed the NOTES to:

    The value returned by malloc_usable_size() may be greater than the
    requested size of the allocation because of various internal
    implementation details, none of which the programmer should rely on.
    This function is intended to only be used for diagnostics and
    statistics; writing to the excess memory without first calling
    realloc(3) to resize the allocation is not supported.  The returned
    value is only valid at the time of the call.

Remove usage of both malloc_usable_size() and malloc_size() to be on the
safe size and only use the internal size tracking mechanism when
jemalloc is not available.

(cherry picked from commit d61712d14e)
2024-08-26 18:27:01 +00:00
Matthijs Mekking
6f6d000103 Apply SKR bundle on rekey
When a zone has a skr structure, lookup the currently active bundle
that contains the right key and signature material.

(cherry picked from commit 63e058c29e)
2024-08-22 10:17:08 +00:00
Ondřej Surý
46069fe5c7 Use clang-format-19 to update formatting
This is purely result of running:

    git-clang-format-19 --binary clang-format-19 origin/main

(cherry picked from commit 7b756350f5)
2024-08-22 08:16:03 +00:00
Ondřej Surý
97a9e4711c Remove code to read and parse /proc/net/if_inet6 on Linux
The getifaddr() works fine for years, so we don't have to
keep the callback to parse /proc/net/if_inet6 anymore.

(cherry picked from commit 2fbf9757b8)
2024-08-19 11:49:56 +00:00
Ondřej Surý
2a0454f881 Ignore errno returned from rewind() in the interface iterator
The clang-scan 19 has reported that we are ignoring errno after the call
to rewind().  As we don't really care about the result, just silence the
error, the whole code will be removed in the development version anyway
as it is not needed.

(cherry picked from commit dda5ba53df)
2024-08-19 11:49:56 +00:00
Ondřej Surý
530f1dd913 Check the result of dirfd() before calling unlinkat()
Instead of directly using the result of dirfd() in the unlinkat() call,
check whether the returned file descriptor is actually valid.  That
doesn't really change the logic as the unlinkat() would fail with
invalid descriptor anyway, but this is cleaner and will report the right
error returned directly by dirfd() instead of EBADF from unlinkat().

(cherry picked from commit 59f4fdebc0)
2024-08-19 10:03:08 +00:00
Ondřej Surý
dc4c0397eb Use constexpr for NS_PER_SEC and friends constants
The contexpr introduced in C23 standard makes perfect sense to be used
instead of preprocessor macros - the symbols are kept, etc.  Define
ISC_CONSTEXPR to be `constexpr` for C23 and `static const` for the older
C standards.  Use the newly introduced macro for the NS_PER_SEC and
friends time constants.

(cherry picked from commit 122a142241)
2024-08-19 09:10:04 +00:00