Instead of directly using the result of dirfd() in the unlinkat() call,
check whether the returned file descriptor is actually valid. That
doesn't really change the logic as the unlinkat() would fail with
invalid descriptor anyway, but this is cleaner and will report the right
error returned directly by dirfd() instead of EBADF from unlinkat().
(cherry picked from commit 59f4fdebc0)
The getifaddr() works fine for years, so we don't have to
keep the callback to parse /proc/net/if_inet6 anymore.
(cherry picked from commit 2fbf9757b8)
The clang-scan 19 has reported that we are ignoring errno after the call
to rewind(). As we don't really care about the result, just silence the
error, the whole code will be removed in the development version anyway
as it is not needed.
(cherry picked from commit dda5ba53df)
When the SSL object was destroyed, it would invalidate all SSL_SESSION
objects including the cached, but not yet used, TLS session objects.
Properly disassociate the SSL object from the SSL_SESSION before we
store it in the TLS session cache, so we can later destroy it without
invalidating the cached TLS sessions.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
Co-authored-by: Aram Sargsyan <aram@isc.org>
(cherry picked from commit c11b736e44)
When TLS connection (TLSstream) connection was accepted, the children
listening socket was not attached to sock->server and thus it could have
been freed before all the accepted connections were actually closed.
In turn, this would cause us to call isc_tls_free() too soon - causing
cascade errors in pending SSL_read_ex() in the accepted connections.
Properly attach and detach the children listening socket when accepting
and closing the server connections.
(cherry picked from commit 684f3eb8e6)
Don't run more events than already scheduled. If the quantum is set to
a high value, the task_run() would execute already scheduled, and all
new events that result from running event->ev_action().
Setting quantum to a number of scheduled events will postpone events
scheduled after we enter the loop here to the next task_run()
invocation.
Instead of randomly using -1 or 1 as a failure status, properly utilize
the EXIT_FAILURE define that's platform specific (as it should be).
(cherry picked from commit76997983fde02d9c32aa23bda30b65f1ebd4178c)
On a 32 bit machine casting to size_t can still lead to an overflow.
Cast to uint64_t. Also detect all possible negative values for
pages and pagesize to silence warning about possible negative value.
39#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
1. tainted_data_return: Called function sysconf(_SC_PHYS_PAGES),
and a possible return value may be less than zero.
2. assign: Assigning: pages = sysconf(_SC_PHYS_PAGES).
40 long pages = sysconf(_SC_PHYS_PAGES);
41 long pagesize = sysconf(_SC_PAGESIZE);
42
3. Condition pages == -1, taking false branch.
4. Condition pagesize == -1, taking false branch.
43 if (pages == -1 || pagesize == -1) {
44 return (0);
45 }
46
5. overflow: The expression (size_t)pages * pagesize might be negative,
but is used in a context that treats it as unsigned.
CID 498034: (#1 of 1): Overflowed return value (INTEGER_OVERFLOW)
6. return_overflow: (size_t)pages * pagesize, which might have underflowed,
is returned from the function.
47 return ((size_t)pages * pagesize);
48#endif /* if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE) */
(cherry picked from commit e8dbc5db92)
This commit ensures that we restart reading only when all DNS data in
the input buffer is processed so the we will not get into the
situation when the buffer is overrun.
Be more aggressive when throttling the reading - when we can't send the
outgoing TCP synchronously with uv_try_write(), we start throttling the
reading immediately instead of waiting for the send buffers to fill up.
This should not affect behaved clients that read the data from the TCP
on the other end.
(cherry picked from commit bc3e713317)
This commit ensures that socket objects use smaller sizes for its
internal requests and handles pools. That prevents a memory allocator
from thrashing.
When a peer is not reading the data we are sending it was for the TLS
DNS code to end up in a situation when it would indefinitely
reschedule send requests, effectively turning the 'uv_loop' into a
busy loop that would consume CPU cycles in endless efforts to send
outgoing data.
The main reason for that was only one send buffer dedicated for sends:
the code would re-queue sends until it is empty - that would never
happen when the remote side is not reading data.
That seems like an omission from the older day of the Network Manager
as it is quiet simple to make the code use multiple buffers for
sends. That ultimately breaks the cycle of futile send request
rescheduling.
As a side effect, this commit also gets rid of one memory copying on a
hot path.
Due to omission it was possible to un-throttle a TCP connection
previously throttled due to the peer not reading back data we are
sending.
In particular, that affected DoH code, but it could also affect other
transports (the current or future ones) that pause/resume reading
according to its internal state.
(cherry picked from commit d228aa8bbb944fbd04baf22d151fde5c33561e26)
The single TCP read can create as much as 64k divided by the minimum
size of the DNS message. This can clog the processing thread and trash
the memory allocator because we need to do as much as ~20k allocations in
a single UV loop tick.
Limit the number of the DNS messages processed in a single UV loop tick
to just single DNS message and limit the number of the outstanding DNS
messages back to 23. This effectively limits the number of pipelined
DNS messages to that number (this is the limit we already had before).
This reverts commit 780a89012d.
When TCP client would not read the DNS message sent to them, the TCP
sends inside named would accumulate and cause degradation of the
service. Throttle the reading from the TCP socket when we accumulate
enough DNS data to be sent. Currently this is limited in a way that a
single largest possible DNS message can fit into the buffer.
(cherry picked from commit 26006f7b44474819fac2a76dc6cd6f69f0d76828)
This commit ensures that an HTTP endpoints set reference is stored in
a socket object associated with an HTTP/2 stream instead of
referencing the global set stored inside a listener.
This helps to prevent an issue like follows:
1. BIND is configured to serve DoH clients;
2. A client is connected and one or more HTTP/2 stream is
created. Internal pointers are now pointing to the data on the
associated HTTP endpoints set;
3. BIND is reconfigured - the new endpoints set object is created and
promoted to all listeners;
4. The old pointers to the HTTP endpoints set data are now invalid.
Instead referencing a global object that is updated on
re-configurations we now store a local reference which prevents the
endpoints set objects to go out of scope prematurely.
(cherry picked from commit b9b5d0c01a3a546c4a6a8b3bff8ae9dd31fee224)
It was reported that HTTP/2 session might get closed or even deleted
before all async. processing has been completed.
This commit addresses that: now we are avoiding using the object when
we do not need it or specifically check if the pointers used are not
'NULL' and by ensuring that there is at least one reference to the
session object while we are doing incoming data processing.
This commit makes the code more resilient to such issues in the
future.
(cherry picked from commit 0cca550dff403c6100b7c0da8f252e7967765ba7)
If uv_tcp_close_reset() returns an error code, this means the
reset_shutdown callback has not been issued, so do it now.
(cherry picked from commit c40e5c8653)
Failing to accept TCP/TLS connections in 9.18 detaches the quota in
isc__nm_failed_accept_cb, causing TCP4Clients and TCP6Clients statistics
to not decrease inside cleanup.
Fix by increasing the counter after the point of no failure but before
handling statistics through the client's socket is no longer valid.
When isc_task_purgeevent() is called for and 'event', the event, in
the meanwhile, could in theory get processed, unlinked, and freed.
So when the function then operates on the 'event', it causes a
segmentation fault.
The only place where isc_task_purgeevent() is called is from
timer_purge().
In order to resolve the data race, call isc_task_purgeevent() inside
the 'timer->lock' locked block, so that timerevent_destroy() won't
be able to destroy the event if it was processed in the meanwhile,
before isc_task_purgeevent() had a chance to purge it.
In order to be able to do that, move the responsibility of calling
isc_event_free() (upon a successful purge) out from the
isc_task_purgeevent() function to its caller instead, so that it can
be called outside of the timer->lock locked block.
Let basic_tick() of 'task1' and 'basic_quick' of 'task4' run in
different threads, and insert an artificial delay in timer_purge()
to cause an existing race condition to appear.
The statistics channel does not expose the current number of TCP clients
connected, only the highwater. Therefore, users did not have an easy
means to collect statistics about TCP clients served over time. This
information could only be measured as a seperate mechanism via rndc by
looking at the TCP quota filled.
In order to expose the exact current count of connected TCP clients
(tracked by the "tcp-clients" quota) as a statistics counter, an
extra, dedicated Network Manager callback would need to be
implemented for that purpose (a counterpart of ns__client_tcpconn()
that would be run when a TCP connection is torn down), which is
inefficient. Instead, track the number of currently-connected TCP
clients separately for IPv4 and IPv6, as Network Manager statistics.
(cherry picked from commit 2690dc48d3)
The case insensitive matching in isc_ht was basically completely broken
as only the hashvalue computation was case insensitive, but the key
comparison was always case sensitive.
(cherry picked from commit ec11aa2836)
The case insensitive matching in isc_ht was basically completely broken
as only the hashvalue computation was case insensitive, but the key
comparison was always case sensitive.
(cherry picked from commit 34ae6916f115fc291865857509433f95c2bc0871)
Change the taskmgr (and thus netmgr) in a way that it supports fast and
slow task queues. The fast queue is used for incoming DNS traffic and
it will pass the processing to the slow queue for sending outgoing DNS
messages and processing resolver messages.
In the future, more tasks might get moved to the slow queues, so the
cached and authoritative DNS traffic can be handled without being slowed
down by operations that take longer time to process.
cmocka.h and jemalloc.h/malloc_np.h has conflicting macro definitions.
While fixing them with push_macro for only malloc is done below, we only
need the non-standard mallocx interface which is easy to just define by
ourselves.
(cherry picked from commit 197de93bdc)
Because we don't use jemalloc functions directly, but only via the
libisc library, the dynamic linker might pull the jemalloc library
too late when memory has been already allocated via standard libc
allocator.
Add a workaround round isc_mem_create() that makes the dynamic linker
to pull jemalloc earlier than libc.
(cherry picked from commit 41a0ee1071)
When connecting to a remote party the TLS DNS code could process more
than one message at a time despite the fact that it is expected that
we should stop after every DNS message.
Every DNS message is handled and consumed from the input buffer by
isc__nm_process_sock_buffer(). However, as opposed to TCP DNS code, it
can be called more than once when processing incoming data from a
server (see tls_cycle_input()). That, in turn means that we can
process more than one message at a time. Some higher level code might
not expect that, as it breaks the contract.
In particular, in the original report that happened during
isc__nm_async_tlsdnsshutdown() call: when shutting down multiple calls
to tls_cycle() are possible (each possibly leading to a
isc__nm_process_sock_buffer()). If there are any non processed
messages left, for any of the messages left the read callback will be
called even when it is not expected as there were no preceding
isc_nm_read().
To keep TCP DNS and TLS DNS code in sync, we make a similar change to
it as well, although it should not matter.
When jemalloc is linked into BIND 9 binaries (rather than preloaded or
used as the system allocator), depending on the decisions made by the
linker, the malloc() symbol may be resolved to a non-jemalloc
implementation at runtime. Such a scenario foils the workaround added
in commit 2da371d005 as it relies on the
jemalloc implementation of malloc() to be executed.
Handle the above scenario properly by calling mallocx() explicitly
instead of relying on the runtime resolution of the malloc() symbol.
Use trivial wrapper functions to avoid the need to copy multiple #ifdef
lines from lib/isc/mem.c to lib/isc/trampoline.c. Using a simpler
alternative, e.g. calling isc_mem_create() & isc_mem_destroy(), was
already considered before and rejected, as described in the log message
for commit 2da371d005.
ADJUST_ZERO_ALLOCATION_SIZE() is only used in isc__mem_free_noctx() to
concisely avoid compilation warnings about its 'size' parameter not
being used when building against jemalloc < 4.0.0 (as sdallocx() is then
redefined to dallocx(), which has a different signature).
Apply the semantic patch to catch all the places where we pass 'char' to
the <ctype.h> family of functions (isalpha() and friends, toupper(),
tolower()).
(cherry picked from commit 29caa6d1f0)
In TLS DNS, it was possible for an 'isc__nm_uvreq_t' object to be used
twice when it was not implied, leading to use after free errors,
leading to abort()'s and/or crashes.
In particular, in the older version of the code, the following was
happening:
* 'tlsdns_send_direct()' is eventually called from
'isc__nm_async_tlsdnssend()' when sending data;
* 'tlsdns_send_direct()' could fail. If it happens early closer to the
beginning of the function, then the 'isc__nm_uvreq_t' object is passed
to 'isc__nm_failed_send_cb()' and eventually gets destroyed normally
without any negative consequences;
* However, if 'tlsdns_send_direct()' fails due to the call to
'tls_cycle()', then the send request object is re-queued to be
processed asynchronously later by a call to 'tlsdns_send_enqueue()'.
* The error processing code in 'tlsdns_send_direct()' is written
without an assumption that the send request object could be re-queued
in 'tlsdns_send_direct()' and, thus, it was passed to
'isc__nm_failed_send_cb()' for asynchronous error processing.
* As a result of asynchronous processing initiated by
`tlsdns_send_enqueue()` the 'isc__nm_uvreq_t' send request object gets
properly destroyed eventually.
* Then, as a result of asynchronous error processing initiated by the
'isc__nm_failed_send_cb()' call, the send request object is used a
second time. However, it was destroyed at the previous step.
So we have a (relatively) obvious logical mistake here. Why
'tls_cycle()' was failing is of no particular importance, as I can see
many ways for it to fail both on TLS and networking levels. In
particular, during testing with 'dnsperf', I have encountered the
following two cases:
1. 'result == ISC_R_TLSERROR' - the remote side was wrapping up,
causing errors on TLS level (e.g. handshake has not been completed
yet);
2. 'result == ISC_R_CONNECTIONRESET' - the remote side closed the
connection after we have initiated the asynchronous send request
operation and reached the point when its processing started.
One could ask: why haven't we encountered the error before? That
happened because 'send()' is, in general, a faster operation than
'read()' so we were lucky enough not to encounter this, as it took
dnsperf to make our code fail in a particular way to trigger the
problematic code path. It requires closing the connection shortly
after we have made an asynchronous 'send()' attempt and passed the
data for processing by the libuv code (or, to be precise, down our
technologies stack).