The getifaddr() works fine for years, so we don't have to
keep the callback to parse /proc/net/if_inet6 anymore.
(cherry picked from commit 2fbf9757b8)
The clang-scan 19 has reported that we are ignoring errno after the call
to rewind(). As we don't really care about the result, just silence the
error, the whole code will be removed in the development version anyway
as it is not needed.
(cherry picked from commit dda5ba53df)
Instead of directly using the result of dirfd() in the unlinkat() call,
check whether the returned file descriptor is actually valid. That
doesn't really change the logic as the unlinkat() would fail with
invalid descriptor anyway, but this is cleaner and will report the right
error returned directly by dirfd() instead of EBADF from unlinkat().
(cherry picked from commit 59f4fdebc0)
The contexpr introduced in C23 standard makes perfect sense to be used
instead of preprocessor macros - the symbols are kept, etc. Define
ISC_CONSTEXPR to be `constexpr` for C23 and `static const` for the older
C standards. Use the newly introduced macro for the NS_PER_SEC and
friends time constants.
(cherry picked from commit 122a142241)
New version of clang (19) has introduced a stricter checks when mixing
integer (and float types) with enums. In this case, we used enum {}
as C17 doesn't have constexpr yet. Change the time conversion constants
to be static const unsigned int instead of enum values.
(cherry picked from commit b03e90e0d4)
Check if 'lctx->logconfig' is NULL before using it in isc_log_doit(),
because it's possible that isc_log_destroy() was already called, e.g.
when a 'call_rcu' function wants to log a message during shutdown.
(cherry picked from commit 656e04f48a)
When iterating through the old internal hashmap table, skip all the
nodes that have been already migrated to the new table. We know that
all positions with index less than .hiter are NULL.
(cherry picked from commit 3e4d153453)
When the round robin hashing reorders the map entries on deletion, we
were adjusting the iterator table size only when the reordering was
happening at the internal table boundary. The iterator table size had
to be reduced by one to prevent seeing the entry that resized on
position [0] twice because it migrated to [iter->size - 1] position.
However, the same thing could happen when the same entry migrates a
second time from [iter->size - 1] to [iter->size - 2] position (and so
on) because the check that we are manipulating the entry just in the [0]
position was insufficient. Instead of checking the position [pos == 0],
we now check that the [pos % iter->size == 0], thus ignoring all the
entries that might have moved back to the end of the internal table.
(cherry picked from commit acdc57259f)
When the SSL object was destroyed, it would invalidate all SSL_SESSION
objects including the cached, but not yet used, TLS session objects.
Properly disassociate the SSL object from the SSL_SESSION before we
store it in the TLS session cache, so we can later destroy it without
invalidating the cached TLS sessions.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
Co-authored-by: Aram Sargsyan <aram@isc.org>
(cherry picked from commit c11b736e44)
When TLS connection (TLSstream) connection was accepted, the children
listening socket was not attached to sock->server and thus it could have
been freed before all the accepted connections were actually closed.
In turn, this would cause us to call isc_tls_free() too soon - causing
cascade errors in pending SSL_read_ex() in the accepted connections.
Properly attach and detach the children listening socket when accepting
and closing the server connections.
(cherry picked from commit 684f3eb8e6)
The previous work in this area was led by the belief that we might be
calling call_rcu() from within call_rcu() callbacks. After carefully
checking all the current callback, it became evident that this is not
the case and the problem isn't enough rcu_barrier() calls, but something
entirely else.
Call the rcu_barrier() just once as that's enough and the multiple
rcu_barrier() calls will not hide the real problem anymore, so we can
find it.
(cherry picked from commit 13941c8ca7)
When putting the 48-bit number into a fixed-size buffer that's exactly 6
bytes, the assertion failure would occur as the 48-bit number is
internally represented as 64-bit number and the code was checking if
there is enough space for `sizeof(val)`. This causes assertion failure
when otherwise valid TSIG signature has a bad timing information.
Specify the size of the argument explicitly, so the 48-bit number
doesn't require 8-byte long buffer.
(cherry picked from commit 37dbd57c16)
The PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_ERRORCHECK_NP are
usually not defines, but enum values, so simple preprocessor check
doesn't work.
Check for PTHREAD_MUTEX_ADAPTIVE_NP from the autoconf AS_COMPILE_IFELSE
block and define HAVE_PTHREAD_MUTEX_ADAPTIVE_NP. This should enable
adaptive mutex on Linux and FreeBSD.
As PTHREAD_MUTEX_ERRORCHECK actually comes from POSIX and Linux glibc
does define it when compatibility macros are being set, we can just use
PTHREAD_MUTEX_ERRORCHECK instead of PTHREAD_MUTEX_ERRORCHECK_NP.
(cherry picked from commit cc4f99bc6d)
On a 32 bit machine casting to size_t can still lead to an overflow.
Cast to uint64_t. Also detect all possible negative values for
pages and pagesize to silence warning about possible negative value.
39#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
1. tainted_data_return: Called function sysconf(_SC_PHYS_PAGES),
and a possible return value may be less than zero.
2. assign: Assigning: pages = sysconf(_SC_PHYS_PAGES).
40 long pages = sysconf(_SC_PHYS_PAGES);
41 long pagesize = sysconf(_SC_PAGESIZE);
42
3. Condition pages == -1, taking false branch.
4. Condition pagesize == -1, taking false branch.
43 if (pages == -1 || pagesize == -1) {
44 return (0);
45 }
46
5. overflow: The expression (size_t)pages * pagesize might be negative,
but is used in a context that treats it as unsigned.
CID 498034: (#1 of 1): Overflowed return value (INTEGER_OVERFLOW)
6. return_overflow: (size_t)pages * pagesize, which might have underflowed,
is returned from the function.
47 return ((size_t)pages * pagesize);
48#endif /* if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE) */
(cherry picked from commit e8dbc5db92)
This commit ensures that we are not attempting to accept an expired
TCP connection as we are not interested in any data that could have
been accumulated in its internal buffers. Now we just drop them for
good.
Be more aggressive when throttling the reading - when we can't send the
outgoing TCP synchronously with uv_try_write(), we start throttling the
reading immediately instead of waiting for the send buffers to fill up.
This should not affect behaved clients that read the data from the TCP
on the other end.
Due to omission it was possible to un-throttle a TCP connection
previously throttled due to the peer not reading back data we are
sending.
In particular, that affected DoH code, but it could also affect other
transports (the current or future ones) that pause/resume reading
according to its internal state.
The single TCP read can create as much as 64k divided by the minimum
size of the DNS message. This can clog the processing thread and trash
the memory allocator because we need to do as much as ~20k allocations in
a single UV loop tick.
Limit the number of the DNS messages processed in a single UV loop tick
to just single DNS message and limit the number of the outstanding DNS
messages back to 23. This effectively limits the number of pipelined
DNS messages to that number (this is the limit we already had before).
When TCP client would not read the DNS message sent to them, the TCP
sends inside named would accumulate and cause degradation of the
service. Throttle the reading from the TCP socket when we accumulate
enough DNS data to be sent. Currently this is limited in a way that a
single largest possible DNS message can fit into the buffer.
This commit ensures that an HTTP endpoints set reference is stored in
a socket object associated with an HTTP/2 stream instead of
referencing the global set stored inside a listener.
This helps to prevent an issue like follows:
1. BIND is configured to serve DoH clients;
2. A client is connected and one or more HTTP/2 stream is
created. Internal pointers are now pointing to the data on the
associated HTTP endpoints set;
3. BIND is reconfigured - the new endpoints set object is created and
promoted to all listeners;
4. The old pointers to the HTTP endpoints set data are now invalid.
Instead referencing a global object that is updated on
re-configurations we now store a local reference which prevents the
endpoints set objects to go out of scope prematurely.
It was reported that HTTP/2 session might get closed or even deleted
before all async. processing has been completed.
This commit addresses that: now we are avoiding using the object when
we do not need it or specifically check if the pointers used are not
'NULL' and by ensuring that there is at least one reference to the
session object while we are doing incoming data processing.
This commit makes the code more resilient to such issues in the
future.
Add an isc_queue implementation that hides the gory details of cds_wfcq
into more neat API. The same caveats as with cds_wfcq.
TODO: Add documentation to the API.
The detach function declaration in `ISC__REFCOUNT_TRACE_DECL` had an
returned an accidental implicit int. While not allowed since C99, it
became an error by default in GCC 14.
`ISC_REFCOUNT_TRACE_IMPL` and `ISC_REFCOUNT_STATIC_TRACE_IMPL` expanded
into the wrong macros, trying to declare it again with the wrong number
of parameters.
An assertion failure would be triggered when sending the TCP data ends
after the TCP reading gets closed. Implement proper reference counting
for the isc_httpd object.
Returning the value allows for better high-water tracking without
running into edge cases like the following:
0. The counter is at value X
1. Increment the value (X+1)
2. The value is decreased multiple times in another threads (X+1-Y)
3. Get the value (X+1-Y)
4. Update-if-greater misses the X+1 value which should have been the
high-water
Prepare the statistics channel data in the offloaded worker thread, so
the networking thread is not blocked by the process gathering data from
various data structures. Only the netmgr send is then run on the
networkin thread when all the data is already there.
isc_loop() can now take its place.
This also requires changes to the test harness - instead of running the
setup and teardown outside of th main loop, we now schedule the setup
and teardown to run on the loop (via isc_loop_setup() and
isc_loop_teardown()) - this is needed because the new the isc_loop()
call has to be run on the active event loop, but previously the
isc_loop_current() (and the variants like isc_loop_main()) would work
even outside of the loop because it needed just isc_tid() to work, but
not the full loop (which was mainly true for the main thread).
if we had a method to get the running loop, similar to how
isc_tid() gets the current thread ID, we can simplify loop
and loopmgr initialization.
remove most uses of isc_loop_current() in favor of isc_loop().
in some places where that was the only reason to pass loopmgr,
remove loopmgr from the function parameters.
The case insensitive matching in isc_ht was basically completely broken
as only the hashvalue computation was case insensitive, but the key
comparison was always case sensitive.
The isc_log_t contains a isc_logconfig_t that is swapped, dereferenced
or accessed its fields through a mutex. Instead of protecting it with a
rwlock, use RCU.
To reduce memory pressure, we can add light per-loop (netmgr worker)
memory pools for isc_nmsocket_t structures. This will help in
situations where there's a lot of churn creating and destroying the
nmsockets.
Embedding isc_nmsocket_h2_t directly inside isc_nmsocket_t had increased
the size of isc_nmsocket_t to 1840 bytes. Making the isc_nmsocket_h2_t
to be a pointer to the structure and allocated on demand allows us to
reduce the size to 1208 bytes. While there are still some possible
reductions in the isc_nmsocket_t (embedded tlsstream, streamdns
structures), this was the far biggest drop in the memory usage.
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use. Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure. Remove all the
unused members from the uv_req union.
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.
As it was pointed out, the alignas() can't be used on objects larger
than `max_align_t` otherwise the compiler might miscompile the code to
use auto-vectorization on unaligned memory.
As we were only using alignas() as a way to prevent false memory
sharing, we can use manual padding in the affected structures.
With _exit() instead of exit() in place, we don't need
isc__tls_setfatalmode() mechanism as the atexit() calls will not be
executed including OpenSSL atexit hooks.
Instead of crude 5x rcu_barrier() call in the isc__mem_destroy(), change
the mechanism to call rcu_barrier() until the memory use and references
stops decreasing. This should deal with any number of nested call_rcu()
levels.
Additionally, don't destroy the contextslock if the list of the contexts
isn't empty. Destroying the lock could make the late threads crash.