Commit Graph

4433 Commits

Author SHA1 Message Date
Ondřej Surý
933162ae14 Lock the trampoline when attaching
When attaching to the trampoline, the isc__trampoline_max was access
unlocked.  This would not manifest under normal circumstances because we
initialize 65 trampolines by default and that's enough for most
commodity hardware, but there are ARM machines with 128+ cores where
this would be reported by ThreadSanitizer.

Add locking around the code in isc__trampoline_attach().  This also
requires the lock to leak on exit (along with memory that we already)
because a new thread might be attaching to the trampoline while we are
running the library destructor at the same time.
2022-05-13 10:07:20 +02:00
Ondřej Surý
0582478c96 Remove isc_task_destroy() and isc_task_shutdown()
After removing the isc_task_onshutdown(), the isc_task_shutdown() and
isc_task_destroy() became obsolete.

Remove calls to isc_task_shutdown() and replace the calls to
isc_task_destroy() with isc_task_detach().

Simplify the internal logic to destroy the task when the last reference
is removed.
2022-05-12 14:55:49 +02:00
Ondřej Surý
2235edabcf Remove isc_task_onshutdown()
The isc_task_onshutdown() was used to post event that should be run when
the task is being shutdown.  This could happen explicitly in the
isc_test_shutdown() call or implicitly when we detach the last reference
to the task and there are no more events posted on the task.

This whole task onshutdown mechanism just makes things more complicated,
and it's easier to post the "shutdown" events when we are shutting down
explicitly and the existing code already always knows when it should
shutdown the task that's being used to execute the onshutdown events.

Replace the isc_task_onshutdown() calls with explicit calls to execute
the shutdown tasks.
2022-05-12 13:45:34 +02:00
Ondřej Surý
a0a102cc50 Restore the implementation of uv_os_getenv() shim
Somewhere in the move from netmgr/uv-compat.h to uv.c, the
uv_os_getenv() implementation was lost in the process.  Restore the
implementation, so we can support Debian stretch for couple more months.
2022-05-04 12:31:46 +02:00
Ondřej Surý
b43812692d Move netmgr/uv-compat.h to <isc/uv.h>
As we are going to use libuv outside of the netmgr, we need the shims to
be readily available for the rest of the codebase.

Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to
uv.c, and as a rule of thumb, the users of libuv should include
<isc/uv.h> instead of <uv.h> directly.

Additionally, merge netmgr/uverr2result.c into uv.c and rename the
single function from isc__nm_uverr2result() to isc_uverr2result().
2022-05-03 10:02:19 +02:00
Ondřej Surý
24c3879675 Move socket related functions to netmgr/socket.c
Move the netmgr socket related functions from netmgr/netmgr.c and
netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in
the same place.  Adjust the names of couple interal functions
accordingly.
2022-05-03 09:52:49 +02:00
Tony Finch
66b3cb9732 Remove several superfluous newlines in log messages 2022-05-02 23:49:38 +01:00
Artem Boldariev
978f97dcdd TLSDNS: call send callbacks after only the data was sent
This commit ensures that write callbacks are getting called only after
the data has been sent via the network.

Without this fix, a situation could appear when a write callback could
get called before the actual encrypted data would have been sent to
the network. Instead, it would get called right after it would have
been passed to the OpenSSL (i.e. encrypted).

Most likely, the issue does not reveal itself often because the
callback call was asynchronous, so in most cases it should have been
called after the data has been sent, but that was not guaranteed by
the code logic.

Also, this commit removes one memory allocation (netievent) from a hot
path, as there is no need to call this callback asynchronously
anymore.
2022-04-27 17:44:23 +03:00
Ondřej Surý
407b37c3f2 Set IP(V6)_RECVERR on connect UDP sockets (via libuv)
The connect()ed UDP socket provides feedback on a variety of ICMP
errors (eg port unreachable) which bind can then use to decide what to
do with errors (report them to the client, try again with a different
nameserver etc).  However, Linux's implementation does not report what
it considers "transient" conditions, which is defined as Destination
host Unreachable, Destination network unreachable, Source Route Failed
and Message Too Big.

Explicitly enable IP_RECVERR / IPV6_RECVERR (via libuv uv_udp_bind()
flag) to learn about ICMP destination network/host unreachable.
2022-04-26 12:22:18 +02:00
Ondřej Surý
eb8f2974b1 Abort when libuv at runtime mismatches libuv at compile time
When we compile with libuv that has some capabilities via flags passed
to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would
fail with invalid arguments when older libuv version is linked at the
runtime that doesn't understand the flag that was available at the
compile time.

Enforce minimal libuv version when flags have been available at the
compile time, but are not available at the runtime.  This check is less
strict than enforcing the runtime libuv version to be same or higher
than compile time libuv version.
2022-04-26 11:40:40 +02:00
Tony Finch
b2950c96de Revert "Move random number re-seeding out of the hot path"
This reverts commit b1bb41603e.
2022-04-25 15:18:58 +01:00
Tony Finch
b1bb41603e Move random number re-seeding out of the hot path
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
2022-04-22 16:40:37 +01:00
Tony Finch
254d2abafb Clean up isc_random
Remove redundant comments and avoid implicit casts.
2022-04-22 16:40:37 +01:00
Tony Finch
d20ea4a703 Make isc_random_uniform() nearly divisionless
It used to require two 32-bit integer divisions to get a random number
less than some limit. Now we use Daniel Lemire's "nearly-divisionless"
algorithm for unbiased bounded random numbers, which requires one
64-bit integer multiply in the usual case, and one 32-bit integer
division in rare slow cases. Even the slow cases are faster than
before; there are also fewer branches.

I think this algorithm is exceptionally beautiful. It also has more
clever tricks than lines of code, so I have done my best to explain
how it works.
2022-04-22 16:40:37 +01:00
Michał Kępień
7aa7b6474b Prevent memory bloat caused by a jemalloc quirk
Since version 5.0.0, decay-based purging is the only available dirty
page cleanup mechanism in jemalloc.  It relies on so-called tickers,
which are simple data structures used for ensuring that certain actions
are taken "once every N times".  Ticker data (state) is stored in a
thread-specific data structure called tsd in jemalloc parlance.  Ticks
are triggered when extents are allocated and deallocated.  Once every
1000 ticks, jemalloc attempts to release some of the dirty pages hanging
around (if any).  This allows memory use to be kept in check over time.

This dirty page cleanup mechanism has a quirk.  If the first
allocator-related action for a given thread is a free(), a
minimally-initialized tsd is set up which does not include ticker data.
When that thread subsequently calls *alloc(), the tsd transitions to its
nominal state, but due to a certain flag being set during minimal tsd
initialization, ticker data remains unallocated.  This prevents
decay-based dirty page purging from working, effectively enabling memory
exhaustion over time. [1]

The quirk described above has been addressed (by moving ticker state to
a different structure) in jemalloc's development branch [2], but not in
any numbered jemalloc version released to date (the latest one being
5.2.1 as of this writing).

Work around the problem by ensuring that every thread spawned by
isc_thread_create() starts with a malloc() call.  Avoid immediately
calling free() for the dummy allocation to prevent an optimizing
compiler from stripping away the malloc() + free() pair altogether.

An alternative implementation of this workaround was considered that
used a pair of isc_mem_create() + isc_mem_destroy() calls instead of
malloc() + free(), enabling the change to be fully contained within
isc__trampoline_run() (i.e. to not touch struct isc__trampoline), as the
compiler is not allowed to strip away arbitrary function calls.
However, that solution was eventually dismissed as it triggered
ThreadSanitizer reports when tools like dig, nsupdate, or rndc exited
abruptly without waiting for all worker threads to finish their work.

[1] https://github.com/jemalloc/jemalloc/issues/2251
[2] c259323ab3
2022-04-21 14:19:39 +02:00
Ondřej Surý
d1d88a2895 Add detailed tracing when TASKMGR_TRACE is defined
When TASKMGR_TRACE=1 is defined, the task and event objects have
detailed tracing information about function, file, line, and
backtrace (to the extent tracked by gcc) where it was created.

At exit, when there are unfinished tasks, they will be printed along
with the detailed information.
2022-04-19 14:25:23 +02:00
Ondřej Surý
f0feaa3305 Remove isc_task_sendto(anddetach) functions
The only place where isc_task_sendto() was used was in dns_resolver
unit, where the "sendto" part was actually no-op, because dns_resolver
uses bound tasks.  Remove the isc_task_sendto() and
isc_task_sendtoanddetach() functions in favor of using bound tasks
create with isc_task_create_bound().

Additionally, cache the number of running netmgr threads (nworkers)
locally to reduce the number of function calls.
2022-04-19 14:24:36 +02:00
Ondřej Surý
1eeb4c1121 Remove isc_event_constallocate()
The isc_event_constallocate() function was not used anywhere, thus
remove the isc_event_constallocate() macro, declaration and definition.
2022-04-19 13:46:26 +02:00
Ondřej Surý
f55a4d3e55 Allow listening on less than nworkers threads
For some applications, it's useful to not listen on full battery of
threads.  Add workers argument to all isc_nm_listen*() functions and
convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.
2022-04-19 11:08:13 +02:00
Artem Boldariev
df317184eb Add isc_nmsocket_set_tlsctx()
This commit adds isc_nmsocket_set_tlsctx() - an asynchronous function
that replaces the TLS context within a given TLS-enabled listener
socket object. It is based on the newly added reference counting
functionality.

The intention of adding this function is to add functionality to
replace a TLS context without recreating the whole socket object,
including the underlying TCP listener socket, as a BIND process might
not have enough permissions to re-create it fully on reconfiguration.
2022-04-06 18:45:57 +03:00
Artem Boldariev
25609156a5 Maintain a per-thread TLS ctx reference in TLS stream code
This commit changes the generic TLS stream code to maintain a
per-worker thread TLS context reference.
2022-04-06 18:45:57 +03:00
Artem Boldariev
9256026d18 Use isc_tlsctx_attach() in TLS DNS code
This commit adds proper reference counting for TLS contexts into
generic TLS DNS (DoT) code.
2022-04-06 18:45:57 +03:00
Artem Boldariev
b52d46612f Use isc_tlsctx_attach() in TLS stream code
This commit adds proper reference counting for TLS contexts into
generic TLS stream code.
2022-04-06 18:45:57 +03:00
Artem Boldariev
a7a482c1b1 Add isc_tlsctx_attach()
The implementation is done on top of the reference counting
functionality found in OpenSSL/LibreSSL, which allows for avoiding
wrapping the object.

Adding this function allows using reference counting for TLS contexts
in BIND 9's codebase.
2022-04-06 18:45:57 +03:00
Mark Andrews
98718b3b4b Unlink the timer event before trying to purge it
as far as I can determine the order of operations is not important.

    *** CID 351372:  Concurrent data access violations  (ATOMICITY)
    /lib/isc/timer.c: 227 in timer_purge()
    221     		LOCK(&timer->lock);
    222     		if (!purged) {
    223     			/*
    224     			 * The event has already been executed, but not
    225     			 * yet destroyed.
    226     			 */
    >>>     CID 351372:  Concurrent data access violations  (ATOMICITY)
    >>>     Using an unreliable value of "event" inside the second locked section. If the data that "event" depends on was changed by another thread, this use might be incorrect.
    227     			timerevent_unlink(timer, event);
    228     		}
    229     	}
    230     }
    231
    232     void
2022-04-06 07:33:41 +00:00
Artem Boldariev
f0ac4c47b0 Change X509_STORE_up_ref() shim return value
X509_STORE_up_ref() must return 1 on success, while the previous
implementation would return the references count. This commit fixes
that.
2022-04-05 15:03:27 +03:00
Ondřej Surý
7868d8145b Rename shutdown() to test_shutdown() in timer_test.c
The shutdown() is part of standard library (POSIX-1), don't use such
name in the timer_test.c, but rather rename it to test_shutdown().
2022-04-05 01:49:04 +02:00
Ondřej Surý
142c63dda8 Enable the load-balance-sockets configuration
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.

Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
2022-04-05 01:30:58 +02:00
Ondřej Surý
85c6e797aa Add option to configure load balance sockets
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).

It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy.  In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.

Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
2022-04-04 23:10:04 +02:00
Ondřej Surý
f106d0ed2b Run the RPZ update as offloaded work
Previously, the RPZ updates ran quantized on the main nm_worker loops.
As the quantum was set to 1024, this might lead to service
interruptions when large RPZ update was processed.

Change the RPZ update process to run as the offloaded work.  The update
and cleanup loops were refactored to do as little locking of the
maintenance lock as possible for the shortest periods of time and the db
iterator is being paused for every iteration, so we don't hold the rbtdb
tree lock for prolonged periods of time.
2022-04-04 21:20:05 +02:00
Ondřej Surý
ae01ec2823 Don't use reference counting in isc_timer unit
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions.  The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself.  This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference.  Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.

Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.

The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
2022-04-02 01:23:15 +02:00
Ondřej Surý
30e0fd942b Remove task privileged mode
Previously, the task privileged mode has been used only when the named
was starting up and loading the zones from the disk as the "first" thing
to do.  The privileged task was setup with quantum == 2, which made the
taskmgr/netmgr spin around the privileged queue processing two events at
the time.

The same effect can be achieved by setting the quantum to UINT_MAX (e.g.
practically unlimited) for the loadzone task, hence the privileged task
mode was removed in favor of just processing all the events on the
loadzone task in a single task_run().
2022-04-01 23:55:26 +02:00
Ondřej Surý
62a72211aa Remove isc_pool API
Since the last user of the isc_pool API is gone, remove the whole
isc_pool API.
2022-04-01 23:50:34 +02:00
Ondřej Surý
2bc7303af2 Use isc_nm_getnworkers to manage zone resources
Instead of passing the number of worker to the dns_zonemgr manually,
get the number of nm threads using the new isc_nm_getnworkers() call.

Additionally, remove the isc_pool API and manage the array of memory
context, zonetasks and loadtasks directly in the zonemgr.
2022-04-01 23:50:34 +02:00
Ondřej Surý
2707d0eeb7 Set hard thread affinity for each zone
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.

Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
2022-04-01 23:50:34 +02:00
Ondřej Surý
a94678ff77 Create per-thread task and memory context for zonemgr
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.

Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.

Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
2022-04-01 23:50:34 +02:00
Ondřej Surý
15ea6f002f Add isc_task_setquantum() and use it for post-init zone loading
Add isc_task_setquantum() function that modifies quantum for the future
isc_task_run() invocations.

NOTE: The current isc_task_run() caches the task->quantum into a local
variable and therefore the current event loop is not affected by any
quantum change.
2022-04-01 23:45:23 +02:00
Ondřej Surý
c17eee034b Remove isc_task_purge() and isc_task_purgerange()
The isc_task_purge() and isc_task_purgerange() were now unused, so sweep
the task.c file.  Additionally remove unused ISC_EVENTATTR_NOPURGE event
attribute.
2022-04-01 23:45:23 +02:00
Ondřej Surý
48b2a5df97 Keep the list of scheduled events on the timer
Instead of searching for the events to purge, keep the list of scheduled
events on the timer list and purge the events that we have scheduled.
2022-04-01 23:45:23 +02:00
Ondřej Surý
17aed2f895 Repair isc_task_purgeevent(), clean isc_task_unsend{,range}()
The isc_task_purgerange() was walking through all events on the task to
find a matching task.  Instead use the ISC_LINK_LINKED to find whether
the event is active.

Cleanup the related isc_task_unsend() and isc_task_unsendrange()
functions that were not used anywhere.
2022-04-01 23:45:23 +02:00
Ondřej Surý
b84c9b2608 Turn isc_hash_bits32() into static online function
Adding extra val & 0xffff in the isc_hash_bits32() macros in the hotpath
has significantly reduced the performance.  Turn the macro into static
inline function matching the previous hash_32() function used to compute
hashval matching the hashtable->bits.
2022-04-01 23:04:24 +02:00
Artem Boldariev
3edf7a9fe7 Implement shim for SSL_CTX_set1_cert_store() (affects Debian 9)
This commit implements a shim for SSL_CTX_set1_cert_store() for
OpenSSL/LibreSSL versions where it is not available.
2022-04-01 16:33:43 +03:00
Ondřej Surý
b05a991ad0 Make isc_ht optionally case insensitive
Previously, the isc_ht API would always take the key as a literal input
to the hashing function.  Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.
2022-03-28 15:02:18 -07:00
Evan Hunt
e9ef3defa4 consolidate fibonacci hashing in one place
Fibonacci hashing was implemented in four separate places (rbt.c,
rbtdb.c, resolver.c, zone.c). This commit combines them into a single
implementation. The hash_32() function is now replaced with
isc_hash_bits32().
2022-03-28 14:44:21 -07:00
Ondřej Surý
4dceab142d Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE()
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE().  Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
2022-03-28 23:26:08 +02:00
Artem Boldariev
783663db80 Add ISC_R_TLSBADPEERCERT error code to the TLS related code
This commit adds support for ISC_R_TLSBADPEERCERT error code, which is
supposed to be used to signal for TLS peer certificates verification
in dig and other code.

The support for this error code is added to our TLS and TLS DNS
implementations.

This commit also adds isc_nm_verify_tls_peer_result_string() function
which is supposed to be used to get a textual description of the
reason for getting a ISC_R_TLSBADPEERCERT error.
2022-03-28 15:32:30 +03:00
Artem Boldariev
71cf8fa5ac Extend TLS context cache with CA certificates store
This commit adds support for keeping CA certificates stores associated
with TLS contexts. The intention is to keep one reusable store per a
set of related TLS contexts.
2022-03-28 15:31:22 +03:00
Artem Boldariev
c49a81e27d Add foundational functions to implement Strict/Mutual TLS
This commit adds a set of functions that can be used to implement
Strict and Mutual TLS:

* isc_tlsctx_load_client_ca_names();
* isc_tlsctx_load_certificate();
* isc_tls_verify_peer_result_string();
* isc_tlsctx_enable_peer_verification().
2022-03-28 15:31:22 +03:00
Artem Boldariev
32783d36c2 Add utility functions to manipulate X509 certificate stores
This commit adds a set of high-level utility functions to manipulate
the certificate stores. The stores are needed to implement TLS
certificates verification efficiently.
2022-03-28 15:31:22 +03:00
Ondřej Surý
9de10cd153 Remove extrahandle size from netmgr
Previously, it was possible to assign a bit of memory space in the
nmhandle to store the client data.  This was complicated and prevents
further refactoring of isc_nmhandle_t caching (future work).

Instead of caching the data in the nmhandle, allocate the hot-path
ns_client_t objects from per-thread clientmgr memory context and just
assign it to the isc_nmhandle_t via isc_nmhandle_set().
2022-03-25 10:38:35 +01:00