This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested. It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.
NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.
The changes that are included in this commit are listed here
(extensively, but not exclusively):
* The netmgr_test unit test was split into individual tests (udp_test,
tcp_test, tcpdns_test and newly added tcp_quota_test)
* The udp_test and tcp_test has been extended to allow programatic
failures from the libuv API. Unfortunately, we can't use cmocka
mock() and will_return(), so we emulate the behaviour with #define and
including the netmgr/{udp,tcp}.c source file directly.
* The netievents that we put on the nm queue have variable number of
members, out of these the isc_nmsocket_t and isc_nmhandle_t always
needs to be attached before enqueueing the netievent_<foo> and
detached after we have called the isc_nm_async_<foo> to ensure that
the socket (handle) doesn't disappear between scheduling the event and
actually executing the event.
* Cancelling the in-flight TCP connection using libuv requires to call
uv_close() on the original uv_tcp_t handle which just breaks too many
assumptions we have in the netmgr code. Instead of using uv_timer for
TCP connection timeouts, we use platform specific socket option.
* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
waiting for socket to either end up with error (that path was fine) or
to be listening or connected using condition variable and mutex.
Several things could happen:
0. everything is ok
1. the waiting thread would miss the SIGNAL() - because the enqueued
event would be processed faster than we could start WAIT()ing.
In case the operation would end up with error, it would be ok, as
the error variable would be unchanged.
2. the waiting thread miss the sock->{connected,listening} = `true`
would be set to `false` in the tcp_{listen,connect}close_cb() as
the connection would be so short lived that the socket would be
closed before we could even start WAIT()ing
* The tcpdns has been converted to using libuv directly. Previously,
the tcpdns protocol used tcp protocol from netmgr, this proved to be
very complicated to understand, fix and make changes to. The new
tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
Closes: #2194, #2283, #2318, #2266, #2034, #1920
* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
pass accepted TCP sockets between netthreads, but instead (similar to
UDP) uses per netthread uv_loop listener. This greatly reduces the
complexity as the socket is always run in the associated nm and uv
loops, and we are also not touching the libuv internals.
There's an unfortunate side effect though, the new code requires
support for load-balanced sockets from the operating system for both
UDP and TCP (see #2137). If the operating system doesn't support the
load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
on FreeBSD 12+), the number of netthreads is limited to 1.
* The netmgr has now two debugging #ifdefs:
1. Already existing NETMGR_TRACE prints any dangling nmsockets and
nmhandles before triggering assertion failure. This options would
reduce performance when enabled, but in theory, it could be enabled
on low-performance systems.
2. New NETMGR_TRACE_VERBOSE option has been added that enables
extensive netmgr logging that allows the software engineer to
precisely track any attach/detach operations on the nmsockets and
nmhandles. This is not suitable for any kind of production
machine, only for debugging.
* The tlsdns netmgr protocol has been split from the tcpdns and it still
uses the old method of stacking the netmgr boxes on top of each other.
We will have to refactor the tlsdns netmgr protocol to use the same
approach - build the stack using only libuv and openssl.
* Limit but not assert the tcp buffer size in tcp_alloc_cb
Closes: #2061
(cherry picked from commit 634bdfb16d)
When calling the high level netmgr functions, the callback would be
sometimes called synchronously if we catch the failure directly, or
asynchronously if it happens later. The synchronous call to the
callback could create deadlocks as the caller would not expect the
failed callback to be executed directly.
(cherry picked from commit a49d88568f)
This commit adds couple of additional safeguards against running
sends/reads on inactive sockets. The changes was modeled after the
changes we made to netmgr/tcpdns.c
(cherry picked from commit fa424225af)
Parse the configuration of tls objects into SSL_CTX* objects. Listen on
DoT if 'tls' option is setup in listen-on directive. Use DoT/DoH ports
for DoT/DoH.
(cherry picked from commit 38b78f59a0)
- peer address was not being reported correctly by "dig +tls"
- the protocol used is now reported in the dig output: UDP, TCP, or TLS.
(cherry picked from commit 8886569e9d)
Add server-side TLS support to netmgr - that includes moving some of the
isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper
tcp or tls function, and a new isc_nm_listentls() function.
Add DoT support to tcpdns - isc_nm_listentlsdns().
(cherry picked from commit b2ee0e9dc3)
there were two failures during observed in testing, both occurring
when 'rndc halt' was run rather than 'rndc stop' - the latter dumps
zone contents to disk and presumably introduced enough delay to
prevent the races:
- a failure when the zone was shut down and called dns_xfrin_detach()
before the xfrin had finished connecting; the connect timeout
terminated without detaching its handle
- a failure when the tcpdns socket timer fired after the outerhandle
had already been cleared.
this commit incidentally addresses a failure observed in mutexatomic
due to a variable having been initialized incorrectly.
socket() call can return an error - e.g. EMFILE, so we need to handle
this nicely and not crash.
Additionally wrap the socket() call inside a platform independent helper
function as the Socket data type on Windows is unsigned integer:
> This means, for example, that checking for errors when the socket and
> accept functions return should not be done by comparing the return
> value with –1, or seeing if the value is negative (both common and
> legal approaches in UNIX). Instead, an application should use the
> manifest constant INVALID_SOCKET as defined in the Winsock2.h header
> file.
(cherry picked from commit 8af7f81d6c)
Because we use result earlier for setting the loadbalancing on the
socket, we could be left with a ISC_R_NOTIMPLEMENTED value stored in the
variable and when the UDP connection would succeed, we would
errorneously return this value instead of ISC_R_SUCCESS.
(cherry picked from commit 050258bda4)
use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid
of connect_timeout() and related functions in dighost.c.
(cherry picked from commit ea2b04c361)
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
(cherry picked from commit 4be63c5b00)
When we are operating on the tcpdns socket, we need to double check
whether the socket or its outerhandle or its listener or its mgr is
still active and when not, bail out early.
(cherry picked from commit c14c1fdd2c)
If dnslisten_readcb gets a read callback it needs to verify that the
outer socket wasn't closed in the meantime, and issue a CANCELED callback
if it was.
(cherry picked from commit 3ab3d90de0)
When binding a TCP socket, if bind() fails with EADDRINUSE,
try again with REUSEPORT/REUSEADDR (or the equivalent options).
(cherry picked from commit 26a3a22895)
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket. This
commit introduces a .closing flag to guard the socket from
being closed twice.
(cherry picked from commit ed3ab63f74)
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.
(cherry picked from commit 6cfadf9db0)
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
only when it the socket was actively reading.
(cherry picked from commit 5fcd52209a)
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.
(cherry picked from commit cdccac4993)
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
timeout argument to ensure connections time out and are correctly
cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
failure if reading is interrupted by a non-network thread (e.g.
a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
prior to 1.27, when uv_udp_connect() was added
all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
(cherry picked from commit 5dcdc00b93)
The release notes doesn't have to have copyright header, it doesn't add
any value there as the release notes are useless outside the project.
(cherry picked from commit cb30d9892d)
The DNS Flag Day 2020 reduced all the EDNS buffer sizes to 1232. In
this commit, we revert the default value for nocookie-udp-size back to
4096 because the option is too obscure and most people don't realize
that they also need to change this configuration option in addition to
max-udp-size.
(cherry picked from commit 79c196fc77)
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries. Increase its default value from 75 to 100.
(cherry picked from commit ab0bf49203)