Commit Graph

32509 Commits

Author SHA1 Message Date
Ondřej Surý
7b9c8b9781 Refactor netmgr and add more unit tests
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested.  It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.

NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.

The changes that are included in this commit are listed here
(extensively, but not exclusively):

* The netmgr_test unit test was split into individual tests (udp_test,
  tcp_test, tcpdns_test and newly added tcp_quota_test)

* The udp_test and tcp_test has been extended to allow programatic
  failures from the libuv API.  Unfortunately, we can't use cmocka
  mock() and will_return(), so we emulate the behaviour with #define and
  including the netmgr/{udp,tcp}.c source file directly.

* The netievents that we put on the nm queue have variable number of
  members, out of these the isc_nmsocket_t and isc_nmhandle_t always
  needs to be attached before enqueueing the netievent_<foo> and
  detached after we have called the isc_nm_async_<foo> to ensure that
  the socket (handle) doesn't disappear between scheduling the event and
  actually executing the event.

* Cancelling the in-flight TCP connection using libuv requires to call
  uv_close() on the original uv_tcp_t handle which just breaks too many
  assumptions we have in the netmgr code.  Instead of using uv_timer for
  TCP connection timeouts, we use platform specific socket option.

* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}

  When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
  waiting for socket to either end up with error (that path was fine) or
  to be listening or connected using condition variable and mutex.

  Several things could happen:

    0. everything is ok

    1. the waiting thread would miss the SIGNAL() - because the enqueued
       event would be processed faster than we could start WAIT()ing.
       In case the operation would end up with error, it would be ok, as
       the error variable would be unchanged.

    2. the waiting thread miss the sock->{connected,listening} = `true`
       would be set to `false` in the tcp_{listen,connect}close_cb() as
       the connection would be so short lived that the socket would be
       closed before we could even start WAIT()ing

* The tcpdns has been converted to using libuv directly.  Previously,
  the tcpdns protocol used tcp protocol from netmgr, this proved to be
  very complicated to understand, fix and make changes to.  The new
  tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
  Closes: #2194, #2283, #2318, #2266, #2034, #1920

* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
  pass accepted TCP sockets between netthreads, but instead (similar to
  UDP) uses per netthread uv_loop listener.  This greatly reduces the
  complexity as the socket is always run in the associated nm and uv
  loops, and we are also not touching the libuv internals.

  There's an unfortunate side effect though, the new code requires
  support for load-balanced sockets from the operating system for both
  UDP and TCP (see #2137).  If the operating system doesn't support the
  load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
  on FreeBSD 12+), the number of netthreads is limited to 1.

* The netmgr has now two debugging #ifdefs:

  1. Already existing NETMGR_TRACE prints any dangling nmsockets and
     nmhandles before triggering assertion failure.  This options would
     reduce performance when enabled, but in theory, it could be enabled
     on low-performance systems.

  2. New NETMGR_TRACE_VERBOSE option has been added that enables
     extensive netmgr logging that allows the software engineer to
     precisely track any attach/detach operations on the nmsockets and
     nmhandles.  This is not suitable for any kind of production
     machine, only for debugging.

* The tlsdns netmgr protocol has been split from the tcpdns and it still
  uses the old method of stacking the netmgr boxes on top of each other.
  We will have to refactor the tlsdns netmgr protocol to use the same
  approach - build the stack using only libuv and openssl.

* Limit but not assert the tcp buffer size in tcp_alloc_cb
  Closes: #2061

(cherry picked from commit 634bdfb16d)
2020-12-09 10:46:16 +01:00
Ondřej Surý
fa9ca83862 Turn all the callback to be always asynchronous
When calling the high level netmgr functions, the callback would be
sometimes called synchronously if we catch the failure directly, or
asynchronously if it happens later.  The synchronous call to the
callback could create deadlocks as the caller would not expect the
failed callback to be executed directly.

(cherry picked from commit a49d88568f)
2020-12-09 10:46:16 +01:00
Ondřej Surý
bcc9ad98ea netmgr: Add additional safeguards to netmgr/tls.c
This commit adds couple of additional safeguards against running
sends/reads on inactive sockets.  The changes was modeled after the
changes we made to netmgr/tcpdns.c

(cherry picked from commit fa424225af)
2020-12-09 10:46:16 +01:00
Witold Kręcicki
b83dff0585 isc_nm_tls_create_server_ctx can create ephemeral certs
In-memory ephemeral certs creation for easy DoT/DoH deployment.

(cherry picked from commit 3c00fb71db)
2020-12-09 10:46:16 +01:00
Witold Kręcicki
d7fa046a69 Add DoT support to bind
Parse the configuration of tls objects into SSL_CTX* objects.  Listen on
DoT if 'tls' option is setup in listen-on directive.  Use DoT/DoH ports
for DoT/DoH.

(cherry picked from commit 38b78f59a0)
2020-12-09 10:46:16 +01:00
Evan Hunt
0f5fff5c1e report peer address in TLS mode, and specify protocol
- peer address was not being reported correctly by "dig +tls"
- the protocol used is now reported in the dig output: UDP, TCP, or TLS.

(cherry picked from commit 8886569e9d)
2020-12-09 10:46:16 +01:00
Witold Kręcicki
4a854da141 netmgr: server-side TLS support
Add server-side TLS support to netmgr - that includes moving some of the
isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper
tcp or tls function, and a new isc_nm_listentls() function.

Add DoT support to tcpdns - isc_nm_listentlsdns().

(cherry picked from commit b2ee0e9dc3)
2020-12-09 10:46:16 +01:00
Evan Hunt
6f6f0e26ab address some possible shutdown races in xfrin
there were two failures during observed in testing, both occurring
when 'rndc halt' was run rather than 'rndc stop' - the latter dumps
zone contents to disk and presumably introduced enough delay to
prevent the races:

- a failure when the zone was shut down and called dns_xfrin_detach()
  before the xfrin had finished connecting; the connect timeout
  terminated without detaching its handle
- a failure when the tcpdns socket timer fired after the outerhandle
  had already been cleared.

this commit incidentally addresses a failure observed in mutexatomic
due to a variable having been initialized incorrectly.
2020-12-09 10:46:16 +01:00
Ondřej Surý
c4dcedd2dc netmgr: Don't crash if socket() returns an error in udpconnect
socket() call can return an error - e.g. EMFILE, so we need to handle
this nicely and not crash.

Additionally wrap the socket() call inside a platform independent helper
function as the Socket data type on Windows is unsigned integer:

> This means, for example, that checking for errors when the socket and
> accept functions return should not be done by comparing the return
> value with –1, or seeing if the value is negative (both common and
> legal approaches in UNIX). Instead, an application should use the
> manifest constant INVALID_SOCKET as defined in the Winsock2.h header
> file.

(cherry picked from commit 8af7f81d6c)
2020-12-09 10:46:16 +01:00
Ondřej Surý
21daa258a2 netmgr: Always load the result from async socket
Because we use result earlier for setting the loadbalancing on the
socket, we could be left with a ISC_R_NOTIMPLEMENTED value stored in the
variable and when the UDP connection would succeed, we would
errorneously return this value instead of ISC_R_SUCCESS.

(cherry picked from commit 050258bda4)
2020-12-09 10:46:16 +01:00
Evan Hunt
70e08cab6b dig: use new netmgr timeout mechanism
use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid
of connect_timeout() and related functions in dighost.c.

(cherry picked from commit ea2b04c361)
2020-12-09 10:46:16 +01:00
Evan Hunt
4598d7b30d add isc_nmhandle_settimeout() function
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.

(cherry picked from commit 4be63c5b00)
2020-12-09 10:46:16 +01:00
Ondřej Surý
5877befb51 fix nmhandle attach/detach errors in tcpdnsconnect_cb()
we need to attach to the statichandle when connecting TCPDNS sockets,
same as with UDP.

(cherry picked from commit 2191d2bf44)
2020-12-09 10:46:16 +01:00
Mark Andrews
574e0d9f6e Incorrect result code passed to failed_connect_cb
*** CID 312970:  Incorrect expression  (COPY_PASTE_ERROR) /lib/isc/netmgr/tcp.c: 282 in tcp_connect_cb()
    276     	}
    277
    278     	isc__nm_incstats(sock->mgr, sock->statsindex[STATID_CONNECT]);
    279     	r = uv_tcp_getpeername(&sock->uv_handle.tcp, (struct sockaddr *)&ss,
    280     			       &(int){ sizeof(ss) });
    281     	if (r != 0) {
    >>>     CID 312970:  Incorrect expression  (COPY_PASTE_ERROR)
    >>>     "status" in "isc___nm_uverr2result(status, true, "netmgr/tcp.c", 282U)" looks like a copy-paste error.
    282     		failed_connect_cb(sock, req, isc__nm_uverr2result(status));
    283     		return;
    284     	}
    285
    286     	atomic_store(&sock->connecting, false);
    287

(cherry picked from commit 0073cb7356)
2020-12-09 10:46:16 +01:00
Ondřej Surý
268e111546 Put up additional safe guards to not use inactive/closed tcpdns socket
When we are operating on the tcpdns socket, we need to double check
whether the socket or its outerhandle or its listener or its mgr is
still active and when not, bail out early.

(cherry picked from commit c14c1fdd2c)
2020-12-09 10:46:16 +01:00
Witold Kręcicki
fb19091a32 Fix improper closed connection handling in tcpdns.
If dnslisten_readcb gets a read callback it needs to verify that the
outer socket wasn't closed in the meantime, and issue a CANCELED callback
if it was.

(cherry picked from commit 3ab3d90de0)
2020-12-09 10:46:16 +01:00
Evan Hunt
80de62645c check return value from uv_tcp_getpeername() when connecting
if we can't determine the peer, the connect should fail.

(cherry picked from commit 8fcad58ea6)
2020-12-09 10:46:16 +01:00
Evan Hunt
12b1ae64ff set REUSEPORT and REUSEADDR on TCP sockets if needed
When binding a TCP socket, if bind() fails with EADDRINUSE,
try again with REUSEPORT/REUSEADDR (or the equivalent options).

(cherry picked from commit 26a3a22895)
2020-12-09 10:46:16 +01:00
Ondřej Surý
e35b8db249 Fix more races between connect and shutdown
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket.  This
commit introduces a .closing flag to guard the socket from
being closed twice.

(cherry picked from commit ed3ab63f74)
2020-12-09 10:46:16 +01:00
Ondřej Surý
d8c3e48970 Fix a race between isc__nm_async_shutdown() and new sends/reads
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.

(cherry picked from commit 6cfadf9db0)
2020-12-09 10:46:16 +01:00
Ondřej Surý
c4816ce34f Refactor udp_recv_cb()
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
  only when it the socket was actively reading.

(cherry picked from commit 5fcd52209a)
2020-12-09 10:46:16 +01:00
Ondřej Surý
7945fb0c90 Fix netmgr read/connect timeout issues
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.

(cherry picked from commit cdccac4993)
2020-12-09 10:46:16 +01:00
Ondřej Surý
e9354e7bfe Add isc__nm_udp_shutdown() function
This function will be called during isc_nm_closedown() to ensure
that all UDP sockets are closed and detached.

(cherry picked from commit 7a6056bc8f)
2020-12-09 10:46:16 +01:00
Evan Hunt
c919a3338f add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
  timeout argument to ensure connections time out and are correctly
  cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
  stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
  failure if reading is interrupted by a non-network thread (e.g.
  a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
  prior to 1.27, when uv_udp_connect() was added

all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.

(cherry picked from commit 5dcdc00b93)
2020-12-09 10:46:16 +01:00
Tinderbox User
7406ea925a prep 9.16.10 2020-12-09 10:46:16 +01:00
Michał Kępień
a01961260d Prepare release notes for BIND 9.16.10 2020-12-09 10:46:16 +01:00
Michał Kępień
2ef1784b85 Reorder release notes 2020-12-09 10:45:49 +01:00
Michał Kępień
3f6f0b9f66 Tweak and reword release notes 2020-12-09 10:45:49 +01:00
Michał Kępień
9f270783ac Tweak and reword recent CHANGES entries 2020-12-09 10:45:49 +01:00
Michał Kępień
d902dc611f Fix formatting of "dnssec-policy" documentation 2020-12-09 10:45:49 +01:00
Michal Nowak
19197034fb Miscellaneous minor documentation updates 2020-12-09 10:45:49 +01:00
Ondřej Surý
645701afb0 Merge branch 'ondrej/release-notes-doesnt-need-copyright-v9_16' into 'v9_16'
Remove the requirement for the release notes to have copyright

See merge request isc-projects/bind9!4484
2020-12-09 09:51:20 +00:00
Ondřej Surý
fcfb3e77bb Remove the requirement for the release notes to have copyright
The release notes doesn't have to have copyright header, it doesn't add
any value there as the release notes are useless outside the project.

(cherry picked from commit cb30d9892d)
2020-12-09 10:50:15 +01:00
Ondřej Surý
2e879edb8c Merge branch 'ondrej/clang-format-11-v9_16' into 'v9_16'
Bump the clang version to 11 (v9.16)

See merge request isc-projects/bind9!4480
2020-12-08 19:14:15 +00:00
Ondřej Surý
908f167a5d Bump the clang version to 11 (stable)
(cherry picked from commit c1eb385fdf)
2020-12-08 19:34:12 +01:00
Ondřej Surý
a35a666a7c Reformat sources using clang-format-11
(cherry picked from commit 7ba18870dc)
2020-12-08 19:34:05 +01:00
Ondřej Surý
504969cb63 Explicitly configure new clang-format-11 options
(cherry picked from commit 6c28834354)
2020-12-08 19:30:55 +01:00
Ondřej Surý
dc548b2e83 Merge branch '2250-dns-flag-day-2020-revert-nocookie-udp-size-v9_16' into 'v9_16'
Resolve "DNS Flag Day 2020 - EDNS buffer size configuring does not work anymore"

See merge request isc-projects/bind9!4456
2020-12-02 15:33:06 +00:00
Ondřej Surý
9d35c9b96d Add CHANGES and release not for GL #2250
(cherry picked from commit c7d81f12f8)
2020-12-02 12:02:10 +01:00
Ondřej Surý
5d34daaf78 Change the default value for nocookie-udp-size back to 4096
The DNS Flag Day 2020 reduced all the EDNS buffer sizes to 1232.  In
this commit, we revert the default value for nocookie-udp-size back to
4096 because the option is too obscure and most people don't realize
that they also need to change this configuration option in addition to
max-udp-size.

(cherry picked from commit 79c196fc77)
2020-12-02 12:01:50 +01:00
Mark Andrews
0d3ba105bb Merge branch '2305-adjust-recursion-limits-v9_16' into 'v9_16'
Adjust default value of "max-recursion-queries"

See merge request isc-projects/bind9!4447
2020-12-01 14:13:40 +00:00
Mark Andrews
5c10b5a4e8 Adjust default value of "max-recursion-queries"
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries.  Increase its default value from 75 to 100.

(cherry picked from commit ab0bf49203)
2020-12-02 00:53:49 +11:00
Mark Andrews
c4178b7d8d Merge branch '2315-bind-9-11-22-9-11-25-fails-to-build-for-aep-hsm-native-pkcs11-v9_16' into 'v9_16'
Resolve "BIND 9.11.22 - 9.11.25 fails to build for AEP HSM native pkcs11"

See merge request isc-projects/bind9!4445
2020-12-01 13:50:49 +00:00
Mark Andrews
45719ff249 Add release note for [GL #2315]
(cherry picked from commit 356243aaec)
2020-12-01 23:29:43 +11:00
Mark Andrews
a07754cf69 Add CHANGES
(cherry picked from commit 11a3545e32)
2020-12-01 23:19:20 +11:00
Mark Andrews
4926888306 Fix misplaced declaration
(cherry picked from commit 49b9219bb3)
2020-12-01 23:19:20 +11:00
Michal Nowak
d765d024ea Merge branch '2274-drop-centos-6-support-after-november-30-2020' into 'v9_16'
Remove CentOS 6 from GitLab CI

See merge request isc-projects/bind9!4392
2020-11-30 13:29:02 +00:00
Michal Nowak
b908cc9c79 Remove CentOS 6 from GitLab CI
CentOS 6 reaches EOL on November 30, 2020 and will not be officially
supported by the CentOS project.
2020-11-27 13:29:38 +01:00
Mark Andrews
fe07671c9b Merge branch '2275-tighten-dns-cookie-response-handling-v9_16' into 'v9_16'
Resolve "Tighten DNS COOKIE response handling"

See merge request isc-projects/bind9!4438
2020-11-26 22:25:42 +00:00
Mark Andrews
e98edb871d Add release note for [GL #2275]
(cherry picked from commit d0dd71380b)
2020-11-27 08:44:00 +11:00