Commit Graph

3960 Commits

Author SHA1 Message Date
Evan Hunt
ea2b04c361 dig: use new netmgr timeout mechanism
use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid
of connect_timeout() and related functions in dighost.c.
2020-11-07 20:49:53 +01:00
Evan Hunt
4be63c5b00 add isc_nmhandle_settimeout() function
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
2020-11-07 20:49:53 +01:00
Ondřej Surý
2191d2bf44 fix nmhandle attach/detach errors in tcpdnsconnect_cb()
we need to attach to the statichandle when connecting TCPDNS sockets,
same as with UDP.
2020-11-07 20:49:53 +01:00
Mark Andrews
0073cb7356 Incorrect result code passed to failed_connect_cb
*** CID 312970:  Incorrect expression  (COPY_PASTE_ERROR) /lib/isc/netmgr/tcp.c: 282 in tcp_connect_cb()
    276     	}
    277
    278     	isc__nm_incstats(sock->mgr, sock->statsindex[STATID_CONNECT]);
    279     	r = uv_tcp_getpeername(&sock->uv_handle.tcp, (struct sockaddr *)&ss,
    280     			       &(int){ sizeof(ss) });
    281     	if (r != 0) {
    >>>     CID 312970:  Incorrect expression  (COPY_PASTE_ERROR)
    >>>     "status" in "isc___nm_uverr2result(status, true, "netmgr/tcp.c", 282U)" looks like a copy-paste error.
    282     		failed_connect_cb(sock, req, isc__nm_uverr2result(status));
    283     		return;
    284     	}
    285
    286     	atomic_store(&sock->connecting, false);
    287
2020-11-04 21:58:05 +00:00
Ondřej Surý
c14c1fdd2c Put up additional safe guards to not use inactive/closed tcpdns socket
When we are operating on the tcpdns socket, we need to double check
whether the socket or its outerhandle or its listener or its mgr is
still active and when not, bail out early.
2020-11-02 20:58:00 +01:00
Witold Kręcicki
3ab3d90de0 Fix improper closed connection handling in tcpdns.
If dnslisten_readcb gets a read callback it needs to verify that the
outer socket wasn't closed in the meantime, and issue a CANCELED callback
if it was.
2020-11-02 15:10:28 +01:00
Evan Hunt
8fcad58ea6 check return value from uv_tcp_getpeername() when connecting
if we can't determine the peer, the connect should fail.
2020-10-30 11:11:54 +01:00
Ondřej Surý
14f54d13dc add a netmgr unit test
tests of UDP and TCP cases including:
- sending and receiving
- closure sockets without reading or sending
- closure of sockets at various points while sending and receiving
- since the teste is multithreaded, cmocka now aborts tests on the
  first failure, so that failures in subthreads are caught and
  reported correctly.
2020-10-30 11:11:54 +01:00
Evan Hunt
26a3a22895 set REUSEPORT and REUSEADDR on TCP sockets if needed
When binding a TCP socket, if bind() fails with EADDRINUSE,
try again with REUSEPORT/REUSEADDR (or the equivalent options).
2020-10-30 11:11:54 +01:00
Ondřej Surý
ed3ab63f74 Fix more races between connect and shutdown
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket.  This
commit introduces a .closing flag to guard the socket from
being closed twice.
2020-10-30 11:11:54 +01:00
Ondřej Surý
6cfadf9db0 Fix a race between isc__nm_async_shutdown() and new sends/reads
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.
2020-10-30 11:11:54 +01:00
Ondřej Surý
5fcd52209a Refactor udp_recv_cb()
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
  only when it the socket was actively reading.
2020-10-30 11:11:54 +01:00
Ondřej Surý
cdccac4993 Fix netmgr read/connect timeout issues
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.
2020-10-30 11:11:54 +01:00
Ondřej Surý
7a6056bc8f Add isc__nm_udp_shutdown() function
This function will be called during isc_nm_closedown() to ensure
that all UDP sockets are closed and detached.
2020-10-30 11:11:54 +01:00
Evan Hunt
5dcdc00b93 add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
  timeout argument to ensure connections time out and are correctly
  cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
  stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
  failure if reading is interrupted by a non-network thread (e.g.
  a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
  prior to 1.27, when uv_udp_connect() was added

all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
2020-10-30 11:11:54 +01:00
Witold Kręcicki
c41ce8e0c9 Properly handle outer TCP connection closed in TCPDNS.
If the connection is closed while we're processing the request
we might access TCPDNS outerhandle which is already reset. Check
for this condition and call the callback with ISC_R_CANCELED result.
2020-10-29 12:32:25 +01:00
Ondřej Surý
37b9511ce1 Use libuv's shared library handling capabilities
While libltdl is a feature-rich library, BIND 9 code only uses its basic
capabilities, which are also provided by libuv and which BIND 9 already
uses for other purposes.  As libuv's cross-platform shared library
handling interface is modeled after the POSIX dlopen() interface,
converting code using the latter to the former is simple.  Replace
libltdl function calls with their libuv counterparts, refactoring the
code as necessary.  Remove all use of libltdl from the BIND 9 source
tree.
2020-10-28 15:48:58 +01:00
Ondřej Surý
8797e5efd5 Fix the data race when read-writing sock->active by using cmpxchg 2020-10-22 11:46:58 -07:00
Ondřej Surý
5ef71c420f Ignore and don't log ISC_R_NOTCONNECTED from uv_accept()
When client disconnects before the connection can be accepted, the named
would log a spurious log message:

    error: Accepting TCP connection failed: socket is not connected

We now ignore the ISC_R_NOTCONNECTED result code and log only other
errors
2020-10-22 11:37:16 -07:00
Ondřej Surý
f7c82e406e Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
   whether the socket was still alive and scheduling reads/sends on
   closed socket.

2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
   changed to always return the error conditions via the callbacks, so
   they always succeed.  This applies to all protocols (UDP, TCP and
   TCPDNS).
2020-10-22 11:37:16 -07:00
Ondřej Surý
6af08d1ca6 Fix the way tcp_send_direct() is used
There were two problems how tcp_send_direct() was used:

1. The tcp_send_direct() can return ISC_R_CANCELED (or translated error
   from uv_tcp_send()), but the isc__nm_async_tcpsend() wasn't checking
   the error code and not releasing the uvreq in case of an error.

2. In isc__nm_tcp_send(), when the TCP send is already in the right
   netthread, it uses tcp_send_direct() to send the TCP packet right
   away.  When that happened the uvreq was not freed, and the error code
   was returned to the caller.  We need to return ISC_R_SUCCESS and
   rather use the callback to report an error in such case.
2020-10-22 11:37:16 -07:00
Ondřej Surý
d72bc3eb52 Detach the sock->server in uv_close() callback, not before 2020-10-22 11:37:16 -07:00
Ondřej Surý
97b33e5bde Explicitly stop reading before closing the nmtcpsocket
When closing the socket that is actively reading from the stream, the
read_cb() could be called between uv_close() and close callback when the
server socket has been already detached hence using sock->statichandle
after it has been already freed.
2020-10-22 11:37:16 -07:00
Witold Kręcicki
ff0a336d52 Proper handling of socket references in case of TCP conn failure. 2020-10-22 11:37:16 -07:00
Witold Kręcicki
ae9a6befa8 Don't crash if isc_uv_export returns an error in accept_connection.
isc_uv_export can return an error - e.g. EMFILE (from dup), handle this
nicely.
2020-10-22 11:37:16 -07:00
Ondřej Surý
afca2e3b21 Fix the way udp_send_direct() is used
There were two problems how udp_send_direct() was used:

1. The udp_send_direct() can return ISC_R_CANCELED (or translated error
   from uv_udp_send()), but the isc__nm_async_udpsend() wasn't checking
   the error code and not releasing the uvreq in case of an error.

2. In isc__nm_udp_send(), when the UDP send is already in the right
   netthread, it uses udp_send_direct() to send the UDP packet right
   away.  When that happened the uvreq was not freed, and the error code
   was returned to the caller.  We need to return ISC_R_SUCCESS and
   rather use the callback to report an error in such case.
2020-10-22 11:37:16 -07:00
Michal Nowak
7ef268bb4b Drop unused bufferlist code 2020-10-22 13:11:16 +02:00
Michał Kępień
9014ff0cc6 Update library API versions 2020-10-22 08:54:32 +02:00
Matthijs Mekking
6c5ff94218 Don't increment network error stats on UV_EOF
When networking statistics was added to the netmgr (in commit
5234a8e00a), two lines were added that
increment the 'STATID_RECVFAIL' statistic: One if 'uv_read_start'
fails and one at the end of the 'read_cb'.  The latter happens
if 'nread < 0'.

According to the libuv documentation, I/O read callbacks (such as for
files and sockets) are passed a parameter 'nread'. If 'nread' is less
than 0, there was an error and 'UV_EOF' is the end of file error, which
you may want to handle differently.

In other words, we should not treat EOF as a RECVFAIL error.
2020-10-20 10:57:16 +02:00
Mark Andrews
f95ba8aa20 Complete the isc_nmhandle_detach() in the worker thread.
isc_nmhandle_detach() needs to complete in the same thread
as shutdown_walk_cb() to avoid a race.  Clear the caller's
pointer then pass control to the worker if necessary.

    WARNING: ThreadSanitizer: data race
    Write of size 8 at 0x000000000001 by thread T1:
    #0 isc_nmhandle_detach lib/isc/netmgr/netmgr.c:1258:15
    #1 control_command bin/named/controlconf.c:388:3
    #2 dispatch lib/isc/task.c:1152:7
    #3 run lib/isc/task.c:1344:2

    Previous read of size 8 at 0x000000000001 by thread T2:
    #0 isc_nm_pauseread lib/isc/netmgr/netmgr.c:1449:33
    #1 recv_data lib/isccc/ccmsg.c:109:2
    #2 isc__nm_tcp_shutdown lib/isc/netmgr/tcp.c:1157:4
    #3 shutdown_walk_cb lib/isc/netmgr/netmgr.c:1515:3
    #4 uv_walk <null>
    #5 process_queue lib/isc/netmgr/netmgr.c:659:4
    #6 process_normal_queue lib/isc/netmgr/netmgr.c:582:10
    #7 process_queues lib/isc/netmgr/netmgr.c:590:8
    #8 async_cb lib/isc/netmgr/netmgr.c:548:2
    #9 <null> <null>
2020-10-12 17:51:09 +11:00
Ondřej Surý
96ac91a18a Clean the last remnant of ISC_PLATFORM_HAVEIPV6 macro
In set_sndbuf() we were using ISC_PLATFORM_HAVEIPV6 macro that doesn't
exist anymore, because we assume that IPv6 support is always available.
2020-10-08 08:28:33 +02:00
Ondřej Surý
1672e851c8 Always set the DF flag (not only when CMSG is available)
By mistake, we were setting the DF flag only when CMSG was available for
said platform.
2020-10-08 08:28:14 +02:00
Ondřej Surý
e8b56acb49 Clone the csock in accept_connection(), not in callback
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.
2020-10-08 07:24:31 +02:00
Ondřej Surý
d86a74d8a4 Change the isc__nm_tcpdns_stoplistening() to be asynchronous event
The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb()
that would clear the .accept_cb from non-netmgr thread.  Change the
tcpdns_stoplistening to enqueue ievent that would get processed in the
right netmgr thread to avoid locking.
2020-10-08 07:24:31 +02:00
Ondřej Surý
b9a42446e8 Enable DF (don't fragment) flag on listening UDP sockets
This commits uses the isc__nm_socket_dontfrag() helper function to
enable setting DF bit on the outgoing UDP packets.
2020-10-05 16:21:21 +02:00
Ondřej Surý
bb990030d3 Simplify the EDNS buffer size logic for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication.  In this commit, we implement the required
changes and simplify the logic for picking the EDNS Buffer Size.

1. The defaults for `edns-udp-size`, `max-udp-size` and
   `nocookie-udp-size` have been changed to `1232` (the value picked by
   DNS Flag Day 2020).

2. The probing heuristics that would try 512->4096->1432->1232 buffer
   sizes has been removed and the resolver will always use just the
   `edns-udp-size` value.

3. Instead of just disabling the PMTUD mechanism on the UDP sockets, we
   now set IP_DONTFRAG (IPV6_DONTFRAG) flag.  That means that the UDP
   packets won't get ever fragmented.  If the ICMP packets are lost the
   UDP will just timeout and eventually be retried over TCP.
2020-10-05 16:21:21 +02:00
Ondřej Surý
fd975a551d Split reusing the addr/port and load-balancing socket options
The SO_REUSEADDR, SO_REUSEPORT and SO_REUSEPORT_LB has different meaning
on different platform. In this commit, we split the function to set the
reuse of address/port and setting the load-balancing into separate
functions.

The libuv library already have multiplatform support for setting
SO_REUSEADDR and SO_REUSEPORT that allows binding to the same address
and port, but unfortunately, when used after the load-balancing socket
options have been already set, it overrides the previous setting, so we
need our own helper function to enable the SO_REUSEADDR/SO_REUSEPORT
first and then enable the load-balancing socket option.
2020-10-05 15:18:28 +02:00
Ondřej Surý
acb6ad9e3c Use uv_os_sock_t instead of uv_os_fd_t for sockets
On POSIX based systems both uv_os_sock_t and uv_os_fd_t are both typedef
to int.  That's not true on Windows, where uv_os_sock_t is SOCKET and
uv_os_fd_t is HANDLE and they differ in level of indirection.
2020-10-05 15:18:28 +02:00
Ondřej Surý
9dc01a636b Refactor isc__nm_socket_freebind() to take fd and sa_family as args
The isc__nm_socket_freebind() has been refactored to match other
isc__nm_socket_...() helper functions and take uv_os_fd_t and
sa_family_t as function arguments.
2020-10-05 15:18:24 +02:00
Ondřej Surý
d685bbc822 Add helper function to enable DF (don't fragment) flag on UDP sockets
This commits add isc__nm_socket_dontfrag() helper functions.
2020-10-05 14:55:20 +02:00
Ondřej Surý
5daaca7146 Add SO_REUSEPORT and SO_INCOMING_CPU helper functions
The setting of SO_REUSE**** and SO_INCOMING_CPU have been moved into a
separate helper functions.
2020-10-05 14:54:24 +02:00
Mark Andrews
a9c3374717 Add the ability to print out the list of test names (-l) 2020-10-01 08:21:42 +00:00
Mark Andrews
76837484e7 Add the ability to select tests to run
task_test [-t <test_name>]
2020-10-01 08:21:42 +00:00
Mark Andrews
96febe6b38 Alphabetise tests 2020-10-01 08:21:42 +00:00
Mark Andrews
519b070618 Add ISO time stamps to the microsecond 2020-09-30 23:56:18 +10:00
Ondřej Surý
e5ab137ba3 Refactor the pausing/unpausing and finishing the nm_thread
The isc_nm_pause(), isc_nm_resume() and finishing the nm_thread() from
nm_destroy() has been refactored, so all use the netievents instead of
directly touching the worker structure members.  This allows us to
remove most of the locking as the .paused and .finished members are
always accessed from the matching nm_thread.

When shutting down the nm_thread(), instead of issuing uv_stop(), we
just shutdown the .async handler, so all uv_loop_t events are properly
finished first and uv_run() ends gracefully with no outstanding active
handles in the loop.
2020-09-28 11:17:11 +02:00
Michał Kępień
b60d7345ed Fix function overrides in unit tests on macOS
Since Mac OS X 10.1, Mach-O object files are by default built with a
so-called two-level namespace which prevents symbol lookups in BIND unit
tests that attempt to override the implementations of certain library
functions from working as intended.  This feature can be disabled by
passing the "-flat_namespace" flag to the linker.  Fix unit tests
affected by this issue on macOS by adding "-flat_namespace" to LDFLAGS
used for building all object files on that operating system (it is not
enough to only set that flag for the unit test executables).
2020-09-28 09:09:21 +02:00
Mark Andrews
c37b251eb9 It appears that you can't change what you are polling for while connecting.
WARNING: ThreadSanitizer: data race
    Read of size 8 at 0x000000000001 by thread T1 (mutexes: write M1):
    #0 epoll_ctl <null>
    #1 watch_fd lib/isc/unix/socket.c:704:8
    #2 wakeup_socket lib/isc/unix/socket.c:897:11
    #3 process_ctlfd lib/isc/unix/socket.c:3362:3
    #4 process_fds lib/isc/unix/socket.c:3275:10
    #5 netthread lib/isc/unix/socket.c:3516:10

    Previous write of size 8 at 0x000000000001 by thread T2 (mutexes: write M2):
    #0 connect <null>
    #1 isc_socket_connect lib/isc/unix/socket.c:4737:7
    #2 resquery_send lib/dns/resolver.c:2892:13
    #3 fctx_query lib/dns/resolver.c:2202:12
    #4 fctx_try lib/dns/resolver.c:4300:11
    #5 resquery_connected lib/dns/resolver.c:3130:4
    #6 dispatch lib/isc/task.c:1152:7
    #7 run lib/isc/task.c:1344:2

    Location is file descriptor 513 created by thread T2 at:
    #0 connect <null>
    #1 isc_socket_connect lib/isc/unix/socket.c:4737:7
    #2 resquery_send lib/dns/resolver.c:2892:13
    #3 fctx_query lib/dns/resolver.c:2202:12
    #4 fctx_try lib/dns/resolver.c:4300:11
    #5 resquery_connected lib/dns/resolver.c:3130:4
    #6 dispatch lib/isc/task.c:1152:7
    #7 run lib/isc/task.c:1344:2

    Mutex M1 (0x000000000016) created at:
    #0 pthread_mutex_init <null>
    #1 isc__mutex_init lib/isc/pthreads/mutex.c:288:8
    #2 setup_thread lib/isc/unix/socket.c:3584:3
    #3 isc_socketmgr_create2 lib/isc/unix/socket.c:3825:3
    #4 create_managers bin/named/main.c:932:11
    #5 setup bin/named/main.c:1223:11
    #6 main bin/named/main.c:1523:2

    Mutex M2 is already destroyed.

    Thread T1 'isc-socket-1' (running) created by main thread at:
    #0 pthread_create <null>
    #1 isc_thread_create lib/isc/pthreads/thread.c:73:8
    #2 isc_socketmgr_create2 lib/isc/unix/socket.c:3826:3
    #3 create_managers bin/named/main.c:932:11
    #4 setup bin/named/main.c:1223:11
    #5 main bin/named/main.c:1523:2

    Thread T2 (running) created by main thread at:
    #0 pthread_create <null>
    #1 isc_thread_create lib/isc/pthreads/thread.c:73:8
    #2 isc_taskmgr_create lib/isc/task.c:1434:3
    #3 create_managers bin/named/main.c:915:11
    #4 setup bin/named/main.c:1223:11
    #5 main bin/named/main.c:1523:2

    SUMMARY: ThreadSanitizer: data race in epoll_ctl
2020-09-23 13:54:06 +10:00
Ondřej Surý
79ca724d46 Handle the errors from sysconf() call in isc_meminfo_totalphys()
isc_meminfo_totalphys() would return invalid memory size when sysconf()
call would fail, because ((size_t)-1 * -1) is very large number.
2020-09-21 10:55:00 +02:00
Ondřej Surý
0110d1ab17 Exclude isc_mem_isovermem from ThreadSanitizer
The .is_overmem member of isc_mem_t structure is intentionally accessed
unlocked as 100% accuracy isn't necessary here.

Without the attribute, following TSAN warning would show up:

    WARNING: ThreadSanitizer: data race
      Write of size 1 at 0x000000000001 by thread T1 (mutexes: write M1, write M2):
	#0 isc___mem_put lib/isc/mem.c:1119:19
	#1 isc__mem_put lib/isc/mem.c:2439:2
	#2 dns_rdataslab_fromrdataset lib/dns/rdataslab.c:327:2
	#3 addrdataset lib/dns/rbtdb.c:6761:11
	#4 dns_db_addrdataset lib/dns/db.c:719:10
	#5 cache_name lib/dns/resolver.c:6538:13
	#6 cache_message lib/dns/resolver.c:6628:14
	#7 resquery_response lib/dns/resolver.c:7883:13
	#8 dispatch lib/isc/task.c:1152:7
	#9 run lib/isc/task.c:1344:2

      Previous read of size 1 at 0x000000000001 by thread T2 (mutexes: write M3):
	#0 isc_mem_isovermem lib/isc/mem.c:1553:15
	#1 addrdataset lib/dns/rbtdb.c:6866:25
	#2 dns_db_addrdataset lib/dns/db.c:719:10
	#3 addoptout lib/dns/ncache.c:281:10
	#4 dns_ncache_add lib/dns/ncache.c:101:10
	#5 ncache_adderesult lib/dns/resolver.c:6668:12
	#6 ncache_message lib/dns/resolver.c:6845:11
	#7 rctx_ncache lib/dns/resolver.c:9174:11
	#8 resquery_response lib/dns/resolver.c:7894:2
	#9 dispatch lib/isc/task.c:1152:7
	#10 run lib/isc/task.c:1344:2

      Location is heap block of size 328 at 0x000000000020 allocated by thread T3:
	#0 malloc <null>
	#1 default_memalloc lib/isc/mem.c:713:8
	#2 mem_create lib/isc/mem.c:763:8
	#3 isc_mem_create lib/isc/mem.c:2425:2
	#4 configure_view bin/named/server.c:4494:4
	#5 load_configuration bin/named/server.c:9062:3
	#6 run_server bin/named/server.c:9771:2
	#7 dispatch lib/isc/task.c:1152:7
	#8 run lib/isc/task.c:1344:2

    [...]

    SUMMARY: ThreadSanitizer: data race lib/isc/mem.c:1119:19 in isc___mem_put
2020-09-17 13:51:50 +00:00