Commit Graph

12882 Commits

Author SHA1 Message Date
Diego Fronza
8324c3ddfe Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.

(cherry picked from commit e219422575)
2021-01-29 10:39:09 +01:00
Diego Fronza
0aebad96b5 Added option for disabling stale-answer-client-timeout
This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.

This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.

(cherry picked from commit 0ad6f594f6)
2021-01-29 10:38:58 +01:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Diego Fronza
7bf8950a0a Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.

(cherry picked from commit 74840ec50b)
2021-01-29 10:35:26 +01:00
Diego Fronza
f3bd27373d Avoid iterating name twice when constructing fctx->info
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.

(cherry picked from commit 49c40827f6)
2021-01-29 10:35:17 +01:00
Mark Andrews
6a0b751555 Require 'ctx' to be non-NULL in cfg_acl_fromconfig{,2}
(cherry picked from commit a8b55992a8)
2021-01-28 13:43:47 +11:00
Mark Andrews
afc75de0cc Optimise dnssec-verify
dns_dnssec_keyfromrdata() only needs to be called once per DNSKEY
rather than once per verification attempt.

(cherry picked from commit c75b325832)
2021-01-28 12:18:31 +11:00
Mark Andrews
b416d8fcdf Improve the diagnostic 'rndc retransfer' error message
(cherry picked from commit dd3520ae41)
2021-01-28 09:44:26 +11:00
Matthijs Mekking
4a36b6d918 Make opensslecdsa_parse use fromlabel
When 'opensslecdsa_parse()' encounters a label tag in the private key
file, load the private key with 'opensslecdsa_fromlabel()'. Otherwise
load it from the private structure.

This was attempted before with 'load_privkey()' and 'uses_engine()',
but had the same flaw as 'opensslecdsa_fromlabel()' had previously,
that is getting the private and public key separately, juggling with
pointers between EC_KEY and EVP_PKEY, did not create a valid
cryptographic key that could be used for signing.

(cherry picked from commit 57ac70ad46)
2021-01-26 15:04:59 +01:00
Matthijs Mekking
97185ecac2 Simplify opensslecdsa_fromlabel
The 'opensslecdsa_fromlabel()' function does not need to get the
OpenSSL engine twice to load the private and public key. Also no need
to call 'dst_key_to_eckey()' as the EC_KEY can be derived from the
loaded EVP_PKEY's.

Add some extra checks to ensure the key has the same base id and curve
(group nid) as the dst key.

Since we already have the EVP_PKEY, no need to call 'finalize_eckey()',
instead just set the right values in the key structure.

(cherry picked from commit 393052d6ff)
2021-01-26 15:04:51 +01:00
Matthijs Mekking
f555cec0af Replace EVP_DigestFinal with EVP_DigestFinal_ex
The openssl docs claim that EVP_DigestFinal() is obsolete and that
one should use EVP_DigestFinal_ex() instead.

(cherry picked from commit 1fcd0ef8bd)
2021-01-26 15:04:38 +01:00
Matthijs Mekking
9e2ea5efb1 Don't set pubkey if eckey already has public key
The 'ecdsa_check()' function tries to correctly set the public key
on the eckey, but this should be skipped if the public key is
retrieved via the private key.

(cherry picked from commit 06b9724152)
2021-01-26 15:04:21 +01:00
Matthijs Mekking
e3acfb44d5 ECDSA code should not use RSA label
The 'opensslecdsa_tofile()' function tags the label as an RSA label,
that is a copy paste error and should be of course an ECDSA label.

(cherry picked from commit 46afeca8bf)
2021-01-26 15:04:11 +01:00
Matthijs Mekking
8b25d3ab57 Correctly update pointers to pubkey and privkey
The functions 'load_pubkey_from_engine()' and
'load_privkey_from_engine()' did not correctly store the pointers.

Update both functions to add 'EC_KEY_set_public_key()' and
'EC_KEY_set_private_key()' respectively, so that the pointers to
the public and private keys survive the "load from engine" functions.

(cherry picked from commit 01239691a1)
2021-01-26 15:04:03 +01:00
Matthijs Mekking
f66df9f1b7 load_pubkey_from_engine() should load public key
The 'function load_pubkey_from_engine()' made a call to the libssl
function 'ENGINE_load_private_key'.  This is a copy paste error and
should be 'ENGINE_load_public_key'.

(cherry picked from commit 370285a62d)
2021-01-26 15:03:43 +01:00
Evan Hunt
077e2c2a74 add serial number to "transfer ended" log messages 2021-01-26 12:38:32 +01:00
Evan Hunt
2df6ffc051 check size ratio when responding to IXFR requests 2021-01-26 12:38:32 +01:00
Evan Hunt
9950247c78 improve calculation of database transfer size
- change name of 'bytes' to 'xfrsize' in dns_db_getsize() parameter list
  and related variables; this is a more accurate representation of what
  the function is doing
- change the size calculations in dns_db_getsize() to more accurately
  represent the space needed for a *XFR message or journal file to contain
  the data in the database. previously we returned the sizes of all
  rdataslabs, including header overhead and offset tables, which
  resulted in the database size being reported as much larger than the
  equivalent *XFR or journal.
- map files caused a particular problem here: the fullname can't be
  determined from the node while a file is being deserialized, because
  the uppernode pointers aren't set yet. so we store "full name length"
  in the dns_rbtnode structure while serializing, and clear it after
  deserialization is complete.
2021-01-26 12:38:32 +01:00
Evan Hunt
70df95e9f5 dns_journal_iter_init() can now return the size of the delta
the call initailizing a journal iterator can now optionally return
to the caller the size in bytes of an IXFR message (not including
DNS header overhead, signatures etc) containing the differences from
the beginning to the ending serial number.

this is calculated by scanning the journal transaction headers to
calculate the transfer size. since journal file records contain a length
field that is not included in IXFR messages, we subtract out the length
of those fields from the overall transaction length.

this necessitated adding an "RR count" field to the journal transaction
header, so we know how many length fields to subract. NOTE: this will
make existing journal files stop working!
2021-01-26 12:38:32 +01:00
Evan Hunt
57aadd6cea add syntax and setter/getter functions to configure max-ixfr-ratio 2021-01-26 12:38:32 +01:00
Ondřej Surý
0e25af628c Use -release instead of -version-info for internal library SONAMEs
The BIND 9 libraries are considered to be internal only and hence the
API and ABI changes a lot.  Keeping track of the API/ABI changes takes
time and it's a complicated matter as the safest way to make everything
stable would be to bump any library in the dependency chain as in theory
if libns links with libdns, and a binary links with both, and we bump
the libdns SOVERSION, but not the libns SOVERSION, the old libns might
be loaded by binary pulling old libdns together with new libdns loaded
by the binary.  The situation gets even more complicated with loading
the plugins that have been compiled with few versions old BIND 9
libraries and then dynamically loaded into the named.

We are picking the safest option possible and usable for internal
libraries - instead of using -version-info that has only a weak link to
BIND 9 version number, we are using -release libtool option that will
embed the corresponding BIND 9 version number into the library name.

That means that instead of libisc.so.1608 (as an example) the library
will now be named libisc-9.16.10.so.

(cherry picked from commit c605d75ea5)
2021-01-25 15:28:09 +01:00
Tinderbox User
536bc1163a prep 9.16.11 2021-01-21 09:11:54 +01:00
Evan Hunt
1a32a4d001 prevent "primaries" lists from having duplicate names
it is now an error to have two primaries lists with the same
name. this is true regardless of whether the "primaries" or
"masters" keywords were used to define them.

(cherry picked from commit f619708bbf)
2021-01-12 15:21:14 +01:00
Evan Hunt
746aa2581c add "primary-only" as a synonym for "master-only"
update the "notify" option to use RFC 8499 terminology as well.

(cherry picked from commit 424a3cf3cc)
2021-01-12 15:21:14 +01:00
Evan Hunt
04b9cdb53c add "primaries" as a synonym for "masters" in named.conf
as "type primary" is preferred over "type master" now, it makes
sense to make "primaries" available as a synonym too.

added a correctness check to ensure "primaries" and "masters"
cannot both be used in the same zone.

(cherry picked from commit 16e14353b1)
2021-01-12 15:21:14 +01:00
Matthijs Mekking
c4520620dc Fix signatures-validity config option
KASP was using 'signatures-validity-dnskey' instead of
'signatures-validity'.

(cherry picked from commit ad63e9e4f8)
2021-01-12 13:13:05 +01:00
Mark Andrews
07e899f616 Inactive incorrectly incremented
It is possible to have two threads destroying an rbtdb at the same
time when detachnode() executes and removes the last reference to
a node between exiting being set to true for the node and testing
if the references are zero in maybe_free_rbtdb().  Move NODE_UNLOCK()
to after checking if references is zero to prevent detachnode()
changing the reference count too early.

(cherry picked from commit 859d2fdad6)
2021-01-06 16:33:32 +11:00
Matthijs Mekking
63e58f09a5 Fix dnssec-signzone and -verify logging (again)
While fixing #2359, 'report()' was changed so that it would print the
newline.

Newlines were missing from the output of 'dnssec-signzone'
and 'dnssec-verify' because change
664b8f04f5 moved the printing from
newlines to the library.

This had to be reverted because this also would print redundant
newlines in logfiles.

While doing the revert, some newlines in 'lib/dns/zoneverify.c'
were left in place, now making 'dnssec-signzone' and 'dnssec-verify'
print too many newlines.

This commit removes those newlines, so that the output looks nice
again.

(cherry picked from commit 18c62a077e)
2021-01-05 13:41:49 +01:00
Matthijs Mekking
d564ad5f52 Update keymgr to allow transition to insecure mode
The keymgr prevented zones from going to insecure mode. If we
have a policy with an empty key list this is a signal that the zone
wants to go back to insecure mode. In this case allow one extra state
transition to be valid when checking for DNSSEC safety.

(cherry picked from commit 9134100069)
2020-12-23 11:56:54 +01:00
Matthijs Mekking
6da379d844 Publish CDS/CDNSKEY Delete Records
Check if zone is transitioning from secure to insecure. If so,
delete the CDS/CDNSKEY records, otherwise make sure they are not
part of the RRset.

(cherry picked from commit 68d715a229)
2020-12-23 11:56:44 +01:00
Matthijs Mekking
cf0439cd5f Treat dnssec-policy "none" as a builtin zone
Configure "none" as a builtin policy. Change the 'cfg_kasp_fromconfig'
api so that the 'name' will determine what policy needs to be
configured.

When transitioning a zone from secure to insecure, there will be
cases when a zone with no DNSSEC policy (dnssec-policy none) should
be using KASP. When there are key state files available, this is an
indication that the zone once was DNSSEC signed but is reconfigured
to become insecure.

If we would not run the keymgr, named would abruptly remove the
DNSSEC records from the zone, making the zone bogus. Therefore,
change the code such that a zone will use kasp if there is a valid
dnssec-policy configured, or if there are state files available.

(cherry picked from commit cf420b2af0)
2020-12-23 11:56:33 +01:00
Matthijs Mekking
6ff69ee8ba Add function to see if dst key uses kasp
For purposes of zones transitioning back to insecure mode, it is
practical to see if related keys have a state file associated.

(cherry picked from commit 8f2c5e45da)
2020-12-23 11:56:25 +01:00
Mark Andrews
4d003dd0f8 Only pick CPUs that are part of the existing CPU affinity set when
assigning a thread to a CPU.

(cherry picked from commit 698d9285d4)
2020-12-23 09:21:29 +11:00
Ondřej Surý
04f9f45c54 Print warning when falling back to increment soa serial method
When using the `unixtime` or `date` method to update the SOA serial,
`named` and `dnssec-signzone` would silently fallback to `increment`
method to prevent the new serial number to be smaller than the old
serial number (using the serial number arithmetics).  Add a warning
message when such fallback happens.

(cherry picked from commit ef685bab5c)
2020-12-12 07:55:29 +01:00
Ondřej Surý
2c04299eb1 Fix HAVE_SO_REUSEPORT_LB macro name definition
A typo in macro definition caused the load-balanced sockets to be
disabled even on platforms with existing support for load-balanced
sockets.

(cherry picked from commit 5caf33feda)
2020-12-09 10:46:16 +01:00
Ondřej Surý
90979a79e2 Sync the func() -> func(void) in netmgr 2020-12-09 10:46:16 +01:00
Ondřej Surý
bb9b55dfba Use sock->nchildren instead of mgr->nworkers when initializing NM
On Windows, we were limiting the number of listening children to just 1,
but we were then iterating on mgr->nworkers.  That lead to scheduling
more async_*listen() than actually allocated and out-of-bound read-write
operation on the heap.

(cherry picked from commit 87c5867202)
2020-12-09 10:46:16 +01:00
Ondřej Surý
857704b879 Explicitly link the netmgr tests with -luv 2020-12-09 10:46:16 +01:00
Ondřej Surý
7ec4ec3a81 Fix datarace when UDP/TCP connect fails and we are in nmthread
When we were in nmthread, the isc__nm_async_<proto>connect() function
executes in the same thread as the isc__nm_<proto>connect() and on a
failure, it would block indefinitely because the failure branch was
setting sock->active to false before the condition around the wait had a
chance to skip the WAIT().

This also fixes the zero system test being stuck on FreeBSD 11, so we
re-enable the test in the commit.
2020-12-09 10:46:16 +01:00
Ondřej Surý
90a9b0611a Add FreeBSD connection timeout socket option
On FreeBSD, the option to configure connection timeout is called
TCP_KEEPINIT, use it to configure the connection timeout there.

This also fixes the dangling socket problems in the unit test, so
re-enable them.
2020-12-09 10:46:16 +01:00
Ondřej Surý
0ee8672692 Distribute queries among threads even on platforms without lb sockets
On platforms without load-balancing socket all the queries would be
handle by a single thread.  Currently, the support for load-balanced
sockets is present in Linux with SO_REUSEPORT and FreeBSD 12 with
SO_REUSEPORT_LB.

This commit adds workaround for such platforms that:

1. setups single shared listening socket for all listening nmthreads for
   UDP, TCP and TCPDNS netmgr transports

2. Calls uv_udp_bind/uv_tcp_bind on the underlying socket just once and
   for rest of the nmthreads only copy the internal libuv flags (should
   be just UV_HANDLE_BOUND and optionally UV_HANDLE_IPV6).

3. start reading on UDP socket or listening on TCP socket

The load distribution among the nmthreads is uneven, but it's still
better than utilizing just one thread for processing all the incoming
queries
2020-12-09 10:46:16 +01:00
Ondřej Surý
4c70100ce0 Don't use stack allocated buffer for uv_write()
On FreeBSD, the stack is destroyed more aggressively than on Linux and
that revealed a bug where we were allocating the 16-bit len for the
TCPDNS message on the stack and the buffer got garbled before the
uv_write() sendback was executed.  Now, the len is part of the uvreq, so
we can safely pass it to the uv_write() as the req gets destroyed after
the sendcb is executed.

(cherry picked from commit 94afea9325)
2020-12-09 10:46:16 +01:00
Michał Kępień
12fa8a7aed Make netmgr initialize and cleanup Winsock itself
On Windows, WSAStartup() needs to be called to initialize Winsock before
any sockets are created or else socket() calls will return error code
10093 (WSANOTINITIALISED).  Since BIND's Network Manager is intended to
work as a reusable networking library, it should take care of calling
WSAStartup() - and its cleanup counterpart, WSACleanup() - itself rather
than relying on external code to do it.  Add the necessary WSAStartup()
and WSACleanup() calls to isc_nm_start() and isc_nm_destroy(),
respectively.

(cherry picked from commit 88f96faba8)
2020-12-09 10:46:16 +01:00
Michał Kępień
216fc34490 Extend log message for unexpected socket() errors
Make sure the error code is included in the message logged for
unexpected socket creation errors in order to facilitate troubleshooting
on Windows.

(cherry picked from commit dc2e1dea86)
2020-12-09 10:46:16 +01:00
Ondřej Surý
e8e8ed7fb9 Adjust the nstests for isc_nmhandle_{attach,detach} name change
Due to the added attach/detach tracing in the netmgr-v2 code, the
libns tests needs to be adjusted as the real function names have
changed from isc_nmhandle_* to isc__nmhandle_*.
2020-12-09 10:46:16 +01:00
Ondřej Surý
9b2184893d The cmocka.h header MUST be included before isc/util.h gets included
The isc/util.h header redefine the DbC checks (REQUIRE, INSIST, ...)  to
be cmocka "fake" assertions.  However that means that cmocka.h needs to
be included after UNIT_TESTING is defined but before isc/util.h is
included.  Because isc/util.h is included in most of the project headers
this means that the sequence MUST be:

    #define UNIT_TESTING
    #include <cmocka.h>

    #include <isc/_anything_.h>

See !2204 for other header requirements for including cmocka.h.

(cherry picked from commit 0ba697fe8c)
2020-12-09 10:46:16 +01:00
Ondřej Surý
7fc62f829d Add libssl libraries to Windows build
This commit extends the perl Configure script to also check for libssl
in addition to libcrypto and change the vcxproj source files to link
with both libcrypto and libssl.
2020-12-09 10:46:16 +01:00
Ondřej Surý
48759bd047 Fix the data race in accessing the isc_nm_t timers
The following TSAN report about accessing the mgr timers (mgr->init,
mgr->idle, mgr->keepalive and mgr->advertised) has been fixed in this
commit:

    ==================
    WARNING: ThreadSanitizer: data race (pid=2746)
    Read of size 4 at 0x7b440008a948 by thread T18:
    #0 isc__nm_tcpdns_read /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 (libisc.so.1706+0x2ba0f)
    #1 isc_nm_read /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1679:3 (libisc.so.1706+0x22258)
    #2 tcpdns_connect_connect_cb /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:363:2 (tcpdns_test+0x4bc5fb)
    #3 isc__nm_async_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1816:2 (libisc.so.1706+0x228c9)
    #4 isc__nm_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1791:3 (libisc.so.1706+0x22713)
    #5 tcpdns_connect_cb /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:343:2 (libisc.so.1706+0x2d89d)
    #6 uv__stream_connect /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1381:5 (libuv.so.1+0x27c18)
    #7 uv__stream_io /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1298:5 (libuv.so.1+0x25977)
    #8 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 (libuv.so.1+0x2e795)
    #9 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:385:5 (libuv.so.1+0x158ec)
    #10 nm_thread /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:530:11 (libisc.so.1706+0x1c94a)

    Previous write of size 4 at 0x7b440008a948 by main thread:
    #0 isc_nm_settimeouts /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:490:12 (libisc.so.1706+0x1dda5)
    #1 tcpdns_recv_two /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:601:2 (tcpdns_test+0x4bad0e)
    #2 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70be)
    #3 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a)

    Location is heap block of size 281 at 0x7b440008a840 allocated by main thread:
    #0 malloc <null> (tcpdns_test+0x42864b)
    #1 default_memalloc /home/ondrej/Projects/bind9/lib/isc/mem.c:713:8 (libisc.so.1706+0x6d261)
    #2 mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:622:8 (libisc.so.1706+0x69b9c)
    #3 isc___mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:1044:9 (libisc.so.1706+0x6d379)
    #4 isc__mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:2432:10 (libisc.so.1706+0x6889e)
    #5 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:203:8 (libisc.so.1706+0x1c219)
    #6 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4)
    #7 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd)
    #8 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a)

    Thread T18 'isc-net-0000' (tid=3513, running) created by main thread at:
    #0 pthread_create <null> (tcpdns_test+0x429e7b)
    #1 isc_thread_create /home/ondrej/Projects/bind9/lib/isc/pthreads/thread.c:73:8 (libisc.so.1706+0x8476a)
    #2 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:271:3 (libisc.so.1706+0x1c66a)
    #3 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4)
    #4 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd)
    #5 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a)

    SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 in isc__nm_tcpdns_read
    ==================
    ThreadSanitizer: reported 1 warnings

(cherry picked from commit 2e1dd56d0b)
2020-12-09 10:46:16 +01:00
Ondřej Surý
a61b7294c2 Avoid netievent allocations when the callbacks can be called directly
After turning the users callbacks to be asynchronous, there was a
visible performance drop.  This commit prevents the unnecessary
allocations while keeping the code paths same for both asynchronous and
synchronous calls.

The same change was done to the isc__nm_udp_{read,send} as those two
functions are in the hot path.

(cherry picked from commit d6d2fbe0e9)
2020-12-09 10:46:16 +01:00
Ondřej Surý
7b9c8b9781 Refactor netmgr and add more unit tests
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested.  It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.

NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.

The changes that are included in this commit are listed here
(extensively, but not exclusively):

* The netmgr_test unit test was split into individual tests (udp_test,
  tcp_test, tcpdns_test and newly added tcp_quota_test)

* The udp_test and tcp_test has been extended to allow programatic
  failures from the libuv API.  Unfortunately, we can't use cmocka
  mock() and will_return(), so we emulate the behaviour with #define and
  including the netmgr/{udp,tcp}.c source file directly.

* The netievents that we put on the nm queue have variable number of
  members, out of these the isc_nmsocket_t and isc_nmhandle_t always
  needs to be attached before enqueueing the netievent_<foo> and
  detached after we have called the isc_nm_async_<foo> to ensure that
  the socket (handle) doesn't disappear between scheduling the event and
  actually executing the event.

* Cancelling the in-flight TCP connection using libuv requires to call
  uv_close() on the original uv_tcp_t handle which just breaks too many
  assumptions we have in the netmgr code.  Instead of using uv_timer for
  TCP connection timeouts, we use platform specific socket option.

* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}

  When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
  waiting for socket to either end up with error (that path was fine) or
  to be listening or connected using condition variable and mutex.

  Several things could happen:

    0. everything is ok

    1. the waiting thread would miss the SIGNAL() - because the enqueued
       event would be processed faster than we could start WAIT()ing.
       In case the operation would end up with error, it would be ok, as
       the error variable would be unchanged.

    2. the waiting thread miss the sock->{connected,listening} = `true`
       would be set to `false` in the tcp_{listen,connect}close_cb() as
       the connection would be so short lived that the socket would be
       closed before we could even start WAIT()ing

* The tcpdns has been converted to using libuv directly.  Previously,
  the tcpdns protocol used tcp protocol from netmgr, this proved to be
  very complicated to understand, fix and make changes to.  The new
  tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
  Closes: #2194, #2283, #2318, #2266, #2034, #1920

* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
  pass accepted TCP sockets between netthreads, but instead (similar to
  UDP) uses per netthread uv_loop listener.  This greatly reduces the
  complexity as the socket is always run in the associated nm and uv
  loops, and we are also not touching the libuv internals.

  There's an unfortunate side effect though, the new code requires
  support for load-balanced sockets from the operating system for both
  UDP and TCP (see #2137).  If the operating system doesn't support the
  load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
  on FreeBSD 12+), the number of netthreads is limited to 1.

* The netmgr has now two debugging #ifdefs:

  1. Already existing NETMGR_TRACE prints any dangling nmsockets and
     nmhandles before triggering assertion failure.  This options would
     reduce performance when enabled, but in theory, it could be enabled
     on low-performance systems.

  2. New NETMGR_TRACE_VERBOSE option has been added that enables
     extensive netmgr logging that allows the software engineer to
     precisely track any attach/detach operations on the nmsockets and
     nmhandles.  This is not suitable for any kind of production
     machine, only for debugging.

* The tlsdns netmgr protocol has been split from the tcpdns and it still
  uses the old method of stacking the netmgr boxes on top of each other.
  We will have to refactor the tlsdns netmgr protocol to use the same
  approach - build the stack using only libuv and openssl.

* Limit but not assert the tcp buffer size in tcp_alloc_cb
  Closes: #2061

(cherry picked from commit 634bdfb16d)
2020-12-09 10:46:16 +01:00