Commit Graph

414 Commits

Author SHA1 Message Date
Ondřej Surý
f92b77ff0d Change the isc_thread_self() return type to uintptr_t
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.

(cherry picked from commit a0181056a8)
2021-02-26 21:14:17 +01:00
Ondřej Surý
effe3ee595 Refactor TLSDNS module to work with libuv/ssl directly
* Following the example set in 634bdfb16d, the tlsdns netmgr
  module now uses libuv and SSL primitives directly, rather than
  opening a TLS socket which opens a TCP socket, as the previous
  model was difficult to debug.  Closes #2335.

* Remove the netmgr tls layer (we will have to re-add it for DoH)

* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
  library; move the OpenSSL initialization/deinitialization from dstapi
  needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()

* Add couple of new shims needed for OpenSSL 1.0.x

* When LibreSSL is used, require at least version 2.7.0 that
  has the best OpenSSL 1.1.x compatibility and auto init/deinit

* Enforce OpenSSL 1.1.x usage on Windows

(cherry picked from commit e493e04c0f)
2021-02-26 16:14:50 +01:00
Matthijs Mekking
acc95d4e1d Don't servfail on staleonly lookups
When a staleonly lookup doesn't find a satisfying answer, it should
not try to respond to the client.

This is not true when the initial lookup is staleonly (that is when
'stale-answer-client-timeout' is set to 0), because no resolver fetch
has been created at this point. In this case continue with the lookup
normally.

(cherry picked from commit f8b7b597e9)
2021-02-25 12:07:34 +01:00
Matthijs Mekking
84deb57bc3 Don't allow recursion on staleonly lookups
Fix a crash that can happen in the following scenario:

A client request is received. There is no data for it in the cache,
(not even stale data). A resolver fetch is created as part of
recursion.

Some time later, the fetch still hasn't completed, and
stale-answer-client-timeout is triggered. A staleonly lookup is
started. It will also find no data in the cache.

So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND,
so this will call 'query_notfound()' and this will start recursion.

We will eventually end up in 'ns_query_recurse()' and that requires
the client query fetch to be NULL:

    REQUIRE(client->query.fetch == NULL);

If the previously started fetch is still running this assertion
fails.

The crash is easily prevented by not requiring recursion for
staleonly lookups.

Also remove a redundant setting of the staleonly flag at the end of
'query_lookup_staleonly()' before destroying the query context.

Add a system test to catch this case.

(cherry picked from commit 9e061faaae)
2021-02-25 12:07:27 +01:00
Michal Nowak
04aff208fb Use BIND 9.17 preprocessor macro to skip unit test
BIND 9.17 changed exit code of skipped test to meet Automake
expectations in fa505bfb0e. BIND 9.16 was
not rewritten to Automake, but for consistency reasons, the same
SKIPPED_TEST_EXIT_CODE preprocessor macro is used (though the actual
exit code differs from the one in BIND 9.17).

(cherry picked from commit fa505bfb0e)
2021-02-17 12:09:25 +01:00
Michal Nowak
001413ed50 Drop AddressSanitizer constraint from libns unit tests
The AddressSanitizer constraint in some libns unit tests does not seem
to be necessary anymore, these tests run fine under AddressSanitizer.

(cherry picked from commit 613be8706e)
2021-02-10 11:03:27 +01:00
Matthijs Mekking
2afaff75ed Use stale on error also when unable to recurse
The 'query_usestale()' function was only called when in
'query_gotanswer()' and an unexpected error occurred. This may have
been "quota reached", and thus we were in some cases returning
stale data on fetch-limits (and if serve-stale enabled of course).

But we can also hit fetch-limits when recursing because we are
following a referral (in 'query_notfound()' and
'query_delegation_recurse()'). Here we should also check for using
stale data in case an error occurred.

Specifically don't check for using stale data when refetching a
zero TTL RRset from cache.

Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()'
function to avoid code duplication.

(cherry picked from commit 8bcd7fe69e)
2021-02-08 16:10:03 +01:00
Matthijs Mekking
dbf5428629 Only start stale refresh window when resuming
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.

Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.

(This commit also wraps some docstrings to 80 characters width)

(cherry picked from commit aabdedeae3)
2021-02-08 16:07:43 +01:00
Matthijs Mekking
809ec0a224 Use stale data also if we are not resuming
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.

This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).

If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.

(cherry picked from commit c6fd02aed5)
2021-02-08 16:07:43 +01:00
Matthijs Mekking
99c72bf5da Update code flow in query.c wrt stale data
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.

Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.

Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.

Update documentation to be more verbose and to match then new code
flow.

(cherry picked from commit fa0c9280d2)
2021-01-29 10:43:41 +01:00
Diego Fronza
0e62c53c5b Extracted common function from query_lookup and query_refresh_rrset
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.

Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.

(cherry picked from commit 966060c03b)
2021-01-29 10:43:27 +01:00
Diego Fronza
5cbb28a40e Small optimization in query_usestale
This commit makes the code in query_usestale easier to follow, it also
doesn't attach/detach to the database if stale answers are not enabled.

(cherry picked from commit f89ac07b28)
2021-01-29 10:41:39 +01:00
Diego Fronza
8324c3ddfe Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.

(cherry picked from commit e219422575)
2021-01-29 10:39:09 +01:00
Diego Fronza
3478794a5d Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.

(cherry picked from commit 171a5b7542)
2021-01-29 10:38:32 +01:00
Diego Fronza
7bf8950a0a Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.

(cherry picked from commit 74840ec50b)
2021-01-29 10:35:26 +01:00
Evan Hunt
077e2c2a74 add serial number to "transfer ended" log messages 2021-01-26 12:38:32 +01:00
Evan Hunt
2df6ffc051 check size ratio when responding to IXFR requests 2021-01-26 12:38:32 +01:00
Evan Hunt
70df95e9f5 dns_journal_iter_init() can now return the size of the delta
the call initailizing a journal iterator can now optionally return
to the caller the size in bytes of an IXFR message (not including
DNS header overhead, signatures etc) containing the differences from
the beginning to the ending serial number.

this is calculated by scanning the journal transaction headers to
calculate the transfer size. since journal file records contain a length
field that is not included in IXFR messages, we subtract out the length
of those fields from the overall transaction length.

this necessitated adding an "RR count" field to the journal transaction
header, so we know how many length fields to subract. NOTE: this will
make existing journal files stop working!
2021-01-26 12:38:32 +01:00
Ondřej Surý
0e25af628c Use -release instead of -version-info for internal library SONAMEs
The BIND 9 libraries are considered to be internal only and hence the
API and ABI changes a lot.  Keeping track of the API/ABI changes takes
time and it's a complicated matter as the safest way to make everything
stable would be to bump any library in the dependency chain as in theory
if libns links with libdns, and a binary links with both, and we bump
the libdns SOVERSION, but not the libns SOVERSION, the old libns might
be loaded by binary pulling old libdns together with new libdns loaded
by the binary.  The situation gets even more complicated with loading
the plugins that have been compiled with few versions old BIND 9
libraries and then dynamically loaded into the named.

We are picking the safest option possible and usable for internal
libraries - instead of using -version-info that has only a weak link to
BIND 9 version number, we are using -release libtool option that will
embed the corresponding BIND 9 version number into the library name.

That means that instead of libisc.so.1608 (as an example) the library
will now be named libisc-9.16.10.so.

(cherry picked from commit c605d75ea5)
2021-01-25 15:28:09 +01:00
Tinderbox User
536bc1163a prep 9.16.11 2021-01-21 09:11:54 +01:00
Ondřej Surý
04f9f45c54 Print warning when falling back to increment soa serial method
When using the `unixtime` or `date` method to update the SOA serial,
`named` and `dnssec-signzone` would silently fallback to `increment`
method to prevent the new serial number to be smaller than the old
serial number (using the serial number arithmetics).  Add a warning
message when such fallback happens.

(cherry picked from commit ef685bab5c)
2020-12-12 07:55:29 +01:00
Ondřej Surý
e8e8ed7fb9 Adjust the nstests for isc_nmhandle_{attach,detach} name change
Due to the added attach/detach tracing in the netmgr-v2 code, the
libns tests needs to be adjusted as the real function names have
changed from isc_nmhandle_* to isc__nmhandle_*.
2020-12-09 10:46:16 +01:00
Ondřej Surý
7fc62f829d Add libssl libraries to Windows build
This commit extends the perl Configure script to also check for libssl
in addition to libcrypto and change the vcxproj source files to link
with both libcrypto and libssl.
2020-12-09 10:46:16 +01:00
Ondřej Surý
7b9c8b9781 Refactor netmgr and add more unit tests
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested.  It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.

NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.

The changes that are included in this commit are listed here
(extensively, but not exclusively):

* The netmgr_test unit test was split into individual tests (udp_test,
  tcp_test, tcpdns_test and newly added tcp_quota_test)

* The udp_test and tcp_test has been extended to allow programatic
  failures from the libuv API.  Unfortunately, we can't use cmocka
  mock() and will_return(), so we emulate the behaviour with #define and
  including the netmgr/{udp,tcp}.c source file directly.

* The netievents that we put on the nm queue have variable number of
  members, out of these the isc_nmsocket_t and isc_nmhandle_t always
  needs to be attached before enqueueing the netievent_<foo> and
  detached after we have called the isc_nm_async_<foo> to ensure that
  the socket (handle) doesn't disappear between scheduling the event and
  actually executing the event.

* Cancelling the in-flight TCP connection using libuv requires to call
  uv_close() on the original uv_tcp_t handle which just breaks too many
  assumptions we have in the netmgr code.  Instead of using uv_timer for
  TCP connection timeouts, we use platform specific socket option.

* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}

  When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
  waiting for socket to either end up with error (that path was fine) or
  to be listening or connected using condition variable and mutex.

  Several things could happen:

    0. everything is ok

    1. the waiting thread would miss the SIGNAL() - because the enqueued
       event would be processed faster than we could start WAIT()ing.
       In case the operation would end up with error, it would be ok, as
       the error variable would be unchanged.

    2. the waiting thread miss the sock->{connected,listening} = `true`
       would be set to `false` in the tcp_{listen,connect}close_cb() as
       the connection would be so short lived that the socket would be
       closed before we could even start WAIT()ing

* The tcpdns has been converted to using libuv directly.  Previously,
  the tcpdns protocol used tcp protocol from netmgr, this proved to be
  very complicated to understand, fix and make changes to.  The new
  tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
  Closes: #2194, #2283, #2318, #2266, #2034, #1920

* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
  pass accepted TCP sockets between netthreads, but instead (similar to
  UDP) uses per netthread uv_loop listener.  This greatly reduces the
  complexity as the socket is always run in the associated nm and uv
  loops, and we are also not touching the libuv internals.

  There's an unfortunate side effect though, the new code requires
  support for load-balanced sockets from the operating system for both
  UDP and TCP (see #2137).  If the operating system doesn't support the
  load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
  on FreeBSD 12+), the number of netthreads is limited to 1.

* The netmgr has now two debugging #ifdefs:

  1. Already existing NETMGR_TRACE prints any dangling nmsockets and
     nmhandles before triggering assertion failure.  This options would
     reduce performance when enabled, but in theory, it could be enabled
     on low-performance systems.

  2. New NETMGR_TRACE_VERBOSE option has been added that enables
     extensive netmgr logging that allows the software engineer to
     precisely track any attach/detach operations on the nmsockets and
     nmhandles.  This is not suitable for any kind of production
     machine, only for debugging.

* The tlsdns netmgr protocol has been split from the tcpdns and it still
  uses the old method of stacking the netmgr boxes on top of each other.
  We will have to refactor the tlsdns netmgr protocol to use the same
  approach - build the stack using only libuv and openssl.

* Limit but not assert the tcp buffer size in tcp_alloc_cb
  Closes: #2061

(cherry picked from commit 634bdfb16d)
2020-12-09 10:46:16 +01:00
Tinderbox User
7406ea925a prep 9.16.10 2020-12-09 10:46:16 +01:00
Ondřej Surý
a35a666a7c Reformat sources using clang-format-11
(cherry picked from commit 7ba18870dc)
2020-12-08 19:34:05 +01:00
Diego Fronza
5c28451949 Silence coverity warnings in query.c
Return value of dns_db_getservestalerefresh() and
dns_db_getservestalettl() functions were previously unhandled.

This commit purposefully ignore those return values since there is
no side effect if those results are != ISC_R_SUCCESS, it also supress
Coverity warnings.
2020-11-26 14:56:22 +00:00
Tinderbox User
14620951cc prep 9.16.9 2020-11-26 12:25:53 +01:00
Mark Andrews
b3d259107f Fix DNAME when QTYPE is CNAME or ANY
The synthesised CNAME is not supposed to be followed when the
QTYPE is CNAME or ANY as the lookup is satisfied by the CNAME
record.

(cherry picked from commit e980affba0)
2020-11-19 10:52:29 +11:00
Diego Fronza
8cc5abff23 Add stale-refresh-time option
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).

The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.

A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.

Only when this interval expires we then try to do a normal refresh of
the rrset.

The logic behind this commit is as following:

- In query.c / query_gotanswer(), if the test of 'result' variable falls
  to the default case, an error is assumed to have happened, and a call
  to 'query_usestale()' is made to check if serving of stale rrset is
  enabled in configuration.

- If serving of stale answers is enabled, a flag will be turned on in
  the query context to look for stale records:
  query.c:6839
  qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;

- A call to query_lookup() will be made again, inside it a call to
  'dns_db_findext()' is made, which in turn will invoke rbdb.c /
  cache_find().

- In rbtdb.c / cache_find() the important bits of this change is the
  call to 'check_stale_header()', which is a function that yields true
  if we should skip the stale entry, or false if we should consider it.

- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
  is set, if that is the case we know that this new search for stale
  records was made due to a failure in a normal resolution, so we keep
  track of the time in which the failured occured in rbtdb.c:4559:
  header->last_refresh_fail_ts = search->now;

- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
  know this is a normal lookup, if the record is stale and the query
  time is between last failure time + stale-refresh-time window, then
  we return false so cache_find() knows it can consider this stale
  rrset entry to return as a response.

The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh

Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
2020-11-11 15:59:56 -03:00
Matthijs Mekking
a6755ce7f8 Cleanup duplicate definitions in query.h
(cherry picked from commit 31692744cc47eef7ad6b41aeb53f5566ca6e7efe)
2020-11-10 15:50:20 +01:00
Mark Andrews
14fe29b76d Implement DNSTAP support in ns_client_sendraw()
ns_client_sendraw() is currently only used to relay UPDATE
responses back to the client.  dns_dt_send() is called with
this assumption.

(cherry picked from commit b09727a765)
2020-11-10 17:59:04 +11:00
Ondřej Surý
301e4145de Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
   whether the socket was still alive and scheduling reads/sends on
   closed socket.

2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
   changed to always return the error conditions via the callbacks, so
   they always succeed.  This applies to all protocols (UDP, TCP and
   TCPDNS).

(cherry picked from commit f7c82e406e)
2020-10-22 15:00:00 -07:00
Tinderbox User
44e91206a4 prep 9.16.8 2020-10-22 09:09:07 +02:00
Diego Fronza
d5355b8105 Always return address records in additional section for NS queries 2020-10-21 12:12:22 -03:00
Ondřej Surý
58a518adca Change the default ENDS buffer size to 1232 for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication.  In this commit, we implement the minimal
required changes by changing the defaults for `edns-udp-size`,
`max-udp-size` and `nocookie-udp-size` to `1232` (the value picked by
DNS Flag Day 2020).

(cherry picked from commit bb990030d3)
2020-10-06 09:35:20 +02:00
Ondřej Surý
7a90ad1fe2 Add separate prefetch nmhandle to ns_client_t
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records.  The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.

(cherry picked from commit d4976e0ebe)
2020-10-01 18:09:35 +02:00
Evan Hunt
ba2e9dfb99 change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.

A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:

        - reqhandle:    held while waiting for a request callback (query,
                        notify, update)
        - sendhandle:   held while waiting for a send callback
        - fetchhandle:  held while waiting for a recursive fetch to
                        complete
        - updatehandle: held while waiting for an update-forwarding
                        task to complete

(cherry picked from commit 57b4dde974)
2020-10-01 18:09:35 +02:00
Evan Hunt
b14cb9e2f1 restore "blackhole" functionality
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.

in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().

(cherry picked from commit 23c7373d68)
2020-10-01 16:44:43 +02:00
Evan Hunt
f64a881a30 change the signature of recv callbacks to include a result code
this will allow recv event handlers to distinguish between cases
in which the region is NULL because of error, shutdown, or cancelation.

(cherry picked from commit 75c985c07f)
2020-10-01 16:44:43 +02:00
Evan Hunt
573bcdf932 make isc_nmsocket_{attach,detach}{} functions private
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.

(cherry picked from commit 9e740cad21)
2020-10-01 16:44:43 +02:00
Ondřej Surý
826ddb246e Revert the tree to allow cherry-picking netmgr changes from main
The following reverted changes will be picked again as part of the
netmgr sync with main branch.

Revert "Merge branch '1996-confidential-issue-v9_16' into 'security-v9_16'"

This reverts commit e160b1509f, reversing
changes made to c01e643715.

Revert "Merge branch '2038-use-freebind-when-bind-fails-v9_16' into 'v9_16'"

This reverts commit 5f8ecfb918, reversing
changes made to 23021385d5.

Revert "Merge branch '1936-blackhole-fix-v9_16' into 'v9_16'"

This reverts commit f20bc90a72, reversing
changes made to 490016ebf1.

Revert "Merge branch '1938-fix-udp-race' into 'v9_16'"

This reverts commit 0a6c7ab2a9, reversing
changes made to 4ea84740e6.

Revert "Merge branch '1947-fix-tcpdns-race' into 'v9_16'"

This reverts commit 4ea84740e6, reversing
changes made to d761cd576b.
2020-10-01 16:44:43 +02:00
Ondřej Surý
f0989bdf03 The dns_message_create() cannot fail, change the return to void
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.

(cherry picked from commit 33eefe9f85)
2020-09-30 14:26:26 +02:00
Diego Fronza
da84f8d1fd Refactored dns_message_t for using attach/detach semantics
This commit will be used as a base for the next code updates in
order to have a better control of dns_message_t objects' lifetime.

(cherry picked from commit 12d6d13100)
2020-09-30 11:34:42 +10:00
Michał Kępień
e05e5d7c12 Clean up use of function wrapping
Currently, building BIND using "--without-dlopen" universally breaks
building unit tests which employ the --wrap linker option (because the
replacement functions are put in a shared library and building shared
objects requires "--with-dlopen").  Fix by moving the overridden symbol,
isc_nmhandle_unref(), to lib/ns/tests/nstest.c and dropping
lib/ns/tests/wrap.c altogether.  This makes lib/ns/tests/Makefile.in
simpler and prevents --without-dlopen from messing with the process of
building unit tests.

Remove parts of configure.ac which are made redundant by the above
changes.

Put the replacement definition of isc_nmhandle_unref() inside an #ifdef
block, so that the build does not break for non-libtool builds (see
below).

These changes allow the broadest possible set of build variants to work
while also simplifying the build process:

  - for libtool builds, overriding isc_nmhandle_unref() is done by
    placing that symbol directly in lib/ns/tests/nstest.c and relying on
    the dynamic linker to perform symbol resolution in the expected way
    when the test binary is run,

  - for non-libtool builds, overriding isc_nmhandle_unref() is done
    using the --wrap linker option (the libtool approach cannot be used
    in this case as multiple strong symbols with the same name cannot
    coexist in the same binary),

  - the "--without-dlopen" option no longer affects building unit tests.
2020-09-28 09:16:48 +02:00
Evan Hunt
50cc4d6a3e Purge memory pool upon plugin destruction
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:

 1. Original query context is created.
 2. An AAAA RRset is found in cache.
 3. Client-specific data is allocated from the filter-aaaa memory pool.
 4. Recursion is triggered for an A RRset.
 5. Original query context is torn down.

 6. Recursion for an A RRset completes.
 7. A second query context is created.
 8. Client-specific data is retrieved from the filter-aaaa memory pool.
 9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.

However, steps 6-12 are not executed if recursion for an A RRset is
canceled.  Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released.  This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.

Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.

(cherry picked from commit 86eddebc83)
2020-09-25 14:04:54 -07:00
Mark Andrews
9e79a7d7ce Clone the saved / query message buffers
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call.  Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.

(cherry picked from commit f0d9bf7c30)
2020-09-23 11:17:23 +10:00
Evan Hunt
df698d73f4 update all copyright headers to eliminate the typo 2020-09-14 16:50:58 -07:00
Tinderbox User
a195123ad0 prep 9.16.6 2020-08-06 08:14:40 +00:00
Mark Andrews
2dc26ebdb6 Map DNS_R_BADTSIG to FORMERR
Now that the log message has been printed set the result code to
DNS_R_FORMERR.  We don't do this via dns_result_torcode() as we
don't want upstream errors to produce FORMERR if that processing
end with DNS_R_BADTSIG.

(cherry picked from commit 20488d6ad3)
2020-08-04 23:04:34 +10:00