Commit Graph

13135 Commits

Author SHA1 Message Date
Ondřej Surý
6cf6de55bc Prevent the double xfrin_fail() call
When we are reading from the xfrin socket, and the transfer would be
shutdown, the shutdown function would call `xfrin_fail()` which in turns
calls `xfrin_cancelio()` that causes the read callback to be invoked
with `ISC_R_CANCELED` status code and that caused yet another
`xfrin_fail()` call.

The fix here is to ensure the `xfrin_fail()` would be run only once
properly using better synchronization on xfr->shuttingdown flag.
2021-04-20 14:12:26 +02:00
Ondřej Surý
25d27851d8 Fix lock-order-inversion (potential deadlock) in dns_resolver_createfetch
There's a lock-order-inversion when running `zone_maintenance()` from
the timer while shutting down the server `shutdown_server()`.  This only
happens when the taskmgr scheduling is more relaxed and paralellized,
but the issue is real nevertheless.

The associated ThreadSanitizer warning:

    WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
      Cycle in lock order graph: M1 (0x000000000001) => M2 (0x000000000000) => M1

      Mutex M2 acquired here while holding mutex M1 in thread T1:
	#0 pthread_mutex_lock <null>
	#1 dns_view_findzonecut lib/dns/view.c:1326:2
	#2 fctx_create lib/dns/resolver.c:5144:13
	#3 dns_resolver_createfetch lib/dns/resolver.c:10977:12
	#4 zone_refreshkeys lib/dns/zone.c:10830:13
	#5 zone_maintenance lib/dns/zone.c:11065:5
	#6 zone_timer lib/dns/zone.c:14652:2
	#7 task_run lib/isc/task.c:857:5
	#8 isc_task_run lib/isc/task.c:944:10
	#9 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
	#10 process_netievent lib/isc/netmgr/netmgr.c
	#11 process_queue lib/isc/netmgr/netmgr.c:885:8
	#12 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
	#13 process_queues lib/isc/netmgr/netmgr.c:772:7
	#14 async_cb lib/isc/netmgr/netmgr.c:671:2
	#15 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
	#16 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
	#17 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
	#18 nm_thread lib/isc/netmgr/netmgr.c:597:11
	#19 isc__trampoline_run lib/isc/trampoline.c:184:11

      Mutex M1 previously acquired by the same thread here:
	#0 pthread_mutex_lock <null>
	#1 zone_refreshkeys lib/dns/zone.c:10717:2
	#2 zone_maintenance lib/dns/zone.c:11065:5
	#3 zone_timer lib/dns/zone.c:14652:2
	#4 task_run lib/isc/task.c:857:5
	#5 isc_task_run lib/isc/task.c:944:10
	#6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
	#7 process_netievent lib/isc/netmgr/netmgr.c
	#8 process_queue lib/isc/netmgr/netmgr.c:885:8
	#9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
	#10 process_queues lib/isc/netmgr/netmgr.c:772:7
	#11 async_cb lib/isc/netmgr/netmgr.c:671:2
	#12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
	#13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
	#14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
	#15 nm_thread lib/isc/netmgr/netmgr.c:597:11
	#16 isc__trampoline_run lib/isc/trampoline.c:184:11

      Mutex M1 acquired here while holding mutex M2 in thread T2:
	#0 pthread_mutex_lock <null>
	#1 dns_zone_flush lib/dns/zone.c:11443:2
	#2 view_flushanddetach lib/dns/view.c:657:5
	#3 dns_view_flushanddetach lib/dns/view.c:690:2
	#4 shutdown_server bin/named/server.c:10056:4
	#5 task_run lib/isc/task.c:857:5
	#6 isc_task_run lib/isc/task.c:944:10
	#7 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
	#8 process_netievent lib/isc/netmgr/netmgr.c
	#9 process_queue lib/isc/netmgr/netmgr.c:885:8
	#10 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
	#11 process_queues lib/isc/netmgr/netmgr.c:772:7
	#12 async_cb lib/isc/netmgr/netmgr.c:671:2
	#13 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
	#14 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
	#15 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
	#16 nm_thread lib/isc/netmgr/netmgr.c:597:11
	#17 isc__trampoline_run lib/isc/trampoline.c:184:11

      Mutex M2 previously acquired by the same thread here:
	#0 pthread_mutex_lock <null>
	#1 view_flushanddetach lib/dns/view.c:645:3
	#2 dns_view_flushanddetach lib/dns/view.c:690:2
	#3 shutdown_server bin/named/server.c:10056:4
	#4 task_run lib/isc/task.c:857:5
	#5 isc_task_run lib/isc/task.c:944:10
	#6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24
	#7 process_netievent lib/isc/netmgr/netmgr.c
	#8 process_queue lib/isc/netmgr/netmgr.c:885:8
	#9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10
	#10 process_queues lib/isc/netmgr/netmgr.c:772:7
	#11 async_cb lib/isc/netmgr/netmgr.c:671:2
	#12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5
	#13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11
	#14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5
	#15 nm_thread lib/isc/netmgr/netmgr.c:597:11
	#16 isc__trampoline_run lib/isc/trampoline.c:184:11

      Thread T2 (running) created by main thread at:
	#0 pthread_create <null>
	#1 isc_thread_create lib/isc/pthreads/thread.c:79:8
	#2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3
	#3 create_managers bin/named/main.c:957:15
	#4 setup bin/named/main.c:1267:11
	#5 main bin/named/main.c:1558:2

      Thread T2 (running) created by main thread at:
	#0 pthread_create <null>
	#1 isc_thread_create lib/isc/pthreads/thread.c:79:8
	#2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3
	#3 create_managers bin/named/main.c:957:15
	#4 setup bin/named/main.c:1267:11
	#5 main bin/named/main.c:1558:2

    SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) in __interceptor_pthread_mutex_lock
2021-04-19 22:29:14 +02:00
Ondřej Surý
16fe0d1f41 Cleanup the public vs private ISCAPI remnants
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs.  This means, there's one less
layer as following changes have been done:

 * struct isc_socket and struct isc_socketmgr have been removed
 * struct isc__socket and struct isc__socketmgr have been renamed
   to struct isc_socket and struct isc_socketmgr
 * struct isc_task and struct isc_taskmgr have been removed
 * struct isc__task and struct isc__taskmgr have been renamed
   to struct isc_task and struct isc_taskmgr
 * struct isc_timer and struct isc_timermgr have been removed
 * struct isc__timer and struct isc__timermgr have been renamed
   to struct isc_timer and struct isc_timermgr
 * All the associated code that dealt with typing isc_<foo>
   to isc__<foo> and back has been removed.
2021-04-19 13:18:24 +02:00
Ondřej Surý
0127ba6472 Fix task timing race in setnsec3param()
When setnsec3param() is schedule from zone_postload() there's no
guarantee that `zone->db` is not `NULL` yet.  Thus when the
setnsec3param() is called, we need to check for `zone->db` existence and
reschedule the task, because calling `rss_post()` on a zone with empty
`.db` ends up with no-op (the function just returns).
2021-04-19 11:16:51 +02:00
Ondřej Surý
3388ef36b3 Cleanup the isc_<*>mgr_createinc() constructors
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx.  This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.

This commit just cleans up the extra function.
2021-04-19 10:22:56 +02:00
Mark Andrews
eadb829dac properly initialise resarg->lock 2021-04-19 14:32:40 +10:00
Evan Hunt
d0ec7d1f33 move samples/resolve.c to bin/tests/system
"resolve" is used by the resolver system tests, and I'm not
certain whether delv exercises the same code, so rather than
remove it, I moved it to bin/tests/system.
2021-04-16 14:29:43 +02:00
Evan Hunt
056afe7bdc remove sample-async
sample code for export libraries is no longer needed and
this code is not used for any internal tests. also, sample-gai.c
had already been removed but there were some dangling references.
2021-04-16 14:29:43 +02:00
Evan Hunt
568d455c99 rename dns_client_createx() to dns_client_create()
there's no longer a need to use an alternate name.
2021-04-16 14:29:43 +02:00
Evan Hunt
1beb05f3e2 remove dns_client_request() and related code
continues the cleanup of dns_client started in the previous commit.
2021-04-16 14:29:43 +02:00
Evan Hunt
fb2a352e7c remove dns_client_update() and related code
the libdns client API is no longer being maintained for
external use, we can remove the code that isn't being used
internally, as well as the related tests.
2021-04-16 14:29:43 +02:00
Ondřej Surý
55b942b4a0 Refactor dns_journal_rollforward() to work over opened journal
Too much logic was cramped inside the dns_journal_rollforward() that
made it harder to follow.  The dns_journal_rollforward() was refactored
to work over already opened journal and some of the previous logic was
moved to new static zone_journal_rollforward() that separates the
journal "rollforward" logic from the "zone" logic.
2021-04-16 12:04:06 +02:00
Mark Andrews
ec7a9af381 Fixing a recoverable journal should not result in the zone being written
when dns_journal_rollforward returned ISC_R_RECOVERABLE the distintion
between 'up to date' and 'success' was lost, as a consequence
zone_needdump() was called writing out the zone file when it shouldn't
have been.   This change restores that distintion.  Adjust system
test to reflect visible changes.
2021-04-16 11:15:46 +02:00
Artem Boldariev
66432dcd65 Handle a situation when SSL shutdown messages were sent and received
It fixes a corner case which was causing dig to print annoying
messages like:

14-Apr-2021 18:48:37.099 SSL error in BIO: 1 TLS error (errno:
0). Arguments: received_data: (nil), send_data: (nil), finish: false

even when all the data was properly processed.
2021-04-15 15:49:36 +03:00
Artem Boldariev
513cdb52ec TLS: try to close TCP socket descriptor earlier when possible
Before this fix underlying TCP sockets could remain opened for longer
than it is actually required, causing unit tests to fail with lots of
ISC_R_TOOMANYOPENFILES errors.

The change also enables graceful SSL shutdown (before that it  would
happen only in the case when isc_nm_cancelread() were called).
2021-04-15 15:49:36 +03:00
Ondřej Surý
202b1d372d Merge the tls_test.c into netmgr_test.c and extend the tests suite
This commit merges TLS tests into the common Network Manager unit
tests suite and extends the unit test framework to include support for
additional "ping-pong" style tests where all data could be sent via
lesser number of connections (the behaviour of the old test
suite). The tests for TCP and TLS were extended to make use of the new
mode, as this mode better translates to how the code is used in DoH.

Both TLS and TCP tests now share most of the unit tests' code, as they
are expected to function similarly from a users's perspective anyway.

Additionally to the above, the TLS test suite was extended to include
TLS tests using the connections quota facility.
2021-04-15 15:49:36 +03:00
Matthijs Mekking
8fcbef2423 Small refactor lib/dns/zone.c
Introduce some macros that can be reused in 'zone_load_soa_rr()' and
'zone_get_from_db()' to make those functions more readable.
2021-04-13 11:26:26 +02:00
Matthijs Mekking
032110bd2e Use designated initializer in dns_zone_create
Shorten the code and make it less prone to initialisation errors
(it is still easy to forget adding an initializer, but it now defaults
to 0).
2021-04-13 11:26:26 +02:00
Matthijs Mekking
9af8caa733 Implement draft-vandijk-dnsop-nsec-ttl
The draft says that the NSEC(3) TTL must have the same TTL value
as the minimum of the SOA MINIMUM field and the SOA TTL. This was
always the intended behaviour.

Update the zone structure to also track the SOA TTL. Whenever we
use the MINIMUM value to determine the NSEC(3) TTL, use the minimum
of MINIMUM and SOA TTL instead.

There is no specific test for this, however two tests need adjusting
because otherwise they failed: They were testing for NSEC3 records
including the TTL. Update these checks to use 600 (the SOA TTL),
rather than 3600 (the SOA MINIMUM).
2021-04-13 11:26:26 +02:00
Matthijs Mekking
a83c8cb0af Use stale TTL as RRset TTL in dumpdb
It is more intuitive to have the countdown 'max-stale-ttl' as the
RRset TTL, instead of 0 TTL. This information was already available
in a comment "; stale (will be retained for x more seconds", but
Support suggested to put it in the TTL field instead.
2021-04-13 09:48:20 +02:00
Matthijs Mekking
debee6157b Check staleness in bind_rdataset
Before binding an RRset, check the time and see if this record is
stale (or perhaps even ancient). Marking a header stale or ancient
happens only when looking up an RRset in cache, but binding an RRset
can also happen on other occasions (for example when dumping the
database).

Check the time and compare it to the header. If according to the
time the entry is stale, but not ancient, set the STALE attribute.
If according to the time is ancient, set the ANCIENT attribute.

We could mark the header stale or ancient here, but that requires
locking, so that's why we only compare the current time against
the rdh_ttl.

Adjust the test to check the dump-db before querying for data. In the
dumped file the entry should be marked as stale, despite no cache
lookup happened since the initial query.
2021-04-13 09:48:20 +02:00
Matthijs Mekking
2a5e0232ed Fix nonsensical stale TTL values in cache dump
When introducing change 5149, "rndc dumpdb" started to print a line
above a stale RRset, indicating how long the data will be retained.

At that time, I thought it should also be possible to load
a cache from file. But if a TTL has a value of 0 (because it is stale),
stale entries wouldn't be loaded from file. So, I added the
'max-stale-ttl' to TTL values, and adjusted the $DATE accordingly.

Since we actually don't have a "load cache from file" feature, this
is premature and is causing confusion at operators. This commit
changes the 'max-stale-ttl' adjustments.

A check in the serve-stale system test is added for a non-stale
RRset (longttl.example) to make sure the TTL in cache is sensible.

Also, the comment above stale RRsets could have nonsensical
values. A possible reason why this may happen is when the RRset was
marked a stale but the 'max-stale-ttl' has passed (and is actually an
RRset awaiting cleanup). This would lead to the "will be retained"
value to be negative (but since it is stored in an uint32_t, you would
get a nonsensical value (e.g. 4294362497).

To mitigate against this, we now also check if the header is not
ancient. In addition we check if the stale_ttl would be negative, and
if so we set it to 0. Most likely this will not happen because the
header would already have been marked ancient, but there is a possible
race condition where the 'rdh_ttl + serve_stale_ttl' has passed,
but the header has not been checked for staleness.
2021-04-13 09:48:20 +02:00
Michał Kępień
d954e152d9 Free resources when gss_accept_sec_context() fails
Even if a call to gss_accept_sec_context() fails, it might still cause a
GSS-API response token to be allocated and left for the caller to
release.  Make sure the token is released before an early return from
dst_gssapi_acceptctx().
2021-04-08 10:33:44 +02:00
Mark Andrews
0fbdf189c7 Rewrite managed-key journal immediately
Both managed keys and regular zone journals need to be updated
immediately when a recoverable error is discovered.
2021-04-07 20:23:46 +02:00
Mark Andrews
83310ffd92 Update dns_journal_compact() to handle bad transaction headers
Previously, dns_journal_begin_transaction() could reserve the wrong
amount of space.  We now check that the transaction is internally
consistent when upgrading / downgrading a journal and we also handle the
bad transaction headers.
2021-04-07 20:23:46 +02:00
Mark Andrews
520509ac7e Compute transaction size based on journal/transaction type
previously the code assumed that it was a new transaction.
2021-04-07 20:20:57 +02:00
Mark Andrews
5a6112ec8f Use journal_write_xhdr() to write the dummy transaction header
Instead of journal_write(), use correct format call journal_write_xhdr()
to write the dummy transaction header which looks at j->header_ver1 to
determine which transaction header to write instead of always writing a
zero filled journal_rawxhdr_t header.
2021-04-07 20:18:44 +02:00
Artem Boldariev
8da12738f1 Use T_CONNECT timeout constant for TCP tests (instead of 1 ms)
The netmgr_test would be failing on heavily loaded systems because the
connection timeout was set to 1 ms.  Use the global constant instead.
2021-04-07 15:37:10 +02:00
Ondřej Surý
72ef5f465d Refactor async callbacks and fix the double tlsdnsconnect callback
The isc_nm_tlsdnsconnect() call could end up with two connect callbacks
called when the timeout fired and the TCP connection was aborted,
but the TLS handshake was not complete yet.  isc__nm_connecttimeout_cb()
forgot to clean up sock->tls.pending_req when the connect callback was
called with ISC_R_TIMEDOUT, leading to a second callback running later.

A new argument has been added to the isc__nm_*_failed_connect_cb and
isc__nm_*_failed_read_cb functions, to indicate whether the callback
needs to run asynchronously or not.
2021-04-07 15:36:59 +02:00
Ondřej Surý
58e75e3ce5 Skip long tls_tests in the CI
We already skip most of the recv_send tests in CI because they are
too timing-related to be run in overloaded environment.  This commit
adds a similar change to tls_test before we merge tls_test into
netmgr_test.
2021-04-07 15:36:59 +02:00
Artem Boldariev
340235c855 Prevent short TLS tests from hanging in case of errors
The tests in tls_test.c could hang in the event of a connect
error.  This commit allows the tests to bail out when such an
error occurs.
2021-04-07 15:36:59 +02:00
Evan Hunt
426c40c96d rearrange nm_teardown() to check correctness after shutting down
if a test failed at the beginning of nm_teardown(), the function
would abort before isc_nm_destroy() or isc_tlsctx_free() were reached;
we would then abort when nm_setup() was run for the next test case.
rearranging the teardown function prevents this problem.
2021-04-07 15:36:59 +02:00
Ondřej Surý
86f4872dd6 isc_nm_*connect() always return via callback
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).

This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
2021-04-07 15:36:59 +02:00
Evan Hunt
a70cd026df move UDP connect retries from dig into isc_nm_udpconnect()
dig previously ran isc_nm_udpconnect() three times before giving
up, to work around a freebsd bug that caused connect() to return
a spurious transient EADDRINUSE. this commit moves the retry code
into the network manager itself, so that isc_nm_udpconnect() no
longer needs to return a result code.
2021-04-07 15:36:59 +02:00
Ondřej Surý
ca12e25bb0 Use generic functions for reading and timers in TCP
The TCP module has been updated to use the generic functions from
netmgr.c instead of its own local copies.  This brings the module
mostly up to par with the TCPDNS and TLSDNS modules.
2021-04-07 15:36:59 +02:00
Ondřej Surý
7df8c7061c Fix and clean up handling of connect callbacks
Serveral problems were discovered and fixed after the change in
the connection timeout in the previous commits:

  * In TLSDNS, the connection callback was not called at all under some
    circumstances when the TCP connection had been established, but the
    TLS handshake hadn't been completed yet.  Additional checks have
    been put in place so that tls_cycle() will end early when the
    nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call.

  * In TCP, TCPDNS and TLSDNS, new connections would be established
    even when the network manager was shutting down.  The new
    call isc__nm_closing() has been added and is used to bail out
    early even before uv_tcp_connect() is attempted.
2021-04-07 15:36:59 +02:00
Ondřej Surý
5a87c7372c Make it possible to recover from connect timeouts
Similarly to the read timeout, it's now possible to recover from
ISC_R_TIMEDOUT event by restarting the timer from the connect callback.

The change here also fixes platforms that missing the socket() options
to set the TCP connection timeout, by moving the timeout code into user
space.  On platforms that support setting the connect timeout via a
socket option, the timeout has been hardcoded to 2 minutes (the maximum
value of tcp-initial-timeout).
2021-04-07 15:36:58 +02:00
Ondřej Surý
33c00c281f Make it possible to recover from read timeouts
Previously, when the client timed out on read, the client socket would
be automatically closed and destroyed when the nmhandle was detached.
This commit changes the logic so that it's possible for the callback to
recover from the ISC_R_TIMEDOUT event by restarting the timer. This is
done by calling isc_nmhandle_settimeout(), which prevents the timeout
handling code from destroying the socket; instead, it continues to wait
for data.

One specific use case for multiple timeouts is serve-stale - the client
socket could be created with shorter timeout (as specified with
stale-answer-client-timeout), so we can serve the requestor with stale
answer, but keep the original query running for a longer time.
2021-04-07 15:36:58 +02:00
Ondřej Surý
0aad979175 Disable netmgr tests only when running under CI
The full netmgr test suite is unstable when run in CI due to various
timing issues.  Previously, we enabled the full test suite only when
CI_ENABLE_ALL_TESTS environment variable was set, but that went against
original intent of running the full suite when an individual developer
would run it locally.

This change disables the full test suite only when running in the CI and
the CI_ENABLE_ALL_TESTS is not set.
2021-04-07 15:36:58 +02:00
Diego Fronza
6e08307bc8 Resolve TSAN data race in zone_maintenance
Fix race between zone_maintenance and dns_zone_notifyreceive functions,
zone_maintenance was attempting to read a zone flag calling
DNS_ZONE_FLAG(zone, flag) while dns_zone_notifyreceive was updating
a flag in the same zone calling DNS_ZONE_SETFLAG(zone, ...).

The code reading the flag in zone_maintenance was not protected by the
zone's lock, to avoid a race the zone's lock is now being acquired
before an attempt to read the zone flag is made.
2021-04-07 12:04:01 +00:00
Michał Kępień
6bdd55a9b3 Enforce a run time limit on unit test binaries
When a unit test binary hangs, the GitLab CI job in which it is run is
stuck until its run time limit is exceeded.  Furthermore, it is not
trivial to determine which test(s) hung in a given GitLab CI job based
on its log.  To prevent these issues, enforce a run time limit on every
binary executed by the lib/unit-test-driver.sh script.  Use a timeout of
5 minutes for consistency with older BIND 9 branches, which employed
Kyua for running unit tests.  Report an exit code of 124 when the run
time limit is exceeded for a unit test binary, for consistency with the
"timeout" tool included in GNU coreutils.
2021-04-07 11:41:45 +02:00
Artem Boldariev
ee10948e2d Remove dead code which was supposed to handle TLS shutdowns nicely
Fixes Coverity issue CID 330954 (See #2612).
2021-04-07 11:21:08 +03:00
Artem Boldariev
e6062210c7 Handle buggy situations with SSL_ERROR_SYSCALL
See "BUGS" section at:

https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html

It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL
AND errno == 0 it means that underlying transport layer returned EOF
prematurely.  However, we are managing the transport ourselves, so we
should just resume reading from the TCP socket.

It seems that this case has been handled properly on modern versions
of OpenSSL. That being said, the situation goes in line with the
manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be
returned not only in a case of low-level errors (like system call
failures).
2021-04-07 11:21:08 +03:00
Mark Andrews
9c28df2204 remove lib/dns/gen when running 'make clean' 2021-04-07 08:06:49 +10:00
Matthijs Mekking
3d3a6415f7 If RPZ config'd, bail stale-answer-client-timeout
When we are recursing, RPZ processing is not allowed. But when we are
performing a lookup due to "stale-answer-client-timeout", we are still
recursing. This effectively means that RPZ processing is disabled on
such a lookup.

In this case, bail the "stale-answer-client-timeout" lookup and wait
for recursion to complete, as we we can't perform the RPZ rewrite
rules reliably.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
839df94190 Rename "staleonly"
The dboption DNS_DBFIND_STALEONLY caused confusion because it implies
we are looking for stale data **only** and ignore any active RRsets in
the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear
the option is related to a lookup due to "stale-answer-client-timeout".

Rename other usages of "staleonly", instead use "lookup due to...".
Also rename related function and variable names.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
3f81d79ffb Restore the RECURSIONOK attribute after staleonly
When doing a staleonly lookup we don't want to fallback to recursion.
After all, there are obviously problems with recursion, otherwise we
wouldn't do a staleonly lookup.

When resuming from recursion however, we should restore the
RECURSIONOK flag, allowing future required lookups for this client
to recurse.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
aaed7f9d8c Remove result exception on staleonly lookup
When implementing "stale-answer-client-timeout", we decided that
we should only return positive answers prematurely to clients. A
negative response is not useful, and in that case it is better to
wait for the recursion to complete.

To do so, we check the result and if it is not ISC_R_SUCCESS, we
decide that it is not good enough. However, there are more return
codes that could lead to a positive answer (e.g. CNAME chains).

This commit removes the exception and now uses the same logic that
other stale lookups use to determine if we found a useful stale
answer (stale_found == true).

This means we can simplify two test cases in the serve-stale system
test: nodata.example is no longer treated differently than data.example.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
3d5429f61f Remove INSIST on NS_QUERYATTR_ANSWERED
The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response
twice. Without the attribute, this may happen if a staleonly lookup
found a useful answer and sends a response to the client, and later
recursion ends and also tries to send a response.

The attribute was also used to mask adding a duplicate RRset. This is
considered harmful. When we created a response to the client with a
stale only lookup (regardless if we actually have send the response),
we should clear the rdatasets that were added during that lookup.

Mark such rdatasets with the a new attribute,
DNS_RDATASETATTR_STALE_ADDED. Set a query attribute
NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale
only lookup. Before creating a response on a normal lookup, check if
we can expect rdatasets to have been added during a staleonly lookup.
If so, clear the rdatasets from the message with the attribute
DNS_RDATASETATTR_STALE_ADDED set.
2021-04-02 09:15:07 +02:00
Matthijs Mekking
48b0dc159b Simplify when to detach the client
With stale-answer-client-timeout, we may send a response to the client,
but we may want to hold on to the network manager handle, because
recursion is going on in the background, or we need to refresh a
stale RRset.

Simplify the setting of 'nodetach':
* During a staleonly lookup we should not detach the nmhandle, so just
  set it prior to 'query_lookup()'.
* During a staleonly "stalefirst" lookup set the 'nodetach' to true
  if we are going to refresh the RRset.

Now there is no longer the need to clear the 'nodetach' if we go
through the "dbfind_stale", "stale_refresh_window", or "stale_only"
paths.
2021-04-02 09:14:09 +02:00