bind9

Author	SHA1	Message	Date
Ondřej Surý	6cf6de55bc	Prevent the double xfrin_fail() call When we are reading from the xfrin socket, and the transfer would be shutdown, the shutdown function would call `xfrin_fail()` which in turns calls `xfrin_cancelio()` that causes the read callback to be invoked with `ISC_R_CANCELED` status code and that caused yet another `xfrin_fail()` call. The fix here is to ensure the `xfrin_fail()` would be run only once properly using better synchronization on xfr->shuttingdown flag.	2021-04-20 14:12:26 +02:00
Ondřej Surý	25d27851d8	Fix lock-order-inversion (potential deadlock) in dns_resolver_createfetch There's a lock-order-inversion when running `zone_maintenance()` from the timer while shutting down the server `shutdown_server()`. This only happens when the taskmgr scheduling is more relaxed and paralellized, but the issue is real nevertheless. The associated ThreadSanitizer warning: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph: M1 (0x000000000001) => M2 (0x000000000000) => M1 Mutex M2 acquired here while holding mutex M1 in thread T1: #0 pthread_mutex_lock <null> #1 dns_view_findzonecut lib/dns/view.c:1326:2 #2 fctx_create lib/dns/resolver.c:5144:13 #3 dns_resolver_createfetch lib/dns/resolver.c:10977:12 #4 zone_refreshkeys lib/dns/zone.c:10830:13 #5 zone_maintenance lib/dns/zone.c:11065:5 #6 zone_timer lib/dns/zone.c:14652:2 #7 task_run lib/isc/task.c:857:5 #8 isc_task_run lib/isc/task.c:944:10 #9 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24 #10 process_netievent lib/isc/netmgr/netmgr.c #11 process_queue lib/isc/netmgr/netmgr.c:885:8 #12 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10 #13 process_queues lib/isc/netmgr/netmgr.c:772:7 #14 async_cb lib/isc/netmgr/netmgr.c:671:2 #15 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5 #16 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 #17 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5 #18 nm_thread lib/isc/netmgr/netmgr.c:597:11 #19 isc__trampoline_run lib/isc/trampoline.c:184:11 Mutex M1 previously acquired by the same thread here: #0 pthread_mutex_lock <null> #1 zone_refreshkeys lib/dns/zone.c:10717:2 #2 zone_maintenance lib/dns/zone.c:11065:5 #3 zone_timer lib/dns/zone.c:14652:2 #4 task_run lib/isc/task.c:857:5 #5 isc_task_run lib/isc/task.c:944:10 #6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24 #7 process_netievent lib/isc/netmgr/netmgr.c #8 process_queue lib/isc/netmgr/netmgr.c:885:8 #9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10 #10 process_queues lib/isc/netmgr/netmgr.c:772:7 #11 async_cb lib/isc/netmgr/netmgr.c:671:2 #12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5 #13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 #14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5 #15 nm_thread lib/isc/netmgr/netmgr.c:597:11 #16 isc__trampoline_run lib/isc/trampoline.c:184:11 Mutex M1 acquired here while holding mutex M2 in thread T2: #0 pthread_mutex_lock <null> #1 dns_zone_flush lib/dns/zone.c:11443:2 #2 view_flushanddetach lib/dns/view.c:657:5 #3 dns_view_flushanddetach lib/dns/view.c:690:2 #4 shutdown_server bin/named/server.c:10056:4 #5 task_run lib/isc/task.c:857:5 #6 isc_task_run lib/isc/task.c:944:10 #7 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24 #8 process_netievent lib/isc/netmgr/netmgr.c #9 process_queue lib/isc/netmgr/netmgr.c:885:8 #10 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10 #11 process_queues lib/isc/netmgr/netmgr.c:772:7 #12 async_cb lib/isc/netmgr/netmgr.c:671:2 #13 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5 #14 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 #15 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5 #16 nm_thread lib/isc/netmgr/netmgr.c:597:11 #17 isc__trampoline_run lib/isc/trampoline.c:184:11 Mutex M2 previously acquired by the same thread here: #0 pthread_mutex_lock <null> #1 view_flushanddetach lib/dns/view.c:645:3 #2 dns_view_flushanddetach lib/dns/view.c:690:2 #3 shutdown_server bin/named/server.c:10056:4 #4 task_run lib/isc/task.c:857:5 #5 isc_task_run lib/isc/task.c:944:10 #6 isc__nm_async_task lib/isc/netmgr/netmgr.c:730:24 #7 process_netievent lib/isc/netmgr/netmgr.c #8 process_queue lib/isc/netmgr/netmgr.c:885:8 #9 process_tasks_queue lib/isc/netmgr/netmgr.c:756:10 #10 process_queues lib/isc/netmgr/netmgr.c:772:7 #11 async_cb lib/isc/netmgr/netmgr.c:671:2 #12 uv__async_io /home/ondrej/Projects/tsan/libuv/src/unix/async.c:163:5 #13 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 #14 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:392:5 #15 nm_thread lib/isc/netmgr/netmgr.c:597:11 #16 isc__trampoline_run lib/isc/trampoline.c:184:11 Thread T2 (running) created by main thread at: #0 pthread_create <null> #1 isc_thread_create lib/isc/pthreads/thread.c:79:8 #2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3 #3 create_managers bin/named/main.c:957:15 #4 setup bin/named/main.c:1267:11 #5 main bin/named/main.c:1558:2 Thread T2 (running) created by main thread at: #0 pthread_create <null> #1 isc_thread_create lib/isc/pthreads/thread.c:79:8 #2 isc_nm_start lib/isc/netmgr/netmgr.c:303:3 #3 create_managers bin/named/main.c:957:15 #4 setup bin/named/main.c:1267:11 #5 main bin/named/main.c:1558:2 SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) in __interceptor_pthread_mutex_lock	2021-04-19 22:29:14 +02:00
Ondřej Surý	16fe0d1f41	Cleanup the public vs private ISCAPI remnants Since all the libraries are internal now, just cleanup the ISCAPI remnants in isc_socket, isc_task and isc_timer APIs. This means, there's one less layer as following changes have been done: * struct isc_socket and struct isc_socketmgr have been removed * struct isc__socket and struct isc__socketmgr have been renamed to struct isc_socket and struct isc_socketmgr * struct isc_task and struct isc_taskmgr have been removed * struct isc__task and struct isc__taskmgr have been renamed to struct isc_task and struct isc_taskmgr * struct isc_timer and struct isc_timermgr have been removed * struct isc__timer and struct isc__timermgr have been renamed to struct isc_timer and struct isc_timermgr * All the associated code that dealt with typing isc_<foo> to isc__<foo> and back has been removed.	2021-04-19 13:18:24 +02:00
Ondřej Surý	0127ba6472	Fix task timing race in setnsec3param() When setnsec3param() is schedule from zone_postload() there's no guarantee that `zone->db` is not `NULL` yet. Thus when the setnsec3param() is called, we need to check for `zone->db` existence and reschedule the task, because calling `rss_post()` on a zone with empty `.db` ends up with no-op (the function just returns).	2021-04-19 11:16:51 +02:00
Ondřej Surý	3388ef36b3	Cleanup the isc_<>mgr_createinc() constructors Previously, the taskmgr, timermgr and socketmgr had a constructor variant, that would create the mgr on top of existing appctx. This was no longer true and isc_<>mgr was just calling isc_<*>mgr_create() directly without any extra code. This commit just cleans up the extra function.	2021-04-19 10:22:56 +02:00
Mark Andrews	eadb829dac	properly initialise resarg->lock	2021-04-19 14:32:40 +10:00
Evan Hunt	d0ec7d1f33	move samples/resolve.c to bin/tests/system "resolve" is used by the resolver system tests, and I'm not certain whether delv exercises the same code, so rather than remove it, I moved it to bin/tests/system.	2021-04-16 14:29:43 +02:00
Evan Hunt	056afe7bdc	remove sample-async sample code for export libraries is no longer needed and this code is not used for any internal tests. also, sample-gai.c had already been removed but there were some dangling references.	2021-04-16 14:29:43 +02:00
Evan Hunt	568d455c99	rename dns_client_createx() to dns_client_create() there's no longer a need to use an alternate name.	2021-04-16 14:29:43 +02:00
Evan Hunt	1beb05f3e2	remove dns_client_request() and related code continues the cleanup of dns_client started in the previous commit.	2021-04-16 14:29:43 +02:00
Evan Hunt	fb2a352e7c	remove dns_client_update() and related code the libdns client API is no longer being maintained for external use, we can remove the code that isn't being used internally, as well as the related tests.	2021-04-16 14:29:43 +02:00
Ondřej Surý	55b942b4a0	Refactor dns_journal_rollforward() to work over opened journal Too much logic was cramped inside the dns_journal_rollforward() that made it harder to follow. The dns_journal_rollforward() was refactored to work over already opened journal and some of the previous logic was moved to new static zone_journal_rollforward() that separates the journal "rollforward" logic from the "zone" logic.	2021-04-16 12:04:06 +02:00
Mark Andrews	ec7a9af381	Fixing a recoverable journal should not result in the zone being written when dns_journal_rollforward returned ISC_R_RECOVERABLE the distintion between 'up to date' and 'success' was lost, as a consequence zone_needdump() was called writing out the zone file when it shouldn't have been. This change restores that distintion. Adjust system test to reflect visible changes.	2021-04-16 11:15:46 +02:00
Artem Boldariev	66432dcd65	Handle a situation when SSL shutdown messages were sent and received It fixes a corner case which was causing dig to print annoying messages like: 14-Apr-2021 18:48:37.099 SSL error in BIO: 1 TLS error (errno: 0). Arguments: received_data: (nil), send_data: (nil), finish: false even when all the data was properly processed.	2021-04-15 15:49:36 +03:00
Artem Boldariev	513cdb52ec	TLS: try to close TCP socket descriptor earlier when possible Before this fix underlying TCP sockets could remain opened for longer than it is actually required, causing unit tests to fail with lots of ISC_R_TOOMANYOPENFILES errors. The change also enables graceful SSL shutdown (before that it would happen only in the case when isc_nm_cancelread() were called).	2021-04-15 15:49:36 +03:00
Ondřej Surý	202b1d372d	Merge the tls_test.c into netmgr_test.c and extend the tests suite This commit merges TLS tests into the common Network Manager unit tests suite and extends the unit test framework to include support for additional "ping-pong" style tests where all data could be sent via lesser number of connections (the behaviour of the old test suite). The tests for TCP and TLS were extended to make use of the new mode, as this mode better translates to how the code is used in DoH. Both TLS and TCP tests now share most of the unit tests' code, as they are expected to function similarly from a users's perspective anyway. Additionally to the above, the TLS test suite was extended to include TLS tests using the connections quota facility.	2021-04-15 15:49:36 +03:00
Matthijs Mekking	8fcbef2423	Small refactor lib/dns/zone.c Introduce some macros that can be reused in 'zone_load_soa_rr()' and 'zone_get_from_db()' to make those functions more readable.	2021-04-13 11:26:26 +02:00
Matthijs Mekking	032110bd2e	Use designated initializer in dns_zone_create Shorten the code and make it less prone to initialisation errors (it is still easy to forget adding an initializer, but it now defaults to 0).	2021-04-13 11:26:26 +02:00
Matthijs Mekking	9af8caa733	Implement draft-vandijk-dnsop-nsec-ttl The draft says that the NSEC(3) TTL must have the same TTL value as the minimum of the SOA MINIMUM field and the SOA TTL. This was always the intended behaviour. Update the zone structure to also track the SOA TTL. Whenever we use the MINIMUM value to determine the NSEC(3) TTL, use the minimum of MINIMUM and SOA TTL instead. There is no specific test for this, however two tests need adjusting because otherwise they failed: They were testing for NSEC3 records including the TTL. Update these checks to use 600 (the SOA TTL), rather than 3600 (the SOA MINIMUM).	2021-04-13 11:26:26 +02:00
Matthijs Mekking	a83c8cb0af	Use stale TTL as RRset TTL in dumpdb It is more intuitive to have the countdown 'max-stale-ttl' as the RRset TTL, instead of 0 TTL. This information was already available in a comment "; stale (will be retained for x more seconds", but Support suggested to put it in the TTL field instead.	2021-04-13 09:48:20 +02:00
Matthijs Mekking	debee6157b	Check staleness in bind_rdataset Before binding an RRset, check the time and see if this record is stale (or perhaps even ancient). Marking a header stale or ancient happens only when looking up an RRset in cache, but binding an RRset can also happen on other occasions (for example when dumping the database). Check the time and compare it to the header. If according to the time the entry is stale, but not ancient, set the STALE attribute. If according to the time is ancient, set the ANCIENT attribute. We could mark the header stale or ancient here, but that requires locking, so that's why we only compare the current time against the rdh_ttl. Adjust the test to check the dump-db before querying for data. In the dumped file the entry should be marked as stale, despite no cache lookup happened since the initial query.	2021-04-13 09:48:20 +02:00
Matthijs Mekking	2a5e0232ed	Fix nonsensical stale TTL values in cache dump When introducing change 5149, "rndc dumpdb" started to print a line above a stale RRset, indicating how long the data will be retained. At that time, I thought it should also be possible to load a cache from file. But if a TTL has a value of 0 (because it is stale), stale entries wouldn't be loaded from file. So, I added the 'max-stale-ttl' to TTL values, and adjusted the $DATE accordingly. Since we actually don't have a "load cache from file" feature, this is premature and is causing confusion at operators. This commit changes the 'max-stale-ttl' adjustments. A check in the serve-stale system test is added for a non-stale RRset (longttl.example) to make sure the TTL in cache is sensible. Also, the comment above stale RRsets could have nonsensical values. A possible reason why this may happen is when the RRset was marked a stale but the 'max-stale-ttl' has passed (and is actually an RRset awaiting cleanup). This would lead to the "will be retained" value to be negative (but since it is stored in an uint32_t, you would get a nonsensical value (e.g. 4294362497). To mitigate against this, we now also check if the header is not ancient. In addition we check if the stale_ttl would be negative, and if so we set it to 0. Most likely this will not happen because the header would already have been marked ancient, but there is a possible race condition where the 'rdh_ttl + serve_stale_ttl' has passed, but the header has not been checked for staleness.	2021-04-13 09:48:20 +02:00
Michał Kępień	d954e152d9	Free resources when gss_accept_sec_context() fails Even if a call to gss_accept_sec_context() fails, it might still cause a GSS-API response token to be allocated and left for the caller to release. Make sure the token is released before an early return from dst_gssapi_acceptctx().	2021-04-08 10:33:44 +02:00
Mark Andrews	0fbdf189c7	Rewrite managed-key journal immediately Both managed keys and regular zone journals need to be updated immediately when a recoverable error is discovered.	2021-04-07 20:23:46 +02:00
Mark Andrews	83310ffd92	Update dns_journal_compact() to handle bad transaction headers Previously, dns_journal_begin_transaction() could reserve the wrong amount of space. We now check that the transaction is internally consistent when upgrading / downgrading a journal and we also handle the bad transaction headers.	2021-04-07 20:23:46 +02:00
Mark Andrews	520509ac7e	Compute transaction size based on journal/transaction type previously the code assumed that it was a new transaction.	2021-04-07 20:20:57 +02:00
Mark Andrews	5a6112ec8f	Use journal_write_xhdr() to write the dummy transaction header Instead of journal_write(), use correct format call journal_write_xhdr() to write the dummy transaction header which looks at j->header_ver1 to determine which transaction header to write instead of always writing a zero filled journal_rawxhdr_t header.	2021-04-07 20:18:44 +02:00
Artem Boldariev	8da12738f1	Use T_CONNECT timeout constant for TCP tests (instead of 1 ms) The netmgr_test would be failing on heavily loaded systems because the connection timeout was set to 1 ms. Use the global constant instead.	2021-04-07 15:37:10 +02:00
Ondřej Surý	72ef5f465d	Refactor async callbacks and fix the double tlsdnsconnect callback The isc_nm_tlsdnsconnect() call could end up with two connect callbacks called when the timeout fired and the TCP connection was aborted, but the TLS handshake was not complete yet. isc__nm_connecttimeout_cb() forgot to clean up sock->tls.pending_req when the connect callback was called with ISC_R_TIMEDOUT, leading to a second callback running later. A new argument has been added to the isc__nm__failed_connect_cb and isc__nm__failed_read_cb functions, to indicate whether the callback needs to run asynchronously or not.	2021-04-07 15:36:59 +02:00
Ondřej Surý	58e75e3ce5	Skip long tls_tests in the CI We already skip most of the recv_send tests in CI because they are too timing-related to be run in overloaded environment. This commit adds a similar change to tls_test before we merge tls_test into netmgr_test.	2021-04-07 15:36:59 +02:00
Artem Boldariev	340235c855	Prevent short TLS tests from hanging in case of errors The tests in tls_test.c could hang in the event of a connect error. This commit allows the tests to bail out when such an error occurs.	2021-04-07 15:36:59 +02:00
Evan Hunt	426c40c96d	rearrange nm_teardown() to check correctness after shutting down if a test failed at the beginning of nm_teardown(), the function would abort before isc_nm_destroy() or isc_tlsctx_free() were reached; we would then abort when nm_setup() was run for the next test case. rearranging the teardown function prevents this problem.	2021-04-07 15:36:59 +02:00
Ondřej Surý	86f4872dd6	isc_nm_connect() always return via callback The isc_nm_connect() functions were refactored to always return the connection status via the connect callback instead of sometimes returning the hard failure directly (for example, when the socket could not be created, or when the network manager was shutting down). This commit changes the connect functions in all the network manager modules, and also makes the necessary refactoring changes in places where the connect functions are called.	2021-04-07 15:36:59 +02:00
Evan Hunt	a70cd026df	move UDP connect retries from dig into isc_nm_udpconnect() dig previously ran isc_nm_udpconnect() three times before giving up, to work around a freebsd bug that caused connect() to return a spurious transient EADDRINUSE. this commit moves the retry code into the network manager itself, so that isc_nm_udpconnect() no longer needs to return a result code.	2021-04-07 15:36:59 +02:00
Ondřej Surý	ca12e25bb0	Use generic functions for reading and timers in TCP The TCP module has been updated to use the generic functions from netmgr.c instead of its own local copies. This brings the module mostly up to par with the TCPDNS and TLSDNS modules.	2021-04-07 15:36:59 +02:00
Ondřej Surý	7df8c7061c	Fix and clean up handling of connect callbacks Serveral problems were discovered and fixed after the change in the connection timeout in the previous commits: * In TLSDNS, the connection callback was not called at all under some circumstances when the TCP connection had been established, but the TLS handshake hadn't been completed yet. Additional checks have been put in place so that tls_cycle() will end early when the nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call. * In TCP, TCPDNS and TLSDNS, new connections would be established even when the network manager was shutting down. The new call isc__nm_closing() has been added and is used to bail out early even before uv_tcp_connect() is attempted.	2021-04-07 15:36:59 +02:00
Ondřej Surý	5a87c7372c	Make it possible to recover from connect timeouts Similarly to the read timeout, it's now possible to recover from ISC_R_TIMEDOUT event by restarting the timer from the connect callback. The change here also fixes platforms that missing the socket() options to set the TCP connection timeout, by moving the timeout code into user space. On platforms that support setting the connect timeout via a socket option, the timeout has been hardcoded to 2 minutes (the maximum value of tcp-initial-timeout).	2021-04-07 15:36:58 +02:00
Ondřej Surý	33c00c281f	Make it possible to recover from read timeouts Previously, when the client timed out on read, the client socket would be automatically closed and destroyed when the nmhandle was detached. This commit changes the logic so that it's possible for the callback to recover from the ISC_R_TIMEDOUT event by restarting the timer. This is done by calling isc_nmhandle_settimeout(), which prevents the timeout handling code from destroying the socket; instead, it continues to wait for data. One specific use case for multiple timeouts is serve-stale - the client socket could be created with shorter timeout (as specified with stale-answer-client-timeout), so we can serve the requestor with stale answer, but keep the original query running for a longer time.	2021-04-07 15:36:58 +02:00
Ondřej Surý	0aad979175	Disable netmgr tests only when running under CI The full netmgr test suite is unstable when run in CI due to various timing issues. Previously, we enabled the full test suite only when CI_ENABLE_ALL_TESTS environment variable was set, but that went against original intent of running the full suite when an individual developer would run it locally. This change disables the full test suite only when running in the CI and the CI_ENABLE_ALL_TESTS is not set.	2021-04-07 15:36:58 +02:00
Diego Fronza	6e08307bc8	Resolve TSAN data race in zone_maintenance Fix race between zone_maintenance and dns_zone_notifyreceive functions, zone_maintenance was attempting to read a zone flag calling DNS_ZONE_FLAG(zone, flag) while dns_zone_notifyreceive was updating a flag in the same zone calling DNS_ZONE_SETFLAG(zone, ...). The code reading the flag in zone_maintenance was not protected by the zone's lock, to avoid a race the zone's lock is now being acquired before an attempt to read the zone flag is made.	2021-04-07 12:04:01 +00:00
Michał Kępień	6bdd55a9b3	Enforce a run time limit on unit test binaries When a unit test binary hangs, the GitLab CI job in which it is run is stuck until its run time limit is exceeded. Furthermore, it is not trivial to determine which test(s) hung in a given GitLab CI job based on its log. To prevent these issues, enforce a run time limit on every binary executed by the lib/unit-test-driver.sh script. Use a timeout of 5 minutes for consistency with older BIND 9 branches, which employed Kyua for running unit tests. Report an exit code of 124 when the run time limit is exceeded for a unit test binary, for consistency with the "timeout" tool included in GNU coreutils.	2021-04-07 11:41:45 +02:00
Artem Boldariev	ee10948e2d	Remove dead code which was supposed to handle TLS shutdowns nicely Fixes Coverity issue CID 330954 (See #2612).	2021-04-07 11:21:08 +03:00
Artem Boldariev	e6062210c7	Handle buggy situations with SSL_ERROR_SYSCALL See "BUGS" section at: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL AND errno == 0 it means that underlying transport layer returned EOF prematurely. However, we are managing the transport ourselves, so we should just resume reading from the TCP socket. It seems that this case has been handled properly on modern versions of OpenSSL. That being said, the situation goes in line with the manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be returned not only in a case of low-level errors (like system call failures).	2021-04-07 11:21:08 +03:00
Mark Andrews	9c28df2204	remove lib/dns/gen when running 'make clean'	2021-04-07 08:06:49 +10:00
Matthijs Mekking	3d3a6415f7	If RPZ config'd, bail stale-answer-client-timeout When we are recursing, RPZ processing is not allowed. But when we are performing a lookup due to "stale-answer-client-timeout", we are still recursing. This effectively means that RPZ processing is disabled on such a lookup. In this case, bail the "stale-answer-client-timeout" lookup and wait for recursion to complete, as we we can't perform the RPZ rewrite rules reliably.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	839df94190	Rename "staleonly" The dboption DNS_DBFIND_STALEONLY caused confusion because it implies we are looking for stale data only and ignore any active RRsets in the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear the option is related to a lookup due to "stale-answer-client-timeout". Rename other usages of "staleonly", instead use "lookup due to...". Also rename related function and variable names.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3f81d79ffb	Restore the RECURSIONOK attribute after staleonly When doing a staleonly lookup we don't want to fallback to recursion. After all, there are obviously problems with recursion, otherwise we wouldn't do a staleonly lookup. When resuming from recursion however, we should restore the RECURSIONOK flag, allowing future required lookups for this client to recurse.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	aaed7f9d8c	Remove result exception on staleonly lookup When implementing "stale-answer-client-timeout", we decided that we should only return positive answers prematurely to clients. A negative response is not useful, and in that case it is better to wait for the recursion to complete. To do so, we check the result and if it is not ISC_R_SUCCESS, we decide that it is not good enough. However, there are more return codes that could lead to a positive answer (e.g. CNAME chains). This commit removes the exception and now uses the same logic that other stale lookups use to determine if we found a useful stale answer (stale_found == true). This means we can simplify two test cases in the serve-stale system test: nodata.example is no longer treated differently than data.example.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3d5429f61f	Remove INSIST on NS_QUERYATTR_ANSWERED The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response twice. Without the attribute, this may happen if a staleonly lookup found a useful answer and sends a response to the client, and later recursion ends and also tries to send a response. The attribute was also used to mask adding a duplicate RRset. This is considered harmful. When we created a response to the client with a stale only lookup (regardless if we actually have send the response), we should clear the rdatasets that were added during that lookup. Mark such rdatasets with the a new attribute, DNS_RDATASETATTR_STALE_ADDED. Set a query attribute NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale only lookup. Before creating a response on a normal lookup, check if we can expect rdatasets to have been added during a staleonly lookup. If so, clear the rdatasets from the message with the attribute DNS_RDATASETATTR_STALE_ADDED set.	2021-04-02 09:15:07 +02:00
Matthijs Mekking	48b0dc159b	Simplify when to detach the client With stale-answer-client-timeout, we may send a response to the client, but we may want to hold on to the network manager handle, because recursion is going on in the background, or we need to refresh a stale RRset. Simplify the setting of 'nodetach': * During a staleonly lookup we should not detach the nmhandle, so just set it prior to 'query_lookup()'. * During a staleonly "stalefirst" lookup set the 'nodetach' to true if we are going to refresh the RRset. Now there is no longer the need to clear the 'nodetach' if we go through the "dbfind_stale", "stale_refresh_window", or "stale_only" paths.	2021-04-02 09:14:09 +02:00

1 2 3 4 5 ...

13135 Commits