The client connection timeout was set to just one second, which might
not be enough on busy systems (and the CI machines are oh-boy-busy).
Bump the server timeouts to 10 seconds and client timeouts to 5 seconds,
this will make the unit test run a little bit longer, but it should be
more reliable.
Previously, the isccc_ccmsg_invalidate() was called from conn_free() and
this could lead to netmgr calling control_recvmessage() after we
detached the reading controlconnection_t reference, but it wouldn't be
the last reference because controlconnection_t is also attached/detached
when sending response or running command asynchronously.
Instead, move the isccc_ccmsg_invalidate() call to control_recvmessage()
error handling path to make sure that control_recvmessage() won't be
ever called again from the netmgr.
The controlconf channel runs single-threaded on the main thread.
Replace the listener->connections locking with check that we are still
running on the thread with TID 0.
The lock-file configuration (both from configuration file and -X
argument to named) has better alternatives nowadays. Modern process
supervisor should be used to ensure that a single named process is
running on a given configuration.
Alternatively, it's possible to wrap the named with flock(1).
This is first step in removing the lock-file configuration option, it
marks both the `lock-file` configuration directive and -X option to
named as deprecated.
When -X is used the 'lock-file' option change detection condition
is invalid, because it compares the 'lock-file' option's value to
the '-X' argument's value instead of the older 'lock-file' option
value (which was ignored because of '-X').
Don't warn about changing 'lock-file' option if '-X' is used.
It is obvious that the '!cfg_obj_asstring(obj)' check should be
'cfg_obj_asstring(obj)' instead, because it is an AND logic chain
which further uses 'obj' as a string.
Fix the error.
When 'lock-file <lockfile>' is used in configuration at the same time
as using '-X none' in 'named' invocation, there is an invalid
logic that would lead to a isc_mem_strdup() call on a NULL value.
Also, contradicting to ARM, 'lock-file none' is overriding the '-X'
argument.
Fix the overall logic, and make sure that the '-X' takes precedence to
'lock-file'.
When 'lock-file <lockfile1>' was used in configuration at the same time
as using `-X <lockfile2>` in `named` invocation, there was an invalid
logic that would lead to a double isc_mem_strdup() call on the
<lockfile2> value.
Skip the second allocation if `lock-file` is being used in
configuration, so the <lockfile2> is used only single time.
All changes in this commit were automated using the command:
shfmt -w -i 2 -ci -bn . $(find . -name "*.sh.in")
By default, only *.sh and files without extension are checked, so
*.sh.in files have to be added additionally. (See mvdan/sh#944)
The Userspace-RCU variants other than membarrier is untested and at
least in QSBR case it's broken. Allow changing the Userspace-RCU
variant only in the developer's mode.
The `.reader` member of dns_qpmulti_t was accessed without RCU
protection; reader_open() calls rcu_dereference() on it, and this
call needs to be inside an RCU critical section.
A similar problem was identified in the dns_qpmulti_snapshot() - the
RCU critical section was completely missing.
These are relicts of the isc_qsbr - in the QSBR mode the rcu_read_lock()
and rcu_read_unlock() are no-ops and whole event loop is a critical section.
When named fails to starts due to not being able to obtain
a lock on the lock file that lock file should remain. Check
that the lock file exists before and after the attempt to
start a second instance of named.
The lock file was being removed when we hadn't successfully locked
it which defeated the purpose of the lockfile. Adjust cleanup_lockfile
such that it only unlinks the lockfile if we have successfully locked
the lockfile and it is still active (lockfile != NULL).
Do a light refactoring and cleanups that replaces common list walking
patterns with ISC_LIST_FOREACH macros and split some nested loops into
separate static functions to reduce the nesting depth.
Add complementary macros to ISC_LIST_FOREACH(_SAFE) that walk the lists
in reverse.
* ISC_LIST_FOREACH_REV(list, elt, link) - walk the static list from
tail to head
* ISC_LIST_FOREACH_REV_SAFE(list, elt, link, next) - walk the list
from tail to head in a manner that's safe against list member
deletions
There was a lot of internal code looking like this:
INSIST(dns_rdataset_isassociated(rdataset));
dns_rdataset_disassociated(rdataset)
isc_mempool_put(msg->rdspool, rdataset);
Deduplicate the code into local dns__message_puttemprdataset() routine,
and drop the INSIST() which is checked in dns_rdataset_disassociate().
Refactor the dispatch unit test to use more local variables (previously
dispatchmgr, dispatch and dispentry were all global), and add two new
tests:
* dispatch_getcp - test whether the TCP connection will get reused
* dispatch_newtcp - test that the TCP connection will not get reused
when DNS_DISPATCHOPT_UNSHARED is in effect
The current dispatch code could reuse the TCP connection when
dns_dispatch_gettcp() would be used first. This is problematic as the
dns_resolver doesn't use TCP connection sharing, but dns_request could
get the TCP stream that was created outside of the dns_request.
Add new DNS_DISPATCHOPT_UNSHARED option to dns_dispatch_createtcp() that
would prevent the TCP stream to be reused. Use that option in the
dns_resolver call to dns_dispatch_createtcp() to prevent dns_request
from reusing the TCP connections created by dns_resolver.
Additionally, the dns_xfrin unit added TCP connection sharing for
incoming transfers. While interleaving *xfr streams on a TCP connection
should work this should be a deliberate change and be property of the
server that can be controlled. Additionally some level of parallel TCP
streams is desirable. Revert to the old behaviour by removing the
dns_dispatch_gettcp() calls from dns_xfrin and use the new option to
prevent from sharing the transfer streams with dns_request.
The xfrin_recv_done() was accessing xfr->result where we stored the
result of the offloaded work from a thread that could receive data while
processing the transfer on the offloaded thread.
Completely remove the offloaded result from the dns_xfrin_t structure
and keep it local for *xfr_apply() and *xfr_apply_done() as the failure
is already recorded in .shutdown_result and we now that the processing
has failed because .shuttingdown has been already set.
When using sd_notify(3) to send a message to the service manager
about named being reloaded, systemd also requires the MONOTONIC_USEC
field to be set to the current monotonic time in microseconds,
otherwise the 'systemctl reload' command fails.
Add the MONOTONIC_USEC field to the message.
See 'man 5 systemd.service' for more information.
The dns__catz_update_cb() does not expect that 'catzs->zones'
can become NULL during shutdown.
Add similar checks in the dns__catz_update_cb() and dns_catz_zone_get()
functions to protect from such a case. Also add an INSIST in the
dns_catz_zone_add() function to explicitly state that such a case
is not expected there, because that function is called only during a
reconfiguration.