The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.
The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.
Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.
To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.
For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
If too many versions of log / dnstap files to be saved where requests
the memory after to_keep could be overwritten. Force the number of
versions to be saved to a save level. Additionally the memmove length
was incorrect.
When "rndc reconfig" is run, named first configures a fresh set of views
and then tears down the old views. Consider what happens for a single
view with LMDB enabled; "envA" is the pointer to the LMDB environment
used by the original/old version of the view, "envB" is the pointer to
the same LMDB environment used by the new version of that view:
1. mdb_env_open(envA) is called when the view is first created.
2. "rndc reconfig" is called.
3. mdb_env_open(envB) is called for the new instance of the view.
4. mdb_env_close(envA) is called for the old instance of the view.
This seems to have worked so far. However, an upstream change [1] in
LMDB which will be part of its 0.9.26 release prevents the above
sequence of calls from working as intended because the locktable mutexes
will now get destroyed by the mdb_env_close() call in step 4 above,
causing any subsequent mdb_txn_begin() calls to fail (because all of the
above steps are happening within a single named process).
Preventing the above scenario from happening would require either
redesigning the way we use LMDB in BIND, which is not something we can
easily backport, or redesigning the way BIND carries out its
reconfiguration process, which would be an even more severe change.
To work around the problem, set MDB_NOLOCK when calling mdb_env_open()
to stop LMDB from controlling concurrent access to the database and do
the necessary locking in named instead. Reuse the view->new_zone_lock
mutex for this purpose to prevent the need for modifying struct dns_view
(which would necessitate library API version bumps). Drop use of
MDB_NOTLS as it is made redundant by MDB_NOLOCK: MDB_NOTLS only affects
where LMDB reader locktable slots are stored while MDB_NOLOCK prevents
the reader locktable from being used altogether.
[1] 2fd44e3251
The ThreadSanitizer found a data race when updating the stale header.
Instead of trying to acquire the write lock and failing occasionally
which would skew the statistics, the dns_rdatasetheader_t.attributes
field has been promoted to use stdatomics. Updating the attributes in
the mark_header_ancient() and mark_header_stale() now uses the cmpxchg
to update the attributes forfeiting the need to hold the write lock on
the tree. Please note that mark_header_ancient() still needs to hold
the lock because .dirty is being updated in the same go.
The stdatomic shims for non-C11 compilers (Windows, old gcc, ...) and
mutexatomic implemented only and minimal subset of the atomic types.
This commit adds 16-bit operations for Windows and all atomic types as
defined in standard.
it is now an error to have two primaries lists with the same
name. this is true regardless of whether the "primaries" or
"masters" keywords were used to define them.
as "type primary" is preferred over "type master" now, it makes
sense to make "primaries" available as a synonym too.
added a correctness check to ensure "primaries" and "masters"
cannot both be used in the same zone.
We erroneously tried to destroy a socket after issuing
isc__nm_tcp{,dns}_close. Under some (race) circumstances we could get
nm_socket_cleanup to be called twice for the same socket, causing an
access to a dead memory.
There's a possibility of race in isc__nm_tcpconnect if the asynchronous
connect operation finishes with all the callbacks before we exit the
isc__nm_tcpconnect itself we might access an already freed memory.
Fix it by creating an additional reference to the socket freed at the
end of isc__nm_tcpconnect.
When we're coming back from recursion fetch_callback does not accept
DNS_R_NXDOMAIN as an rcode - query_gotanswer calls query_nxdomain in
which an assertion fails on qctx->is_zone. Yet, under some
circumstances, qname minimization will return an DNS_R_NXDOMAIN - when
root zone mirror is not yet loaded. The fix changes the DNS_R_NXDOMAIN
answer to DNS_R_SERVFAIL.
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
Implement the 'rndc dnssec -status' command that will output
some information about the key states, such as which policy is
used for the zone, what keys are in use, and when rollover is
scheduled.
Add loose testing in the kasp system test, the actual times are
already tested via key file inspection.
I'd like to use the same functionality (pretty print the datetime
of keytime metadata) in the 'rndc dnssec -status' command. So it is
better that this logic is done in a separate function.
Since the stdtime.c code have differernt files for unix and win32,
I think the "#ifdef WIN32" define can be dropped.
The wait until zones are signed after rndc reconfig is broken
because the zones are already signed before the reconfig. Fix
by having a different way to ensure the signing of the zone is
complete. This does require a call to the "wait_for_done_signing"
function after each "check_keys" call after the ns6 reconfig.
The "wait_for_done_signing" looks for a (newly added) debug log
message that named will output if it is done signing with a certain
key.
isc__nm_tcpdns_send() was not asynchronous and accessed socket
internal fields in an unsafe manner, which could lead to a race
condition and subsequent crash. Fix it by moving tcpdns processing
to a proper netmgr thread.
We need to mark the socket as inactive early (and synchronously)
in the stoplistening process; otherwise we might destroy the
callback argument before we actually stop listening, and call
the callback on bad memory.
Assign and then check node for NULL to address another thread
changing radix->head in the meantime.
Move 'node != NULL' check into while loop test to silence cppcheck
false positive.
Fix pointer != NULL style.
The `rndc` argument was always overridden by the static configuration,
because the logic for handling the number of dnstap files to retain
was both backwards and a bit redundant.
these keywords were added to the parser as synonyms for "master"
and "slave" but were never hooked in to the configuration of named,
so they were ignored. this has been fixed and the option is now
checked for correctness.
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.
when isc_nm_destroy() is called, there's a loop that waits for
other references to be detached, pausing and unpausing the netmgr
to ensure that all the workers' events are run, followed by a
1-second sleep. this caused a delay on shutdown which will be
noticeable when netmgr is used in tools other than named itself,
so the delay has now been reduced to a hundredth of a second.
the isc_nm_tcpconnect() function establishes a client connection via
TCP. once the connection is esablished, a callback function will be
called with a newly created network manager handle.
A TCPDNS socket creates a handle for each complete DNS message.
Previously, when all the handles were disconnected, the socket
would be closed, but the wrapped TCP socket might still have
more to read.
Now, when a connection is established, the TCPDNS socket creates
a reference to itself by attaching itself to sock->self. This
reference isn't cleared until the connection is closed via
EOF, timeout, or server shutdown. This allows the socket to remain
open even when there are no active handles for it.
- isc__nmhandle_get() now attaches to the sock in the nmhandle object.
the caller is responsible for dereferencing the original socket
pointer when necessary.
- tcpdns listener sockets attach sock->outer to the outer tcp listener
socket. tcpdns connected sockets attach sock->outerhandle to the handle
for the tcp connected socket.
- only listener sockets need to be attached/detached directly. connected
sockets should only be accessed and reference-counted via their
associated handles.
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
RBTDB node can now appear on the deadnodes lists following the changes
to decrement_reference in 176b23b6cd to
defer checking of node->down when the tree write lock is not held. The
node should be unlinked instead.
NS_CLIENT_TCP_BUFFER_SIZE was 2 byte too large following the
move to netmgr add associated changes to lib/ns/client.c and
as a result an INSIST could be trigger if the DNS message being
constructed had a checkpoint stage that fell in those two extra
bytes. Adjusted NS_CLIENT_TCP_BUFFER_SIZE and cleaned up
client_allocsendbuf now that the previously reserved 2 bytes
are no longer used.
The ThreadSanitizer uses system synchronization primitives to check for
data race. The netmgr handle->references was missing acquire memory
barrier before resetting and reusing the memory occupied by isc_nmhandle_t.
- clone keynode->dsset rather than return a pointer so that thread
use is independent of each other.
- hold a reference to the dsset (keynode) so it can't be deleted
while in use.
- create a new keynode when removing DS records so that dangling
pointers to the deleted records will not occur.
- use a rwlock when accessing the rdatalist to prevent instabilities
when DS records are added.
There's a possibility of a race in TCP accepting code:
T1 accepts a connection C1
T2 accepts a connection C2
T1 tries to accept a connection C3, but we hit a quota,
isc_quota_cb_init() sets quota_accept_cb for the socket,
we return from accept_connection
T2 drops C2, but we race in quota_release with accepting C3 so
we don't see quota->waiting is > 0, we don't launch the callback
T1 accepts a connection C4, we are able to get the quota we clear
the quota_accept_cb from sock->quotacb
T1 drops C1, tries to call the callback which is zeroed, sigsegv.