Similar to notify, add code to send and keep track of checkds requests.
On every zone_rekey event, we will check the DS at parental agents
(but we will only actually query parental agents if theree is a DS
scheduled to be published/withdrawn).
On a zone_rekey event, we will first clear the ongoing checkds requests.
Reset the counter, to avoid continuing KSK rollover premature.
This has the risk that if zone_rekey events happen too soon after each
other, there are redundant DS queries to the parental agents. But
if TTLs and the configured durations in the dnssec-policy are sane (as
in not ridiculous short) the chance of this happening is low.
This code gathers DNSSEC keys from key files and from the DNSKEY RRset.
It is used for the 'rndc dnssec -status' command, but will also be
needed for "checkds". Turn it into a function.
Change the static function 'get_ksk_zsk' to a library function that
can be used to determine the role of a dst_key. Add checks if the
boolean parameters to store the role are not NULL. Rename to
'dst_key_role'.
Add checks for "parental-agents" configuration, checking for the option
being at wrong type of zone (only allowed for primaries and
secondaries), duplicate definitions, duplicate references, and
undefined parental clauses (the name referenced in the zone clause
does not have a matching "parental-agent" clause).
When performing the 'setnsec3param' task, zones that are not loaded will have
their task rescheduled. We should do this only if the zone load is still
pending, this prevents zones that failed to load get stuck in a busy wait and
causing a hang on shutdown.
The Makefile.tests was modifying global AM_CFLAGS and LDADD and could
accidentally pull /usr/include to be listed before the internal
libraries, which is known to cause problems if the headers from the
previous version of BIND 9 has been installed on the build machine.
In DNS Flag Day 2020, we started setting the DF (Don't Fragment socket
option on the UDP sockets. It turned out, that this code was incomplete
leading to dropping the outgoing UDP packets.
This has been now remedied, so it is possible to disable the
fragmentation on the UDP sockets again as the sending error is now
handled by sending back an empty response with TC (truncated) bit set.
This reverts commit 66eefac78c.
When the fragmentation is disabled on UDP sockets, the uv_udp_send()
call can fail with UV_EMSGSIZE for messages larger than path MTU.
Previously, this error would end with just discarding the response. In
this commit, a proper handling of such case is added and on such error,
a new DNS response with truncated bit set is generated and sent to the
client.
This change allows us to disable the fragmentation on the UDP
sockets again.
This commit adds a unittest that tests private rdataset_getownercase()
and rdataset_setownercase() methods from rbtdb.c. The test setups
minimal mock dns_rbtdb_t and dns_rbtdbnode_t data structures.
As the rbtdb methods are generally hidden behind layers and layers, we
include the "rbtdb.c" directly from rbtdb_test.c, and thus we can use
the private methods and data structures directly. This also opens up
opportunity to add more unittest for the rbtdb private functions without
going through all the layers.
In the code that rdataset_setownercase() and rdataset_getownercase() we
now use tolower()/toupper()/isupper() functions appropriately instead of
rolling our own code.
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection. An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket. This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner. Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
When locking key files for a zone, we iterate over all the views and
lock a mutex inside the zone structure. However, if we envounter an
in-view zone, we will try to lock the key files twice, one time for
the home view and one time for the in-view view. This will lead to
a deadlock because one thread is trying to get the same lock twice.
When "max-cache-size" is changed to "unlimited" (or "0") for a running
named instance (using "rndc reconfig"), the hash table size limit for
each affected cache DB is not reset to the maximum possible value,
preventing those hash tables from being allowed to grow as a result of
new nodes being added.
Extend dns_rbt_adjusthashsize() to interpret "size" set to 0 as a signal
to remove any previously imposed limits on the hash table size. Adjust
API documentation for dns_db_adjusthashsize() accordingly. Move the
call to dns_db_adjusthashsize() from dns_cache_setcachesize() so that it
also happens when "size" is set to 0.
Upon creation, each dns_rbt_t structure has its "maxhashbits" field
initialized to the value of the RBT_HASH_MAX_BITS preprocessor macro,
i.e. 32. When the dns_rbt_adjusthashsize() function is called for the
first time for a given RBT (for cache RBTs, this happens when they are
first created, i.e. upon named startup), it lowers the value of the
"maxhashbits" field to the number of bits required to index the
requested number of hash table slots. When a larger hash table size is
subsequently requested, the value of the "maxhashbits" field should be
increased accordingly, up to RBT_HASH_MAX_BITS. However, the loop in
the rehash_bits() function currently ensures that the number of bits
necessary to index the resized hash table will not be larger than
rbt->maxhashbits instead of RBT_HASH_MAX_BITS, preventing the hash table
from being grown once the "maxhashbits" field of a given dns_rbt_t
structure is set to any value lower than RBT_HASH_MAX_BITS.
Fix by tweaking the loop guard condition in the rehash_bits() function
so that it compares the new number of bits used for indexing the hash
table against RBT_HASH_MAX_BITS rather than rbt->maxhashbits.
The requirements for BIND 9.17+ now requires C11 support from the
compiler, so we can safely drop most of the stdatomic.h shims from
lib/isc/unix/include/stdatomic.h.
This commit removes support for clang atomic builtins (clang >= 3.6.0
includes stdatomic.h header) and for Gcc __sync builtins.
The only compatibility shim that remains is support for __atomic
builtins for Gcc >= 4.7.0 since CentOS 7 still includes only Gcc 4.8.1
and the proper stdatomic.h header was only introduced in Gcc >= 4.9.
The warning was produced by an ASAN build:
runtime error: null pointer passed as argument 2, which is declared to
never be null
This commit fixes it by checking if nghttp2_session_mem_send() has
actually returned anything.
This change sets the mentioned fields properly and gets rid of klusges
added in the times when we were keeping pointers to isc_sockaddr_t
instead of copies. Among other things it helps to avoid a situation
when garbage instead of an address appears in dig output.
We cannot use DoH for zone transfers. According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6). This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.
As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."
Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.
Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
Support a situation in header processing callback when client side
code could receive a belated response or part of it. That could
happen when the HTTP/2 session was already closed, but there were some
response data from server in flight. Other client-side nghttp2
callbacks code already handled this case.
The bug became apparent after HTTP/2 write buffering was supported,
leading to rare unit test failures.
This commit ensures that sock->h2.connect.cstream gets nullified when
the object in question is deleted. This fixes a nasty crash in dig
exposed when receiving large responses leading to double free()ing.
Also, it refactors how the client-side code keeps track of client
streams (hopefully) preventing from similar errors appearing in the
future.
This commit makes NM code to report HTTP as a stream protocol. This
makes it possible to handle large responses properly. Like:
dig +https @127.0.0.1 A cmts1-dhcp.longlines.com
When answering a query requires wildcard expansion, the AUTHORITY
section of the response needs to include NSEC(3) record(s) proving that
the QNAME does not exist.
When a response to a query is an insecure delegation, the AUTHORITY
section needs to include an NSEC(3) proof that no DS record exists at
the parent side of the zone cut.
These two conditions combined trip up the NSEC part of the logic
contained in query_addds(), which expects the NS RRset to be owned by
the first name found in the AUTHORITY section of a delegation response.
This may not always be true, for example if wildcard expansion causes an
NSEC record proving QNAME nonexistence to be added to the AUTHORITY
section before the delegation is added to the response. In such a case,
named incorrectly omits the NSEC record proving nonexistence of QNAME
from the AUTHORITY section.
The same block of code is affected by another flaw: if the same NSEC
record proves nonexistence of both the QNAME and the DS record at the
parent side of the zone cut, this NSEC record will be added to the
AUTHORITY section twice.
Fix by looking for the NS RRset in the entire AUTHORITY section and
adding the NSEC record to the delegation using query_addrrset() (which
handles duplicate RRset detection).
Instead of checking the value of the variable modified two lines earlier
(the number of SOA records present at the apex of the old version of the
zone), one of the RUNTIME_CHECK() assertions in zone_postload() checks
the number of SOA records present at the apex of the new version of the
zone, which is already checked before. Fix the assertion by making it
check the correct variable.
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.
We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
When named restarts, it will examine signed zones and checks if the
current denial of existence strategy matches the dnssec-policy. If not,
it will schedule to create a new NSEC(3) chain.
However, on startup the zone database may not be read yet, fooling
BIND that the denial of existence chain needs to be created. This
results in a replacement of the previous NSEC(3) chain.
Change the code such that if the NSEC3PARAM lookup failed (the result
did not return in ISC_R_SUCCESS or ISC_R_NOTFOUND), we will try
again later. The nsec3param structure has additional variables to
signal if the lookup is postponed. We also need to save the signal
if an explicit resalt was requested.
In addition to the two added boolean variables, we add a variable to
store the NSEC3PARAM rdata. This may have a yet to be determined salt
value. We can't create the private data yet because there may be a
mismatch in salt length and the NULL salt value.
When we rewrote the zone dumping to use the separate threadpool, the
dumping would acquire the read lock for the whole time the zone dumping
process is dumping the zone.
When combined with incoming IXFR that tries to acquire the write lock on
the same rwlock, we would end up blocking all the other readers.
In this commit, we pause the dbiterator every time we get next record
and before start dumping it to the disk.
While cleaning up the usage of HAVE_UV_<func> macros, we forgot to
cleanup the HAVE_UV_UDP_CONNECT in the actual code and
HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail
on uv_udp_send() because the socket was already connected and we were
falsely assuming that it was not.
The platforms with autoconf support were not affected, because we were
still checking for the functions from the configure.
This commit adds the ability to consolidate HTTP/2 write requests if
there is already one in flight. If it is the case, the code will
consolidate multiple subsequent write request into a larger one
allowing to utilise the network in a more efficient way by creating
larger TCP packets as well as by reducing TLS records overhead (by
creating large TLS records instead of multiple small ones).
This optimisation is especially efficient for clients, creating many
concurrent HTTP/2 streams over a transport connection at once. This
way, the code might create a small amount of multi-kilobyte requests
instead of many 50-120 byte ones.
In fact, it turned out to work so well that I had to add a work-around
to the code to ensure compatibility with the flamethrower, which, at
the time of writing, does not support TLS records larger than two
kilobytes. Now the code tries to flush the write buffer after 1.5
kilobyte, which is still pretty adequate for our use case.
Essentially, this commit implements a recommendation given by nghttp2
library:
https://nghttp2.org/documentation/nghttp2_session_mem_send.html
Add a call to posix_fadvise() to indicate to the kernel, that `named`
won't be needing the dumped zone files any time soon with:
* POSIX_FADV_DONTNEED - The specified data will not be accessed in the
near future.
Notes:
POSIX_FADV_DONTNEED attempts to free cached pages associated with the
specified region. This is useful, for example, while streaming large
files. A program may periodically request the kernel to free cached
data that has already been used, so that more useful cached pages are
not discarded instead.
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing. With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.
This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.
The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.