When the buffer is allocated via isc_buffer_allocate() and the size is
smaller or equal ISC_BUFFER_STATIC_SIZE (currently 512 bytes), the
buffer will be allocated as a flexible array member in the buffer
structure itself instead of allocating it on the heap. This should help
when the buffer is used on the hot-path with small allocations.
When isc_buffer_t buffer is created with isc_buffer_allocate() assume
that we want it to always auto-reallocate instead of having an extra
call to enable auto-reallocation.
The Stream DNS implementation needs a peek methods that read the value
from the buffer, but it doesn't advance the current position. Add
isc_buffer_peekuintX methods, refactor the isc_buffer_{get,put}uintN
methods to modern integer types, and move the isc_buffer_getuintN to the
header as static inline functions.
The isc_buffer_reserve() would be passed a reference to the buffer
pointer, which was unnecessary as the pointer would never be changed
in the current implementation. Remove the extra dereference.
In case, we are trying to hash the empty key into the hashmap, the key
is going to have zero length. This might happen in the unit test.
Allow this and add a unit test to ensure the empty zero-length key
doesn't hash to slot 0 as SipHash 2-4 (our hash function of choice) has
no problem with zero-length inputs.
The only function left in the isc_resource API was setting the file
limit. Replace the whole unit with a simple getrlimit to check the
maximum value of RLIMIT_NOFILE and set the maximum back to rlimit_cur.
This is more compatible than trying to set RLIMIT_UNLIMITED on the
RLIMIT_NOFILE as it doesn't work on Linux (see man 5 proc on
/proc/sys/fs/nr_open), neither it does on Darwin kernel (see man 2
getrlimit).
The only place where the maximum value could be raised under privileged
user would be BSDs, but the `named_os_adjustnofile()` were not called
there before. We would apply the increased limits only on Linux and Sun
platforms.
The mechanism for associating a worker task to a database now
uses loops rather than tasks.
For this reason, the parameters to dns_cache_create() have been
updated to take a loop manager rather than a task manager.
The dns_cache API contained a cache cleaning mechanism that would be
disabled for 'rbt' based cache. As named doesn't have any other cache
implementations, remove the cache cleaning mechanism from dns_cache API.
The various factors like NS_PER_MS are now defined in a single place
and the names are no longer inconsistent. I chose the _PER_SEC names
rather than _PER_S because it is slightly more clear in isolation;
but the smaller units are always NS, US, and MS.
Since this is very sensitive code which has often had security
problems in many DNS implementations, it needs a decent amount of
validation. This fuzzer ensures that the new code has the same output
as the old code, and that it doesn't take longer than a second.
The benchmark uses the fuzzer's copy of the old dns_name_fromwire()
code to compare a number of scenarios: many compression pointers, many
labels, long labels, random data, with/without downcasing.
C does not make any guarantees about the value of padding in a
structure, so bytewise comparison of two semantically equal structures
with padding can be spuriously non-equal due to non-equal padding
bytes.
Compare each member of name.attributes individually to avoid this
problem.
There were a number of places where the zone table should have been
locked, but wasn't, when dns_zt_apply was called.
Added a isc_rwlocktype_t type parameter to dns_zt_apply and adjusted
all calls to using it. Removed locks in callers.
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:
1. It implements Robin Hood Hashing which is open-addressing hash table
algorithm (e.g. no linked-lists)
2. No memory allocations - the array to store the nodes is made of
isc_hashmap_node_t structures instead of just pointers, so there's
only allocation on resize.
3. The key is not copied into the hashmap node and must be also stored
externally, either as part of the stored value or in any other
location that's valid as long the value is stored in the hashmap.
This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
TLS DNS unit tests were sharing the port with TCP DNS tests by
mistake. That could have caused conflicts between the two, when
running the unit tests in parallel. This commit fixes that.
After the loop manager refactoring TCP DNS and TLS DNS unit tests
ended up broken.
The problem is that in these unit tests the code is written in such a
way that for establishing a new connection tcpdns_connect() and
tlsdns_connect() functions are used. However, in these tests as a
connection callback function connect_connect_cb() is used. The
function logic is responsible for determining the function for
establishing subsequent connection.
To do so, it called get_stream_connect_function() ... which can return
only tcp_connect() or tls_connect(), not tcpdns_connect() or
tlsdns_connect(). That is definitely *not* what was implied.
All this time the unit tests were testing something, but now what was
intended.
This commit fixes the problem by passing the tcpdns_connect() and
tlsdns_connect() function pointers to connect_connect_cb().
Because dns_resolver_createfetch() locks the view, it was necessary
to unlock the zone in zone_refreshkeys() before calling it in order
to maintain the lock order, and relock afterward. this permitted a race
with dns_zone_synckeyzone().
This commit moves the call to dns_resolver_createfetch() into a separate
function which is called asynchronously after the zone has been
unlocked.
The keyfetch object now attaches to the zone to ensure that
it won't be shut down before the asynchronous call completes.
This necessitated refactoring dns_zone_detach() so it always runs
unlocked. For managed zones it now schedules zone_shutdown() to
run asynchronously, and for unmanaged zones, it requires the last
dns_zone_detach() to be run without loopmgr running.
when more than one event was scheduled in the isc_aysnc queue,
they were executed in reverse order. we need to pull events
off the back of queue instead the front, so that uv_loop will
run them in the right order.
note that isc_job_run() has the same behavior, because it calls
uv_idle_start() directly. in that case we just document it so
it'll be less surprising in the future.
Instead of fixing a Coverity complaint (and other style nits),
delete it because it needs input data that can't be generated
with the tools that ship with BIND.
Currently the 'duration_test' unit test checks only the
cfg_obj_asduration() function.
Extend the test so it checks also the reverse operation using the
cfg_print_duration() function, which is used in named-checkconf.
The `render` benchmark loads some binary DNS message dumps and
repeatedly passes them to `dns_message_render`.
The `compress` benchmark loads a list of domain names and packs them
into 4KiB chunks using `dns_name_towire`.
Check that names are correctly added and deleted in the compression
context. Use many names with differing numerical prefixes to make it
relatively easy to identify and debug problems.
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.
This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.
Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.
There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.
A small benchmark suggests this is about 2x faster.
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
In ac4cc8443d, the ISC_R_CONNREFUSED was
removed in connect_read_cb, but it can actually happen in the udp_test:
[ RUN ] udp_recv_send
connect_read_cb(0x7f2c2801a270, connection refused, (nil))
The ISC_R_SHUTTINGDOWN should be handled the same as ISC_R_CANCELED in
the udp__send_cb(), as we might be sending the data while the
loopmgr/netmgr shutdown has been initiated.
In rare circumstances, the UDP port for the listening socket and the UDP
port for the connecting socket might be the same. Because we use the
"reuse" port socket option, this isn't caught when binding the socket,
and thus the connected client socket could send a datagram to itself,
completely bypassing the server. This doesn't happen under normal
operation mode because `named` is listening on a privileged port (53),
and even if not, it doesn't usually talk to itself as the tests do.
Pick an arbitrary port for listening (9153-9156) that is outside the
ephemeral port range for the network manager related unit tests (except
the `doh_test).
Since we are testing UDP on the localhost and the same interface, the
UDP datagrams can't get lost. Change the connect read callback, so it
starts reading again on the timeout instead of just getting stuck, and
fail when any other result codes than ISC_R_SUCCESS and ISC_R_TIMEDOUT
are received because we don't expect them to happen in these simple
tests.
This commit removes broken remnants of unit test slowdown logic, which
caused unit test hangs on platforms susceptible to "too many open
files" error, notably OpenBSD.
There was inconsistency in which error codes would get accepted and
ignored in the network manager unit test callbacks. Add following
results, so we just detach the handle instead of causing assertion
failure:
* ISC_R_SHUTTINGDOWN - when the network manager is shutting down
* ISC_R_CANCELED - the socket has been shut down
* ISC_R_EOF - the (TCP) communication has ended on the other side
* ISC_R_CONNECTIONRESET - the TCP connection was reset
This should fix some of the spurious unit test failures.
If sending took too long the isc_nm_read() could timeout twice, leading
to extra 'cread' counter in the udp_cancel_read test. Increase the
cread counter only on ISC_R_EOF (canceled read) and deal with the
multiple ISC_R_TIMEOUTS gracefully.
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments. Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
Coverity is optimistic that we might do thousands of hashes in less
than a microsecond.
/tests/bench/siphash.c: 54 in main()
48 count++;
49 }
50
51 isc_time_now_hires(&finish);
52
53 us = isc_time_microdiff(&finish, &start);
>>> CID 358309: Integer handling issues (DIVIDE_BY_ZERO)
>>> In expression "count * 1000UL / us", division by expression "us" which may be zero has undefined behavior.
54 printf("%f us wide-lower len %3zu, %7llu kh/s (%llx)\n",
55 (double)us / 1000000.0, len,
56 (unsigned long long)(count * 1000 / us),
57 (unsigned long long)sum);
58 }
59
Formerly, the isc_hash32() would have to change the key in a local copy
to make it case insensitive. Change the isc_siphash24() and
isc_halfsiphash24() functions to lowercase the input directly when
reading it from the memory and converting the uint8_t * array to
64-bit (respectively 32-bit numbers).
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7. It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
Previously, the isc_mem_debugging would be single global variable that
would affect the behavior of the memory context whenever it would be
changed which could be after some allocation were already done.
Change the memory debugging options to be local to the memory context
and immutable, so all allocations within the same memory context are
treated the same.
By bumping the minimum libuv version to 1.34.0, it allows us to remove
all libuv shims we ever had and makes the code much cleaner. The
up-to-date libuv is available in all distributions supported by BIND
9.19+ either natively or as a backport.
After the loopmgr work has been merged, we can now cleanup the TCP and
TLS protocols a little bit, because there are stronger guarantees that
the sockets will be kept on the respective loops/threads. We only need
asynchronous call for listening sockets (start, stop) and reading from
the TCP (because the isc_nm_read() might be called from read callback
again.
This commit does the following changes (they are intertwined together):
1. Cleanup most of the asynchronous events in the TCP code, and add
comments for the events that needs to be kept asynchronous.
2. Remove isc_nm_resumeread() from the netmgr API, and replace
isc_nm_resumeread() calls with existing isc_nm_read() calls.
3. Remove isc_nm_pauseread() from the netmgr API, and replace
isc_nm_pauseread() calls with a new isc_nm_read_stop() call.
4. Disable the isc_nm_cancelread() for the streaming protocols, only the
datagram-like protocols can use isc_nm_cancelread().
5. Add isc_nmhandle_close() that can be used to shutdown the socket
earlier than after the last detach. Formerly, the socket would be
closed only after all reading and sending would be finished and the
last reference would be detached. The new isc_nmhandle_close() can
be used to close the underlying socket earlier, so all the other
asynchronous calls would call their respective callbacks immediately.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
This change prepares ground for sending DNS requests using DoT,
which, in particular, will be used for forwarding dynamic updates
to TLS-enabled primaries.
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
In the udp_shutdown_read unit test, delay the isc_loopmgr_shutdown() to
the send callback, and in the udp_cancel_read test wait for a single
timed out test, then read again, send an UDP packet and cancel the read
from the send callback.