Commit Graph

10682 Commits

Author SHA1 Message Date
Petr Špaček
d26d4f289f Generate JUnit reports for unit & system tests
This allows Gitlab to show nice summary for individual tests/test
directories and to expose the results in Gitlab API for consumption
elsewhere.

A catch: As of Gitlab 14.7.7, the detailed results are stored
only in artifacts and thus expire. All consumers (including API) need
to be "fast enough" to get the data before they disappear.
This also forces us to always store the artifacts intead of storing them
only on failure.
2022-04-06 21:14:38 +02:00
Tony Finch
71ce8b0a51 Ensure that dns_request_createvia() has a retry limit
There are a couple of problems with dns_request_createvia(): a UDP
retry count of zero means unlimited retries (it should mean no
retries), and the overall request timeout is not enforced. The
combination of these bugs means that requests can be retried forever.

This change alters calls to dns_request_createvia() to avoid the
infinite retry bug by providing an explicit retry count. Previously,
the calls specified infinite retries and relied on the limit implied
by the overall request timeout and the UDP timeout (which did not work
because the overall timeout is not enforced). The `udpretries`
argument is also changed to be the number of retries; previously, zero
was interpreted as infinity because of an underflow to UINT_MAX, which
appeared to be a mistake. And `mdig` is updated to match the change in
retry accounting.

The bug could be triggered by zone maintenance queries, including
NOTIFY messages, DS parental checks, refresh SOA queries and stub zone
nameserver lookups. It could also occur with `nsupdate -r 0`.
(But `mdig` had its own code to avoid the bug.)
2022-04-06 17:12:48 +01:00
Tony Finch
5867c1b727 Make notify test shellcheck clean
Use POSIX shell syntax, and use functions to reduce repetition.
2022-04-06 17:12:08 +01:00
Artem Boldariev
8bec4a6bf6 Extend the doth system test
This commit adds simple checks that the TLS contexts in question are
indeed being updated on DoT and DoH listeners.
2022-04-06 18:45:57 +03:00
Ondřej Surý
7e71c4d0cc Rename the configuration option to load balance sockets to reuseport
After some back and forth, it was decidede to match the configuration
option with unbound ("so-reuseport"), PowerDNS ("reuseport") and/or
nginx ("reuseport").
2022-04-06 17:03:57 +02:00
Aram Sargsyan
5b2b3e589c Fix using unset pointer when printing a debug message in dighost.c
The used `query->handle` is always `NULL` at this point.

Change the code to use `handle` instead.
2022-04-05 11:20:42 +00:00
Aram Sargsyan
2771a5b64d Add a missing clear_current_lookup() call in recv_done()
The error code path handling the `ISC_R_CANCELED` code lacks a
`clear_current_lookup()` call, without which dig hangs indefinitely
when handling the error.

Add the missing call to account for all references of the lookup so
it can be destroyed.
2022-04-05 11:20:42 +00:00
Aram Sargsyan
f831e758d1 When using +qr in dig print the data of the current query
In `send_udp()` and `launch_next_query()` functions, when calling
`dighost_printmessage()` to print detailed information about the
sent query, dig always prints the data of the first query in the
lookup's queries list.

The first query in the list can be already finished, having its handles
freed, and accessing this information results in assertion failure.

Print the current query's information instead.
2022-04-05 11:20:41 +00:00
Mark Andrews
56fbed2f0f Add regression test for CVE-2022-0635 2022-04-05 09:54:45 +02:00
Mark Andrews
9ef4d2b583 Use multiple fixed expressions for portable grep usage
Additionally add "network unreachable" as an expected error message.
2022-04-05 03:55:13 +00:00
Ondřej Surý
142c63dda8 Enable the load-balance-sockets configuration
Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private
netmgr-int.h header file, making the configuration of load balanced
sockets inoperable.

Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add
missing isc_nm_getloadbalancesockets() implementation.
2022-04-05 01:30:58 +02:00
Ondřej Surý
85c6e797aa Add option to configure load balance sockets
Previously, the option to enable kernel load balancing of the sockets
was always enabled when supported by the operating system (SO_REUSEPORT
on Linux and SO_REUSEPORT_LB on FreeBSD).

It was reported that in scenarios where the networking threads are also
responsible for processing long-running tasks (like RPZ processing, CATZ
processing or large zone transfers), this could lead to intermitten
brownouts for some clients, because the thread assigned by the operating
system might be busy.  In such scenarious, the overall performance would
be better served by threads competing over the sockets because the idle
threads can pick up the incoming traffic.

Add new configuration option (`load-balance-sockets`) to allow enabling
or disabling the load balancing of the sockets.
2022-04-04 23:10:04 +02:00
Aram Sargsyan
7e2f50c369 Fix dig hanging issue in cases when the lookup's next query can't start
In recv_done(), when dig decides to start the lookup's next query in
the line using `start_udp()` or `start_tcp()`, and for some reason,
no queries get started, dig doesn't cancel the lookup.

This can occur, for example, when there are two queries in the lookup,
one with a regular IP address, and another with a IPv4 mapped IPv6
address. When the regular IP address fails to serve the query, its
`recv_done()` callback starts the next query in the line (in this
case the one with a mapped IP address), but because `dig` doesn't
connect to such IP addresses, and there are no other queries in the
list, no new queries are being started, and the lookup keeps hanging.

After calling `start_udp()` or `start_tcp()` in `recv_done()`, check
if there are no pending/working queries then cancel the lookup instead
of only detaching from the current query.
2022-04-04 09:15:56 +00:00
Ondřej Surý
ae01ec2823 Don't use reference counting in isc_timer unit
The reference counting and isc_timer_attach()/isc_timer_detach()
semantic are actually misleading because it cannot be used under normal
conditions.  The usual conditions under which is timer used uses the
object where timer is used as argument to the "timer" itself.  This
means that when the caller is using `isc_timer_detach()` it needs the
timer to stop and the isc_timer_detach() does that only if this would be
the last reference.  Unfortunately, this also means that if the timer is
attached elsewhere and the timer is fired it will most likely be
use-after-free, because the object used in the timer no longer exists.

Remove the reference counting from the isc_timer unit, remove
isc_timer_attach() function and rename isc_timer_detach() to
isc_timer_destroy() to better reflect how the API needs to be used.

The only caveat is that the already executed event must be destroyed
before the isc_timer_destroy() is called because the timer is no longet
attached to .ev_destroy_arg.
2022-04-02 01:23:15 +02:00
Ondřej Surý
30e0fd942b Remove task privileged mode
Previously, the task privileged mode has been used only when the named
was starting up and loading the zones from the disk as the "first" thing
to do.  The privileged task was setup with quantum == 2, which made the
taskmgr/netmgr spin around the privileged queue processing two events at
the time.

The same effect can be achieved by setting the quantum to UINT_MAX (e.g.
practically unlimited) for the loadzone task, hence the privileged task
mode was removed in favor of just processing all the events on the
loadzone task in a single task_run().
2022-04-01 23:55:26 +02:00
Ondřej Surý
2bc7303af2 Use isc_nm_getnworkers to manage zone resources
Instead of passing the number of worker to the dns_zonemgr manually,
get the number of nm threads using the new isc_nm_getnworkers() call.

Additionally, remove the isc_pool API and manage the array of memory
context, zonetasks and loadtasks directly in the zonemgr.
2022-04-01 23:50:34 +02:00
Ondřej Surý
2707d0eeb7 Set hard thread affinity for each zone
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.

Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
2022-04-01 23:50:34 +02:00
Ondřej Surý
8b4ba366dd Remove the zone counting in the named
The zone counting in the named was used to properly size the zonemgr
resources (memory contexts, zonetasks and loadtasks).  Since this is no
longer the case, remove the whole zone counting from named.
2022-04-01 23:50:34 +02:00
Ondřej Surý
a94678ff77 Create per-thread task and memory context for zonemgr
Previously, the zonemgr created 1 task per 100 zones and 1 memory
context per 1000 zones (with minimum 10 tasks and 2 memory contexts) to
reduce the contention between threads.

Instead of reducing the contention by having many resources, create a
per-nm_thread memory context, loadtask and zonetask and spread the zones
between just per-thread resources.

Note: this commit alone does decrease performance when loading the zone
by couple seconds (in case of 1M zone) and thus there's more work in
this whole MR fixing the performance.
2022-04-01 23:50:34 +02:00
Evan Hunt
5319d8adea fix resolver test when built without --enable-querytrace
a test case in the 'resolver' system test was reliant on
logged output that would only be present when query tracing
was enabled, as in developer builds. that test case is now
disabled when query tracing is not available. Thanks to
Anton Castelli.
2022-04-01 09:54:44 -07:00
Aram Sargsyan
4477f71868 Synchronze udp_ready() and tcp_connected() functions entry behavior
The `udp_ready()` and `tcp_connected()` functions in dighost.c are
used for similar purposes for UDP and TCP respectively.

Synchronize the `udp_ready()` function entry code to behave like
`tcp_connected()` by adding input validation, debug messages and
early exit code when `cancel_now` is `true`.
2022-04-01 10:56:27 +00:00
Aram Sargsyan
7d360bd05e Fix "dig +nssearch" indefinitely hanging issue
When finishing the NSSEARCH task and there is no more followup
lookups to start, dig does not destroy the last lookup, which
causes it to hang indefinitely.

Rename the unused `first_pass` member of `dig_query_t` to `started`
and make it `true` in the first callback after `start_udp()` or
`start_tcp()` of the query to indicate that the query has been
started.

Create a new `check_if_queries_done()` function to check whether
all of the queries inside a lookup have been started and finished,
or canceled.

Use the mentioned function in the TRACE code block in `recv_done()`
to check whether the current query is the last one in the lookup and
cancel the lookup in that case to free the resources.
2022-04-01 10:56:27 +00:00
Evan Hunt
bd814b79d4 add a system test for $GENERATE with an integer overflow
the line "$GENERATE 19-28/2147483645 $ CNAME x" should generate
a single CNAME with the owner "19.example.com", but prior to the
overflow bug it generated several CNAMEs, half of them with large
negative values.

we now test for the bugfix by using "named-checkzone -D" and
grepping for a single CNAME in the output.
2022-04-01 07:56:52 +00:00
Evan Hunt
2261c853b5 update shell syntax
clean up the shell syntax in the checkzone test prior to adding
a new test.
2022-04-01 07:56:52 +00:00
Tony Finch
84c4eb02e7 Log "not authoritative for update zone" more clearly
Ensure the update zone name is mentioned in the NOTAUTH error message
in the server log, so that it is easier to track down problematic
update clients. There are two cases: either the update zone is
unrelated to any of the server's zones (previously no zone was
mentioned); or the update zone is a subdomain of one or more of the
server's zones (previously the name of the irrelevant parent zone was
misleadingly logged).

Closes #3209
2022-03-30 12:50:30 +01:00
Ondřej Surý
3a650d973f Remove isc_appctx_t use in dns_client
The use of isc_appctx_t in dns_client was used to wait for
dns_client_startresolve() to finish the processing (the resolve_done()
task callback).

This has been replaced with standard bool+cond+lock combination removing
the need of isc_appctx_t altogether.
2022-03-29 14:14:49 -07:00
Tony Finch
29a3e77425 MacOS needs more IP addresses to run the system tests
The launchd script only counted up to 8 whereas ifconfig.sh went all
the way up to 10, and even a bit further than that.
2022-03-29 16:59:19 +01:00
Ondřej Surý
b05a991ad0 Make isc_ht optionally case insensitive
Previously, the isc_ht API would always take the key as a literal input
to the hashing function.  Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.
2022-03-28 15:02:18 -07:00
Ondřej Surý
4dceab142d Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE()
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE().  Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
2022-03-28 23:26:08 +02:00
Artem Boldariev
cfea9a3aec Extend the 'doth' system test with Strict/Mutual TLS checks
This commit extends the 'doth' system test with a set of Strict/Mutual
TLS related checks.

This commit also makes each doth NS instance use its own TLS
certificate that includes FQDN, IPv4, and IPv6 addresses, issued using
a common Certificate Authority, instead of ad-hoc certs.

Extend servers initialisation timeout to 60 seconds to improve the
tests stability in the CI as certain configurations could fail to
initialise on time under load.
2022-03-28 16:22:53 +03:00
Artem Boldariev
7b9318bf72 Add missing plain HTTP options to dig's help output
A couple of dig options were missing in the help output, while been
properly documented and supported. This commit fixes this overlook.
2022-03-28 16:22:53 +03:00
Artem Boldariev
57f0251713 Add support for Strict/Mutual TLS into BIND
This commit adds support for Strict/Mutual TLS into BIND. It does so
by implementing the backing code for 'hostname' and 'ca-file' options
of the 'tls' statement. The commit also updates the documentation
accordingly.
2022-03-28 16:22:53 +03:00
Artem Boldariev
89d7059103 Restore disabled unused 'tls' options: 'ca-file' and 'hostname'
This commit restores the 'tls' options disabled in
78b73d0865.
2022-03-28 16:22:53 +03:00
Artem Boldariev
fd38a4e1bf Add support for Strict/Mutual TLS to dig
This commit adds support for Strict/Mutual TLS to dig.

The new command-line options and their behaviour are modelled after
kdig (+tls-ca, +tls-hostname, +tls-certfile, +tls-keyfile) for
compatibility reasons. That is, using +tls-* is sufficient to enable
DoT in dig, implying +tls-ca

If there is no other DNS transport specified via command-line,
specifying any of +tls-* options makes dig use DoT. In this case, its
behaviour is the same as if +tls-ca is specified: that is, the remote
peer's certificate is verified using the platform-specific
intermediate CA certificates store. This behaviour is introduced for
compatibility with kdig.
2022-03-28 16:22:53 +03:00
Aram Sargsyan
9b84bfb5f4 Cleanup the code to remove unnecessary indentation
Because of the "goto" in the "if" body the "else" part is unnecessary
and adds another level of indentation.

Cleanup the code to not have the "else" part.
2022-03-28 10:17:56 +00:00
Aram Sargsyan
d29e5f197b Log a warning when catz is told to modify a zone not added by catz
Catz logs a warning message when it is told to modify a zone which was
not added by the current catalog zone.

When logging a warning, distinguish the two cases when the zone
was not added by a catalog zone at all, and when the zone was
added by a different catalog zone.
2022-03-28 10:17:56 +00:00
Aram Sargsyan
e861224cf4 Fix invalid function name in the error log
The current function's name in one of the error logs in
catz_addmodzone_taskaction() function is invalid.

Fix the name.
2022-03-28 10:17:56 +00:00
Tony Finch
c38a323082 Teach dnssec-settime to read times that it writes
The dnssec-settime -p and -up options print times in asctime() and
UNIX time_t formats, respectively. The asctime() format can also be
found inside K*.key public key files. Key files also contain times in
the YYYYMMDDHHMMSS format that can be used in timing parameter
options.

The dnssec-settime -p and -up time formats are now acceptable in
timing parameter options to dnssec-settime and dnssec-keygen, so it is
no longer necessary to parse key files to retrieve times that are
acceptable in timing parameter options.
2022-03-25 16:05:43 +01:00
Ondřej Surý
9de10cd153 Remove extrahandle size from netmgr
Previously, it was possible to assign a bit of memory space in the
nmhandle to store the client data.  This was complicated and prevents
further refactoring of isc_nmhandle_t caching (future work).

Instead of caching the data in the nmhandle, allocate the hot-path
ns_client_t objects from per-thread clientmgr memory context and just
assign it to the isc_nmhandle_t via isc_nmhandle_set().
2022-03-25 10:38:35 +01:00
Ondřej Surý
81fdc4a822 Remove UNREACHABLE() statements after exit()
Couple of UNREACHABLE() statements following exit() were found and
removed.
2022-03-25 09:25:11 +01:00
Ondřej Surý
20f0936cf2 Remove use of the inline keyword used as suggestion to compiler
Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline.  As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete.  The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.

Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler

NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
2022-03-25 08:33:43 +01:00
Ondřej Surý
04d0b70ba2 Replace ISC_NORETURN with C11's noreturn
C11 has builtin support for _Noreturn function specifier with
convenience noreturn macro defined in <stdnoreturn.h> header.

Replace ISC_NORETURN macro by C11 noreturn with fallback to
__attribute__((noreturn)) if the C11 support is not complete.
2022-03-25 08:33:43 +01:00
Ondřej Surý
584f0d7a7e Simplify way we tag unreachable code with only ISC_UNREACHABLE()
Previously, the unreachable code paths would have to be tagged with:

    INSIST(0);
    ISC_UNREACHABLE();

There was also older parts of the code that used comment annotation:

    /* NOTREACHED */

Unify the handling of unreachable code paths to just use:

    UNREACHABLE();

The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
2022-03-25 08:33:43 +01:00
Ondřej Surý
fe7ce629f4 Add FALLTHROUGH macro for __attribute__((fallthrough))
Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which
is explicit version of the /* FALLTHROUGH */ comment we are currently
using.

Add and apply FALLTHROUGH macro that uses the attribute if available,
but does nothing on older compilers.

In one case (lib/dns/zone.c), using the macro revealed that we were
using the /* FALLTHROUGH */ comment in wrong place, remove that comment.
2022-03-25 08:33:43 +01:00
Matthijs Mekking
d6d107d804 Save keyfromlabel error output
Save the error output from pkcs11-tool and dnssec-keyfromlabel in the
engine_pkcs11 system test.
2022-03-21 10:11:02 +01:00
Ondřej Surý
d70daa29f7 Make netmgr the authority on number of threads running
Instead of passing the "workers" variable back and forth along with
passing the single isc_nm_t instance, add isc_nm_getnworkers() function
that returns the number of netmgr threads are running.

Change the ns_interfacemgr and ns_taskmgr to utilize the newly acquired
knowledge.
2022-03-18 21:53:28 +01:00
Tony Finch
599c1d2a6b Avoid using C99 variable length arrays
From an attacker's point of view, a VLA declaration is essentially a
primitive for performing arbitrary arithmetic on the stack pointer. If
the attacker can control the size of a VLA they have a very powerful
tool for causing memory corruption.

To mitigate this kind of attack, and the more general class of stack
clash vulnerabilities, C compilers insert extra code when allocating a
VLA to probe the growing stack one page at a time. If these probes hit
the stack guard page, the program will crash.

From the point of view of a C programmer, there are a few things to
consider about VLAs:

  * If it is important to handle allocation failures in a controlled
    manner, don't use VLAs. You can use VLAs if it is OK for
    unreasonable inputs to cause an uncontrolled crash.

  * If the VLA is known to be smaller than some known fixed size,
    use a fixed size array and a run-time check to ensure it is large
    enough. This will be more efficient than the compiler's stack
    probes that need to cope with arbitrary-size VLAs.

  * If the VLA might be large, allocate it on the heap. The heap
    allocator can allocate multiple pages in one shot, whereas the
    stack clash probes work one page at a time.

Most of the existing uses of VLAs in BIND are in test code where they
are benign, but there was one instance in `named`, in the GSS-TSIG
verification code, which has now been removed.

This commit adjusts the style guide and the C compiler flags to allow
VLAs in test code but not elsewhere.
2022-03-18 15:11:48 +00:00
Aram Sargsyan
03697f1bcc Add various dig/host tests for TCP/UDP socket error handling cases
Rework the "ans8" server in the "digdelv" system test to support various
modes of operations using a control channel.

The supported modes are:

1. `silent` (do not respond)
2. `close` (UDP: same as `silent`; TCP: also close the connection)
3. `servfail` (always respond with `SERVFAIL`)
4. `unstable` (constantly switch between `silent` and `servfail`)

Add multiple tests to check the handling of both TCP and UDP socket
error scenarios in dig/host.
2022-03-18 10:28:19 +00:00
Aram Sargsyan
0fb4fc1897 Fix dig error when trying the next server after a TCP connection failure
When encountering a TCP connection error while trying to initiate a
connection to a server, dig erroneously cancels the lookup even when
there are other server(s) to try, which results in an assertion failure.

Cancel the lookup only when there are no more queries left in the
lookup's queries list (i.e. `next` is NULL).
2022-03-18 10:28:19 +00:00
Aram Sargsyan
e8a64d0cbe Add digdelv system test to check that dig tries other servers on error
Add a test to check whether dig tries the next query/server after
a connection error.

Add a test to check whether dig tries the next query/server after
a one or more (default is 3) connection/request timeouts.
2022-03-18 09:12:23 +00:00