In the NSSEARCH followup lookup, when one of the queries fails to be
set up (UDP) or connected (TCP), DiG doesn't start the next query.
This is a mistake, because in NSSEARCH mode the queries are independent
and DiG shouldn't stop the lookup process just because setting up (or
connecting to) one of the name servers returns an error code in the
`udp_ready()` or `tcp_connected()` callbacks.
Write a new `nssearch_next()` function which takes care of starting the
next query in NSSEARCH mode, so it can be used in several places without
code repetition.
Make sure that the `udp_ready()` and `tcp_connected()` functions call
`nssearch_next()` in case they won't be calling `send_udp()` and
`send_tcp()` respectively, because in that case the `send_done()`
callback, which usually does the job, won't be called.
Refactor `send_done()` to use the newly written `nssearch_next()`
function.
In the NSSEARCH followup lookup, when one of the queries fails to be
sent, DiG doesn't start the next query. This is a mistake, because in
NSSEARCH mode the queries are independent and DiG shouldn't stop the
lookup process just because sending a query to one of the name servers
returns an error code.
Restructure the `send_done()` function to unconditionally send the next
query in NSSEARCH mode, if it exists.
DiG implements different logic in the `recv_done()` callback function
when processing a failure:
1. For a timed-out query it applies the "retries" logic first, then,
when it fails, fail-overs to the next server.
2. For an EOF (end-of-file, or unexpected disconnect) error it tries to
make a single retry attempt (even if the user has requested more
retries), then, when it fails, fail-overs to the next server.
3. For other types of failures, DiG does not apply the "retries" logic,
and tries to fail-over to the next servers (again, even if the user
has requested to make retries).
Simplify the logic and apply the same logic (1) of first retries, and
then fail-over, for different types of failures in `recv_done()`.
When the `send_done()` callback function gets called with a failure
result code, DiG erroneously cancels the lookup.
Stop canceling the lookup and give DiG a chance to retry the failed
query, or fail-over to another server, using the logic implemented in
the `recv_done()` callback function.
When the `udp_ready()` callback function gets called with a failure
result code, DiG erroneously cancels the lookup.
Copy the logic behind `tcp_connected()` callback function into
`udp_ready()` so that DiG will now retry the failed query (if retries
are enabled) and then, if it fails again, it will fail-over to the next
server in the list, which synchronizes the behavior between TCP and UDP
modes.
Also, `udp_ready()` was calling `lookup_detach()` without calling
`lookup_attach()` first, but the issue was masked behind the fact
that `clear_current_lookup()` wasn't being called when needed, and
`lookup_detach()` was compensating for that. This also has been fixed.
Instead of returning error values from isc_rwlock_*(), isc_mutex_*(),
and isc_condition_*() macros/functions and subsequently carrying out
runtime assertion checks on the return values in the calling code,
trigger assertion failures directly in those macros/functions whenever
any pthread function returns an error, as there is no point in
continuing execution in such a case anyway.
In special NS search mode, after the initial lookup, dig starts the
followup lookup with discovered NS servers in the queries list. If one
of those queries then fail, dig, as usual, tries to start the next query
in the list, which results in a crash, because the NS search mode is
special in a way that the queries are running in parallel, so the next
query is usually already started.
Apply some special logic in `recv_done()` function to deal with the
described situation when handling the query result for the NS search
mode. Particularly, print a warning message for the failed query,
and do not try to start the next query in the list. Also, set a non-zero
exit code if all the queries in the followup lookup fail.
Fix an issue reported by Coverity by removing the unneded check.
*** CID 352554: Null pointer dereferences (REVERSE_INULL)
/bin/dig/dighost.c: 3056 in start_tcp()
3050
3051 if (ISC_LINK_LINKED(query, link)) {
3052 next = ISC_LIST_NEXT(query, link);
3053 } else {
3054 next = NULL;
3055 }
>>> CID 352554: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "connectquery" suggests that it may be null, but it
has already been dereferenced on all paths leading to the check.
3056 if (connectquery != NULL) {
3057 query_detach(&connectquery);
3058 }
3059 query_detach(&query);
3060 if (next == NULL) {
3061 clear_current_lookup();
There was a proposal in the late 1990s that it might, but it turned
out to be unworkable. See RFC 6891, Extension Mechanisms for
DNS (EDNS(0)), section 5, Extended Label Types.
The remnants of the code that supported this in BIND are redundant.
Previously, tasks could be created either unbound or bound to a specific
thread (worker loop). The unbound tasks would be assigned to a random
thread every time isc_task_send() was called. Because there's no logic
that would assign the task to the least busy worker, this just creates
unpredictability. Instead of random assignment, bind all the previously
unbound tasks to worker 0, which is guaranteed to exist.
This commit removes dead code from cleanup handling part of the
get_create_tls_context().
In particular, currently:
* there is no way 'found_ctx' might equal 'ctx';
* there is no way 'session_cache' might equal a non-NULL value while
cleaning up after a TLS initialisation error.
This commit ensures that isc_nm_cancelread() is not called from within
dig code for HTTP sockets, as these lack its implementation.
It does not have much sense to have it due to transactional nature of
HTTP.
Every HTTP request-response pair is represented by a virtual socket,
where read callback is called only when full DNS message is received
or when an error code is being passed there. That is, there is nothing
to cancel at the time of the call.
This commit extends TLS context cache with TLS client session cache so
that an associated session cache can be stored alongside the TLS
context within the context cache.
The dns_message_gettempname(), dns_message_gettemprdata(),
dns_message_gettemprdataset(), and dns_message_gettemprdatalist() always
succeeds because the memory allocation cannot fail now. Change the API
to return void and cleanup all the use of aforementioned functions.
3034 next = ISC_LIST_NEXT(query, link);
3035 } else {
3036 next = NULL;
3037 }
CID 352554 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking connectquery suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
3038 if (connectquery != NULL) {
3039 query_detach(&connectquery);
3040 }
- var_decl: Declaring variable "tbuf" without initializer
- assign: Assigning: "target.base" = "tbuf", which points to
uninitialized data
- assign: Assigning: "r.base" = "target.base", which points to
uninitialized data
I expect it would correctly initialize length always. Add simple
initialization to silent coverity.
There was a query_detach() call missing in dig, which could lead to
dig hanging on TLS context creation errors. This commit fixes.
The error was introduced because the Strict TLS implementation was
initially made over an older version of the code, where this extra
query_detach() call was not needed.
Man pages for dig/mdig/delv used `.. option:: +[no]bla` to describe two
options at once, and very old Sphinx does not support that [] in option
names.
Solution is to split negative and positive options into `+bla, +nobla`
form. In the end it improves readability because it transforms hard to
read strings with double brackets from
`+[no]subnet=addr[/prefix-length]` to
`+subnet=addr[/prefix-length], +nosubnet`.
As a side-effect it also allows easier linking to dig/mdig/delv options
using their name directly instead of always overriding the link target
to `+[no]bla` form.
Transformation was done using regex:
s/:: +\[no\]\(.*\)/:: +\1, +no\1
... and manual review around occurences matching regex
+no.*=
Fixes: #3301
dig previously set an exit code of 9 when a TCP connection failed
or when a UDP connection timed out, but when the server address is
localhost it's possible for a UDP query to fail with ISC_R_CONNREFUSED.
that code path didn't update the exit code, causing dig to exit with
status 0. we now set the exit code to 9 in this failure case.
In `+nssearch` mode `dig` starts the next query of the followup lookup
using `start_udp()` or `start_tcp()` calls without waiting for the
previous query to complete.
In UDP mode that happens in the `send_done()` callback of the previous
query, but in TCP mode that happens in the `start_tcp()` call of the
previous query (recursion) which doesn't work because `start_tcp()`
attaches the `lookup->current_query` to the query it is starting, so a
recursive call will result in an assertion failure.
Make the TCP mode to start the next query in `send_done()`, just like in
the UDP mode. During that time the `lookup->current_query` is already
detached by the `tcp_connected()`/`udp_ready()` callbacks.
The error code path handling the `ISC_R_CANCELED` code lacks a
`clear_current_lookup()` call, without which dig hangs indefinitely
when handling the error.
Add the missing call to account for all references of the lookup so
it can be destroyed.
In `send_udp()` and `launch_next_query()` functions, when calling
`dighost_printmessage()` to print detailed information about the
sent query, dig always prints the data of the first query in the
lookup's queries list.
The first query in the list can be already finished, having its handles
freed, and accessing this information results in assertion failure.
Print the current query's information instead.
In recv_done(), when dig decides to start the lookup's next query in
the line using `start_udp()` or `start_tcp()`, and for some reason,
no queries get started, dig doesn't cancel the lookup.
This can occur, for example, when there are two queries in the lookup,
one with a regular IP address, and another with a IPv4 mapped IPv6
address. When the regular IP address fails to serve the query, its
`recv_done()` callback starts the next query in the line (in this
case the one with a mapped IP address), but because `dig` doesn't
connect to such IP addresses, and there are no other queries in the
list, no new queries are being started, and the lookup keeps hanging.
After calling `start_udp()` or `start_tcp()` in `recv_done()`, check
if there are no pending/working queries then cancel the lookup instead
of only detaching from the current query.
The `udp_ready()` and `tcp_connected()` functions in dighost.c are
used for similar purposes for UDP and TCP respectively.
Synchronize the `udp_ready()` function entry code to behave like
`tcp_connected()` by adding input validation, debug messages and
early exit code when `cancel_now` is `true`.
When finishing the NSSEARCH task and there is no more followup
lookups to start, dig does not destroy the last lookup, which
causes it to hang indefinitely.
Rename the unused `first_pass` member of `dig_query_t` to `started`
and make it `true` in the first callback after `start_udp()` or
`start_tcp()` of the query to indicate that the query has been
started.
Create a new `check_if_queries_done()` function to check whether
all of the queries inside a lookup have been started and finished,
or canceled.
Use the mentioned function in the TRACE code block in `recv_done()`
to check whether the current query is the last one in the lookup and
cancel the lookup in that case to free the resources.
This commit adds support for Strict/Mutual TLS to dig.
The new command-line options and their behaviour are modelled after
kdig (+tls-ca, +tls-hostname, +tls-certfile, +tls-keyfile) for
compatibility reasons. That is, using +tls-* is sufficient to enable
DoT in dig, implying +tls-ca
If there is no other DNS transport specified via command-line,
specifying any of +tls-* options makes dig use DoT. In this case, its
behaviour is the same as if +tls-ca is specified: that is, the remote
peer's certificate is verified using the platform-specific
intermediate CA certificates store. This behaviour is introduced for
compatibility with kdig.
Previously, it was possible to assign a bit of memory space in the
nmhandle to store the client data. This was complicated and prevents
further refactoring of isc_nmhandle_t caching (future work).
Instead of caching the data in the nmhandle, allocate the hot-path
ns_client_t objects from per-thread clientmgr memory context and just
assign it to the isc_nmhandle_t via isc_nmhandle_set().
C11 has builtin support for _Noreturn function specifier with
convenience noreturn macro defined in <stdnoreturn.h> header.
Replace ISC_NORETURN macro by C11 noreturn with fallback to
__attribute__((noreturn)) if the C11 support is not complete.
Previously, the unreachable code paths would have to be tagged with:
INSIST(0);
ISC_UNREACHABLE();
There was also older parts of the code that used comment annotation:
/* NOTREACHED */
Unify the handling of unreachable code paths to just use:
UNREACHABLE();
The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which
is explicit version of the /* FALLTHROUGH */ comment we are currently
using.
Add and apply FALLTHROUGH macro that uses the attribute if available,
but does nothing on older compilers.
In one case (lib/dns/zone.c), using the macro revealed that we were
using the /* FALLTHROUGH */ comment in wrong place, remove that comment.
When encountering a TCP connection error while trying to initiate a
connection to a server, dig erroneously cancels the lookup even when
there are other server(s) to try, which results in an assertion failure.
Cancel the lookup only when there are no more queries left in the
lookup's queries list (i.e. `next` is NULL).
When timing-out or having other types of socket errors during a query,
dig isn't trying to perform the lookup using other servers which exist
in the lookup's queries list.
After configured amount of timeout retries, or after a socket error,
check if there are other queries/servers in the lookup's queries list,
and start the next one if it exists, instead of unconditionally failing.
When a query times out, and `dig` (or `host`) creates a new query
to resend the request, it is being prepended to the lookup's queries
list, which can cause a confusion later, making `dig` (or `host`)
believe that there is another new query in the list, but that is
actually the old one, which was timed out. That mistake will result
in an assertion failure.
That can happen, in particular, when after a timed out request,
the retried request returns a SERVFAIL result, and the recursion
is enabled, and `+nofail` option was used with `dig` (that is the
default behavior in `host`, unless the `-s` option is provided).
Fix the problem by inserting the query just after the current,
timed-out query, instead of prepending to the list.
Before calling start_udp() detach `l->current_query`, like it is
done in another place in the function.
Slightly update a couple of debug messages to make them more
consistent.
After a query results in a SERVFAIL result, and there is another
registered query in the lookup's queries list, `dig` starts the next
query to try another server, but for some reason, reports about that
also when the current query is in the head of the list, even if there
is no other query in the list to try.
Use the same condition for both decisions, and after starting the next
query, jump to the "detach_query" label instead of "next_lookup",
because there is no need to start the next lookup after we just started
a query in the current lookup.
The clang-format-15 has new option InsertBraces that could add missing
branches around single line statements. Use that to our advantage
without switching to not-yet-released LLVM version to add missing braces
in couple of places.
Replace :manpage: with :iscman: to generate internal hyperlinks. That
way reader can use links even when offline, and jumps to man pages
for the same version.
Formerly HTML version of man pages did not have links in See Also
section because :manpage: role in Sphinx can generate only external
hyperlinks - and we do not have that enabled.
Enabling the Sphinx :manpage: linking could reliably create hyperlinks
only to external URLs, but that would take users to another version
of docs.
Generated by:
find bin -name '*.rst' | xargs sed -i -e 's/:manpage:`\([^(]\+\)(\([0-9]\))`/:iscman:`\1(\2) <\1>`/g'
+ hand-edit to revert change for mmencode reference which is
not provided in our source tree.
Use the new role :iscman: to replace all occurences or ``binary``
with :iscman:`binary`, creating a hyperlink to the manual page.
Generated using:
find bin -name *.rst | xargs fgrep --files-with-matches '.. iscman' | xargs -I{} -n1 basename {} .rst > /tmp/progs
for PROG in $(cat /tmp/progs); do find -name '*.rst' | xargs sed -i -e "s/\`\`$PROG\`\`/:iscman:\`$PROG\`/g"; done
Additional hand-edits were done mainly around filter-aaaa and
filter-a which are program names and and option names at the
same time. Couple more edits was neede to fix .rst syntax broken by
automatic replacement.
Sphinx has it's own :program: syntax for refering to program names.
Use it for self-references in manual pages. These self-references are
not clickable and not as eye-cathing as links, which is a good thing.
There is no point in attracting attention to ``dig`` several times on a
single page dedicated to dig itself.
Substituted automatically using:
find bin -name *.rst | xargs fgrep --files-with-matches '.. program' | xargs -n1 bash /tmp/repl.sh
With /tmp/repl.sh being:
BASE=$(basename "$1" .rst)
sed -i -e "s/\`\`$BASE\`\`/:program:\`$BASE\`/g" "$1"
The new directive and role "iscman" allow to tag & reference man pages in
our source tree. Essentially it is just namespacing for ISC man pages,
but it comes with couple benefits.
Differences from .. _man_program label we formerly used:
- Does not expand :ref:`man_program` into full text of the page header.
- Generates index entry with category "manual page".
- Rendering style is closer to ubiquitous to the one produced
by ``named`` syntax.
Differences from Sphinx built-in :manpage: role:
- Supports all builders with support for cross-references.
- Generates internal links (unlike :manpage: which generates external
URLs).
- Checks that target exists withing our source tree.
Side-effect of hyperlinking is that typos in program and option names
are now detected by Sphinx.
Candidate -options were detected using:
find -name *.rst | xargs grep '``-[^`]'
and then modified from ``-o`` to :option:`-o` using regex
s/``\(-[^`]\+\)``/:option:`\1`/
+ manual modifications where necessary.
Non-hyphenated options were detected by looking at context around
program names:
find bin -name *.rst | xargs -I{} -n1 basename {} .rst | sort -u
and grepping for program name with trailing whitespace.
Stand-alone program names like ``named`` are not hyperlinked in this
commit.
The markup allows referencing individual options, and also makes them
more legible (no more thin red text on gray background).
Most of the work was done using regexes:
s/^``-\(.*\)``$/.. option:: -\1\r/
s/^``+\(.*\)``$/.. option:: +\1\r/
on bin/**/*.rst files along with visual inspection and hand-edits,
mostly for positional arguments.
Regex for rndc.rst:
s/^``\(.*\)``/.. option:: \1\r/
+ hand edits to remove extra asterisk and whitespace here and there.