29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Backport of MR !10185
Merge branch 'mnowak/do-not-delete-only-keyword-in-generate-tsan-stress-jobs' into 'bind-9.18'
See merge request isc-projects/bind9!10188
29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.
Backport of MR !10127
Merge branch 'backport-nicki/ci-shotgun-eval-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10184
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
(cherry picked from commit 29fd756408)
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
(cherry picked from commit 4214c1e8a7)
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.
Closes#2094
Backport of MR !9971
Merge branch 'backport-2094-plugin-reference-leak-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10171
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
(cherry picked from commit c2e4358267)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
Closes#5197
Backport of MR !10157
Merge branch 'backport-5197-cache_name-logic-error-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10159
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
(cherry picked from commit d0fd9cbe3b)
With RPZ in use, `named` could terminate unexpectedly because of a race condition when a reconfiguration command was received using `rndc`. This has been fixed.
Closes#5146
Backport of MR !10079
Merge branch 'backport-5146-rpz-reconfig-bug-fix-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10145
After a reconfiguration the old view can be left without a valid
'rpzs' member, because when the RPZ is not changed during the named
reconfiguration 'rpzs' "migrate" from the old view into the new
view, so when a query resumes it can find that 'qctx->view->rpzs'
is NULL which query_resume() currently doesn't expect to happen if
it's recursing and 'qctx->rpz_st' is not NULL.
Fix the issue by adding a NULL-check. In order to not split the log
message to two different log messages depending on whether
'qctx->view->rpzs' is NULL or not, change the message to not log
the RPZ policy's "version" which is just a runtime counter and is
most likely not very useful for the users.
(cherry picked from commit 3ea2fbc238)
Previously, when parsing responses, named incorrectly rejected responses without matching RRSIG records for NSEC/DS/NSEC3 records in the authority section. This rejection, if appropriate, should have been left for the validator to determine and has been fixed.
Closes#5185
Backport of MR !10125
Merge branch 'backport-5185-remove-rrsig-check-from-dns_message_parse-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10143
Checking whether the authority section is properly signed should
be left to the validator. Checking in getsection (dns_message_parse)
was way too early and resulted in resolution failures of lookups
that should have otherwise succeeded.
(cherry picked from commit 83159d0a54)
The cache has been updated so that if new data is rejected - for example, because there was already existing data at a higher trust level - then its covering RRSIG will also be rejected.
Closes#5132
Backport of MR !9999
Merge branch 'backport-5132-improve-cd-behavior-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10135
add a zone with different NS RRsets in the parent and child,
and test resolver and forwarder behavior with and without +CD.
(cherry picked from commit e4652a0444)
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.
When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
(cherry picked from commit 6aba56ae89)
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.
(cherry picked from commit 2adabe835a)
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.
It could lead to potential memory leaks and consequent shutdown hangs.
(cherry picked from commit 8b8f4d500d)
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.
The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.
(cherry picked from commit a22bc2d7d4)
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.
Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.
(cherry picked from commit 05e8a50818)
When processing a query with the "checking disabled" bit set (CD=1), `named` stores the unvalidated result in the cache, marked "pending". When the same query is sent with CD=0, the cached data is validated, and either accepted as an answer, or ejected from the cache as invalid. This deferred validation was not attempted for DS and DNSKEY records if they had no cached signatures, causing spurious validation failures. We now complete the deferred validation in this scenario.
Also, if deferred validation fails, we now re-query the data to find out whether the zone has been corrected since the invalid data was cached.
Closes#5066
Backport of MR !10104
Merge branch 'backport-5066-fix-strip-dnssec-rrsigs-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10106
If a deferred validation on data that was originally queried with
CD=1 fails, we now repeat the query, since the zone data may have
changed in the meantime.
(cherry picked from commit 04b1484ed8)
When a query is made with CD=1, we store the result in the
cache marked pending so that it can be validated later, at
which time it will either be accepted as an answer or removed
from the cache as invalid. Deferred validation was not
attempted when there were no cached RRSIGs for DNSKEY and
DS. We now complete the deferred validation in this scenario.
(cherry picked from commit 8b900d1808)
The text that stale-cache-enable is set to no has no effect on
max-cache-ttl, but on max-stale-ttl.
Closes#5181
Backport of MR !10108
Merge branch 'backport-5181-max-stale-ttl-typo-arm-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10116
Views use two types of reference counting - regular and weak, and
when there are no more regular references, the `view_flushanddetach()`
function destroys or detaches some parts of the view, including
`view->zonetable`, while other parts are freed by `destroy()` when
the last weak reference is detached. Since catalog zones use weak
references to attach a view, it's currently possible that during
shutdown catalog zone processing will try to add a new zone into
an otherwise unused view (because it's shutting down) which doesn't
have an attached zonetable any more. This could cause an assertion
failure. Fix this issue by modifying the `dns_view_addzone()` function
to expect that `view->zonetable` can be `NULL`, and in that case just
return `ISC_R_SHUTTINGDOWN`.
Closes#5138
Merge branch '5138-fix-dns_view_addzone-race-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10086
Views use two types of reference counting - regular and weak, and
when there are no more regular references, the view_flushanddetach()
function destroys or detaches some parts of the view, including
'view->zonetable', while other parts are freed by destroy() when
the last weak reference is detached. Since catalog zones use weak
references to attach a view, it's currently possible that during
shutdown catalog zone processing will try to add a new zone into
an otherwise unused view (because it's shutting down) which doesn't
have an attached zonetable any more. This could cause an assertion
failure. Fix this issue by modifying the dns_view_addzone() function
to expect that 'view->zonetable' can be NULL, and in that case just
return ISC_R_SHUTTINGDOWN.
An incorrect optimization caused "CNAME and other data" errors not to be detected if certain types were at the same node as a CNAME. This has been fixed.
Closes#5150
Backport of MR !10033
Merge branch 'backport-5150-cname-and-other-data-check-not-applied-to-all-types-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10101
prio_type was being used in the wrong place to optimize cname_and_other.
We have to first exclude and accepted types and we also have to
determine that the record exists before we can check if we are at
a point where a later CNAME cannot appear.
(cherry picked from commit 5e49a9e4ae)
This is a complement to the already present system test "stress" test.
Backport of MR !9474
Merge branch 'backport-mnowak/generate-tsan-unit-stress-tests-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10095