This commit ensures that an HTTP endpoints set reference is stored in
a socket object associated with an HTTP/2 stream instead of
referencing the global set stored inside a listener.
This helps to prevent an issue like follows:
1. BIND is configured to serve DoH clients;
2. A client is connected and one or more HTTP/2 stream is
created. Internal pointers are now pointing to the data on the
associated HTTP endpoints set;
3. BIND is reconfigured - the new endpoints set object is created and
promoted to all listeners;
4. The old pointers to the HTTP endpoints set data are now invalid.
Instead referencing a global object that is updated on
re-configurations we now store a local reference which prevents the
endpoints set objects to go out of scope prematurely.
(cherry picked from commit b9b5d0c01a3a546c4a6a8b3bff8ae9dd31fee224)
It was reported that HTTP/2 session might get closed or even deleted
before all async. processing has been completed.
This commit addresses that: now we are avoiding using the object when
we do not need it or specifically check if the pointers used are not
'NULL' and by ensuring that there is at least one reference to the
session object while we are doing incoming data processing.
This commit makes the code more resilient to such issues in the
future.
(cherry picked from commit 0cca550dff403c6100b7c0da8f252e7967765ba7)
When calling dns_resolver_createfetch in resolver.c with a callback
of resume_dslookup, clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT from
options as DNS_EVENT_TRYSTALE is not an expected event type and
triggers a REQUIRE.
When the cache's memory context was in over memory state when the
cache was flushed it resulted in LRU cleaning removing newly entered
data in the new cache straight away until the old cache had been
destroyed enough to take it out of over memory state. When flushing
the cache create a new memory context for the new db to prevent this.
(cherry picked from commit 5e77edd074)
When named is started with -4 or -6 and the primaries for a zone
do not have an IPv4 or IPv6 address respectively issue a log message.
(cherry picked from commit 2cd4303249)
If uv_tcp_close_reset() returns an error code, this means the
reset_shutdown callback has not been issued, so do it now.
(cherry picked from commit c40e5c8653)
Failing to accept TCP/TLS connections in 9.18 detaches the quota in
isc__nm_failed_accept_cb, causing TCP4Clients and TCP6Clients statistics
to not decrease inside cleanup.
Fix by increasing the counter after the point of no failure but before
handling statistics through the client's socket is no longer valid.
The `axfr_makedb()` didn't set the loop on the newly created database,
effectively killing delayed cleaning on such database. Move the
database creation into dns_zone API that knows all the gory details of
creating new database suitable for the zone.
(cherry picked from commit 3310cac2b0)
When isc_task_purgeevent() is called for and 'event', the event, in
the meanwhile, could in theory get processed, unlinked, and freed.
So when the function then operates on the 'event', it causes a
segmentation fault.
The only place where isc_task_purgeevent() is called is from
timer_purge().
In order to resolve the data race, call isc_task_purgeevent() inside
the 'timer->lock' locked block, so that timerevent_destroy() won't
be able to destroy the event if it was processed in the meanwhile,
before isc_task_purgeevent() had a chance to purge it.
In order to be able to do that, move the responsibility of calling
isc_event_free() (upon a successful purge) out from the
isc_task_purgeevent() function to its caller instead, so that it can
be called outside of the timer->lock locked block.
Let basic_tick() of 'task1' and 'basic_quick' of 'task4' run in
different threads, and insert an artificial delay in timer_purge()
to cause an existing race condition to appear.
- duplicated question
- duplicated answer
- qtype as an answer
- two question types
- question names
- nsec3 bad owner name
- short record
- short question
- mismatching question class
- bad record owner name
- mismatched class in record
- mismatched KEY class
- OPT wrong owner name
- invalid RRSIG "covers" type
- UPDATE malformed delete type
- TSIG wrong class
- TSIG not the last record
(cherry picked from commit 6e9ed4983e)
DNSKEY was incorrectly being added to the NESC/NSEC3 type bit map
when it was obscured by the delegation. This lead to zone verification
failures.
(cherry picked from commit ec3c624814)
If a query sent using the dns_request API times out when the view it was
associated with gets torn down, the dns_dispatch_resume() call in
req_response() may be issued with the 'resp' argument set to NULL,
triggering an assertion failure. Consider the following scenario ([A]
and [B] are thread identifiers):
1. [A] Read timeout for a dispatch query fires.
2. [A] udp_recv() is called. It locks the dispatch, determines it
timed out, prepares for calling the higher-level callback with
ISC_R_TIMEDOUT, and unlocks the dispatch (lib/dns/dispatch.c:633).
3. [B] The last reference to a view is released.
dns_requestmgr_shutdown() is called, canceling all in-flight
requests for that view. (Note that udp_recv() in thread [A] already
unlocked the dispatch, so its state can be modified.) As a part of
this process, request_cancel() calls dns_dispatch_done() on
request->dispentry, setting it to NULL.
4. [A] udp_recv() calls the higher-level callback (req_response()) with
ISC_R_TIMEDOUT.
5. [A] Since the request timed out, req_response() retries sending it.
In the process, it calls dns_dispatch_resume(), passing
request->dispentry as the 'resp' argument.
6. [A] Since 'resp' is NULL, the REQUIRE(VALID_RESPONSE(resp));
assertion in dns_dispatch_resume() fails.
Fix by checking whether the request has been canceled before calling
dns_dispatch_resume(), similarly to how it is done in req_connected()
and req_senddone().
With weak zone attachments being used for catzs, catzs->view->zonetable
may be NULL so we need to account for this which dns_view_findzone
does. This is already done in main.
Use dns_view_weakattach and dns_view_weakdetach to maintain a
reference to the view referenced through catzs->view.
(cherry picked from commit 307e3ed9a6)
Allow SVBC (HTTPS) alias form with parameters to be accepted from
the wire and when transfered. This is for possible future extensions.
(cherry picked from commit 799046929c)
When parsing a zonefile named-checkzone (and others) could loop
infinitely if a directory was $INCLUDED. Record the error and treat
as EOF when looking for multiple errors.
This was found by Eric Sesterhenn from X41.
(cherry picked from commit efd27bb82d)
In nibble mode if the value to be converted was negative the parser
would loop forever. Process the value as an unsigned int instead
of as an int to prevent sign extension when shifting.
This was found by Eric Sesterhenn from X41.
(cherry picked from commit 371824f078)
An RPZ response's SOA record TTL is set to 1 instead of the SOA TTL,
a boolean value is passed on to query_addsoa, which is supposed to be
a TTL value. I don't see what value is appropriate to be used for
overriding, so we will pass UINT32_MAX.
(cherry picked from commit 5d7e613e81)
If the zone is signed with a different way than 'dnssec-policy', use
the legacy way of jittering signatures, that is calculate jitter by
taking the two values of 'sig-validity-interval' and subtracting the
second value from the first value.
Having a value higher than signatures-validity does not make sense
and should be treated as a configuration error.
(cherry picked from commit c3d8932f79)
When calculating the RRSIG validity, jitter is now derived from the
config option rather than from the refresh value.
(cherry picked from commit 67f403a423)
Be stricter in durations that are accepted. Basically we accept ISO 8601
formats, but fail to detect garbage after the integers in such strings.
For example, 'P7.5D' will be treated as 7 days. Pass 'endptr' to
'strtoll' and check if the endptr is at the correct suffix.
(cherry picked from commit e39de45adc)
dns_db_addrdataset() enforces a requirement that version can only be
NULL for a cache database. code that checks for zone semantics and
version == NULL can never be reached.
(cherry picked from commit b3c8b5cfb2)
The Depends relation refers to types of rollovers in which a certain
record type is going to be swapped. Specifically, the Depends relation
says there should be no dependency on the predecessor key (the set
Dep(x, T) must be empty).
But if the key is phased out (all its states are in HIDDEN), there is
no longer a dependency. Since the relationship is still maintained
(Predecessor and Successor metadata), the keymgr_dep function still
returned true. In other words, the set Dep(x, T) is not considered
empty.
This slows down key rollovers, only retiring keys when the successor
key has been fully propagated.
(cherry picked from commit 0aac81cf80)
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
(cherry picked from commit 79040a669c)
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.
(cherry picked from commit 231b2375e5)
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.
Update code comments so that they more precisely describe what the
relevant bits of code actually do.
(cherry picked from commit 3a01c749f9)
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to create the task(s) that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
Previously, rbtdb->task had quantum of 1 because it was originally used
just for freeing RBTDB contents, which can happen on a "best effort"
basis (does not need to be prioritized). However, when tree pruning was
implemented, it also started sending events to that task, enabling the
latter to become clogged up with a significant event backlog because it
only pruned a single RBTDB node per event.
To prioritize tree pruning (as it is necessary for enforcing the
configured memory use limit for the cache memory context), create a
second task with a virtually unlimited quantum (UINT_MAX) and send the
tree-pruning events to this new task, to ensure that all nodes scheduled
for pruning will be processed before further nodes are queued in a
similar fashion.
This change enables dropping the prunenodes list and restoring the
originally-used logic that allocates and sends a separate event for each
node to prune.