Commit Graph

14274 Commits

Author SHA1 Message Date
Tony Finch
a8f1d0c19c Compress zone transfers properly
After change 5995, zone transfers were using a small
compression context that only had space for the first
few dozen names in each message. They now use a large
compression context with enough space for every name.
2022-11-30 12:16:09 +00:00
Ondřej Surý
1816244725 Don't log "final reference detached" on INFO level
The "final reference detached" message was meant to be DEBUG(1), but was
instead kept at INFO level.  Move it to the DEBUG(1) logging level, so
it's not printed under normal operations.
2022-11-30 11:04:45 +01:00
Ondřej Surý
35d8d72dd8 Keep the unlink adb entries until expiration
Currently, the ADB uses TTL of 0 for ADB names that the server is
authoritative for and TTL of 10 seconds for HINT and GLUE ADB names.

This requires the unlinked ADB entries to be kept around, because they
would disappear too quickly.  This especially affect the root zone as
the trust level is "ultimate" for the root zone nameservers.

This commit restores the ability to keep the unlinked ADB entries in the
database for later reuse, restores printing the unlinked entries and
adds some extra cleaning of the unlinked ADB entries on the tail of the
LRU list (similar to what we are doing for the ADB names).
2022-11-30 10:03:24 +01:00
Ondřej Surý
50f357cb36 Refactor the dns_adb unit
The dns_adb unit has been refactored to be much simpler.  Following
changes have been made:

1. Simplify the ADB to always allow GLUE and hints

   There were only two places where dns_adb_createfind() was used - in
   the dns_resolver unit where hints and GLUE addresses were ok, and in
   the dns_zone where dns_adb_createfind() would be called without
   DNS_ADBFIND_HINTOK and DNS_ADBFIND_GLUEOK set.

   Simplify the logic by allowing hint and GLUE addresses when looking
   up the nameserver addresses to notify.  The difference is negligible
   and would cause a difference in the notified addresses only when
   there's mismatch between the parent and child addresses and we
   haven't cached the child addresses yet.

2. Drop the namebuckets and entrybuckets

   Formerly, the namebuckets and entrybuckets were used to reduced the
   lock contention when accessing the double-linked lists stored in each
   bucket.  In the previous refactoring, the custom hashtable for the
   buckets has been replaced with isc_ht/isc_hashmap, so only a single
   item (mostly, see below) would end up in each bucket.

   Removing the entrybuckets has been straightforward, the only matching
   was done on the isc_sockaddr_t member of the dns_adbentry.

   Removing the zonebuckets required GLUEOK and HINTOK bits to be
   removed because the find could match entries with-or-without the bits
   set, and creating a custom key that stores the
   DNS_ADBFIND_STARTATZONE in the first byte of the key, so we can do a
   straightforward lookup into the hashtable without traversing a list
   that contains items with different flags.

3. Remove unassociated entries from ADB database

   Previously, the adbentries could live in the ADB database even after
   unlinking them from dns_adbnames.  Such entries would show up as
   "Unassociated entries" in the ADB dump.  The benefit of keeping such
   entries is little - the chance that we link such entry to a adbname
   is small, and it's simpler to evict unlinked entries from the ADB
   cache (and the hashtable) than create second LRU cleaning mechanism.

   Unlinked ADB entries are now directly deleted from the hash
   table (hashmap) upon destruction.

4. Cleanup expired entries from the hash table

   When buckets were still in place, the code would keep the buckets
   always allocated and never shrink the hash table (hashmap).  With
   proper reference counting in place, we can delete the adbnames from
   the hash table and the LRU list.

5. Stop purging the names early when we hit the time limit

   Because the LRU list is now time ordered, we can stop purging the
   names when we find a first entry that doesn't fullfil our time-based
   eviction criteria because no further entry on the LRU list will meet
   the criteria.

Future work:

1. Lock contention

   In this commit, the focus was on correctness of the data structure,
   but in the future, the lock contention in the ADB database needs to
   be addressed.  Currently, we use simple mutex to lock the hash
   tables, because we almost always need to use a write lock for
   properly purging the hashtables.  The ADB database needs to be
   sharded (similar to the effect that buckets had in the past).  Each
   shard would contain own hashmap and own LRU list.

2. Time-based purging

   The ADB names and entries stay intact when there are no lookups.
   When we add separate shards, a timer needs to be added for time-based
   cleaning in case there's no traffic hashing to the inactive shard.

3. Revisit the 30 minutes limit

   The ADB cache is capped at 30 minutes.  This needs to be revisited,
   and at least the limit should be configurable (in both directions).
2022-11-30 10:03:24 +01:00
Ondřej Surý
66d8bb03cb Create per-thread task for dns_adb resolver fetches
The dns_adb would serialize all fetches on a single task.  Create a
per-thread task, so the fetches will stay local to the thread that
initiated the fetch.
2022-11-30 10:03:24 +01:00
Ondřej Surý
0d4ef6fcd7 Expire namehooks when purging stale ADB names
Instead of trying to expire entries from adbentrybuckets, expire the
namehooks while purging the stale ADB names.
2022-11-30 10:03:23 +01:00
Ondřej Surý
557a71a6f9 Purge stale ADB names globaly, not per bucket
Before the refactoring, there was only few buckets with many names in
them, so cleaning up stale ADB names per-bucket made sense.  After the
refactoring, each bucket directly maps to ADB name, so purging has been
effectively disabled.

Create a global LRU list for ADB names (and ADB entries) and purge the
stale ADB names globally.
2022-11-30 10:03:23 +01:00
Ondřej Surý
327768e280 dns_adb: Remove deadnames and deadentries
Previously, the name and entry buckets were much larger, so the dead
names and entries were moved to a secondary list to be cleaned
later (f.e. after the already running fetch has been canceled).  After
the last refactoring, the bucket now contains only the name (entry)
itself and thus the extra list has a little use.  Remove the .deadnames
and .deadentries from dns_adbnamebucket_t and dns_adbentrybucket_t
structures.
2022-11-30 10:03:23 +01:00
Ondřej Surý
77659e7392 Refactor dns_rpz unit to use single reference counting
The dns_rpz_zones structure was using .refs and .irefs for strong and
weak reference counting.  Rewrite the unit to use just a single
reference counting + shutdown sequence (dns_rpz_destroy_rpzs) that must
be called by the creator of the dns_rpz_zones_t object.  Remove the
reference counting from the dns_rpz_zone structure as it is not needed
because the zone objects are fully embedded into the dns_rpz_zones
structure and dns_rpz_zones_t object must never be destroyed before all
dns_rpz_zone_t objects.

The dns_rps_zones_t reference counting uses the new ISC_REFCOUNT_TRACE
capability - enable by defining DNS_RPZ_TRACE in the dns/rpz.h header.

Additionally, add magic numbers to the dns_rpz_zone and dns_rpz_zones
structures.
2022-11-30 09:59:35 +01:00
Ondřej Surý
118ae66976 Add extra set of ISC_REFCOUNT_TRACE_{IMPL,DECL} macros
The new ISC_REFCOUNT_TRACE_{IMPL,DECL} macros can be used to add a
reference tracing capability to any unit using the reference counting.
It requires a little bit of extra work in each header as you can't have
a define from inside a define (see rpz.h), but it's fairly easy to add
tracing to any struct using reference counting with these macros.
2022-11-29 23:57:40 -08:00
Ondřej Surý
fa275a59da Remove the unused cache cleaning mechanism from dns_cache API
The dns_cache API contained a cache cleaning mechanism that would be
disabled for 'rbt' based cache.  As named doesn't have any other cache
implementations, remove the cache cleaning mechanism from dns_cache API.
2022-11-29 13:48:33 -08:00
Ondřej Surý
5e4a26856c Remove the dead external cache cleaning mechanism from RBTDB
The RBTDB has own cache cleaning mechanism and therefor the iterator
.cleaning member would never be set to true.  Remove the code that
checks for iterator->cleaning from the RBTDB.
2022-11-29 13:48:33 -08:00
Artem Boldariev
9b1c8c03fd TCP: use uv_try_write() to optimise sends
This commit make TCP code use uv_try_write() on best effort basis,
just like TCP DNS and TLS DNS code does.

This optimisation was added in
'caa5b6548a11da6ca772d6f7e10db3a164a18f8d' but, similar change was
mistakenly omitted for generic TCP code. This commit fixes that.
2022-11-29 13:41:10 +02:00
Michal Nowak
afdb41a5aa Update sources to Clang 15 formatting 2022-11-29 08:54:34 +01:00
Tony Finch
96b6d78f75 Speed up lib/dns/gen.c
The `gen` program was causing a lengthy single-threaded pause in
the BIND build. When generating RDATATYPE_FROMTEXT_SW(), `gen` hit
the inner loop of `find_typename()` over 1.2 billion times. This
change avoids long deeply-nested loops, so `gen` now runs in less
than 10ms, about 300x faster.

No changes to the output.
2022-11-28 09:44:26 +00:00
Ondřej Surý
d8df29e37d Be more resilient when destroying the httpd requests
Don't restart reading in the send callback after the httpdmgr has been
shut down, and call httpd_request(..., ISC_R_SHUTDOWN, ...) when
shutting down the httpdmgr to reduce code duplication.
2022-11-25 16:20:34 +01:00
Ondřej Surý
f3004da3a5 Make the netmgr send callback to be asynchronous only when needed
Previously, the send callback would be synchronous only on success.  Add
an option (similar to what other callbacks have) to decide whether we
need the asynchronous send callback on a higher level.

On a general level, we need the asynchronous callbacks to happen only
when we are invoking the callback from the public API.  If the path to
the callback went through the libuv callback or netmgr callback, we are
already on asynchronous path, and there's no need to make the call to
the callback asynchronous again.

For the send callback, this means we need the asynchronous path for
failure paths inside the isc_nm_send() (which calls isc__nm_udp_send(),
isc__nm_tcp_send(), etc...) - all other invocations of the send callback
could be synchronous, because those are called from the respective libuv
send callbacks.
2022-11-25 15:46:25 +01:00
Ondřej Surý
5ca49942a3 Make the netmgr read callback to be asynchronous only when needed
Previously, the read callback would be synchronous only on success or
timeout.  Add an option (similar to what other callbacks have) to decide
whether we need the asynchronous read callback on a higher level.

On a general level, we need the asynchronous callbacks to happen only
when we are invoking the callback from the public API.  If the path to
the callback went through the libuv callback or netmgr callback, we are
already on asynchronous path, and there's no need to make the call to
the callback asynchronous again.

For the read callback, this means we need the asynchronous path for
failure paths inside the isc_nm_read() (which calls isc__nm_udp_read(),
isc__nm_tcp_read(), etc...) - all other invocations of the read callback
could be synchronous, because those are called from the respective libuv
or netmgr read callbacks.
2022-11-25 15:46:15 +01:00
Tony Finch
00307fe318 Deduplicate time unit conversion factors
The various factors like NS_PER_MS are now defined in a single place
and the names are no longer inconsistent. I chose the _PER_SEC names
rather than _PER_S because it is slightly more clear in isolation;
but the smaller units are always NS, US, and MS.
2022-11-25 13:23:36 +00:00
Mark Andrews
b95d089751 Fix log messages incorrectly logged at error
The log message "got TLS configuration for zone transfer" is not
an error, setting to info.
2022-11-25 08:50:36 +11:00
Mark Andrews
65f2512315 TLS setting of primaries with catalog zones where being ignored
Extract the tlss values if present from the ipkeylist entry and add
the resulting tls setting to the constructed configuration for the
primary.

When comparing catalog zone entries for reuse also check the
masters.tlss values for equality.
2022-11-25 08:50:36 +11:00
Evan Hunt
18606f5276 remove unused 'nupdates' field from client
the 'nupdates' field was originally used to track whether a client
was ready to shut down, along with other similar counters nreads,
nrecvs, naccepts and nsends. this is now tracked differently, but
nupdates was overlooked when the other counters were removed.
2022-11-23 23:44:10 +00:00
Matthijs Mekking
f9845dd128 Deprecate auto-dnssec
Deprecate auto-dnssec, add specific log warning to migrate to
dnssec-policy.
2022-11-23 09:46:16 +01:00
Matthijs Mekking
f71a6692db Obsolete dnssec-secure-to-insecure option
Now that the key management operations using dynamic updates feature
has been removed, the 'dnssec-secure-to-insecure' option has become
obsoleted.
2022-11-18 11:04:17 +01:00
Matthijs Mekking
b6c2776df5 Remove dynamic update key management code
Remove code that triggers key and denial of existence management
operations. Dynamic update should no longer be used to do DNSSEC
maintenance (other than that of course signatures need to be
created for the new zone contents).
2022-11-18 11:04:17 +01:00
Tony Finch
1c0f607811 Simplify and speed up DNS name decompression
The aim is to do less work per byte:

  * Check the bounds for each label, instead of checking the
    bounds for each character.

  * Instead of copying one character at a time from the wire to
    the name, copy entire runs of sequential labels using memmove()
    to make the most of its fast loop.

  * To remember where the name ends, we only need to set the end
    marker when we see a compression pointer or when we reach the
    root label. There is no need to check if we jumped back and
    conditionally update the counter for every character.

  * To parse a compression pointer, we no longer take a diversion
    around the outer loop in between reading the upper byte of the
    pointer and the lower byte.

  * The parser state machine is now implicit in the instruction
    pointer, instead of being an explicit variable. Similarly,
    when we reach the root label we break directly out of the loop
    instead of setting a second state machine variable.

  * DNS_NAME_DOWNCASE is never used with dns_name_fromwire() so
    that option is no longer supported.

I have removed this comment which dated from January 1999 when
dns_name_fromwire() was first introduced:

   /*
    * Note:  The following code is not optimized for speed, but
    * rather for correctness.  Speed will be addressed in the future.
    */

No functional change, apart from removing support for the unused
DNS_NAME_DOWNCASE option. The new code is about 2x faster than the
old code: best case 11x faster, worst case 1.4x faster.
2022-11-17 08:45:15 +00:00
Tony Finch
e0c9692341 Clean up remnants of label types
There were a few comments referring obliquely to different kinds of
labels, which became obsolete a long time ago.
2022-11-17 08:44:27 +00:00
Mark Andrews
dfbffd77f9 Select the appropriate namespace when using a dual stack server
When using dual-stack-servers the covering namespace to check whether
answers are in scope or not should be fctx->domain.  To do this we need
to be able to distingish forwarding due to forwarders clauses and
dual-stack-servers.  A new flag FCTX_ADDRINFO_DUALSTACK has been added
to signal this.
2022-11-17 12:23:45 +11:00
Ondřej Surý
379929e052 Deprecate setting operating system limits from named.conf
It was possible to set operating system limits (RLIMIT_DATA,
RLIMIT_STACK, RLIMIT_CORE and RLIMIT_NOFILE) from named.conf.  It's
better to leave these untouched as setting these is responsibility of
the operating system and/or supervisor.

Deprecate the configuration options and remove them in future BIND 9
release.
2022-11-14 16:48:52 +01:00
Ondřej Surý
0bf7014f85 Remove the last remnants of --with-tuning=large
The small/large tuning has been completely removed from the code with
last remnant of the dead code in ns_interfacemgr.  Remove the dead code
and the configure option.
2022-11-14 10:01:20 +01:00
Mark Andrews
f053d5b414 Have dns_zt_apply lock the zone table
There were a number of places where the zone table should have been
locked, but wasn't, when dns_zt_apply was called.

Added a isc_rwlocktype_t type parameter to dns_zt_apply and adjusted
all calls to using it.  Removed locks in callers.
2022-11-11 15:26:11 +00:00
Matthijs Mekking
53eab06083 Change default TTL of NSEC3PARAM to SOA MINIMUM
Despite the RFC says that the NSEC3PARAM is not something that is
intended for the resolver to be cached, and thus the TTL of 0 is most
logical, a zero TTL RRset can be abused by bad actors.

Change the default to SOA MINIMUM.
2022-11-11 12:06:33 +01:00
Ondřej Surý
417097450a Check view->adb in dns_view_flushcache()
The call to dns_view_flushcache() is done under exclusive mode, but we
still need to check if view->adb is still attached before calling
dns_adb_flush() because the shutdown might have been already
initialized.  This most likely only a theoretical problem on shutdown
because there's either no way how to initiate cache flush when shutting
down or very slim window where the `rndc flush` would have to hit the
slim time during named shutdown.
2022-11-11 11:47:44 +01:00
Ondřej Surý
a8ba240325 Don't use view->resolver directly when priming in dns_view_find()
When starting priming from dns_view_find(), the dns_view shutdown could
be initiated by different thread, detaching from the resolver.  Use
dns_view_getresolver() to attach to the resolver under view->lock, so we
don't try to call dns_resolver_prime() with NULL pointer.

There are more accesses to view->resolver, (and also view->adb and
view->requestmgr that suffer from the same problem) in the dns_view
module, but they are all done in exclusive mode or under a view->lock.
2022-11-11 11:47:44 +01:00
Ondřej Surý
e4654d1a6a Bump the allowed HTTP headers in statschannel to 100
Firefox 90+ apparently sends more than 10 headers, so we need to bump
the number to some higher number.  Bump it to 100 just to be on a save
side, this is for internal use only anyway.
2022-11-10 16:34:26 +01:00
Ondřej Surý
b7eabb6394 Use isc_hashmap instead of isc_ht in the dns_resolver API
Replace the use of isc_ht API with isc_hashmap API in the dns_resolver
implementation.  This requires extending the fctxbucket_t structure to
include keysize and copy of the key because the isc_hashmap API needs
the raw key in case of resizing the hashmap table.
2022-11-10 15:07:19 +01:00
Ondřej Surý
e1220a2d4f Use isc_hashmap instead of isc_ht in the dns_adb API
Replace the use of isc_ht API with isc_hashmap API in the dns_adb
database implementation.  This requires extending the
dns_adbnamebucket_t and dns_adbentrybucket_t structures to include
keysize and copy of the key because the isc_hashmap API needs the raw
key in case of resizing the hashmap table.
2022-11-10 15:07:19 +01:00
Ondřej Surý
f46ce447a6 Add isc_hashmap API that implements Robin Hood hashing
Add new isc_hashmap API that differs from the current isc_ht API in
several aspects:

1. It implements Robin Hood Hashing which is open-addressing hash table
   algorithm (e.g. no linked-lists)

2. No memory allocations - the array to store the nodes is made of
   isc_hashmap_node_t structures instead of just pointers, so there's
   only allocation on resize.

3. The key is not copied into the hashmap node and must be also stored
   externally, either as part of the stored value or in any other
   location that's valid as long the value is stored in the hashmap.

This makes the isc_hashmap_t a little less universal because of the key
storage requirements, but the inserts and deletes are faster because
they don't require memory allocation on isc_hashmap_add() and memory
deallocation on isc_hashmap_delete().
2022-11-10 15:07:19 +01:00
Ondřej Surý
9d2f22e666 Properly name the loop->mctx
The per loop memory context were unnamed, properly name them as
'loop<tid>'.
2022-11-08 13:32:13 +01:00
Mark Andrews
044c3b2bb8 Add missing closing ')' to update-policy documentation
The opening '(' before local was not being matched by a closing
')' after the closing '};'.
2022-11-04 10:37:47 +00:00
Ondřej Surý
96e7bf76e7 Don't release the tree read lock in dereference_iter_node()
Previously, the tree read lock could be upgraded to a write lock in
decrement_reference() and then downgraded back to read lock in
dereference_iter_node().  When the use of isc_rwlock_downgrade() was
removed, the downgrade was changed to a simple unlock+lock. This allows
some delete operations to sneak in and delete nodes that the iterator
expects to be in place.

Expand decrement_reference() so the caller can indicate whether the
tree read lock should be upgraded, and disallow the upgrade when
calling from dereference_iter_node(), so there will be no need to
release the lock afterward.
2022-11-03 14:07:44 +00:00
Ondřej Surý
80e66fbd2d Don't use dns_zone_attach() in zone_refreshkeys()
The zone_refreshkeys() could run before the zone_shutdown(), but after
the last .erefs has been "detached" causing assertion failure when doing
dns_zone_attach().  Remove the use of .erefs (dns_zone_attach/detach)
and replace it with using the .irefs and additional checks whether the
zone is exiting in the callbacks.
2022-11-03 14:29:32 +01:00
Matthijs Mekking
332b98ae49 Don't allow DNSSEC records in the raw zone
There was an exception for dnssec-policy that allowed DNSSEC in the
unsigned version of the zone. This however causes a crash if the
zone switches from dynamic to inline-signing in the case of NSEC3,
because we are now trying to add an NSEC3 record to a non-NSEC3 node.
This is because BIND expects none of the records in the unsigned
version of the zone to be NSEC3.

Remove the exception for dnssec-policy when copying non DNSSEC
records, but do allow for DNSKEY as this may be a published DNSKEY
from a different provider.
2022-11-03 10:20:05 +01:00
Ondřej Surý
c429b52533 Don't cleanup the dead nodes when pruning the tree
The dead nodes might get reactivated during the db iterator walks the
version of the tree, so we can't cleanup the dead nodes while the db
version is open.  Restore the previous behaviour that cleaned up the
dead nodes when we are closing the version.
2022-11-03 09:06:08 +01:00
Ondřej Surý
be204bf4c7 Cleanup the dead nodes when pruning the tree
While sending the node to prune_tree(), we can also cleanup dead nodes
because we already hold the tree and node bucket write locks.
2022-11-02 13:06:52 +01:00
Ondřej Surý
0492bbf590 Make the pthread_rwlock implementation header-only macros [2/2]
While using mutrace, the phtread-rwlock based isc_rwlock implementation
would be all tracked in the rwlock.c unit losing all useful information
as all rwlocks would be traced in a single place.  Rewrite the
pthread_rwlock based implementation to be header-only macros, so we can
use mutrace to properly track the rwlock contention without heavily
patching mutrace to understand the libisc synchronization primitives.
2022-11-02 10:34:10 +01:00
Ondřej Surý
6bd201ccec Remove one level of indirection from isc_rwlock [1/2]
Instead of checking the PTHREAD_RUNTIME_CHECK from the header, move it
to the pthread_rwlock implementation functions.  The internal isc_rwlock
actually cannot fail, so the checks in the header was useless anyway.
2022-11-02 10:27:09 +01:00
Ondřej Surý
98b7a93772 Remove isc_rwlock_downgrade() from isc_rwlock
The isc_rwlock_downgrade() is not used anywhere, so we can remove it and
make the pthread_rwlock implementation simpler.
2022-11-02 09:05:37 +01:00
Ondřej Surý
e5f7fe1f65 Add strong rwlock consistency checks to dns_rbtdb
The dns_rbtdb unit already tracks the state of the node and tree rwlocks
during the top level function and passes the states of the locks to the
called functions.

Add the tree locking family of macros modeled after node locking macros,
and expand both to track the state of the lock in an external variable.
Additionally, in developer mode, add precondition to the macros, so the
lock is in required state - this should cause an assertion failure on
double locking instead of the thread getting stuck.
2022-11-02 08:45:48 +01:00
Ondřej Surý
006a7f0cb6 Remove isc_rwlock_downgrade usage in rbtdb.c
The only place where isc_rwlock_downgrade was being used was the
decrement_reference() where the code tries either relocks the node
rwlock to write and then tries to upgrade the tree lock.  When returning
from the function it tries to restore the locks into a previous state
which is nice, but kind of moot, because at every use of
decrement_reference() the node locks is immediately or almost
immeditately unlocked, and same holds for the tree lock.

Instead of trying to restore the node and tree lock into the initial
state, the decrement_reference now returns the state of the locks, so
the caller can then use the right unlock operation (read or write).
Only when the tree lock was originally unlocked, the decrement_reference
unlocks the tree lock before returning to the caller.
2022-11-02 08:45:48 +01:00