It should be floor(DNS_NAME_MAXWIRE / 2) + 1 == 128
The mistake was introduced in c6bf51492d because:
* I was refactoring an existing `DNS_MAX_LABELS` defined as 127
* There was a longstanding bug in `dns_name_isvalid()` which
checked the number of labels against 127U instead of 128
* I mistakenly thought `dns_name_isvalid()` was correct and
`dns_name_countlabels()` was incorrect, but the reverse was true.
After this commit, occurrances of `DNS_NAME_MAXLABELS` with value
128 are consistent with the use of 127 or 128 before commit
c6bf51492d except for the mistake in `dns_name_isvalid()`.
This commit adds a test case that checks the MAXLABELS case
in `dns_name_fromtext()` and `dns_name_isvalid()`.
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)
One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.
After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
Revert refcount debug tracing (commit a8b29f0365), there are better
ways to do it.
Use the dns_qpmethods_t typedef where appropriate.
Some stylistic improvements.
It is sometimes necessary to access a qp-trie outside an isc_loop,
such as in tests or an isc_work callback. The best option was to use
a `dns_qpmulti_write()` transaction, but that has overheads that are
not necessary for read-only access, such as committing a new version
of the trie even when nothing changed.
So this commit adds a `dns_qpmulti_read()` transaction, which is
nearly as lightweight as a query transaction, but it takes the mutex
like a write transaction.
This is the first of the "fancy" searches that know how the DNS
namespace maps on to the structure of a qp-trie. For example, it will
find the closest enclosing zone in the zone tree.
The dns_adbentry_overquota() was violating the layers accessing the
adbentry struct members directly. Change it to dns_adb_overquota() to
match the dns_adb API.
use the ISC_REFCOUNT implementation for dns_zone_attach() and
_detach(). (this applies only to external zone references, not
to dns_zone_iattach() and dns_zone_idetach().)
use dns_zone_ref() where previously a dummy zone object had been
used to increment the reference count.
My original idea had been that the core qp-trie code would be mostly
independent of the storage for keys, so I did not make it check at run
time that key lengths are sensible. However, the qp-trie search
routines need to get keys out of leaf objects, for which they provide
storage on the stack, which is particularly dangerous for unchecked
buffer overflows. So this change checks that key lengths are in bounds
at the API boundary between the qp-trie code and the rest of BIND, and
there is no more pretence that keys might be longer.
Add a new configuration option to set how the checkds method should
work. Acceptable values are 'yes', 'no', and 'explicit'.
When set to 'yes', the checkds method is to lookup the parental agents
by querying the NS records of the parent zone.
When set to 'no', no checkds method is enabled. Users should run
the 'rndc checkds' command to signal that DS records are published and
withdrawn.
When set to 'explicit', the parental agents are explicitly configured
with the 'parental-agents' configuration option.
This should have no functional effects.
The message size stats are specified by RSSAC002 so it's best not
to mess around with how they appear in the statschannel. But it's
worth changing the implementation to use general-purpose histograms,
to reduce code size and benefit from sharded counters.
Cleanup the remnants of MS Compiler bits from <isc/refcount.h>, printing
the information in named/main.c, and cleanup some comments about Windows
that no longer apply.
The bits in picohttpparser.{h,c} were left out, because it's not our
code.
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs. As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.
Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.
Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
The only place where dns_name_hash() was being used is the old hash
table in the dns_badcache unit. Squash the dns_name_fullhash() and
dns_name_hash() into single dns_name_hash() function that's always
case-insensitive as it doesn't make to do case-sensitive hashing of the
domain names and we were not using this anywhere.
Instead of marking the unused entities with UNUSED(x) macro in the
function body, use a `ISC_ATTR_UNUSED` attribute macro that expans to
C23 [[maybe_unused]] or __attribute__((__unused__)) as fallback.
Change the isc_job_run() to not-make any allocations. The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.
For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
for testing purposes, we need to be able to specify a library path from
which to load the dnsrps implementation. this can now be done with the
"dnsrps-library" option.
DNSRPS can now be enabled in configure regardless of whether librpz.so
is currently installed on the system.
the new dns_view_addtrustedkey() function allows a view's trust
anchors to be updated directly. this code was formerly in
dns_client_addtrustedkey(), which is now a wrapper around
dns_view_addtrustedkey().
stop and restart the server in the 'tsiggss' test, in order
to confirm that GSS negotiated TSIG keys are saved and restored
when named loads.
added logging to dns_tsigkey_createfromkey() to indicate whether
a key has been statically configured, generated via GSS negotiation,
or restored from a file.
REQUIRE that rdata->type is dns_rdatatype_svcb to detect when
dns_rdata_checksvcb is called with the wrong rdata type. There are
no code paths that currently pass the wrong rdata to dns_rdata_checksvcb.
This was found by GCC 12 static analysis.
without diffie-hellman TKEY negotiation, some other code is
now effectively dead or unnecessary, and can be cleaned up:
- the rndc tsig-list and tsig-delete commands.
- a nonoperational command-line option to dnssec-keygen that
was documented as being specific to DH.
- the section of the ARM that discussed TKEY/DH.
- the functions dns_tkey_builddeletequery(), processdeleteresponse(),
and tkey_processgssresponse(), which are unused.
Completely remove the TKEY Mode 2 (Diffie-Hellman Exchanged Keying) from
BIND 9 (from named, named.conf and all the tools). The TKEY usage is
fringe at best and in all known cases, GSSAPI is being used as it should.
The draft-eastlake-dnsop-rfc2930bis-tkey specifies that:
4.2 Diffie-Hellman Exchanged Keying (Deprecated)
The use of this mode (#2) is NOT RECOMMENDED for the following two
reasons but the specification is still included in Appendix A in case
an implementation is needed for compatibility with old TKEY
implementations. See Section 4.6 on ECDH Exchanged Keying.
The mixing function used does not meet current cryptographic
standards because it uses MD5 [RFC6151].
RSA keys must be excessively long to achieve levels of security
required by current standards.
We might optionally implement Elliptic Curve Diffie-Hellman (ECDH) key
exchange mode 6 if the draft ever reaches the RFC status. Meanwhile the
insecure DH mode needs to be removed.
There are leftovers from the previous refactoring effort, which left
some function declarations and comments in the header file unchanged.
Finish the renaming.
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().
It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.
The output then looks like this:
incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0
incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0
There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
Now that we can configure a different digest type, update the code
to honor the configuration. Update 'dns_dnssec_syncupdate' so that
the correct CDS record is published, and also when deleting CDS records,
ensure that all possible CDS records are removed from the zone.
In general, it's better to do one thorough compaction when a batch of
work is complete, which is the way that `update` transactions work.
Conversely, `write` transactions are designed so that lots of little
transactions are not too inefficient, but they need explicit
compaction. This changes `dns_qp_compact()` so that it is easier to
compact any time that makes sense, if there isn't a better way to
schedule compaction. And `dns_qpmulti_commit()` only recycles garbage
when there is enough to make it worthwhile.
Add some qp-trie tracing macros which can be enabled by a
developer. These print a message when a leaf is attached or
detached, indicating which part of the qp-trie implementation
did so. The refcount methods must now return the refcount value
so it can be printed by the trace macros.
The first working multi-threaded qp-trie was stuck with an unpleasant
trade-off:
* Use `isc_rwlock`, which has acceptable write performance, but
terrible read scalability because the qp-trie made all accesses
through a single lock.
* Use `liburcu`, which has great read scalability, but terrible
write performance, because I was relying on `rcu_synchronize()`
which is rather slow. And `liburcu` is LGPL.
To get the best of both worlds, we need our own scalable read side,
which we now have with `isc_qsbr`. And we need to modify the write
side so that it is not blocked by readers.
Better write performance requires an async cleanup function like
`call_rcu()`, instead of the blocking `rcu_synchronize()`. (There
is no blocking cleanup in `isc_qsbr`, because I have concluded
that it would be an attractive nuisance.)
Until now, all my multithreading qp-trie designs have been based
around two versions, read-only and mutable. This is too few to
work with asynchronous cleanup. The bare minimum (as in epoch
based reclamation) is three, but it makes more sense to support an
arbitrary number. Doing multi-version support "properly" makes
fewer assumptions about how safe memory reclamation works, and it
makes snapshots and rollbacks simpler.
To avoid making the memory management even more complicated, I
have introduced a new kind of "packed reader node" to anchor the
root of a version of the trie. This is simpler because it re-uses
the existing chunk lifetime logic - see the discussion under
"packed reader nodes" in `qp_p.h`.
I have also made the chunk lifetime logic simpler. The idea of a
"generation" is gone; instead, chunks are either mutable or
immutable. And the QSBR phase number is used to indicate when a
chunk can be reclaimed.
Instead of the `shared_base` flag (which was basically a one-bit
reference count, with a two version limit) the base array now has a
refcount, which replaces the confusing ad-hoc lifetime logic with
something more familiar and systematic.
Adjust the dns_qp_memusage() and dns_qp_compact() functions
to be more informative and flexible about handling fragmentation.
Avoid wasting space in runt chunks.
Switch from twigs_mutable() to cells_immutable() because that is the
sense we usually want.
Drop the redundant evacuate() function and rename evacuate_twigs() to
evacuate(). Move some chunk test functions closer to their point of
use.
Clarify compact_recursive(). Some small cleanups to comments.
Use isc_time_monotonic() for qp-trie timing stats.
Use #define constants to control debug logging.
Set up DNS name label offsets in dns_qpkey_fromname() so it is easier
to use in cases where the name is not fully hydrated.
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
Some qp-trie operations will need to know the maximum number of labels
in a name, so I wanted a standard macro definition with the right
value.
Replace DNS_MAX_LABELS from <dns/resolver.h with DNS_NAME_MAXLABELS in
<dns/name.h>, and add its counterpart DNS_NAME_LABELLEN.
Use these macros in `name.c` and `resolver.c`.
Fix an off-by-one error in an assertion in `dns_name_countlabels()`.