View matching on an incoming query checks the query's signature,
which can be a CPU-heavy task for a SIG(0)-signed message. Implement
an asynchronous mode of the view matching function which uses the
offloaded signature checking facilities, and use it for the incoming
queries.
This is a tiny helper function which is used only once and can be
replaced with two function calls instead. Removing this makes
supporting asynchronous signature checking less complicated.
In order to protect from a malicious DNS client that sends many
queries with a SIG(0)-signed message, add a quota of simultaneously
running SIG(0) checks.
This protection can only help when named is using more than one worker
threads. For example, if named is running with the '-n 4' option, and
'sig0checks-quota 2;' is used, then named will make sure to not use
more than 2 workers for the SIG(0) signature checks in parallel, thus
leaving the other workers to serve the remaining clients which do not
use SIG(0)-signed messages.
That limitation is going to change when SIG(0) signature checks are
offloaded to "slow" threads in a future commit.
The 'sig0checks-quota-exempt' ACL option can be used to exempt certain
clients from the quota requirements using their IP or network addresses.
The 'sig0checks-quota-maxwait-ms' option is used to define a maximum
amount of time for named to wait for a quota to appear. If during that
time no new quota becomes available, named will answer to the client
with DNS_R_REFUSED.
Previously, the number of RR types for a single owner name was limited
only by the maximum number of the types (64k). As the data structure
that holds the RR types for the database node is just a linked list, and
there are places where we just walk through the whole list (again and
again), adding a large number of RR types for a single owner named with
would slow down processing of such name (database node).
Add a configurable limit to cap the number of the RR types for a single
owner. This is enforced at the database (rbtdb, qpzone, qpcache) level
and configured with new max-types-per-name configuration option that
can be configured globally, per-view and per-zone.
Previously, the number of RRs in the RRSets were internally unlimited.
As the data structure that holds the RRs is just a linked list, and
there are places where we just walk through all of the RRs, adding an
RRSet with huge number of RRs inside would slow down processing of said
RRSets.
Add a configurable limit to cap the number of the RRs in a single RRSet.
This is enforced at the database (rbtdb, qpzone, qpcache) level and
configured with new max-records-per-type configuration option that can
be configured globally, per-view and per-zone.
Changed the default value for 'allow-transfer' to 'none'; zone
transfers now require explicit authorization.
Updated all system tests to specify an allow-transfer ACL when needed.
Revised the ARM to specify that the default is 'none'.
The list isn't exactly maintained but it helped with some BIND history
tracking and is basically harmless so it might be worth holding onto it.
I have adapted the name to ASCII so IDN support won't be necessary.
The high-water allows administrators to better tune the recursive
clients limit without having to to poll the statistics channel in high
rates to get this number.
the previous commit introduced a possible race in getsigningtime()
where the rdataset header could change between being found on the
heap and being bound.
getsigningtime() now looks at the first element of the heap, gathers the
locknum, locks the respective lock, and retrieves the header from the
heap again. If the locknum has changed, it will rinse and repeat.
Theoretically, this could spin forever, but practically, it almost never
will as the heap changes on the zone are very rare.
we simplify matters further by changing the dns_db_getsigningtime()
API call. instead of passing back a bound rdataset, we pass back the
information the caller actually needed: the resigning time, owner name
and type of the rdataset that was first on the heap.
if we had a method to get the running loop, similar to how
isc_tid() gets the current thread ID, we can simplify loop
and loopmgr initialization.
remove most uses of isc_loop_current() in favor of isc_loop().
in some places where that was the only reason to pass loopmgr,
remove loopmgr from the function parameters.
by default, QPDB is the database used by named and all tools and
unit tests. the old default of RBTDB can now be restored by using
"configure --with-zonedb=rbt --with-cachedb=rbt".
some tests have been fixed so they will work correctly with either
database.
CHANGES and release notes have been updated to reflect this change.
replace the string "rbt" throughout BIND with "qp" so that
qpdb databases will be used by default instead of rbtdb.
rbtdb databases can still be used by specifying "database rbt;"
in a zone statement.
With _exit() instead of exit() in place, we don't need
isc__tls_setfatalmode() mechanism as the atexit() calls will not be
executed including OpenSSL atexit hooks.
Since the fatal() isn't a correct but rather abrupt termination of the
program, we want to skip the various atexit() calls because not all
memory might be freed during fatal() call, etc. Using _exit() instead
of exit() has this effect - the program will end, but no destructors or
atexit routines will be called.
Expose the newly added 'first refresh' flag in the information
provided by the 'rndc staus' command, by showing the number of
zones, which are not yet fully ready, and their first refresh
is pending or is in-progress.
Add a new zone flag to indicate that a secondary type zone is
not yet fully ready, and a first time refresh is pending or is
in progress.
Expose this new flag in the statistics channel's "Incoming Zone
Transfers" section.
Instead of running all the cryptographic validation in a tight loop,
spread it out into multiple event loop "ticks", but moving every single
validation into own isc_async_run() asynchronous event. Move the
cryptographic operations - both verification and DNSKEY selection - to
the offloaded threads (isc_work_enqueue), this further limits the time
we spend doing expensive operations on the event loops that should be
fast.
Limit the impact of invalid or malicious RRSets that contain crafted
records causing the dns_validator to do many validations per single
fetch by adding a cap on the maximum number of validations and maximum
number of validation failures that can happen before the resolving
fails.
This is now the default way to implement attaching to/detaching from
a pointer.
Also update cfg_keystore_fromconfig() to allow NULL value for the
keystore pointer. In most cases we detach it immediately after the
function call.
Move dns_dnssec_findzonekeys from the dnssec.{c,h} source code to
zone.{c,h} (the header file already commented that this should be done
inside dns_zone_t).
Alter the function in such a way, that keys are searched for in the
key stores if a 'dnssec-policy' (kasp) is attached to the zone,
otherwise keep using the zone's key-directory.
When writing key files to disk, use the internally stored directory.
Add an access function 'dst_key_directory()'.
Most calls to keymgr functions no longer need to provide the
key-directory value. Only 'dns_keymgr_run' still needs access to
the zone's key-directory in case the key-store is set to the built-in
key-directory.
Refactor dns_dnssec_findmatchingkeys and dns_dnssec_keylistfromrdataset
to take into account the key store directories in case the zone is using
dnssec-policy (kasp). Add 'kasp' and 'keystores' parameters.
This requires the keystorelist to be stored inside the zone structure.
The calls to these functions in the DNSSEC tools can use NULL as the
kasp value, as dnssec-signzone does not (yet) support dnssec-policy,
and key collision is checked inside the directory where it is created.
When creating the kasp structure, instead of storing the name of the
key store on keys, store a reference to the key store object instead.
This requires to build the keystore list prior to creating the kasp
structures, in the dnssec tools, the check code and the server code.
We will create a builtin keystore called "key-directory" which means
use the zone's key-directory as the key store.
The check code changes, because now the keystore is looked up before
creating the kasp structure (and if the keystore is not found, this
is an error). Instead of looking up the keystore after all
'dnssec-policy' clauses have been read.
The statistics channel does not expose the current number of TCP clients
connected, only the highwater. Therefore, users did not have an easy
means to collect statistics about TCP clients served over time. This
information could only be measured as a seperate mechanism via rndc by
looking at the TCP quota filled.
In order to expose the exact current count of connected TCP clients
(tracked by the "tcp-clients" quota) as a statistics counter, an
extra, dedicated Network Manager callback would need to be
implemented for that purpose (a counterpart of ns__client_tcpconn()
that would be run when a TCP connection is torn down), which is
inefficient. Instead, track the number of currently-connected TCP
clients separately for IPv4 and IPv6, as Network Manager statistics.
The conn_shutdown() function is called whenever a control channel
connection is supposed to be closed, e.g. after a response to the client
is sent or when named is being shut down. That function calls
isccc_ccmsg_invalidate(), which resets the magic number in the structure
holding the messages exchanged over a given control channel connection
(isccc_ccmsg_t). The expectation here is that all operations related to
the given control channel connection will have been completed by the
time the connection needs to be shut down.
However, if named shutdown is initiated while a control channel message
is still in flight, some netmgr callbacks might still be pending when
conn_shutdown() is called and isccc_ccmsg_t invalidated. This causes
the REQUIRE assertion checking the magic number in ccmsg_senddone() to
fail when the latter function is eventually called, resulting in a
crash.
Fix by splitting up isccc_ccmsg_invalidate() into two separate
functions:
- isccc_ccmsg_disconnect(), which initiates TCP connection shutdown,
- isccc_ccmsg_invalidate(), which cleans up magic number and buffer,
and then:
- replacing all existing uses of isccc_ccmsg_invalidate() with calls
to isccc_ccmsg_disconnect(),
- only calling isccc_ccmsg_invalidate() when all netmgr callbacks are
guaranteed to have been run.
Adjust function comments accordingly.
Multiple zones should be able to read the same key and signing policy
at the same time. Since writing the kasp lock only happens during
reconfiguration, and the complete kasp list is being replaced, there
is actually no need for a lock. Reference counting ensures that a kasp
structure is not destroyed when still being attached to one or more
zones.
This significantly improves the load configuration time.
The loop in shutdown_listener() assumes that the reference count for
every controlconnection_t object on the listener->connections linked
list will drop down to zero after the conn_shutdown() call in the loop's
body. However, when the timing is just right, some netmgr callbacks for
a given control connection may still be awaiting processing by the same
event loop that executes shutdown_listener() when the latter is run.
Since these netmgr callbacks must be run in order for the reference
count for the relevant controlconnection_t objects to drop to zero, when
the scenario described above happens, shutdown_listener() runs into an
infinite loop due to one of the controlconnection_t objects on the
listener->connections linked list never going away from the head of that
list.
Fix by safely iterating through the listener->connections list and
initiating shutdown for all controlconnection_t objects found. This
allows any pending netmgr callbacks to be run by the same event loop in
due course, i.e. after shutdown_listener() returns.
The main intention of PROXY protocol is to pass endpoints information
to a back-end server (in our case - BIND). That means that it is a
valid way to spoof endpoints information, as the addresses and ports
extracted from PROXYv2 headers, from the point of view of BIND, are
used instead of the real connection addresses.
Of course, an ability to easily spoof endpoints information can be
considered a security issue when used uncontrollably. To resolve that,
we introduce 'allow-proxy' and 'allow-proxy-on' ACL options. These are
the only ACL options in BIND that work with real PROXY connections
addresses, allowing a DNS server operator to specify from what clients
and on which interfaces he or she is willing to accept PROXY
headers. By default, for security reasons we do not allow to accept
them.
This commit extends "listen-on" statement with "proxy" options that
allows one to enable PROXYv2 support on a dedicated listener. It can
have the following values:
- "plain" to send PROXYv2 headers without encryption, even in the case
of encrypted transports.
- "encrypted" to send PROXYv2 headers encrypted right after the TLS
handshake.
It was possible that controlconf connections could be shutdown twice
when shutting down the server, because they would receive the
signal (ISC_R_SHUTTINGDOWN result) from netmgr and then the shutdown
procedure would be called second time via controls_shutdown().
Split the shutdown procedure from control_recvmessage(), so we can call
it independently from netmgr callbacks and make sure it will be called
only once. Do the similar thing for the listeners.
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release. Remove both the AES
usage for DNS cookies and the AES implementation itself.