- isc_mempool_get() can no longer fail; when there are no more objects
in the pool, more are always allocated. checking for NULL return is
no longer necessary.
- the isc_mempool_setmaxalloc() and isc_mempool_getmaxalloc() functions
are no longer used and have been removed.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
Previously, we only had capability to trace the mempool gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory pools get created and destroyed. This commit
adds such tracking capability.
The isc_mem_allocate() comes with additional cost because of the memory
tracking. In this commit, we replace the usage with isc_mem_get()
because we track the allocated sizes anyway, so it's possible to also
replace isc_mem_free() with isc_mem_put().
The jemalloc non-standard API fits nicely with our memory contexts, so
just rewrite the memory context internals to use the non-public API.
There's just one caveat - since we no longer track the size of the
allocation for isc_mem_allocate/isc_mem_free combination, we need to use
sallocx() to get real allocation size in both allocator and deallocator
because otherwise the sizes would not match.
The ISC_MEM_DEBUGSIZE and ISC_MEM_DEBUGCTX did sanity checks on matching
size and memory context on the memory returned to the allocator. Those
will no longer needed when most of the allocator will be replaced with
jemalloc.
There's global variable called `malloc_conf` that can be used to
configure jemalloc behaviour at the program startup. We use following
configuration:
* xmalloc:true - abort-on-out-of-memory enabled.
* background_thread:true - Enable internal background worker threads
to handle purging asynchronously.
* metadata_thp:auto - allow jemalloc to use transparent huge page
(THP) for internal metadata initially, but may begin to do so when
metadata usage reaches certain level.
* dirty_decay_ms:30000 - Approximate time in milliseconds from the
creation of a set of unused dirty pages until an equivalent set of
unused dirty pages is purged and/or reused.
* muzzy_decay_ms:30000 - Approximate time in milliseconds from the
creation of a set of unused muzzy pages until an equivalent set of
unused muzzy pages is purged and/or reused.
More information about the specific meaning can be found in the jemalloc
manpage or online at http://jemalloc.net/jemalloc.3.html
The jemalloc allocator is scalable high performance allocator, this is
the first in the series of commits that will add jemalloc as a memory
allocator for BIND 9.
This commit adds configure.ac check and Makefile modifications to use
jemalloc as BIND 9 allocator.
Previously, we only had capability to trace the memory gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory contexts get created and destroyed. This commit
adds such tracking capability.
This commit makes BIND return HTTP status codes for malformed or too
small requests.
DNS request processing code would ignore such requests. Such an
approach works well for other DNS transport but does not make much
sense for HTTP, not allowing it to complete the request/response
sequence.
Suppose execution has reached the point where DNS message handling
code has been called. In that case, it means that the HTTP request has
been successfully processed, and, thus, we are expected to respond to
it either with a message containing some DNS payload or at least to
return an error status code. This commit ensures that BIND behaves
this way.
This error code fits better than the more generic "Internal Server
Error" (500) which implies that the problem is on the server.
Also, do not end the whole HTTP/2 session on a bad request.
We were too strict regarding the value and presence of "Accept" HTTP
header, slightly breaking compatibility with the specification.
According to RFC8484 client SHOULD add "Accept" header to the requests
but MUST be able to handle "application/dns-message" media type
regardless of the value of the header. That basically suggests we
ignore its value.
Besides, verifying the value of the "Accept" header is a bit tricky
because it could contain multiple media types, thus requiring proper
parsing. That is doable but does not provide us with any benefits.
Among other things, not verifying the value also fixes compatibility
with clients, which could advertise multiple media types as supported,
which we should accept. For example, it is possible for a perfectly
valid request to contain "application/dns-message", "application/*",
and "*/*" in the "Accept" header value. Still, we would treat such a
request as invalid.
The commit fixes BIND hanging when browsers end HTTP/2 streams
prematurely (for example, by sending RST_STREAM). It ensures that
isc__nmsocket_prep_destroy() will be called for an HTTP/2 stream,
allowing it to be properly disposed.
The problem was impossible to reproduce using dig or DoH benchmarking
software (e.g. flamethrower) because these do not tend to end HTTP/2
streams prematurely.
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).
When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
if a control channel listener was configured with more than one
key algorithm, message verification would be attempted with each
algorithm in turn. if the first key failed due to the wrong
signature length, the entire verification process was aborted,
rather than continuing on to try with another key.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
The last remaining defines needed for platforms without NAME_MAX and
PATH_MAX (I'm looking at you, GNU Hurd) were moved to isc/dir.h where
it's prevalently used.
The ISC_STRERRORSIZE was defined in isc/platform.h header as the
value was different between Windows and POSIX platforms. Now that
Windows is gone, move the define to where it belongs.
The function 'private_type_record()' is now used in multiple system
setup scripts and should be moved to the common configuration script
conf.sh.common.
The old approach where each zone structure has its own mutex that
a thread needs to obtain multiple locks to do safe keyfile I/O
operations lead to a race condition ending in a possible deadlock.
Consider a zone in two views. Each such zone is stored in a separate
zone structure. A thread that needs to read or write the key files for
this zone needs to obtain both mutexes in seperate structures. If
another thread is working on the same zone in a different view, they
race to get the locks. It would be possible that thread1 grabs the
lock of the zone in view1, while thread2 wins the race for the lock
of the zone in view2. Now both threads try to get the other lock, both
of them are already locked.
Ideally, when a thread wants to do key file operations, it only needs
to lock a single mutex. This commit introduces a key management hash
table, stored in the zonemgr structure. Each time a zone is being
managed, an object is added to the hash table (and removed when the
zone is being released). This object is identified by the zone name
and contains a mutex that needs to be locked prior to reading or
writing key files.
(cherry-picked from commit ef4619366d49efd46f9fae5f75c4a67c246ba2e6)
Similar to notify, add code to send and keep track of checkds requests.
On every zone_rekey event, we will check the DS at parental agents
(but we will only actually query parental agents if theree is a DS
scheduled to be published/withdrawn).
On a zone_rekey event, we will first clear the ongoing checkds requests.
Reset the counter, to avoid continuing KSK rollover premature.
This has the risk that if zone_rekey events happen too soon after each
other, there are redundant DS queries to the parental agents. But
if TTLs and the configured durations in the dnssec-policy are sane (as
in not ridiculous short) the chance of this happening is low.