Commit Graph

4220 Commits

Author SHA1 Message Date
Matthijs Mekking
9acce8a82a Add a function isc_stats_resize
Add a new function to resize the number of counters in a statistics
counter structure. This will be needed when we keep track of DNSSEC
sign statistics and new keys are introduced due to a rollover.
2021-08-24 09:07:15 +02:00
Matthijs Mekking
0bac9c7c5c Add stats unit test
Add a simple stats unit test that tests the existing library functions
isc_stats_ncounters, isc_stats_increment, isc_stats_decrement,
isc_stats_set, and isc_stats_update_if_greater.
2021-08-24 09:07:15 +02:00
Michal Nowak
d3d32683c0 Fix typos in lib/isc/trampoline_p.h 2021-08-19 07:12:33 +02:00
Ondřej Surý
87d5c8ab7c Disable the Path MTU Discover on UDP Sockets
Instead of disabling the fragmentation on the UDP sockets, we now
disable the Path MTU Discovery by setting IP(V6)_MTU_DISCOVER socket
option to IP_PMTUDISC_OMIT on Linux and disabling IP(V6)_DONTFRAG socket
option on FreeBSD.  This option sets DF=0 in the IP header and also
ignores the Path MTU Discovery.

As additional mitigation on Linux, we recommend setting
net.ipv4.ip_no_pmtu_disc to Mode 3:

    Mode 3 is a hardend pmtu discover mode. The kernel will only accept
    fragmentation-needed errors if the underlying protocol can verify
    them besides a plain socket lookup. Current protocols for which pmtu
    events will be honored are TCP, SCTP and DCCP as they verify
    e.g. the sequence number or the association. This mode should not be
    enabled globally but is only intended to secure e.g. name servers in
    namespaces where TCP path mtu must still work but path MTU
    information of other protocols should be discarded. If enabled
    globally this mode could break other protocols.
2021-08-19 07:12:33 +02:00
Mark Andrews
89fe8e920c Use %d for enum values 2021-08-19 10:19:32 +10:00
Mark Andrews
26b22a1445 add tests for string and qstring 2021-08-18 13:49:48 +10:00
Mark Andrews
a6357d8b5c Add unit test for keypair 2021-08-18 13:49:48 +10:00
Mark Andrews
42c22670b3 Add support for parsing <tag>[=<value>]
where <value> may be a quoted string.  Previously quoted string
only supported opening quotes at the start of the string.
2021-08-18 13:49:48 +10:00
Artem Boldariev
d72b1fa5cd Fix the doh_recv_send() logic in the doh_test
The commit fixes the doh_recv_send() because occasionally it would
fail because it did not wait for all responses to be sent, making the
check for ssends value to nit pass.
2021-08-12 14:28:17 +03:00
Artem Boldariev
e639957b58 Optimise TLS stream for small write size (>= 512 bytes)
This commit changes TLS stream behaviour in such a way, that it is now
optimised for small writes. In the case there is a need to write less
or equal to 512 bytes, we could avoid calling the memory allocator at
the expense of possibly slight increase in memory usage. In case of
larger writes, the behviour remains unchanged.
2021-08-12 14:28:17 +03:00
Artem Boldariev
e301e1e3b8 Avoid memory copying during send in TLS stream
At least at this point doing memory copying is not required. Probably
it was a workaround for some problem in the earlier days of DoH, at
this point it appears to be a waste of CPU cycles.
2021-08-12 14:28:17 +03:00
Artem Boldariev
bd69c7c57c Simplify buffering code logic in http_send_outgoing()
This commit significantly simplifies the code in http_send_outgoing()
as it was unnecessary complicated, because it was dealing with
multiple statically and dynamically allocated buffers, making it
extremely hard to follow, as well as making it to do unnecessary
memory copying in some situations. This commit fixes these issues,
while retaining the high level buffering logic.
2021-08-12 14:28:17 +03:00
Artem Boldariev
a32faa20b4 DoH: replace a custom buffer code for POST data with isc_buffer_t
This commit replaces the custom buffer code in client-side DoH code
intended to keep track of POST data, with isc_buffer_t.
2021-08-12 14:28:17 +03:00
Artem Boldariev
5b52a7e37e When terminating a client session, mark it as closing
When an HTTP/2 client terminates a session it means that it is about
to close the underlying connection. However, we were not doing that.
As a result, with the latest changes to the test suite, which made it
to limit amount of requests per a transport connection, the tests
using quota would hang for quite a while. This commit fixes that.
2021-08-12 14:28:17 +03:00
Artem Boldariev
dbca22877a Limit the number of requests sent per connection in DoH tests
This commit ensures that only a limited number of requests is going to
be sent over a single HTTP/2 connection. Before that change was
introduced, it was possible to complete all of the planned sends via
only one transport connection, which undermines the purpose of the
tests using the quota facility.
2021-08-12 14:28:16 +03:00
Artem Boldariev
a05728beb0 Do not call http_do_bio() in isc__nm_http_request()
The function should not be called here because it is, in general,
supposed to be called at the end of the transport level callbacks to
perform I/O, and thus, calling it here is clearly a mistake because it
breaks other code expectations. As a result of the call to
http_do_bio() from within isc__nm_http_request() the unit tests were
running slower than expected in some situations.

In this particular situation http_do_bio() is going to be called at
the end of the transport_connect_cb() (initially), or http_readcb(),
sending all of the scheduled requests at once.

This change affects only the test suite because it is the only place
in the codebase where isc__nm_http_request() is used in order to
ensure that the server is able to handle multiple HTTP/2 streams at
once.
2021-08-12 14:28:16 +03:00
Artem Boldariev
849d38b57b Fix a crash by attach to the transport socket as early as possible
This commit fixes a crash in DoH caused by transport handle to be
detached too early when sending outgoing data.

We need to attach to the session->handle earlier because as an
indirect result of the nghttp2_session_mem_send() the session might
get closed and the handle detached. However, there is still might be
some outgoing data to handle. Besides, even when the underlying socket
was closed via the handle, we still should try to attempt to send
outgoing data via isc_nm_send() to let it call write callback, passed
to the http_send_outgoing().
2021-08-12 14:28:16 +03:00
Artem Boldariev
e0704f2e5d Use isc_buffer_t to keep track of outgoing response
This commit gets rid of custom code taking care of response buffering
by replacing the custom code with isc_buffer_t. Also, it gets rid of
an unnecessary memory copying when sending a response.
2021-08-12 14:28:16 +03:00
Artem Boldariev
6fe4ab39b9 Use isc_buffer_t to keep track of incoming POST data
This commit replaces the ad-hoc 64K buffer for incoming POST data with
isc_buffer_t backed by dynamically allocated buffer sized accordingly
to the value in the "Content-Length" header.
2021-08-12 14:28:16 +03:00
Artem Boldariev
0ca790d9bf DoH: isc__buffer_usedregion->isc_buffer_usedregion in client_send()
This commit replaces wrong usage of  isc__buffer_usedregion() instead
of implied  isc_buffer_usedregion().
2021-08-12 14:28:16 +03:00
Artem Boldariev
2733cca3ac Replace ad-hoc DNS message buffer in client code with isc_buffer_t
The commit replaces an ad-hoc incoming DNS-message buffer in the
client-side DoH code with isc_buffer_t.

The commit also fixes a timing issue in the unit tests revealed by the
change.
2021-08-12 14:28:16 +03:00
Artem Boldariev
c819caa3a1 Replace the HTTP/2 session's ad-hoc buffer with isc_buffer_t
This commit replaces a static ad-hoc HTTP/2 session's temporary buffer
with a realloc-able isc_buffer_t object, which is being allocated on
as needed basis, lowering the memory consumption somewhat. The buffer
is needed in very rare cases, so allocating it prematurely is not
wise.

Also, it fixes a bug in http_readcb() where the ad-hoc buffer appeared
to be improperly used, leading to a situation when the processed data
from the receiving regions can be processed twice, while unprocessed
data will never be processed.
2021-08-12 14:28:16 +03:00
Artem Boldariev
170cc41d5c Get rid of some HTTP/2 related types when NGHTTP2 is not available
This commit removes definitions of some DoH-related types when
libnghttp2 is not available.
2021-08-04 10:32:27 +03:00
Artem Boldariev
f388b71378 Get rid of RW locks in the DoH code
This commit gets rid of RW locks in a hot path of the DoH code. In the
original design, it was implied that we add new endpoints after the
HTTP listener was created. Such a design implies some locking. We do
not need such flexibility, though. Instead, we could build a set of
endpoints before the HTTP listener gets created. Such a design does
not need RW locks at all.
2021-08-04 10:32:25 +03:00
Ondřej Surý
22db2705cd Use static storage for isc_mem water_t
On the isc_mem water change the old water_t structure could be used
after free.  Instead of introducing reference counting on the hot-path
we are going to introduce additional constraints on the
isc_mem_setwater.  Once it's set for the first time, the additional
calls have to be made with the same water and water_arg arguments.
2021-07-22 11:51:46 +02:00
Artem Boldariev
590e8e0b86 Make max number of HTTP/2 streams configurable
This commit makes number of concurrent HTTP/2 streams per connection
configurable as a mean to fight DDoS attacks. As soon as the limit is
reached, BIND terminates the whole session.

The commit adds a global configuration
option (http-streams-per-connection) which can be overridden in an
http <name> {...} statement like follows:

http local-http-server {
    ...
    streams-per-connection 100;
    ...
};

For now the default value is 100, which should be enough (e.g. NGINX
uses 128, but it is a full-featured WEB-server). When using lower
numbers (e.g. ~70), it is possible to hit the limit with
e.g. flamethrower.
2021-07-16 11:50:22 +03:00
Artem Boldariev
03a557a9bb Add (http-)listener-clients option (DoH quota mechanism)
This commit adds support for http-listener-clients global options as
well as ability to override the default in an HTTP server description,
like:

http local-http-server {
    ...
    listener-clients 100;
    ...
};

This way we have ability to specify per-listener active connections
quota globally and then override it when required. This is exactly
what AT&T requested us: they wanted a functionality to specify quota
globally and then override it for specific IPs. This change
functionality makes such a configuration possible.

It makes sense: for example, one could have different quotas for
internal and external clients. Or, for example, one could use BIND's
internal ability to serve encrypted DoH with some sane quota value for
internal clients, while having un-encrypted DoH listener without quota
to put BIND behind a load balancer doing TLS offloading for external
clients.

Moreover, the code no more shares the quota with TCP, which makes
little sense anyway (see tcp-clients option), because of the nature of
interaction of DoH clients: they tend to keep idle opened connections
for longer periods of time, preventing the TCP and TLS client from
being served. Thus, the need to have a separate, generally larger,
quota for them.

Also, the change makes any option within "http <name> { ... };"
statement optional, making it easier to override only required default
options.

By default, the DoH connections are limited to 300 per listener. I
hope that it is a good initial guesstimate.
2021-07-16 11:50:20 +03:00
Artem Boldariev
954240467d Verify HTTP paths both in incoming requests and in config file
This commit adds the code (and some tests) which allows verifying
validity of HTTP paths both in incoming HTTP requests and in BIND's
configuration file.
2021-07-16 10:28:08 +03:00
Evan Hunt
4f6e2317e9 document isc__trampoline
Added some header file documentation to the isc__trampoline
implementation in trampoline_p.h.
2021-07-14 10:55:12 -07:00
Artem Boldariev
64cd7e8a7f Fix crash in DoH on empty query string in GET requests
An unhandled code path left GET query string data uninitialised (equal
to NULL) and led to a crash during the requests' base64 data
decoding. This commit fixes that.
2021-07-13 16:53:51 +03:00
Ondřej Surý
a9e6a7ae57 Disable setting the thread affinity
It was discovered that setting the thread affinity on both the netmgr
and netthread threads lead to inconsistent recursive performance because
sometimes the netmgr and netthread threads would compete over single
resource and sometimes not.

Removing setting the affinity causes a slight dip in the authoritative
performance around 5% (the measured range was from 3.8% to 7.8%), but
the recursive performance is now consistently good.
2021-07-13 14:48:29 +02:00
Ondrej Sury
6eca4b402e Use max_align_t for memory sizeinfo alignment on OpenBSD
On OpenBSD and more generally on platforms without either jemalloc or
malloc_(usable_)size, we need to increase the alignment for the memory
to sizeof(max_align_t) as with plain sizeof(void *), the compiled code
would be crashing when accessing the returned memory.
2021-07-13 13:48:33 +02:00
Ondrej Sury
23751fe252 Cache the isc_os_ncpu() result
It was discovered that on some platforms (f.e. Alpine Linux with MUSL)
the result of isc_os_ncpus() call differ when called before and after we
drop privileges.  This commit changes the isc_os_ncpus() call to cache
the result from the first call and thus always return the same value
during the runtime of the named.  The first call to isc_os_ncpus() is
made as soon as possible on the library initalization.
2021-07-13 09:12:04 +02:00
Ondřej Surý
ce03015d48 Remove nonnull attribute from isc_mem_{get,allocate,reallocate}
The isc_mem_get(), isc_mem_allocate() and isc_mem_reallocate() can
return NULL ptr in case where the allocation size is NULL.  Remove the
nonnull attribute from the functions' declarations.

This stems from the following definition in the C11 standard:

> If the size of the space requested is zero, the behavior is
> implementation-defined: either a null pointer is returned, or the
> behavior is as if the size were some nonzero value, except that the
> returned pointer shall not be used to access an object.

In this case, we return NULL as it's easier to detect errors when
accessing pointer from zero-sized allocation which should obviously
never happen.
2021-07-12 10:02:18 +02:00
Ondřej Surý
d1a9e549b1 Fix the real allocation size in OpenBSD rallocx shim
In the rallocx() shim for OpenBSD (that's the only platform that doesn't
have malloc_size() or malloc_usable_size() equivalent), the newly
allocated size was missing the extra size_t member for storing the
allocation size leading to size_t sized overflow at the end of the
reallocated memory chunk.
2021-07-12 08:43:14 +02:00
Mark Andrews
3945c289bb Reset errcnt at the start of each subtest 2021-07-12 03:47:11 +00:00
Mark Andrews
ce5207699d Fix unchecked return of isc_rwlock_lock and isc_rwlock_unlock
(cherry picked from commit bcaf23dd27)
2021-07-12 13:26:29 +10:00
Ondřej Surý
29a285a67d Revert the allocate/free -> get/put change from jemalloc change
In the jemalloc merge request, we missed the fact that ah_frees and ah_handles
are reallocated which is not compatible with using isc_mem_get() for allocation
and isc_mem_put() for deallocation.  This commit reverts that part and restores
use of isc_mem_allocate() and isc_mem_free().
2021-07-09 18:19:57 +02:00
Artem Boldariev
3673abc53c Use restrict and const in isc_mempool_t
This commit makes add restrict and const modifiers to some variables
to aid compiler to do its optimizations.
2021-07-09 15:58:02 +02:00
Artem Boldariev
c11a401add Do not use atomic variables in isc_mempool_t
As now mempool objects intended to be used in a thread-local manner,
there is no point in using atomic here.
2021-07-09 15:58:02 +02:00
Ondřej Surý
63b06571b9 Use isc_mem_get() and isc_mem_put() in isc_mem_total test
Previously, the isc_mem_allocate() and isc_mem_free() would be used for
isc_mem_total test, but since we now use the real allocation
size (sallocx, malloc_size, malloc_usable_size) to track the allocation
size, it's impossible to get the test value right.  Changing the test to
use isc_mem_get() and isc_mem_put() will use the exact size provided, so
the test would work again on all the platforms even when jemalloc is not
being used.
2021-07-09 15:58:02 +02:00
Ondřej Surý
6f162e8aa4 Rewrite isc_mem water to use single atomic exchange operation
This commit refactors the water mechanism in the isc_mem API to use
single pointer to a water_t structure that can be swapped with
atomic_exchange operation instead of having four different
values (water, water_arg, hi_water, lo_water) in the flat namespace.

This reduces the need for locking and prevents a race when water and
water_arg could be desynchronized.
2021-07-09 15:58:02 +02:00
Ondřej Surý
798333d456 Allow size == 0 in isc_mem_{get,allocate,reallocate}
Calls to jemalloc extended API with size == 0 ends up in undefined
behaviour.  This commit makes the isc_mem_get() and friends calls
more POSIX aligned:

  If size is 0, either a null pointer or a unique pointer that can be
  successfully passed to free() shall be returned.

We picked the easier route (which have been already supported in the old
code) and return NULL on calls to the API where size == 0.
2021-07-09 15:58:02 +02:00
Ondřej Surý
e20cc41e56 Use system allocator when jemalloc is unavailable
This commit adds support for systems where the jemalloc library is not
available as a package, here's the quick summary:

  * On Linux - the jemalloc is usually available as a package, if
    configured --without-jemalloc, the shim would be used around
    malloc(), free(), realloc() and malloc_usable_size()

  * On macOS - the jemalloc is available from homebrew or macports, if
    configured --without-jemalloc, the shim would be used around
    malloc(), free(), realloc() and malloc_size()

  * On FreeBSD - the jemalloc is *the* system allocator, we just need
    to check for <malloc_np.h> header to get access to non-standard API

  * On NetBSD - the jemalloc is *the* system allocator, we just need to
    check for <jemalloc/jemalloc.h> header to get access to non-standard
    API

  * On a system hostile to users and developers (read OpenBSD) - the
    jemalloc API is emulated by using ((size_t *)ptr)[-1] field to hold
    the size information.  The OpenBSD developers care only for
    themselves, so why should we care about speed on OpenBSD?
2021-07-09 15:58:02 +02:00
Ondřej Surý
e754360170 Remove atomic thread synchronization from the memory hot-path
This commit refactors the hi/lo-water related code to remove contention
on the hot path in the memory allocator.
2021-07-09 15:58:02 +02:00
Ondřej Surý
efb385ecdc Clean up isc_mempool API
- isc_mempool_get() can no longer fail; when there are no more objects
  in the pool, more are always allocated. checking for NULL return is
  no longer necessary.
- the isc_mempool_setmaxalloc() and isc_mempool_getmaxalloc() functions
  are no longer used and have been removed.
2021-07-09 15:58:02 +02:00
Ondřej Surý
f487c6948b Replace locked mempools with memory contexts
Current mempools are kind of hybrid structures - they serve two
purposes:

 1. mempool with a lock is basically static sized allocator with
    pre-allocated free items

 2. mempool without a lock is a doubly-linked list of preallocated items

The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.

The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
2021-07-09 15:58:02 +02:00
Ondřej Surý
fd3ceec475 Add debug tracing capability to isc_mempool_create/destroy
Previously, we only had capability to trace the mempool gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory pools get created and destroyed.  This commit
adds such tracking capability.
2021-07-09 15:58:02 +02:00
Ondřej Surý
5ab05d1696 Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c
The isc_mem_allocate() comes with additional cost because of the memory
tracking.  In this commit, we replace the usage with isc_mem_get()
because we track the allocated sizes anyway, so it's possible to also
replace isc_mem_free() with isc_mem_put().
2021-07-09 15:58:02 +02:00
Ondřej Surý
fcc6814776 Replace internal memory calls with non-standard jemalloc API
The jemalloc non-standard API fits nicely with our memory contexts, so
just rewrite the memory context internals to use the non-public API.

There's just one caveat - since we no longer track the size of the
allocation for isc_mem_allocate/isc_mem_free combination, we need to use
sallocx() to get real allocation size in both allocator and deallocator
because otherwise the sizes would not match.
2021-07-09 15:58:02 +02:00