Commit Graph

31898 Commits

Author SHA1 Message Date
Michał Kępień
3ab00a11eb Remove arm64 jobs from GitLab CI
The only arm64 runner we have at our disposal is suffering from
intermittent connectivity issues which make it unusable for extended
periods of time.  Remove arm64 jobs from GitLab CI until we manage to
set up an arm64 runner with more reliable connectivity.

(cherry picked from commit 49f245f7c0)
2020-08-05 12:07:37 +02:00
Michał Kępień
2798a4443d Merge branch '2065-set-max-cache-size-in-the-geoip2-system-test-v9_16' into 'v9_16'
[v9_16] Set "max-cache-size" in the "geoip2" system test

See merge request isc-projects/bind9!3921
2020-08-05 09:43:19 +00:00
Michał Kępień
fec4664ab0 Set "max-cache-size" in the "geoip2" system test
The named configuration files used in the "geoip2" system test cause a
rather large number of views (6-8) to be set up in each tested named
instance.  Each view has its own cache.

Commit aa72c31422 caused the RBT hash
table to be pre-allocated to a size derived from "max-cache-size", so
that it never needs to be rehashed.  The size of that hash table is not
expected to be significant enough to cause memory use issues in typical
conditions even for large "max-cache-size" settings.

However, these two factors combined can cause memory exhaustion issues
in GitLab CI, where we run multiple "instances" of the test suite in
parallel on the same runner, each test suite executes multiple system
tests concurrently, and each system test may potentially start multiple
named instances at the same time.  In practice, this problem currently
only seems to be affecting the "geoip2" system test, which is failing
intermittently due to named instances used by that test getting killed
by oom-killer.

Prevent the "geoip2" system test from failing intermittently by setting
"max-cache-size" in named configuration files used in that test to a low
value in order to keep memory usage at bay even with a large number of
views configured.

(cherry picked from commit 4292d5bdfe)
2020-08-05 11:08:24 +02:00
Matthijs Mekking
71cd1a1d5c Merge branch 'ondrej-serve-stale-improvements-v9_16' into 'v9_16'
Serve stale improvements (9.16)

See merge request isc-projects/bind9!3913
2020-08-05 08:16:10 +00:00
Matthijs Mekking
f3103660d0 keyword 'primaries' is unknown in 9.16
In 9.17 we introduced 'primaries' as a synonym for 'masters' in the
configuration file. This synonym has not been backported so change
the serve-stale test to make use of the 'masters' keyword.
2020-08-05 09:09:16 +02:00
Matthijs Mekking
c92de6cb44 stale-cache-enable is enabled by default
Because this is a backport, the option should default to keep the
serve-stale caching enabled.
2020-08-05 09:09:16 +02:00
Ondřej Surý
f3a7ee87ef Add CHANGES and release notes for GL #1712 and GL #1829
(cherry picked from commit dd62275152)
2020-08-05 09:09:16 +02:00
Ondřej Surý
c4e6ade0e5 Add tests with stale-cache-disabled into serve-stale system test
Add a fifth named (ns5) that runs with `stale-cache-enable no;` and
check that there are no stale records in the cache.

(cherry picked from commit abc2ab9223)
2020-08-05 09:09:16 +02:00
Ondřej Surý
f9711481ad Expire the 0 TTL RRSet quickly rather using them for serve-stale
When a received RRSet has TTL 0, they would be preserved for
serve-stale (default `max-stale-cache` is 12 hours) rather than expiring
them quickly from the cache database.

This commit makes sure the RRSet didn't have TTL 0 before marking the
entry in the database as "stale".

(cherry picked from commit 6ffa2ddae0)
2020-08-05 09:09:16 +02:00
Ondřej Surý
b48e9ab201 Add stale-cache-enable option and disable serve-stable by default
The current serve-stale implementation in BIND 9 stores all received
records in the cache for a max-stale-ttl interval (default 12 hours).

This allows DNS operators to turn the serve-stale answers in an event of
large authoritative DNS outage.  The caching of the stale answers needs
to be enabled before the outage happens or the feature would be
otherwise useless.

The negative consequence of the default setting is the inevitable
cache-bloat that happens for every and each DNS operator running named.

In this MR, a new configuration option `stale-cache-enable` is
introduced that allows the operators to selectively enable or disable
the serve-stale feature of BIND 9 based on their decision.

The newly introduced option has been disabled by default,
e.g. serve-stale is disabled in the default configuration and has to be
enabled if required.

(cherry picked from commit ce53db34d6)
2020-08-05 09:09:16 +02:00
Michał Kępień
b1792f2ec5 Merge branch '2030-bind-arm-incorrectly-documents-the-processing-of-forwarders-still-has-the-pre-9-3-0-explanation-v9_16' into 'v9_16'
[v9_16] Update description of forwarding behavior in ARM

See merge request isc-projects/bind9!3917
2020-08-04 19:50:56 +00:00
Suzanne Goldlust
2d530d259a Update description of forwarding behavior in ARM
(cherry picked from commit 30e126ad02)
2020-08-04 21:42:32 +02:00
Mark Andrews
702c840e93 Merge branch 'marka-DNS_R_BADTSIG-map-to-FORMERR-v9_16' into 'v9_16'
Marka dns r badtsig map to formerr v9 16

See merge request isc-projects/bind9!3914
2020-08-04 13:30:51 +00:00
Mark Andrews
20bc6aefff Check rcode is FORMERR
(cherry picked from commit 88ff6b846c)
2020-08-04 23:04:34 +10:00
Mark Andrews
2dc26ebdb6 Map DNS_R_BADTSIG to FORMERR
Now that the log message has been printed set the result code to
DNS_R_FORMERR.  We don't do this via dns_result_torcode() as we
don't want upstream errors to produce FORMERR if that processing
end with DNS_R_BADTSIG.

(cherry picked from commit 20488d6ad3)
2020-08-04 23:04:34 +10:00
Diego dos Santos Fronza
535d27ffc5 Merge branch '1719-observed-stats-underflow-in-multiple-stats-v9_16' into 'v9_16'
Resolve "Observed stats underflow in multiple stats"

See merge request isc-projects/bind9!3866
2020-08-03 23:20:43 +00:00
Diego Fronza
e1561f0eb2 Add CHANGES and release note for #1719 2020-08-03 19:18:04 -03:00
Diego Fronza
fca1000ee9 Fix ns_statscounter_recursclients underflow
The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.

The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.

Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.

To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.

For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
2020-08-03 19:18:04 -03:00
Michał Kępień
7519821f16 Merge branch 'michal/restore-placeholder-entry-at-sequence-number-5481-v9_16' into 'v9_16'
[v9_16] Restore placeholder entry at sequence number 5481

See merge request isc-projects/bind9!3911
2020-08-03 20:15:19 +00:00
Michał Kępień
b917e9eb20 Restore placeholder entry at sequence number 5481
(cherry picked from commit 029e32c01a)
2020-08-03 22:14:11 +02:00
Ondřej Surý
5f8ecfb918 Merge branch '2038-use-freebind-when-bind-fails-v9_16' into 'v9_16'
Resolve "Bind not handling interfaces changes correctly when listen-on-v6  any  specified"

See merge request isc-projects/bind9!3907
2020-07-31 15:55:53 +00:00
Witold Kręcicki
95fb38619b Add CHANGES and release note for GL #2038
(cherry picked from commit 94eda43ab2)
2020-07-31 13:33:24 +02:00
Witold Kręcicki
a12076cc52 netmgr: retry binding with IP_FREEBIND when EADDRNOTAVAIL is returned.
When a new IPv6 interface/address appears it's first in a tentative
state - in which we cannot bind to it, yet it's already being reported
by the route socket. Because of that BIND9 is unable to listen on any
newly detected IPv6 addresses. Fix it by setting IP_FREEBIND option (or
equivalent option on other OSes) and then retrying bind() call.

(cherry picked from commit a0f7d28967)
2020-07-31 13:33:06 +02:00
Michał Kępień
23021385d5 Merge branch 'michal/only-run-system-tests-as-root-in-developer-mode-v9_16' into 'v9_16'
[v9_16] Only run system tests as root in developer mode

See merge request isc-projects/bind9!3897
2020-07-31 05:47:20 +00:00
Michał Kępień
e734651fbd Only run system tests as root in developer mode
Running system tests with root privileges is potentially dangerous.
Only allow it when explicitly requested (by building with
--enable-developer).

(cherry picked from commit 3ef106f69d)
2020-07-31 07:46:27 +02:00
Mark Andrews
3d606d902f Merge branch '1456-always-check-return-from-isc_refcount_decrement-v9_16' into 'v9_16'
Always check the return from isc_refcount_decrement.

See merge request isc-projects/bind9!3901
2020-07-31 03:32:51 +00:00
Mark Andrews
14fe6e77a7 Always check the return from isc_refcount_decrement.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range.  Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.

(cherry picked from commit bde5c7632a)
2020-07-31 12:54:47 +10:00
Mark Andrews
8454b9dbb8 Merge branch '2033-rndc-dnstap-roll-fix-was-incomplete-v9_16' into 'v9_16'
Refactor the code that counts the last log version to keep

See merge request isc-projects/bind9!3900
2020-07-31 00:25:59 +00:00
Mark Andrews
1981fb1327 Refactor the code that counts the last log version to keep
When silencing the Coverity warning in remove_old_tsversions(), the code
was refactored to reduce the indentation levels and break down the long
code into individual functions.  This improve fix for [GL #1989].

(cherry picked from commit aca18b8b5b)
2020-07-31 10:01:36 +10:00
Michal Nowak
2fa1c95357 Merge branch 'mnowak/various-system-test-fixes-v9_16' into 'v9_16'
[v9_16] Various system test fixes

See merge request isc-projects/bind9!3898
2020-07-30 14:57:48 +00:00
Michal Nowak
0f319908f0 Remove cross-test dependency on ckdnsrps.sh 2020-07-30 16:25:23 +02:00
Michal Nowak
72a6b0dc6f Fix name of the test directory of stop.pl in masterformat test 2020-07-30 16:24:18 +02:00
Michal Nowak
24f5f68d7a Ensure test fails if packet.pl does not work as expected 2020-07-30 16:20:46 +02:00
Ondřej Surý
d8f7f0e747 Merge branch '1775-resizing-growing-of-cache-hash-tables-causes-delays-in-processing-of-client-queries-v9_16' into 'v9_16'
Resolve "Resizing (growing) of cache hash tables causes delays in processing of client queries"

See merge request isc-projects/bind9!3871
2020-07-30 11:47:32 +00:00
Ondřej Surý
343330413a Add CHANGES and release note for #1775
(cherry picked from commit 2b4f0f03f5)
2020-07-30 11:57:41 +02:00
Ondřej Surý
0fff3008ac Change the dns_name hashing to use 32-bit values
Change the dns_hash_name() and dns_hash_fullname() functions to use
isc_hash32() as the maximum hashtable size in rbt is 0..UINT32_MAX
large.

(cherry picked from commit a9182c89a6)
2020-07-30 11:57:24 +02:00
Ondřej Surý
ebb2b055cc Add isc_hash32() and rename isc_hash_function() to isc_hash64()
As the names suggest the original isc_hash64 function returns 64-bit
long hash values and the isc_hash32() returns 32-bit values.

(cherry picked from commit f59fd49fd8)
2020-07-30 11:57:24 +02:00
Ondřej Surý
1e5df7f3bf Add HalfSipHash 2-4 reference implementation
The HalfSipHash implementation has 32-bit keys and returns 32-bit
value.

(cherry picked from commit 344d66aaff)
2020-07-30 11:57:24 +02:00
Ondřej Surý
d89eb403f3 Remove OpenSSL based SipHash 2-4 implementation
Creation of EVP_MD_CTX and EVP_PKEY is quite expensive, so until we fix the code
to reuse the OpenSSL contexts and keys we'll use our own implementation of
siphash instead of trying to integrate with OpenSSL.

(cherry picked from commit 21d751dfc7)
2020-07-30 11:57:24 +02:00
Ondřej Surý
aa72c31422 Fix the rbt hashtable and grow it when setting max-cache-size
There were several problems with rbt hashtable implementation:

1. Our internal hashing function returns uint64_t value, but it was
   silently truncated to unsigned int in dns_name_hash() and
   dns_name_fullhash() functions.  As the SipHash 2-4 higher bits are
   more random, we need to use the upper half of the return value.

2. The hashtable implementation in rbt.c was using modulo to pick the
   slot number for the hash table.  This has several problems because
   modulo is: a) slow, b) oblivious to patterns in the input data.  This
   could lead to very uneven distribution of the hashed data in the
   hashtable.  Combined with the single-linked lists we use, it could
   really hog-down the lookup and removal of the nodes from the rbt
   tree[a].  The Fibonacci Hashing is much better fit for the hashtable
   function here.  For longer description, read "Fibonacci Hashing: The
   Optimization that the World Forgot"[b] or just look at the Linux
   kernel.  Also this will make Diego very happy :).

3. The hashtable would rehash every time the number of nodes in the rbt
   tree would exceed 3 * (hashtable size).  The overcommit will make the
   uneven distribution in the hashtable even worse, but the main problem
   lies in the rehashing - every time the database grows beyond the
   limit, each subsequent rehashing will be much slower.  The mitigation
   here is letting the rbt know how big the cache can grown and
   pre-allocate the hashtable to be big enough to actually never need to
   rehash.  This will consume more memory at the start, but since the
   size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
   will only consume maximum of 32GB of memory for hashtable in the
   worst case (and max-cache-size would need to be set to more than
   4TB).  Calling the dns_db_adjusthashsize() will also cap the maximum
   size of the hashtable to the pre-computed number of bits, so it won't
   try to consume more gigabytes of memory than available for the
   database.

   FIXME: What is the average size of the rbt node that gets hashed?  I
   chose the pagesize (4k) as initial value to precompute the size of
   the hashtable, but the value is based on feeling and not any real
   data.

For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.

Notes:
a. A doubly linked list should be used here to speedup the removal of
   the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

(cherry picked from commit e24bc324b4)
2020-07-30 11:57:24 +02:00
Michał Kępień
57b29d8967 Merge branch '2024-fix-idle-timeout-for-connected-tcp-sockets-v9_16' into 'v9_16'
[v9_16] Fix idle timeout for connected TCP sockets

See merge request isc-projects/bind9!3896
2020-07-30 09:49:50 +00:00
Michał Kępień
8b3014507a Add CHANGES for GL #2024
(cherry picked from commit 18efb2456f)
2020-07-30 11:16:18 +02:00
Michał Kępień
b6c33087b0 Fix idle timeout for connected TCP sockets
When named acting as a resolver connects to an authoritative server over
TCP, it sets the idle timeout for that connection to 20 seconds.  This
fixed timeout was picked back when the default processing timeout for
each client query was hardcoded to 30 seconds.  Commit
000a8970f8 made this processing timeout
configurable through "resolver-query-timeout" and decreased its default
value to 10 seconds, but the idle TCP timeout was not adjusted to
reflect that change.  As a result, with the current defaults in effect,
a single hung TCP connection will consistently cause the resolution
process for a given query to time out.

Set the idle timeout for connected TCP sockets to half of the client
query processing timeout configured for a resolver.  This allows named
to handle hung TCP connections more robustly and prevents the timeout
mismatch issue from resurfacing in the future if the default is ever
changed again.

(cherry picked from commit 953d704bd2)
2020-07-30 11:16:09 +02:00
Evan Hunt
65e0da3ad8 Merge branch '2050-libuv-version-v9_16' into 'v9_16'
report libuv version string in `named -V`

See merge request isc-projects/bind9!3890
2020-07-28 03:01:58 +00:00
Evan Hunt
bbc739b09b report libuv version string in named -V
(cherry picked from commit 1036338a10)
2020-07-27 19:55:22 -07:00
Evan Hunt
8247aeb8e9 Merge branch '1619-rpz-wildcard-passthru-ignored-v9_16' into 'v9_16'
Resolve "RPZ wildcard passthru ignored"

See merge request isc-projects/bind9!3889
2020-07-28 02:50:38 +00:00
Diego Fronza
31af3af57c Add CHANGES entry 2020-07-27 17:18:11 -03:00
Diego Fronza
1a101f223c Add test for RPZ wildcard passthru ignored fix 2020-07-27 17:17:02 -03:00
Diego Fronza
a8ce7b461c Fix rpz wildcard name matching
Whenever an exact match is found by dns_rbt_findnode(),
the highest level node in the chain will not be put into
chain->levels[] array, but instead the chain->end
pointer will be adjusted to point to that node.

Suppose we have the following entries in a rpz zone:
example.com     CNAME rpz-passthru.
*.example.com   CNAME rpz-passthru.

A query for www.example.com would result in the
following chain object returned by dns_rbt_findnode():

chain->level_count = 2
chain->level_matches = 2
chain->levels[0] = .
chain->levels[1] = example.com
chain->levels[2] = NULL
chain->end = www

Since exact matches only care for testing rpz set bits,
we need to test for rpz wild bits through iterating the nodechain, and
that includes testing the rpz wild bits in the highest level node found.

In the case of an exact match, chain->levels[chain->level_matches]
will be NULL, to address that we must use chain->end as the start point,
then iterate over the remaining levels in the chain.
2020-07-27 17:02:16 -03:00
Mark Andrews
6720ba8335 Merge branch '2043-dns_rdata_hip_next-fails-to-return-isc_r_nomore-at-the-right-time-v9_16' into 'v9_16'
Resolve "dns_rdata_hip_next() fails to return ISC_R_NOMORE at the right time."

See merge request isc-projects/bind9!3885
2020-07-24 05:47:50 +00:00