Commit Graph

12470 Commits

Author SHA1 Message Date
Matthijs Mekking
facd99fd9c Group the keyid with the counters
Rather than group key ids together, group key id with its
corresponding counters. This should make growing / shrinking easier
than having keyids then counters.

(cherry picked from commit eb6a8b47d7)
2020-04-03 10:03:49 +02:00
Matthijs Mekking
f59f446122 Redesign dnssec sign statistics
The first attempt to add DNSSEC sign statistics was naive: for each
zone we allocated 64K counters, twice.  In reality each zone has at
most four keys, so the new approach only has room for four keys per
zone. If after a rollover more keys have signed the zone, existing
keys are rotated out.

The DNSSEC sign statistics has three counters per key, so twelve
counters per zone. First counter is actually a key id, so it is
clear what key contributed to the metrics.  The second counter
tracks the number of generated signatures, and the third tracks
how many of those are refreshes.

This means that in the zone structure we no longer need two separate
references to DNSSEC sign metrics: both the resign and refresh stats
are kept in a single dns_stats structure.

Incrementing dnssecsignstats:

Whenever a dnssecsignstat is incremented, we look up the key id
to see if we already are counting metrics for this key.  If so,
we update the corresponding operation counter (resign or
refresh).

If the key is new, store the value in a new counter and increment
corresponding counter.

If all slots are full, we rotate the keys and overwrite the last
slot with the new key.

Dumping dnssecsignstats:

Dumping dnssecsignstats is no longer a simple wrapper around
isc_stats_dump, but uses the same principle.  The difference is that
rather than dumping the index (key tag) and counter, we have to look
up the corresponding counter.

(cherry picked from commit 705810d577)
2020-04-03 10:03:30 +02:00
Ondřej Surý
aec1578620 Reduce rwlock contention in isc_log_wouldlog()
The rwlock introduced to protect the .logconfig member of isc_log_t
structure caused a significant performance drop because of the rwlock
contention.  It was also found, that the debug_level member of said
structure was not protected from concurrent read/writes.

The .dynamic and .highest_level members of isc_logconfig_t structure
were actually just cached values pulled from the assigned channels.

We introduced an even higher cache level for .dynamic and .highest_level
members directly into the isc_log_t structure, so we don't have to
access the .logconfig member in the isc_log_wouldlog() function.

(cherry picked from commit 3a24eacbb6)
2020-04-03 07:59:34 +00:00
Matthijs Mekking
9387729711 Only initialize goal on active keys
If we initialize goals on all keys, superfluous keys that match
the policy all desire to be active.  For example, there are six
keys available for a policy that needs just two, we only want to
set the goal state to OMNIPRESENT on two keys, not six.

(cherry picked from commit 2389fcb4dc)
2020-04-03 09:16:51 +02:00
Matthijs Mekking
4741f2d07e Test migration to dnssec-policy, retire old keys
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly.  Earlier commit deals with migration where existing keys
match the policy.  This commit deals with migration where existing
keys do not match the policy.  In that case, named must not
immediately delete the existing keys, but gracefully roll to the
dnssec-policy.

However, named did remove the existing keys immediately.  This is
because the legacy key states were initialized badly.  Because
those keys had their states initialized to HIDDEN or RUMOURED, the
keymgr decides that they can be removed (because only when the key
has its states in OMNIPRESENT it can be used safely).

The original thought to initialize key states to HIDDEN (and
RUMOURED to deal with existing keys) was to ensure that those keys
will go through the required propagation time before the keymgr
decides they can be used safely.  However, those keys are already
in the zone for a long time and making the key states represent
otherwise is dangerous: keys may be pulled out of the zone while
in fact they are required to establish the chain of trust.

Fix initializing key states for existing keys by looking more closely
at the time metadata.  Add TTL and propagation delays to the time
metadata and see if the DNSSEC records have been propagated.
Initialize the state to OMNIPRESENT if so, otherwise initialize to
RUMOURED.  If the time metadata is in the future, or does not exist,
keep initializing the state to HIDDEN.

The added test makes sure that new keys matching the policy are
introduced, but existing keys are kept in the zone until the new
keys have been propagated.

(cherry picked from commit 7f43520893)
2020-04-03 09:16:11 +02:00
Matthijs Mekking
7aa5a11bdd Fix and test migration to dnssec-policy
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Several adjustments in the keymgr are required to get it right:

- Set published time on keys when we calculate prepublication time.
  This is not strictly necessary, but it is weird to have an active
  key without the published time set.

- Initalize key states also before matching keys. Determine the
  target state by looking at existing time metadata: If the time
  data is set and is in the past, it is a hint that the key and
  its corresponding records have been published in the zone already,
  and the state is initialized to RUMOURED. Otherwise, initialize it
  as HIDDEN. This fixes migration to dnssec-policy from existing
  keys.

- Initialize key goal on keys that match key policy to OMNIPRESENT.
  These may be existing legacy keys that are being migrated.

- A key that has its goal to OMNIPRESENT *or* an active key can
  match a kasp key.  The code was changed with CHANGE 5354 that
  was a bugfix to prevent creating new KSK keys for zones in the
  initial stage of signing.  However, this caused problems for
  restarts when rollovers are in progress, because an outroducing
  key can still be an active key.

The test for this introduces a new KEY property 'legacy'.  This is
used to skip tests related to .state files.

(cherry picked from commit 6801899134)
2020-04-03 09:15:39 +02:00
Evan Hunt
a288dee81e incrementally clean up old RPZ records during updates
After an RPZ zone is updated via zone transfer, the RPZ summary
database is updated, inserting the newly added names in the policy
zone and deleting the newly removed ones. The first part of this
was quantized so it would not run too long and starve other tasks
during large updates, but the second part was not quantized, so
that an update in which a large number of records were deleted
could cause named to become briefly unresponsive.

(cherry picked from commit 32da119ed8)
2020-04-01 01:32:55 -07:00
Witold Kręcicki
3274650123 Deactivate the handle before sending the async close callback.
We could have a race between handle closing and processing async
callback. Deactivate the handle before issuing the callback - we
have the socket referenced anyway so it's not a problem.
2020-03-30 10:54:12 +00:00
Witold Kręcicki
7ab77d009d Add a quota attach function with a callback, some code cleanups.
We introduce a isc_quota_attach_cb function - if ISC_R_QUOTA is returned
at the time the function is called, then a callback will be called when
there's quota available (with quota already attached). The callbacks are
organized as a LIFO queue in the quota structure.
It's needed for TCP client quota -  with old networking code we had one
single place where tcp clients quota was processed so we could resume
accepting when the we had spare slots, but it's gone with netmgr - now
we need to notify the listener/accepter that there's quota available so
that it can resume accepting.

Remove unused isc_quota_force() function.

The isc_quote_reserve and isc_quota_release were used only internally
from the quota.c and the tests.  We should not expose API we are not
using.

(cherry picked from commit d151a10f30)
2020-03-30 10:29:33 +02:00
Ondřej Surý
e017574b74 Correct the typecast of .tv_sec in isc_stdtime_get() 2020-03-25 22:10:10 +01:00
Ondřej Surý
2bb2a10ba4 Fix the tv_nsec check in isc_stdtime_get()
(cherry picked from commit 0d06a62dd1)
2020-03-25 21:19:55 +01:00
Ondřej Surý
230d250b3d Fix 'Dead nested assignment's from scan-build-10
The 3 warnings reported are:

os.c:872:7: warning: Although the value stored to 'ptr' is used in the enclosing expression, the value is never actually read from 'ptr'
        if ((ptr = strtok_r(command, " \t", &last)) == NULL) {
             ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

--

rpz.c:1117:10: warning: Although the value stored to 'zbits' is used in the enclosing expression, the value is never actually read from 'zbits'
        return (zbits &= x);
                ^        ~
1 warning generated.

--

openssleddsa_link.c:532:10: warning: Although the value stored to 'err' is used in the enclosing expression, the value is never actually read from 'err'
        while ((err = ERR_get_error()) != 0) {
                ^     ~~~~~~~~~~~~~~~
1 warning generated.

(cherry picked from commit 262f087bcf)
2020-03-25 18:06:29 +01:00
Mark Andrews
0b13677f7f Used to the correct unlock type (read)
(cherry picked from commit b7dbfd14d8)
2020-03-24 15:44:06 +11:00
Tinderbox User
aed7d77c97 prep 9.16.1
Updated version and CHANGES files with new release number.

Check the API files:
- lib/bind9/api:
  Source code changes, but no interface changes: increment
  LIBREVISION.
- lib/dns/api:
  Function dns_acl_match changed, struct dns_badcache changed,
  function dns_badcache_add changed, function dns_clent_startupdate
  changed, struct dns_compress changed, struct dns_resolver changed,
  rwlock size changed. This means a LIBINTERFACE increment.
- lib/irs/api:
  Source code changes, but no interface changes: increment
  LIBREVISION.
- lib/isc/api:
  The structs isc__networker and isc_nmsocket changed. This means
  increment LIBINTERFACE.  The functions isc_uv_export and
  isc_uv_import are removed, so LIBAGE must beq zero.
- lib/isccc/api:
  Source code changes, but no interface changes: increment
  LIBREVISION.
- lib/isccfg/api:
  Source code changes, but no interface changes: increment
  LIBREVISION.
- lib/ns/api:
  Function ns_clientmgr_create, ns_interfacemgr_create, and
  structs ns_clientmgr, ns_interface, ns_interfacemgr changed:
  increment LIBINTERFACE.

No need to update README or release notes.

Updated CHANGES: Add GitLab MR reference to entry 5357. Remove
merge conflict gone wrong ("max-ixfr-ratio" is not in 9.16).

Add /util/check-make-install.in to .gitattributes.
2020-03-20 11:47:01 +01:00
Ondřej Surý
0345dac44c Use clock_gettime() instead of gettimeofday() for isc_stdtime function
This also removes Solaris 2.8 broken gettimeofday() workaround

(cherry picked from commit e691b89a9a)
2020-03-19 10:17:26 +01:00
Ondřej Surý
11a6ac594a Use isc_rwlock to lock .logconfig member of isc_log_t
In isc_log_woudlog() the .logconfig member of isc_log_t structure was
accessed unlocked on the merit that there could be just a race when
.logconfig would be NULL, so the message would not be logged.  This
turned not to be true, as there's also data race deeper.  The accessed
isc_logconfig_t object could be in the middle of destruction, so the
pointer would be still non-NULL, but the structure members could point
to a chunk of memory no longer belonging to the object.  Since we are
only accessing integer types (the log level), this would never lead to
a crash, it leads to memory access to memory area no longer belonging to
the object and this a) wrong, b) raises a red flag in thread-safety tools.

(cherry picked from commit 4d58856ff7)
2020-03-18 13:25:28 +01:00
Mark Andrews
af14091f65 Refactor the isc_log API so it cannot fail on memory failures
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.

(cherry picked from commit 0b793166d0)
2020-03-18 11:44:18 +01:00
Ondřej Surý
bfe832aea7 Add C11 localtime_r and gmtime_r shims for Windows
On Windows, C11 localtime_r() and gmtime_r() functions are not
available.  While localtime() and gmtime() functions are already thread
safe because they use Thread Local Storage, it's quite ugly to #ifdef
around every localtime_r() and gmtime_r() usage to make the usage also
thread-safe on POSIX platforms.

The commit adds wrappers around Windows localtime_s() and gmtime_s()
functions.

NOTE: The implementation of localtime_s and gmtime_s in Microsoft CRT
are incompatible with the C standard since it has reversed parameter
order and errno_t return type.

(cherry picked from commit 08f4c7d6c0)
2020-03-17 15:33:24 -07:00
Evan Hunt
82edb5a54a silence a warning about unsafe snprintf() call
(cherry picked from commit ec95b84e8d)
2020-03-17 15:33:24 -07:00
Evan Hunt
1ac200626b clean up dead code
removed an if statement that always evaluated to false

(cherry picked from commit fc5ae3192b)
2020-03-17 15:33:24 -07:00
Evan Hunt
3d9a46bcb8 replace unsafe ctime() and gmtime() function calls
This silences LGTM warnings that these functions are not thread-safe.

(cherry picked from commit 5703f70427)
2020-03-17 15:33:24 -07:00
Evan Hunt
a8184b35cd remove or comment empty conditional branches
some empty conditional branches which contained a semicolon were
"fixed" by clang-format to contain nothing. add comments to prevent this.

(cherry picked from commit 735be3b816)
2020-03-17 15:33:23 -07:00
Evan Hunt
64ce02b5f8 fix a pointer-to-int cast error
(cherry picked from commit 6b76646037)
2020-03-17 13:10:42 -07:00
Mark Andrews
86a30a691b Add MAXMINDDB_CFLAGS to CINCLUDES
(cherry picked from commit 81a80274bd)
2020-03-16 18:51:52 +11:00
Mark Andrews
cf6b2a6c18 Silence missing unlock from Coverity.
Save 'i' to 'locknum' and use that rather than using
'header->node->locknum' when performing the deferred
unlock as 'header->node->locknum' can theoretically be
different to 'i'.

(cherry picked from commit 8dd8d48c9f)
2020-03-13 13:17:46 +11:00
Evan Hunt
c5405c2700 improve calculation of database size
"max-journal-size" is set by default to twice the size of the zone
database. however, the calculation of zone database size was flawed.

- change the size calculations in dns_db_getsize() to more accurately
  represent the space needed for a journal file or *XFR message to
  contain the data in the database. previously we returned the sizes
  of all rdataslabs, including header overhead and offset tables,
  which resulted in the database size being reported as much larger
  than the equivalent journal transactions would have been.
- map files caused a particular problem here: the full name can't be
  determined from the node while a file is being deserialized, because
  the uppernode pointers aren't set yet. so we store "full name length"
  in the dns_rbtnode structure while serializing, and clear it after
  deserialization is complete.
2020-03-12 00:38:37 -07:00
Ondřej Surý
67464af0bb Fixup the headers formatting 2020-03-11 10:23:35 +01:00
Ondřej Surý
60c6ff4ece Fix the deeper symlinks to .clang-format.headers 2020-03-11 10:21:54 +01:00
Ondřej Surý
f3c2274479 Use the new sorting rules to regroup #include headers 2020-03-11 08:55:12 +00:00
Evan Hunt
2db2a22f28 remove redundant ZONEDB_UNLOCK
(cherry picked from commit b54454b7c6)
2020-03-09 16:47:44 -07:00
Matthijs Mekking
29cde9e990 Fix race condition dnssec-policy with views
When configuring the same dnssec-policy for two zones with the same
name but in different views, there is a race condition for who will
run the keymgr first. If running sequential only one set of keys will
be created, if running parallel two set of keys will be created.

Lock the kasp when running looking for keys and running the key
manager. This way, for the same zone in different views only one
keyset will be created.

The dnssec-policy does not implement sharing keys between different
zones.

(cherry picked from commit e0bdff7ecd)
2020-03-09 16:25:35 +01:00
Matthijs Mekking
01098fb81e Make clang-format happy
(cherry picked from commit 53bd81ad19)
2020-03-09 14:42:53 +01:00
Matthijs Mekking
c20ac664dd [#1624] dnssec-policy change retire unwanted keys
When changing a dnssec-policy, existing keys with properties that no
longer match were not being retired.

(cherry picked from commit 3905a03205)
2020-03-09 14:42:53 +01:00
Matthijs Mekking
4bbefa8514 [#1625] Algorithm rollover waited too long
Algorithm rollover waited too long before introducing zone
signatures.  It waited to make sure all signatures were resigned,
but when introducing a new algorithm, all signatures are resigned
immediately.  Only add the sign delay if there is a predecessor key.

(cherry picked from commit 28506159f0)
2020-03-09 14:42:53 +01:00
Matthijs Mekking
150464e719 [#1626] Fix stuck algorithm rollover
Algorithm rollover was stuck on submitting DS because keymgr thought
it would move to an invalid state.  It did not match the current
key because it checked it against the current key in the next state.
Fixed by when checking the current key, check it against the desired
state, not the existing state.

(cherry picked from commit a8542b8cab)
2020-03-09 14:42:53 +01:00
Matthijs Mekking
f8b555a3a2 Add algorithm rollover test case
Add a test case for algorithm rollover.  This is triggered by
changing the dnssec-policy.  A new nameserver ns6 is introduced
for tests related to dnssec-policy changes.

This requires a slight change in check_next_key_event to only
check the last occurrence.  Also, change the debug log message in
lib/dns/zone.c to deal with checks when no next scheduled key event
exists (and default to loadkeys interval 3600).

(cherry picked from commit 88ebe9581b)
2020-03-09 14:42:53 +01:00
Diego Fronza
277581c5a1 Fixed disposing of resolver->references in destroy() function 2020-03-06 13:37:07 -03:00
Diego Fronza
341b69aa7e Fixed potential-lock-inversion
This commit simplifies a bit the lock management within dns_resolver_prime()
and prime_done() functions by means of turning resolver's attribute
"priming" into an atomic_bool and by creating only one dependent object on the
lock "primelock", namely the "primefetch" attribute.

By having the attribute "priming" as an atomic type, it save us from having to
use a lock just to test if priming is on or off for the given resolver context
object, within "dns_resolver_prime" function.

The "primelock" lock is still necessary, since dns_resolver_prime() function
internally calls dns_resolver_createfetch(), and whenever this function
succeeds it registers an event in the task manager which could be called by
another thread, namely the "prime_done" function, and this function is
responsible for disposing the "primefetch" attribute in the resolver object,
also for resetting "priming" attribute to false.

It is important that the invariant "priming == false AND primefetch == NULL"
remains constant, so that any thread calling "dns_resolver_prime" knows for sure
that if the "priming" attribute is false, "primefetch" attribute should also be
NULL, so a new fetch context could be created to fulfill this purpose, and
assigned to "primefetch" attribute under the lock protection.

To honor the explanation above, dns_resolver_prime is implemented as follow:
	1. Atomically checks the attribute "priming" for the given resolver context.
	2. If "priming" is false, assumes that "primefetch" is NULL (this is
           ensured by the "prime_done" implementation), acquire "primelock"
	   lock and create a new fetch context, update "primefetch" pointer to
	   point to the newly allocated fetch context.
	3. If "priming" is true, assumes that the job is already in progress,
	   no locks are acquired, nothing else to do.

To keep the previous invariant consistent, "prime_done" is implemented as follow:
	1. Acquire "primefetch" lock.
	2. Keep a reference to the current "primefetch" object;
	3. Reset "primefetch" attribute to NULL.
	4. Release "primefetch" lock.
	5. Atomically update "priming" attribute to false.
	6. Destroy the "primefetch" object by using the temporary reference.

This ensures that if "priming" is false, "primefetch" was already reset to NULL.

It doesn't make any difference in having the "priming" attribute not protected
by a lock, since the visible state of this variable would depend on the calling
order of the functions "dns_resolver_prime" and "prime_done".

As an example, suppose that instead of using an atomic for the "priming" attribute
we employed a lock to protect it.
Now suppose that "prime_done" function is called by Thread A, it is then preempted
before acquiring the lock, thus not reseting "priming" to false.
In parallel to that suppose that a Thread B is scheduled and that it calls
"dns_resolver_prime()", it then acquires the lock and check that "priming" is true,
thus it will consider that this resolver object is already priming and it won't do
any more job.
Conversely if the lock order was acquired in the other direction, Thread B would check
that "priming" is false (since prime_done acquired the lock first and set "priming" to false)
and it would initiate a priming fetch for this resolver.

An atomic variable wouldn't change this behavior, since it would behave exactly the
same, depending on the function call order, with the exception that it would avoid
having to use a lock.

There should be no side effects resulting from this change, since the previous
implementation employed use of the more general resolver's "lock" mutex, which
is used in far more contexts, but in the specifics of the "dns_resolver_prime"
and "prime_done" it was only used to protect "primefetch" and "priming" attributes,
which are not used in any of the other critical sections protected by the same lock,
thus having zero dependency on those variables.
2020-03-06 13:37:07 -03:00
Diego Fronza
84d6896661 Added atomic_compare_exchange_strong_acq_rel macro
It is much better to read than:
atomic_compare_exchange_strong_explicit() with 5 arguments.
2020-03-06 13:37:07 -03:00
Mark Andrews
5d64049301 Fix lists of installed header files 2020-03-06 13:00:04 +11:00
Witold Kręcicki
aec3dd28d6 Destroy query in killoldestquery under a lock.
Fixes a race between ns_client_killoldestquery and ns_client_endrequest -
killoldestquery takes a client from `recursing` list while endrequest
destroys client object, then killoldestquery works on a destroyed client
object. Prevent it by holding reclist lock while cancelling query.

(cherry picked from commit df3dbdff81)
2020-03-05 23:55:42 +00:00
Witold Kręcicki
5b22e3689d Only use tcpdns timer if it's initialized.
(cherry picked from commit 4b9962d4a3)
2020-03-05 23:27:56 +00:00
Witold Kręcicki
b32b01d403 Fix TCPDNS socket closing issues
(cherry picked from commit ae1499ca19)
2020-03-05 23:27:56 +00:00
Witold Kręcicki
11b80da9ff Limit TCP connection quota logging to 1/s
(cherry picked from commit fc9792eae8)
2020-03-05 23:27:56 +00:00
Witold Kręcicki
b85de76816 Proper accounting of active TCP connections
(cherry picked from commit fc9e2276ca)
2020-03-05 23:27:56 +00:00
Tony Finch
48860528d0 Fix dns_client_addtrustedkey(dns_rdatatype_dnskey)
Use a buffer that is big enough for DNSKEY records as well as DS
records.

(cherry picked from commit 689ef89b67)
2020-03-04 15:42:12 -08:00
Witold Kręcicki
3212ab0052 Workaround for clang static analyzer bug.
(cherry picked from commit 6c8f309745)
2020-03-04 10:48:59 +01:00
Witold Kręcicki
1296e48e11 Badcache with multiple locks.
Previously badcache used one single mutex for everything, which
was causing performance issues. Use one global rwlock for the whole
hashtable and per-bucket mutexes.

(cherry picked from commit 47e5f5564c)
2020-03-04 10:48:59 +01:00
Mark Andrews
3e08796e85 Restart zone maintenance if it had been stalled.
(cherry picked from commit f171347b5f)
2020-03-04 09:11:46 +11:00
Witold Kręcicki
e4d39f57ff Fix a race in isc_socket destruction.
There was a very slim chance of a race between isc_socket_detach and
process_fd: isc_socket_detach decrements references to 0, and before it
calls destroy gets preempted. Second thread calls process_fd, increments
socket references temporarily to 1, and then gets preempted, first thread
then hits assertion in destroy() as the reference counter is now 1 and
not 0.

(cherry picked from commit 81ba0fe0e6)
2020-03-03 09:26:54 +01:00