Compare commits

...

1 Commits

Author SHA1 Message Date
Michał Kępień
bdd9773c7f Test serve-stale behavior in a shared cache setup
Add a test that intends to trigger a very specific order of events that
led to a crash before commit bbd163acf6:

 1. Two views, view A and view B, are attached to a shared cache.
    Serve-stale is enabled; "stale-answer-client-timeout" is set to a
    positive integer.

 2. The following DNS response chain is cached:

        cname.selective.    5     IN    CNAME    a.selective.
        a.selective.        10    IN    A        10.53.0.2

 3. cname.selective/CNAME expires from cache. a.selective/A remains
    active.

 4. Both view A and view B are queried for cname.selective/A.

 5. The resolvers for both views start recursion due to the
    cname.selective/CNAME record being expired.

 6. The resolver for view A manages to successfully resolve the query.
    Due to packet loss, the resolver for view B fails to resolve the
    query and continues querying the authoritative servers.

 7. "stale-answer-client-timeout" fires for cname.selective/A in view B.
    Since the resolver for view A managed to resolve
    cname.selective/CNAME in the meantime and the a.selective/A record
    has not expired from cache yet, the final answer for the client
    query received by view B is readily available and is therefore sent
    back to the client.

 8. The a.selective/A record expires from cache.

 9. The resolver for view B manages to resolve cname.selective/CNAME and
    resumes resolution for its target name, i.e. a.selective/A.  Since
    the latter expired from cache (in step 8), recursive resolution is
    started for a.selective/A.

10. Due to packet loss, the a.selective/A queries that the resolver for
    view B sends to authoritative servers remain unanswered.

11. "stale-answer-client-timeout" fires for a.selective/A in view B.
    The resolver for view B finds a stale a.selective/A record and
    attempts to send it back to the client.  However, the a.selective/A
    record was already added to the response (and sent back to the
    client) in step 7.  named crashes due to an assertion failure.

With the right timing, the new test causes affected named versions to
crash with the following assertion failure:

    query.c:8250: INSIST(qctx->rdataset == ((void *)0) || qctx->qtype == ((dns_rdatatype_t)dns_rdatatype_dname))
2023-10-13 14:04:22 +02:00
6 changed files with 198 additions and 3 deletions

View File

@@ -35,6 +35,10 @@ my $send_response = 1;
# delayed. Other lookups will not be delayed.
my $slow_response = 0;
# Current filtering setting for the "selective." domain. See the
# comments in reply_handler_selective() for more information.
my $selective_filtering = "block-queries-for-a";
my $localaddr = "10.53.0.2";
my $localport = int($ENV{'PORT'});
@@ -71,13 +75,76 @@ my $TARGET = "target.example 9 IN A $localaddr";
my $SHORTCNAME = "shortttl.cname.example 1 IN CNAME longttl.target.example";
my $LONGTARGET = "longttl.target.example 600 IN A $localaddr";
# This subroutine handles all requests for the "selective." domain.
sub reply_handler_selective {
my ($qname, $qclass, $qtype, $nsid_requested) = @_;
my ($rcode, @ans, @auth, @add);
if ($qname =~ /\.CONTROL\.selective$/) {
# These special QNAMEs control selective filtering behavior.
if ($qname eq "block-queries-for-a.CONTROL.selective") {
$selective_filtering = "block-queries-for-a";
print " Blocking only queries for a.selective. that include an NSID request\n";
} elsif ($qname eq "block-queries-for-cname-and-a.CONTROL.selective") {
$selective_filtering = "block-queries-for-cname-and-a";
print " Blocking queries for cname.selective. and a.selective. that include an NSID request\n";
}
$rcode = "NOERROR";
push @ans, new Net::DNS::RR("$qname 300 IN $qtype $localaddr");
} elsif ($qname eq "ns.selective") {
# Handling this QNAME makes ADB happy.
$rcode = "NOERROR";
if ($qtype eq "A") {
push @ans, new Net::DNS::RR("$qname 300 IN A $localaddr");
} else {
push @auth, new Net::DNS::RR("selective 300 IN SOA . . 0 0 0 0 300");
}
} elsif ($qname eq "cname.selective") {
if ($nsid_requested && $selective_filtering eq "block-queries-for-cname-and-a") {
# This answer may or may not be returned to the resolver of
# the filtered view (which requests NSID), depending on the
# current selective filtering setting.
return;
} else {
# Delay the response by a little bit. This increases the
# odds of triggering the desired order of events.
select(undef, undef, undef, 0.1);
$rcode = "NOERROR";
push @ans, new Net::DNS::RR("$qname 5 IN CNAME a.selective");
}
} elsif ($qname eq "a.selective") {
if ($nsid_requested) {
# This answer is never returned to the resolver of the
# filtered view (which requests NSID), irrespective of the
# current selective filtering setting.
return;
} else {
$rcode = "NOERROR";
if ($qtype eq "A") {
push @ans, new Net::DNS::RR("$qname 10 IN A $localaddr");
} else {
push @auth, new Net::DNS::RR("selective 300 IN SOA . . 0 0 0 0 300");
}
}
} else {
$rcode = "NXDOMAIN";
push @auth, new Net::DNS::RR("selective 300 IN SOA . . 0 0 0 0 300");
}
return ($rcode, \@ans, \@auth, \@add, { aa => 1 });
}
sub reply_handler {
my ($qname, $qclass, $qtype) = @_;
my ($qname, $qclass, $qtype, $nsid_requested) = @_;
my ($rcode, @ans, @auth, @add);
print ("request: $qname/$qtype\n");
STDOUT->flush();
if ($qname =~ /\.selective$/) {
return (reply_handler_selective(@_));
}
# Control whether we send a response or not.
# We always respond to control commands.
if ($qname eq "enable" ) {
@@ -307,8 +374,9 @@ for (;;) {
my $qclass = $questions[0]->qclass;
my $qtype = $questions[0]->qtype;
my $id = $request->header->id;
my $nsid = $request->edns->option("NSID");
my ($rcode, $ans, $auth, $add, $headermask) = reply_handler($qname, $qclass, $qtype);
my ($rcode, $ans, $auth, $add, $headermask) = reply_handler($qname, $qclass, $qtype, defined($nsid));
if (!defined($rcode)) {
print " Silently ignoring query\n";
@@ -325,6 +393,14 @@ for (;;) {
$reply->push("authority", @$auth) if $auth;
$reply->push("additional", @$add) if $add;
# If NSID was requested, ensure that the response
# contains an EDNS record, otherwise named will disable
# EDNS for this server and the NSID-based answer
# filtering trick will be foiled.
if (defined($nsid)) {
$reply->edns->option("NSID" => {"OPTION-DATA" => "ans2"});
}
my $num_chars = $udpsock->send($reply->data);
print " Sent $num_chars bytes via UDP\n";
}

View File

@@ -16,7 +16,7 @@ rm -f ns*/named.conf
rm -f ns*/root.bk
rm -f rndc.out.test*
rm -f */named.run */named.memstats
rm -f ns*/managed-keys.bind*
rm -f ns*/managed-keys.bind* ns*/*.mkeys.jnl
rm -f ns*/named_dump*
rm -f ns*/named.stats*
rm -f ns*/named.run.prev

View File

@@ -16,3 +16,5 @@ example. 300 NS ns.example.
ns.example. 300 A 10.53.0.2
slow. 300 NS ns.slow.
ns.slow. 300 A 10.53.0.2
selective. 300 NS ns.selective.
ns.selective. 300 A 10.53.0.2

View File

@@ -0,0 +1,53 @@
/*
* Copyright (C) Internet Systems Consortium, Inc. ("ISC")
*
* SPDX-License-Identifier: MPL-2.0
*
* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, you can obtain one at https://mozilla.org/MPL/2.0/.
*
* See the COPYRIGHT file distributed with this work for additional
* information regarding copyright ownership.
*/
options {
query-source address 10.53.0.6;
notify-source 10.53.0.6;
transfer-source 10.53.0.6;
port @PORT@;
pid-file "named.pid";
listen-on { 10.53.0.6; };
listen-on-v6 { none; };
prefetch 0;
recursion yes;
dnssec-validation no;
stale-cache-enable yes;
};
view "filtered" {
attach-cache "global";
match-clients { 10.53.0.10; };
zone "." {
type hint;
file "../../_common/root.hint";
};
server 10.53.0.2 {
request-nsid yes;
};
stale-answer-enable yes;
stale-answer-client-timeout 1000;
};
view "unfiltered" {
attach-cache "global";
match-clients { 10.53.0.11; };
zone "." {
type hint;
file "../../_common/root.hint";
};
};

View File

@@ -19,3 +19,4 @@ copy_setports ns1/named1.conf.in ns1/named.conf
copy_setports ns3/named1.conf.in ns3/named.conf
copy_setports ns4/named.conf.in ns4/named.conf
copy_setports ns5/named.conf.in ns5/named.conf
copy_setports ns6/named.conf.in ns6/named.conf

View File

@@ -2629,6 +2629,69 @@ grep "2001:aaaa" dig.out.2.test$n > /dev/null || ret=1
if [ $ret != 0 ]; then echo_i "failed"; fi
status=$((status+ret))
########################################################################
# The following test attempts to trigger a very specific sequence of
# events that may lead to a crash and is described in GL #4287. Due to
# its reliance on precise timing, the test is prone to races and
# therefore should not be expected to be 100% reliable.
########################################################################
n=$((n+1))
echo_i "check serve-stale behavior in a shared cache setup ($n)"
ret=0
# ns6 has two views configured; clients are matched to them by their IP
# address. Both views are attached to a shared cache. ans2, to which
# the "selective." domain is delegated to, answers all queries from the
# unfiltered view, but only some queries from the filtered view. See
# ns6/named.conf.in and ans2/ans.pl for how this is achieved.
filtered_view="-b 10.53.0.10"
unfiltered_view="-b 10.53.0.11"
# Prime the cache using the unfiltered view.
$DIG -p ${PORT} ${unfiltered_view} @10.53.0.6 cname.selective. A > dig.out.test$n.1 || ret=1
# Sanity check: ensure the filtered view can access data from the shared
# cache (it would not be able to resolve the following query itself).
$DIG -p ${PORT} ${filtered_view} @10.53.0.6 cname.selective. A > dig.out.test$n.2 || ret=1
grep -q "^a\.selective.*10\.53\.0\.2$" dig.out.test$n.2 || ret=1
# Ensure the filtered view will not be able to get any answers for a
# while.
$DIG -p ${PORT} @10.53.0.2 block-queries-for-cname-and-a.CONTROL.selective. A +time=5 +tries=1 > dig.out.test$n.3 || ret=1
# Wait until cname.selective/CNAME expires from the shared cache.
sleep 7
# Issue two simultaneous queries, one per each view. The desired
# outcome here is that the unfiltered view will resolve the query in
# less than "stale-answer-client-timeout" configured for the filtered
# view, but not before the filtered view sends its own recursive query.
# To increase the odds of that happening, the filtered view is queried
# first and the responses for cname.selective/A queries are sent with a
# delay of 100 ms by ans2, but remember that THIS STEP IS PRONE TO RACES
# and yet it is critical for triggering the desired sequence of events!
# (The filtered view is expected to use the still-active a.selective/A
# record from the cache for answering this query.)
nextpart ns6/named.run > /dev/null
$DIG -p ${PORT} ${filtered_view} @10.53.0.6 cname.selective. A > dig.out.test$n.4 &
DIG_PID1=$!
$DIG -p ${PORT} ${unfiltered_view} @10.53.0.6 cname.selective. A > dig.out.test$n.5 &
DIG_PID2=$!
# Wait at least three more seconds, so that a.selective/A expires from
# the shared cache.
sleep 3
# Ensure the filtered view will be able to resolve cname.selective/A
# again, but not a.selective/A.
$DIG -p ${PORT} @10.53.0.2 block-queries-for-a.CONTROL.selective. A +time=5 +tries=1 > dig.out.test$n.6 || ret=1
# Wait until the filtered view resolves cname.selective/A and
# "stale-answer-client-timeout" for a.selective/A fires. Watch for a
# line logged in the non-crashing case to minimize test delay; whether
# ns6 crashed or not will be checked directly in the next step.
wait_for_log 7 "(cname.selective): view filtered: request failed: operation canceled" ns6/named.run || ret=1
# The queries above attempt to trigger a sequence of events that will
# crash the server. Query the unfiltered view to ensure the server is
# still alive.
$DIG -p ${PORT} ${unfiltered_view} @10.53.0.6 cname.selective. A +time=5 +tries=1 > dig.out.test$n.7 || ret=1
grep -q "^a\.selective.*10\.53\.0\.2$" dig.out.test$n.7 || ret=1
# Clean up background processes.
wait ${DIG_PID1} ${DIG_PID2}
if [ $ret != 0 ]; then echo_i "failed"; fi
status=$((status+ret))
###########################################################
# Test serve-stale's interaction with prefetch processing #
###########################################################