update
This commit is contained in:
@@ -1,16 +1,35 @@
|
||||
|
||||
|
||||
Databases
|
||||
|
||||
BIND 9 DNS database allows named rdatasets to be stored and retrieved.
|
||||
DNS databases are used to store two different categories of data:
|
||||
authoritative zone data and non-authoritative cache data.Unlike
|
||||
authoritative zone data and non-authoritative cache data. Unlike
|
||||
previous versions of BIND which used a monolithic database, BIND 9 has
|
||||
one database per zone or cache. Certain database operations, for
|
||||
example updates, have differing requirements and actions depending
|
||||
upon whether the database contains zone data or cache data.
|
||||
|
||||
|
||||
Database Semantics
|
||||
|
||||
A database instance either has zone semantics or cache semantics. The
|
||||
semantics are chosen when the database is created and cannot be
|
||||
changed. The differences between zone databases and cache databases
|
||||
will be discussed further below.
|
||||
|
||||
|
||||
Reference Safety
|
||||
|
||||
It is a general principle of the BIND 9 project, and of the database
|
||||
API, that all references returned to the caller remain valid until the
|
||||
caller discards the reference.
|
||||
|
||||
The database interface also mandates that the rdata in a retrieved
|
||||
rdataset shall remain unaltered while any reference to the rdataset is
|
||||
held. Some other properties of the rdataset, e.g. its DNSSEC
|
||||
validation status, may change.
|
||||
|
||||
|
||||
Database Updates
|
||||
|
||||
A master zone is updated by a Dynamic Update message. A slave zone is
|
||||
@@ -21,22 +40,27 @@ same basic database requirements. They are differential update
|
||||
protocols, e.g. "add this record to the records at name 'foo'". The
|
||||
updates are also atomic, i.e. they must either succeed or fail.
|
||||
Changes must not become visible to clients until the update has
|
||||
committed. In short, zone updates are transactional.
|
||||
committed. In short, zone updates are transactional. This
|
||||
transaction occurs at a database level; the entire database goes from
|
||||
one version to another.
|
||||
|
||||
Cache updates are done by the server in the ordinary course of
|
||||
handling client requests. Unlike zone updates, cache updates do not
|
||||
refer to the current contents of the cache, so concurrent writing to
|
||||
the cache is possible. The main requirement is that concurrent update
|
||||
attempts to the same node and rdataset type must appear to have been
|
||||
executed in some order. In order to make DB versioning simpler, the DB
|
||||
interface actually imposes a more restrictive set of requirements, namely
|
||||
that access to a node is serialized and that database changes will become
|
||||
visible in version order (more on this below).
|
||||
handling client requests. Unlike zone databases, there's no need (and
|
||||
indeed, no ability) to ensure that data in the cache is consistent.
|
||||
For example, the cache may hold rdatasets from different versions of a
|
||||
given zone. A typical cache update involves looking at the existing
|
||||
cache contents for the given name and type (if any), deciding if the
|
||||
proposed replacement is better, and if so, doing the replacement.
|
||||
Concurrent update attempts to the same node and rdataset type must
|
||||
appear to have been executed in some order; there must be no merging
|
||||
of data from multiple updates. Caches are not globally versioned like
|
||||
zones are. There is no need to group changes to multiple rdatasets
|
||||
into a cache transaction.
|
||||
|
||||
|
||||
Database Concurrency and Locking
|
||||
|
||||
A principle goal of the BIND 9 project is multiprocessor scalabilty.
|
||||
A principal goal of the BIND 9 project is multiprocessor scalabilty.
|
||||
The amount of concurrency in database accesses is an important factor
|
||||
in achieving scalability. Consider a heavily used database, e.g. the
|
||||
cache database serving some mail hubs, or ".com". If access to these
|
||||
@@ -47,13 +71,13 @@ database lookup.
|
||||
Support for multiple concurrent readers certainly helps both cache
|
||||
databases and zone databases. Zones are typically read much more than
|
||||
they are written, though less so than in prior years because dynamic
|
||||
DNS support is now widely available. Caches are frequently written as
|
||||
well as read; a non-scientific survey of caching statistics on a few
|
||||
busy caching nameservers showed the ratio of cache hits to misses was
|
||||
about 2 to 1.
|
||||
DNS support is now widely available. Caches are frequently read and
|
||||
frequently written; a non-scientific survey of caching statistics on a
|
||||
few busy caching nameservers showed the ratio of cache hits to misses
|
||||
was about 2 to 1.
|
||||
|
||||
As mentioned above, zone updates must be serialized, but cache updates
|
||||
often provide good opportunities for concurrency.
|
||||
can often go in parallel.
|
||||
|
||||
A simple approach to these concurrency goals would be to have a single
|
||||
read-write lock on the database. This would allow for multiple
|
||||
@@ -61,23 +85,47 @@ concurrent readers, and would provide the serialization of updates
|
||||
that zone updates require. This approach also has significant
|
||||
limitations. Readers cannot run while an update is running. For a
|
||||
short-lived transaction like a Dynamic Update, this may be acceptable,
|
||||
but an IXFR can take a very long time (even hours) to complete.
|
||||
Preventing read access for such a long time is unacceptable. Another
|
||||
problem is that it forces updates to be serialized, even for cache
|
||||
databases. There are problems on the reader side of the lock too. If
|
||||
the entire database is protected by one lock, then any data retrieved
|
||||
from the database must either be used while the lock is held, or it
|
||||
must be copied, because the data in the database can change when the
|
||||
lock isn't held. Copying is expensive, and the server would like to
|
||||
be able to hold a reference to database data for a long time. The
|
||||
most significant long-running reader problem is outbound AXFR, which
|
||||
could potentially block updates for a very long time (hours).
|
||||
but an IXFR can take a long time (even hours) to complete. Preventing
|
||||
read access for such a long time is unacceptable. Another problem is
|
||||
that it forces updates to be serialized, even for cache databases.
|
||||
There are problems on the reader side of the lock too. If the entire
|
||||
database is protected by one lock, then any data retrieved from the
|
||||
database must either be used while the lock is held, or it must be
|
||||
copied, because the data in the database can change when the lock
|
||||
isn't held. Copying is expensive, and the server would like to be
|
||||
able to hold a reference to database data for a long time. The most
|
||||
significant long-running reader problem is outbound AXFR, which could
|
||||
potentially block updates for a long time (hours).
|
||||
|
||||
A finer-grained locking scheme, e.g. one lock per node, helps
|
||||
parallelize cache updates, but doesn't help with the long-lived reader
|
||||
or long-lived writer problems.
|
||||
or long-lived writer problems. These problems are solved by zone
|
||||
database versioning, described below.
|
||||
|
||||
The BIND 9 Database interface does not mandate any particular locking
|
||||
scheme. Database implementations are strongly encouraged to provide
|
||||
as much concurrency as possible without violating the database
|
||||
interface's rules.
|
||||
|
||||
|
||||
Database Versioning
|
||||
|
||||
XXX TBS XXX
|
||||
Versioning is not available in cache databases.
|
||||
|
||||
A zone database has a "current version" which is the version most recently
|
||||
committed. A database has a set of versions open for reading (the
|
||||
"open versions"). This set is always non-empty, since the current
|
||||
version is always open. The openversion method opens a read-only
|
||||
handle to the current version. All retrievals using the handle will
|
||||
see the database as it was at the time the version was opened,
|
||||
regardless of subsequent changes to the database. It is not possible
|
||||
to open a specific version; only the current version may be opened.
|
||||
This helps limit the number of prior versions which must be kept in
|
||||
the database.
|
||||
|
||||
Each zone update transaction is assigned a new version. Only one such
|
||||
"future version" may be open at any time. It is the caller's
|
||||
responsibility to serialize and handle the blocking and awakening of
|
||||
multiple update requests. The future version may be committed or
|
||||
rolled back by the caller. If the future version commits, its version
|
||||
becomes the current version of the database.
|
||||
|
||||
Reference in New Issue
Block a user