2740 lines
142 KiB
Plaintext
2740 lines
142 KiB
Plaintext
|
||
|
||
INTERNET-DRAFT Eric A. Hall, Editor
|
||
Document: draft-hall-dm-idns-00.txt Consultant
|
||
Expires: May 2002 November 2001
|
||
|
||
|
||
The Internationalized Domain Name System
|
||
|
||
|
||
Status of this Memo
|
||
|
||
This document is an Internet-Draft and is in full conformance with
|
||
all provisions of Section 10 of RFC2026.
|
||
|
||
Internet-Drafts are working documents of the Internet Engineering
|
||
Task Force (IETF), its areas, and its working groups. Note that
|
||
other groups may also distribute working documents as Internet-
|
||
Drafts.
|
||
|
||
Internet-Drafts are draft documents valid for a maximum of six
|
||
months and may be updated, replaced, or obsoleted by other
|
||
documents at any time. It is inappropriate to use Internet-Drafts
|
||
as reference material or to cite them other than as "work in
|
||
progress."
|
||
|
||
The list of current Internet-Drafts can be accessed at
|
||
http://www.ietf.org/ietf/1id-abstracts.txt.
|
||
|
||
The list of Internet-Draft Shadow Directories can be accessed at
|
||
http://www.ietf.org/shadow.html.
|
||
|
||
|
||
1. Abstract
|
||
|
||
The principle intention of this specification is to facilitate the
|
||
deployment of a completely internationalized domain name syntax
|
||
and service which new protocols, applications and host systems can
|
||
use, but without disrupting the existing infrastructure. Towards
|
||
that end, this document describes a series of elective
|
||
encapsulation services and protocol extensions which cumulatively
|
||
allow internationalized domain names to be stored and transmitted
|
||
in the existing DNS message and within application data streams,
|
||
according to the compliance level of the participating systems.
|
||
|
||
|
||
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
Table of Contents
|
||
|
||
1. Abstract..................................................1
|
||
2. Definitions and Terminology...............................3
|
||
3. Introduction..............................................4
|
||
3.1. Background.............................................4
|
||
3.2. Objectives.............................................5
|
||
3.3. Common Usage Scenarios.................................7
|
||
3.4. User Audiences.........................................9
|
||
3.5. Service Overview......................................11
|
||
3.6. Process Example.......................................13
|
||
4. The Internationalized Namespace..........................19
|
||
4.1. Internationalized Domain Names and Labels.............20
|
||
4.2. Internationalized Host Identifiers....................27
|
||
4.3. STD13 Domain Names....................................28
|
||
4.4. STD13 Host Identifiers................................29
|
||
5. Transfer Encodings and Label Types.......................30
|
||
5.1. The EDNS/UTF-8 Label Type.............................31
|
||
5.2. The STD13 Legacy Label Type...........................33
|
||
6. Application Guidelines...................................36
|
||
6.1. Input and Output Charsets.............................37
|
||
6.2. Protocol and Application Data.........................38
|
||
6.3. DNS Lookups and Resolver Calls........................40
|
||
7. Resolver Guidelines......................................42
|
||
7.1. Resolver APIs.........................................42
|
||
7.2. Query Processing Services.............................44
|
||
7.3. The Hosts Database....................................48
|
||
8. Server Guidelines........................................49
|
||
8.1. Internationalized Zones...............................50
|
||
8.2. Namespace Visibility Restrictions.....................51
|
||
8.3. The Master File Format................................52
|
||
9. Caching Guidelines.......................................53
|
||
10. Security Considerations..................................53
|
||
11. IANA Considerations......................................54
|
||
12. References...............................................54
|
||
13. Acknowledgements.........................................55
|
||
14. Editor's Address.........................................55
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 2]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
2. Definitions and Terminology
|
||
|
||
This document unites, enhances and clarifies several pre-existing
|
||
technologies. Readers are expected to be familiar with the
|
||
following specifications:
|
||
|
||
[AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version
|
||
0.3.1"
|
||
|
||
[NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of
|
||
Internationalized Host Names"
|
||
|
||
[STD13] (RFC 1034) "Domain names - concepts and facilities",
|
||
(RFC 1035) "Domain names - implementation and
|
||
specification"
|
||
|
||
[STD3] (RFC 1122) "Requirements for Internet Hosts --
|
||
Communication Layers", (RFC1123) "Requirements for Internet
|
||
Hosts -- Application and Support"
|
||
|
||
[BCP18] (RFC 2277) "IETF Policy on Character Sets and
|
||
Languages"
|
||
|
||
[RFC2279] "UTF-8, a transformation format of ISO 10646"
|
||
|
||
[RFC2671] "Extension Mechanisms for DNS (EDNS0)"
|
||
|
||
|
||
The following abbreviations are used throughout this document:
|
||
|
||
UCS (Universal Character Set) “ The ISO/IEC 10646 character
|
||
set repertoire, as represented by the Unicode 3.1
|
||
specification.
|
||
|
||
ACE (ASCII-Compatible Encoding) “ A transfer encoding which
|
||
encodes UCS character codes into a seven-bit codespace
|
||
which is compatible with US-ASCII.
|
||
|
||
UTF-8 (UCS Transformation Format, Eight-Bit) “ A transfer
|
||
encoding which encodes UCS characters into an eight-bit
|
||
codespace which is compatible with DNS message formats.
|
||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
||
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
|
||
in this document are to be interpreted as described in RFC 2119.
|
||
|
||
Hall I-D Expires: May 2002 [page 3]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
|
||
3. Introduction
|
||
|
||
The domain name system (DNS) [STD13] currently defines a message,
|
||
namespace and protocol. Although the DNS message is capable of
|
||
transferring eight-bit character codes as protocol data,
|
||
applications are currently limited to a subset of US-ASCII when
|
||
they interact with the DNS namespace, and this restricted syntax
|
||
is enforced by almost every TCP/IP application and protocol which
|
||
utilizes domain names as embedded data (including, surprisingly,
|
||
the DNS protocol).
|
||
|
||
In order to allow for the use of a larger range of characters in
|
||
the namespace, this document extends and clarifies a variety of
|
||
Internet specifications so that characters from the Universal
|
||
Character Set (UCS) [ISO10646] may be used in domain names. This
|
||
document also extends the DNS message structure to allow for the
|
||
use of UTF-8 [RFC2279] encoded characters for the purpose of
|
||
transferring these domain names, but also provides an ASCII-
|
||
compatible encoding (ACE) [AMC-ACE-Z] of these character codes
|
||
which existing protocols and applications can use to access the
|
||
internationalized domain names, and also provides identification
|
||
mechanisms which allow the end-point systems to downwardly
|
||
negotiate when needed. Finally, this document defines behavior for
|
||
DNS systems which implement this architecture, including the end-
|
||
point applications which generate and store DNS domain names, and
|
||
the resolvers, caches and servers which process them.
|
||
|
||
The mechanisms presented here are elective. Developers, zone
|
||
administrators and network operators who wish to make use of the
|
||
internationalized domain names may do so according to their own
|
||
schedule. Those developers, administrators and operators who
|
||
cannot or prefer not to implement the specified extensions can
|
||
continue to use their legacy systems, and will still be able to
|
||
access resources from the internationalized domain name system.
|
||
|
||
|
||
3.1. Background
|
||
|
||
From one perspective, DNS is already an "eight-bit clean" system,
|
||
in that the structured DNS message is capable of storing and
|
||
transmitting eight-bit data without any additional effort.
|
||
However, this perspective only considers one particular facet of
|
||
the domain name system, and ignores the more critical aspect of
|
||
|
||
Hall I-D Expires: May 2002 [page 4]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
the DNS namespace, which has rules that are entirely different
|
||
from those which govern the message format.
|
||
|
||
The DNS namespace (or more appropriately, the view of the
|
||
namespace which applications use and enforce) is governed by rules
|
||
set forth in RFC952 [RFC952], STD3 [STD3], and STD13, which
|
||
collectively define the characters that are eligible for use with
|
||
host names. These rules are meant to provide a common template
|
||
which may be applied to either the DNS namespace or a local hosts
|
||
database, such that a query for "host.example.com" can be
|
||
processed through either system. The range of valid characters
|
||
currently defined are the letters, numbers and hyphen characters
|
||
from US-ASCII [ASCII] (additional rules also govern the valid
|
||
order and length of a host name). Character code values outside of
|
||
this range are valid in domain name messages, but are undefined
|
||
when used in the namespace, and are subject to interpretation by
|
||
the applications which generate them.
|
||
|
||
The host name rules are enforced by almost every application and
|
||
protocol which uses DNS to identify a host or system. This
|
||
includes network utilities such as ping and traceroute which
|
||
simply identify systems by name, and complex protocols such as
|
||
SMTP which use domain names to determine message-routing paths.
|
||
Portions of the DNS protocol itself are also affected by these
|
||
restrictions, such as the domain names which may be used for NS
|
||
resource records with sub-domain delegation operations (since
|
||
these servers are connection targets, they are also required to be
|
||
compliant with the host name rules).
|
||
|
||
Because these domain names are so pervasive throughout the
|
||
Internet (and even within proprietary applications that run on
|
||
private networks), it is not possible to declare a "flag day" at
|
||
which eight-bit domain names will be considered valid encodings of
|
||
a particular character set. Instead, an extended namespace with a
|
||
larger set of charset rules must be defined, an extended DNS
|
||
protocol capable of supporting these domain names must be
|
||
deployed, and a transitional mechanism which allows the old and
|
||
new systems to interact must be established. This document
|
||
attempts to meet these objectives.
|
||
|
||
|
||
3.2. Objectives
|
||
|
||
In broad terms, this document has one overall goal, which is to
|
||
facilitate the creation and use of an internationalized domain
|
||
name system around a UCS namespace, a collection of UTF-8 and
|
||
|
||
Hall I-D Expires: May 2002 [page 5]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
legacy-compatible encodings which are suitable for transferring
|
||
internationalized domain names within DNS and the affected
|
||
application data streams, and a negotiation mechanism which allows
|
||
end-point systems to identify the encoding that they will use for
|
||
a particular operation.
|
||
|
||
One of the objectives stated above is to internationalize the
|
||
existing DNS namespace, by allowing UCS characters to be used in
|
||
host names and sub-domain delegations in old and new zones
|
||
equally. As such, this document does not define a new namespace,
|
||
but instead defines mechanisms by which leaf-nodes and sub-domains
|
||
may be created within the existing hierarchy.
|
||
|
||
UTF-8 was chosen as the primary transfer encoding of these domain
|
||
names for several reasons. For one, there is a wide availability
|
||
of tools and expertise surrounding UTF-8, and it is already widely
|
||
deployed within development environments, operating systems and
|
||
applications. Furthermore, BCP18 [BCP18] requires that new
|
||
application protocols be able to use UTF-8 as application data,
|
||
and for many applications, this specifically means domain names
|
||
which are passed as data. All signs indicate that UTF-8 is
|
||
currently and will continue to be the preferred eight-bit encoding
|
||
on the Internet, and this specification embraces this position in
|
||
its design.
|
||
|
||
However, most of the network services currently in use are bound
|
||
by the legacy host naming restrictions, and those applications and
|
||
protocols will also need to be able to interact with resources
|
||
from the internationalized namespace, even though they will not be
|
||
compliant with the UTF-8 encoding mechanisms defined in this
|
||
document. In order to allow these systems to participate, this
|
||
specification also embraces the use of ACE as a seven-bit
|
||
backwards-compatible encoding for legacy systems to use.
|
||
|
||
Note that even though a single encoding could have been specified
|
||
by this document, past and present requirements would not have
|
||
been satisfied by a single choice. For example, supporting UTF-8
|
||
alone would mean isolating legacy systems from resources in the
|
||
UCS namespace, while supporting ACE alone would not have provided
|
||
a truly internationalized namespace (the ACE encoded domain names
|
||
still appear in user data quite frequently). By allowing the UTF-8
|
||
and ACE encodings to coexist, the existing and emerging
|
||
communities can both be served.
|
||
|
||
Because both encodings will be active during the same time period,
|
||
this document also defines DNS protocol extensions which allow the
|
||
|
||
Hall I-D Expires: May 2002 [page 6]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
end-point systems to detect the encoding that is in use for a
|
||
particular query/response pair. Note that these negotiation
|
||
mechanisms not only allow new and legacy systems to interoperate,
|
||
but they also provide a transition service for developers, zone
|
||
administrators and end-users, in that ACE encoded domain names can
|
||
be initially deployed within existing applications and DNS
|
||
systems, while individual elements of the infrastructure can be
|
||
upgraded without disturbing other components.
|
||
|
||
|
||
3.3. Common Usage Scenarios
|
||
|
||
Discussion of the mechanism provided by this document depends upon
|
||
the usage context of the domain names themselves. Domain names are
|
||
extremely pervasive, and are used by almost every TCP/IP protocol
|
||
and application in one form or another. However, most usages fall
|
||
under one or more of the following scenarios:
|
||
|
||
* Connection identifiers “ Domain names are most commonly
|
||
used as host-specific identifiers for outbound connection
|
||
requests, whether this be for a command-line application
|
||
such as ping, or as a host name which is stored in an
|
||
application's configuration file. Another common usage
|
||
scenario for connection identifiers is with reverse
|
||
lookups, where a server is logging incoming connections by
|
||
the corresponding domain name, or where a program such as
|
||
netstat is displaying all of the application sessions which
|
||
are currently active on a host. In both of these cases,
|
||
domain names are passed through applications to a resolver,
|
||
resulting in DNS queries and responses which eventually
|
||
provide the requested DNS data.
|
||
|
||
A related use (but one which does not generate DNS
|
||
messages) is determining the host name of the local system.
|
||
This is commonly found with applications and protocols that
|
||
need to display the domain name of the local system as part
|
||
of a protocol operation (such as an SMTP greeting banner)
|
||
or as application data.
|
||
|
||
Connection identifiers (and lookups in general) are
|
||
probably the largest single use of domain names today, and
|
||
this is likely to be the case with internationalized domain
|
||
names as well. This document fully supports the use of
|
||
internationalized domain names for lookup operations, as
|
||
long as the calling application, the stub resolver, the
|
||
local caching servers, and the authoritative servers for
|
||
|
||
Hall I-D Expires: May 2002 [page 7]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
the specified domain name are compliant with this
|
||
specification. If any of these components are not capable
|
||
of supporting internationalized domain names in this
|
||
manner, the ACE equivalent domain name will be negotiated
|
||
for the operation at hand.
|
||
|
||
* Protocol data “ Some application protocols exchange domain
|
||
names as protocol data, with those domain names either
|
||
determining or altering a service-specific operation.
|
||
Examples of this usage include SMTP envelopes ("RCPT TO
|
||
<user@domain.dom>") where the domain name is used to
|
||
determine whether or not a particular email message should
|
||
be accepted for delivery, the HTTP HOST header field which
|
||
identifies a specific document tree on a shared server,
|
||
BOOTP/DHCP options, WHOIS input, and more.
|
||
|
||
Because these protocols treat domain names as protocol
|
||
data, most of these protocols also have specific formatting
|
||
requirements which must be addressed before UTF-8 domain
|
||
names can be used by these protocols directly. This
|
||
document is intended to facilitate the use of UTF-8 encoded
|
||
domain names in this manner, although it is expected that
|
||
most of the protocol development groups will need to
|
||
develop negotiation mechanisms before these protocols can
|
||
use internationalized domain names directly. Until such
|
||
work is completed, ACE equivalent domain names can be used
|
||
to provide these protocols with access to the
|
||
internationalized namespace.
|
||
|
||
* Structured application data “ Structured application data
|
||
is similar to protocol data in that it can trigger or
|
||
affect some protocol action, although this will not always
|
||
occur. For example, a web browser can process an embedded
|
||
IMG link which may be present in a web page, while a user
|
||
can manually follow an embedded email link which is also
|
||
stored in the same web page; even though both usage models
|
||
share the same structured data format (URLs), they are
|
||
processed differently by the application. Similarly, email
|
||
messages typically contain multiple domain names as
|
||
structured data in the message headers, and some of these
|
||
domain names will directly affect subsequent protocol
|
||
operations, while others will not.
|
||
|
||
Because of this ambiguity, this document defines no
|
||
specific treatment for structured application data. In some
|
||
cases, no additional mechanisms will be required, while
|
||
|
||
Hall I-D Expires: May 2002 [page 8]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
other scenarios will require negotiation mechanisms before
|
||
an internationalized domain name can be used in the
|
||
structured data (with ACE being required as the interim
|
||
format). Each protocol development group is encouraged to
|
||
analyze each usage independently, to classify the usage as
|
||
a connection identifier, protocol data, or unstructured
|
||
application data, and to determine the appropriate course
|
||
of action for each usage accordingly.
|
||
|
||
* Unstructured application data “ Many application protocols
|
||
provide free-text data which can contain domain names, but
|
||
with those domain names existing as unstructured data. For
|
||
example, an email message which is provided as a text/plain
|
||
MIME body part may contain a domain name which identifies a
|
||
system or service in the context of a specific application,
|
||
but in an unstructured form ("your files were moved from
|
||
server1 to server2"). Similarly, an email address may be
|
||
provided in WHOIS output, but as unstructured data which
|
||
does not affect the protocol.
|
||
|
||
Given the application-specific nature of this data, it
|
||
cannot be managed by any global protocol or process. Where
|
||
a protocol has rules or restrictions on the data itself,
|
||
then those rules are maintained, but some formatting rules
|
||
may need to be extended before internationalized domain
|
||
names (or their equivalents) can be encoded in the
|
||
application data. For example, internationalized domain
|
||
names in email messages may need to be converted to a
|
||
preferred display charset, while ACE equivalents may be
|
||
necessary for protocols which only support US-ASCII.
|
||
|
||
Each of the above scenarios represent distinct handling cases
|
||
where internationalized domain names may or may not be used
|
||
directly. In some cases, the internationalized domain names may be
|
||
used as soon as the applications and resolvers are configured to
|
||
use them, while in other cases, measured and cautious deployment
|
||
is required in order to prevent undue breakage. In the latter
|
||
cases, however, the backwards-compatible ACE encoding is available
|
||
so that the internationalized domain names can be used.
|
||
|
||
|
||
3.4. User Audiences
|
||
|
||
Another perspective on the changes which will result from
|
||
deploying the mechanisms described in this document can be seen by
|
||
analyzing how any such changes will affect the different
|
||
|
||
Hall I-D Expires: May 2002 [page 9]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
"audiences" who work with domain names, and who have their own
|
||
unique context-specific usage requirements and objectives. The
|
||
three main audiences discussed in this document are:
|
||
|
||
* Developers. Protocol and application developers need to be
|
||
able to incorporate internationalized domain names into
|
||
their systems as easily as possible, although there are
|
||
many factors which will affect such usage, including the
|
||
input and output charsets and encodings which are available
|
||
to the applications and protocols. Where feasible, this
|
||
specification allows developers to choose any charset or
|
||
encoding which may be required and suitable for use,
|
||
although in most cases, a recommendation is also made for
|
||
the use of UTF-8 in particular.
|
||
|
||
Developers may adopt internationalized domain names for
|
||
connection identifiers and lookup operations fairly
|
||
quickly, such that users can use those system as soon as
|
||
they have compliant systems (and they have a target domain
|
||
name to communicate with). Implementing support for
|
||
internationalized domain names in protocols and application
|
||
data will require additional effort by the affected
|
||
development groups.
|
||
|
||
Support for ACE will be harder to implement, since it is a
|
||
relatively new and untested encoding syntax, with no
|
||
existing developer tools. This will likely be the largest
|
||
hurdle to overcome when developing applications for use
|
||
with this service.
|
||
|
||
* Zone administrators. Organizations that wish to deploy
|
||
internationalized domain names should be able to do so
|
||
easily, at a reasonable cost, and without suffering
|
||
excessive pre-conditions. Towards this objective, the
|
||
mechanisms described by this document allow organizations
|
||
to deploy and use internationalized domain names within any
|
||
zone immediately, without requiring any other zone to have
|
||
been updated beforehand (although there are specific and
|
||
strong suggestions for upgrading the Internet's high-load
|
||
servers as soon as possible).
|
||
|
||
If an organization wishes to publish internationalized
|
||
domain names for users to access and utilize, the
|
||
authoritative servers for the affected zone must be
|
||
compliant with the naming rules and message formats
|
||
described by this document, which will almost certainly
|
||
|
||
Hall I-D Expires: May 2002 [page 10]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
require the administrators of that zone to upgrade their
|
||
servers. However, organizations may also choose to only
|
||
deploy ACE encoded domain names if an immediate migration
|
||
is not feasible, with the caveat that internationalized
|
||
domain names in their native form will not be available
|
||
from those zones.
|
||
|
||
* Network operators. The systems and human users which
|
||
generate DNS lookups are another area of concern, as these
|
||
protocols, programs and users will expect these lookups to
|
||
succeed, and will also expect that the visible namespace
|
||
will be compatible with the capabilities of the requesting
|
||
system at a minimum investment. This is a broad range of
|
||
requirements.
|
||
|
||
At a minimum, applications must be capable of generating
|
||
and accepting the internationalized domain names if they
|
||
are to use those domain names (see the "Developers"
|
||
discussion above for the application requirements).
|
||
Similarly, the local resolvers, caches and forwarders on
|
||
the user's network must also support the message formats if
|
||
they are to relay internationalized domain names between
|
||
their local applications and the remote zones being
|
||
queried. If the applications, resolvers and caches do not
|
||
support these requirements, intermediary systems will
|
||
perform the down-level negotiation automatically on their
|
||
behalf such that additional effort is not required on the
|
||
user's part.
|
||
|
||
In summary, the developers, zone administrators and end-users can
|
||
immediately participate in the internationalized namespace at no
|
||
additional expense if they are content with using ACE encoded
|
||
domain names, and can use internationalized domain names in their
|
||
native form if they are willing to make the necessary investments.
|
||
Furthermore, since the native and backwards-compatible encodings
|
||
are not mutually exclusive, implementers of this specification
|
||
have the option of adopting ACE for immediate use and then
|
||
transitioning to internationalized domain names on a per-system,
|
||
per-zone, or per-application basis, according to their schedule.
|
||
|
||
|
||
3.5. Service Overview
|
||
|
||
This document specifies a variety of extensions to several
|
||
different protocols and services in order to facilitate the use of
|
||
internationalized domain names anywhere this support exists or can
|
||
|
||
Hall I-D Expires: May 2002 [page 11]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
be implemented, and to provide a legacy-compatible domain name in
|
||
all other situations.
|
||
|
||
More specifically, this document defines or clarifies behavior for
|
||
the following elements:
|
||
|
||
* Host name character restrictions. Legacy protocols and
|
||
applications are currently restricted to the legacy host
|
||
naming rules, which only allow for a subset of US-ASCII
|
||
characters (letters, digits and the hyphen character). This
|
||
document redefines the characters which are valid within a
|
||
host name so that system identifiers, domain name parts of
|
||
host names, and new network services can use most of the
|
||
characters from the UCS.
|
||
|
||
* DNS message format. This document defines an extended label
|
||
format based on the extended label services provided by
|
||
RFC2671 (Extension Mechanisms for DNS - EDNS0) [RFC2671],
|
||
with this label format being used to encapsulate UTF-8
|
||
encoded internationalized domain names in DNS messages. Any
|
||
DNS message which carries the UTF-8 encoded domain names is
|
||
required to use the EDNS/UTF-8 label type defined in this
|
||
document. Any DNS message which carries legacy domain names
|
||
(including the ACE encoded equivalent domain names) is
|
||
required to use the traditional message format.
|
||
|
||
* Application handling rules. Applications can use
|
||
internationalized domain names immediately for lookup
|
||
operations that do not directly affect external services or
|
||
protocols, and can use ACE encoding sequences to specify
|
||
internationalized domain names in legacy protocol
|
||
operations, and can use them both at the same time.
|
||
|
||
* Stub resolvers. Stub resolvers will most likely need to
|
||
provide a series of internationalized APIs in order to
|
||
fully support applications that generate internationalized
|
||
domain name lookups. For example, these APIs will almost
|
||
certainly be required in order for the resolver to
|
||
determine that the calling application is compliant with
|
||
the host name requirements defined by this document, and
|
||
that the domain names should be encoded in the proper label
|
||
format. Although this specification does not dictate these
|
||
APIs, it encourages their use, and provides some guidance
|
||
on the issues surrounding their use.
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 12]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
* Forwarders, resolving servers and caches. The user-side
|
||
servers which process internationalized domain names have
|
||
several protocol-specific requirements, including the
|
||
negotiated fall-back service when UTF-8 queries fail.
|
||
|
||
* Authoritative servers. A key part of this specification is
|
||
the simultaneous support for internationalized and legacy
|
||
compatible domain names in the UCS namespace, thereby
|
||
allowing a domain name to be entered into an authoritative
|
||
zone database once, and for the appropriate response to be
|
||
generated by a server according to the label encoding from
|
||
the associated query. In order for this to work, this
|
||
specification requires authoritative servers which serve
|
||
internationalized domain names to comply with specific
|
||
conditions. This specification also allows existing servers
|
||
to serve ACE equivalent domain names when the authoritative
|
||
servers cannot be upgraded, although this typically results
|
||
in lower levels of functionality.
|
||
|
||
The elements listed above collectively define a completely
|
||
internationalized domain name system, which is capable of
|
||
servicing internationalized domain names in all compliant systems,
|
||
and which is also capable of providing ACE encoded equivalent
|
||
domain names when any component from the internationalized service
|
||
is not available.
|
||
|
||
|
||
3.6. Process Example
|
||
|
||
This section illustrates a series of query/response transactions
|
||
under which the processes and protocols defined in this document
|
||
function. This example uses a reverse lookup for the PTR resource
|
||
record associated with the "14.2.0.192.in-addr.arpa." domain name
|
||
(forward lookups work similarly, but the issues are more fully
|
||
demonstrated by PTR lookups). Each of the various technologies
|
||
shown below are described in later sections of this document. The
|
||
sole purpose of this example is to provide an illustration of
|
||
these mechanisms in order to facilitate better discussion.
|
||
|
||
Note that this illustration represents a worst-case scenario
|
||
(thereby exercising most of the functionality provided by this
|
||
specification), and does not represent a typical scenario.
|
||
|
||
a. First, a PTR resource record for 14.2.0.192.in-addr.arpa.
|
||
is added to the internationalized zone database on the
|
||
replication master server for the 2.0.192.in-addr.arpa.
|
||
|
||
Hall I-D Expires: May 2002 [page 13]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
zone, with the resource record data value of
|
||
"host.<idn>.example.com." (where <idn> is an
|
||
internationalized domain name compliant with the host
|
||
naming rules provided in this document). Both of these
|
||
domain names have a primary representation consisting of
|
||
UCS characters in some local encoding, but are also
|
||
available as UTF-8 and ACE encoded data so they can be
|
||
encapsulated within DNS queries and responses.
|
||
|
||
Once the zone is reloaded and is replicated by the other
|
||
authoritative servers for that zone, the domain names can
|
||
be processed.
|
||
|
||
b. An application on a remote system generates a DNS lookup
|
||
for the PTR resource record associated with the
|
||
14.2.0.192.in-addr.arpa. domain name.
|
||
|
||
If this is a legacy application, it issues the lookup using
|
||
the only method it knows, which is to pass the domain name
|
||
to the legacy resolver API. This would result in the
|
||
resolver issuing a legacy DNS query for the PTR resource
|
||
record associated with the specified domain name.
|
||
|
||
If this application is compliant with this specification,
|
||
it performs the following steps:
|
||
|
||
1. Verify that the resolver is capable of processing
|
||
queries for UTF-8 domain names by probing for an
|
||
internationalized API. If this step failed, then the
|
||
domain name would be converted to the legacy STD13
|
||
octet encoding in step 3.6.b.3 and passed to the
|
||
resolver's legacy API.
|
||
|
||
2. Convert the domain name from its generated encoding to
|
||
the canonical UCS characters, and then normalize and
|
||
case-convert the UCS characters.
|
||
|
||
3. Convert the normalized and lowercased UCS characters
|
||
to the charset or encoding used by the resolver's
|
||
internationalized API.
|
||
|
||
4. Issue a lookup for the PTR resource record associated
|
||
with the internationalized domain name, via the
|
||
resolver's internationalized API.
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 14]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
Note that even though the domain name is compatible
|
||
with the legacy host name rules, the domain name is
|
||
passed through the internationalized API so that
|
||
servers can tell whether or not the original
|
||
application is UTF-8 compliant, and can determine the
|
||
format of any internationalized domain names which are
|
||
to be returned in the response messages. This is
|
||
required in case the queried resource record includes
|
||
internationalized domain names as resource record data
|
||
(as would be the case with PTR resource records), and
|
||
is also required for the proper handling of any SOA or
|
||
NS resource records which may be returned as
|
||
additional data in the response.
|
||
|
||
For the purpose of this example, we will assume that each
|
||
of these steps were successfully performed.
|
||
|
||
c. The client's stub resolver generates the query, with the
|
||
Question Section of the query containing the UTF-8 encoded
|
||
domain name encapsulated in an EDNS/UTF-8 extended label.
|
||
|
||
d. The stub resolver sends the query to one of its configured
|
||
resolving servers.
|
||
|
||
e. The resolving server will either answer the query from its
|
||
cache or forward the query to a name server which is
|
||
authoritative for the namespace hierarchy, as per the
|
||
normal query-resolution procedure. For the purpose of this
|
||
example, we will assume that the server has no information
|
||
about the specified domain name, so it forwards the query
|
||
to one of the root zone's authoritative servers in order to
|
||
begin the iterative resolution process.
|
||
|
||
f. The queried server responds with a referral, providing
|
||
delegation data for a zone in the path to the queried
|
||
domain name. For the purposes of this example, we will use
|
||
192.in-addr.arpa. as the delegation domain specified in the
|
||
referral message.
|
||
|
||
The specific format of the referral will depend on whether
|
||
or not the queried server understands the EDNS/UTF-8 label
|
||
encoding. If the server is compliant with this
|
||
specification (which it is, or else it wouldn't have
|
||
answered with a referral), then the referral will also
|
||
provide ENDS/UTF-8 encoded domain names in the Authority
|
||
and Additional-Data Sections of the referral. If the server
|
||
|
||
Hall I-D Expires: May 2002 [page 15]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
was not compliant with this specification, it would return
|
||
an error upon seeing the extended label type, which would
|
||
cause the resolving server to restart the query using the
|
||
legacy label type.
|
||
|
||
g. The resolving server decodes the UTF-8 encoded domain names
|
||
to their UCS character representation, caches the resource
|
||
records in their UCS form, and sends the query to one of
|
||
the authoritative servers for the referral zone. Note that
|
||
the cache did not normalize or case-convert the UCS
|
||
characters; only the end-systems perform this work.
|
||
|
||
h. In this case, the queried server does not understand the
|
||
EDNS/UTF-8 label format, and has returned a FORMERR
|
||
response code.
|
||
|
||
i. When these errors are encountered, the current resolver
|
||
(whether this is the client's stub resolver or a caching
|
||
server in the query path) must convert the query domain
|
||
name from its current form to a legacy-compatible encoding
|
||
(either ACE or STD13 octet sequences, depending on the UCS
|
||
characters which have been encoded), and then has to
|
||
reissue the query in that format.
|
||
|
||
In this case, the domain name only contains printable
|
||
characters from US-ASCII, so the STD13 octet encoding is
|
||
used for the fall-back query. Because the UCS domain name
|
||
was normalized and lowercased before it was passed to the
|
||
client's stub resolver, the legacy domain name will also be
|
||
in this format (although it will be compared in a case-
|
||
neutral form by the recipient server).
|
||
|
||
Note that once this conversion takes place, the legacy
|
||
label format is used for the remainder of the current query
|
||
chain (this prevents excessive delays from multiple fall-
|
||
back operations, which could result in timeouts at the
|
||
original resolver or application).
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 16]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
j. The queried server returns a delegation referral for the
|
||
2.0.192.in-addr.arpa. zone. Since the query arrived in the
|
||
STD13 octet encoding, the server has no indicator of the
|
||
client's capabilities, so the referral NS resource records
|
||
will also be returned in legacy compatible form (either as
|
||
STD13 octet sequences or as ACE encoded data, depending on
|
||
the character codes provided in each label from each of the
|
||
associated domain names).
|
||
|
||
Note that even though these NS resource records will be
|
||
restricted to legacy-compatible host names and label types,
|
||
they may contain and reference ACE domain names. In this
|
||
regard, a legacy server in the delegation path does not
|
||
prevent internationalized domain names from being delegated
|
||
or resolved, but only prevents them from being processed as
|
||
EDNS/UTF-8 extended labels.
|
||
|
||
Also note that once the authoritative servers for a zone
|
||
have been discovered and cached, any subsequent UTF-8
|
||
queries which are generated for the resources in that zone
|
||
will be sent directly to one of those servers, bypassing
|
||
the delegation hierarchy. As such, subsequent queries which
|
||
are provided in EDNS/UTF-8 labels can be processed directly
|
||
by the zone's authoritative servers, without the delegation
|
||
servers disrupting the process.
|
||
|
||
k. The resolving server decodes the STD13 octet sequences and
|
||
ACE encoded domain names to their UCS character
|
||
representations, caches the resource records, and resends
|
||
the query to one of the authoritative servers for the
|
||
referral zone.
|
||
|
||
l. The queried server processes the request. Since this query
|
||
arrived as an STD13 octet sequence, the server must compare
|
||
the seven-bit characters from the domain name (which is all
|
||
of them, in this example) in a case-neutral form. Note that
|
||
if the query had arrived as ACE or UTF-8 encoded domain
|
||
names, the server would have decoded the specified domain
|
||
name to its canonical UCS characters and performed a case-
|
||
exact match against the resulting characters.
|
||
|
||
m. The queried server responds with the requested data. Note
|
||
that the query was submitted in the legacy label form due
|
||
to the fall-back processing which occurred in step 3.6.i,
|
||
so the server will only respond to this query with STD13
|
||
|
||
Hall I-D Expires: May 2002 [page 17]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
octet sequences or ACE encoded domain names, using the
|
||
STD13 legacy label.
|
||
|
||
n. The resolving server decodes the STD13 octet sequences and
|
||
ACE encoded domain names to their UCS character
|
||
representations, and caches the resource records. Since the
|
||
query was originally received as an internationalized
|
||
domain name (as indicated by the EDNS/UTF-8 extended label
|
||
from the original query), the resolving server has to
|
||
encode the answer data as UTF-8 before passing it back to
|
||
the client's stub resolver. However, since the input was
|
||
not provided in an encoded UCS form, the server has to
|
||
normalize and case-convert the STD13 octet sequence in
|
||
order to provide a valid internationalized domain name.
|
||
|
||
o. The stub resolver decodes the UTF-8 encoded domain names
|
||
which have been provided in the response message to their
|
||
UCS character representation, and passes the data to the
|
||
original calling application using the charset or encoding
|
||
favored by the resolver.
|
||
|
||
p. The application validates the received domain name by
|
||
decoding the internationalized domain name to its canonical
|
||
UCS characters, normalizing and down-casing the resulting
|
||
domain name, and comparing the results with the answer data
|
||
which was provided by the resolver.
|
||
|
||
As can be seen, the UTF-8 name resolution process is identical to
|
||
the current resolution process, with the addition of a single
|
||
fall-back query in step 3.6.i which resulted in one extra
|
||
query/response pair (roughly equivalent to adding one extra
|
||
delegation referral into the query path), and with several
|
||
different encoding conversions, as required by the participating
|
||
systems and services. This example also illustrates the
|
||
requirements which are placed on developers, zone administrators,
|
||
and network operators in order for typical connection identifier
|
||
services to function with UTF-8 domain names.
|
||
|
||
However, if each system and service had used UTF-8 for encoding
|
||
purposes (including everything between the stub resolver's APIs
|
||
and the authoritative servers for the target zone), then no
|
||
additional queries or conversions would have been required (other
|
||
than the direct UCS conversions required for validation and
|
||
caching, the latter of which can be performed separately without
|
||
affecting the processing path). In this regard, the example above
|
||
illustrates how this system can function even when only a portion
|
||
|
||
Hall I-D Expires: May 2002 [page 18]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
of the participating systems utilize UTF-8, and also illustrates
|
||
how effective the entire operation would be if all of the
|
||
recommendations and requirements provided in this specification
|
||
were adopted.
|
||
|
||
It is also important to reiterate here that any such costs
|
||
associated with this compliance are entirely elective by the
|
||
affected parties. If they want to streamline the process, the
|
||
option is available to them, although the system also works when
|
||
very few optimizations are implemented.
|
||
|
||
|
||
4. The Internationalized Namespace
|
||
|
||
In simple terms, this specification defines an internationalized
|
||
namespace which consists of domain names and labels that contain
|
||
UCS character codes, and also specifies a series of encoding
|
||
formats which may be used whenever the UCS values need to be
|
||
encapsulated for transmission within DNS messages or application
|
||
data streams.
|
||
|
||
In this regard, the internationalized namespace is the UCS
|
||
representation of the domain names and labels as they are used for
|
||
comparison operations once a domain name arrives for processing,
|
||
while the transfer encodings ensure that a domain name arrives at
|
||
the destination system intact, so that it may be processed in its
|
||
canonical form.
|
||
|
||
There are four conceptual elements to this model:
|
||
|
||
* Character codes. Labels from internationalized domain names
|
||
have a single logical canonical representation as sequences
|
||
of UCS code point values. The UCS characters are used when
|
||
a particular label from a domain name is created by an
|
||
application, stored in a zone, hosts or cache database, and
|
||
is used whenever two sets of domain names or labels need to
|
||
be compared. However, different kinds of domain names have
|
||
different rules which govern the character codes that may
|
||
be used.
|
||
|
||
* Storage encodings. Whenever a domain name is created or
|
||
copied from the network, it must be stored in a format that
|
||
is reversible to the canonical UCS character representation
|
||
of that domain name. This specification does not mandate or
|
||
require any particular storage encoding, and allows this
|
||
decision to be made on a per-implementation basis, as long
|
||
|
||
Hall I-D Expires: May 2002 [page 19]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
as the storage encoding supports character codes which can
|
||
be converted to UCS equivalent values for comparison
|
||
purposes. However, the use of UTF-8 for this purpose is
|
||
encouraged, since it is the most common.
|
||
|
||
* Transfer encodings. Whenever a domain name needs to be sent
|
||
over the network, it must be packaged in a form which is
|
||
compliant with the capabilities of the transfer protocol in
|
||
use. This document specifies three transfer encodings which
|
||
may be used to encode canonical UCS character codes in DNS
|
||
messages or application streams, which are: the octet
|
||
encoding from STD13, the ACE encoding from <ACE-Z>, and the
|
||
UTF-8 encoding from RFC2279. Each encoding has different
|
||
costs and benefits in different usage scenarios.
|
||
|
||
* Comparison operations. When two domain names need to be
|
||
compared, they also follow rules which are appropriate to
|
||
the type of domain name being provided, and the transfer
|
||
encoding which may have been used to provide the domain
|
||
name to the system.
|
||
|
||
This document defines four distinct types of internationalized
|
||
domain names which may exist in the internationalized namespace,
|
||
and also describes how each of the above considerations affect
|
||
those domain names and their labels. These domain name types are
|
||
described throughout the remainder of this section.
|
||
|
||
|
||
4.1. Internationalized Domain Names and Labels
|
||
|
||
This section describes the master template rules for all domain
|
||
names and labels which may be used in the internationalized
|
||
namespace, although subordinate rules and restrictions are also
|
||
applied as secondary filters, depending on the intended usage of
|
||
the domain name.
|
||
|
||
For example, domain names and labels which are to be used as
|
||
internationalized host identifiers (either as host names, or as
|
||
domain names which are used to specify a host) are restricted to a
|
||
specific subset of UCS characters. Meanwhile, domain names and
|
||
labels which are compliant with STD13's global rules are
|
||
restricted to eight-bit code values, while the domain names and
|
||
labels which are used as STD13 host identifiers are restricted to
|
||
a specific subset of US-ASCII.
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 20]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
The following diagram illustrates how the subordinate rules are
|
||
applied and interpreted against the master restrictions:
|
||
|
||
+-----------------------+
|
||
| Internationalized DNs |
|
||
+-----------------------+
|
||
any UCS character codes
|
||
/ |
|
||
/ |
|
||
/ |
|
||
/ |
|
||
+-----------+ +-----------+ +------------+
|
||
| Int. Host | | STD13 DNs +-----+ STD13 Host |
|
||
+-----------+ +-----------+ +------------+
|
||
normalized character ASCII letters,
|
||
subset of codes 0x00 numbers, and
|
||
UCS chars through 0xFF hyphen char
|
||
|
||
As can be seen, the internationalized domain names and labels
|
||
rules allow any UCS character code to be stored, although each
|
||
particular usage of the domain names and labels will have their
|
||
own secondary rules and restrictions.
|
||
|
||
In order to allow future documents to define additional rules as
|
||
required for their usage, this document defines very few global
|
||
rules on the core internationalized domain names and labels.
|
||
|
||
|
||
4.1.1. IDN syntax and structure
|
||
|
||
In this specification, an internationalized domain name consists
|
||
of a variable number of labels, each of which contain a variable
|
||
number of UCS character codes, not all of which will have defined
|
||
UCS character interpretations.
|
||
|
||
Furthermore, the encoding system which is used to store and
|
||
interpret those values on a system is not relevant to this
|
||
specification, and is therefore not defined. The characters in a
|
||
label can be stored in memory or on disk as UTF-8, UCS-4, ACE, or
|
||
any other storage encoding which is desired by the operators and
|
||
implementers of the affected system, as long as that encoding
|
||
system is reversible to the canonical UCS character code values,
|
||
and is able to represent the necessary range of UCS characters
|
||
(the "necessary range" varies by operation).
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 21]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
The only universal restrictions which apply to internationalized
|
||
domain names and labels are those which govern length. This
|
||
specification requires that labels from internationalized domain
|
||
names MUST be restricted to a minimum length of two characters and
|
||
a maximum length of 63 characters, inclusive. The exception to
|
||
this rule is the root domain, which is always represented by a
|
||
zero-length label. Note that this rule specifically refers to the
|
||
canonical UCS characters, rather than any encoded form (encoding
|
||
will often result in labels and domain names with fewer actual
|
||
characters, due to overhead from the encoding algorithm).
|
||
|
||
A fully-qualified internationalized domain name is formed by
|
||
joining a series of labels together, with the most-contextually
|
||
specific label in the left-most position of the label sequence,
|
||
and with the root domain occupying the right-most position. The
|
||
sum total of all labels in an internationalized domain name MUST
|
||
NOT exceed 255 characters, inclusive. Any number of labels MAY be
|
||
stored in the domain name, but the sum total of their lengths MUST
|
||
NOT exceed this limit.
|
||
|
||
However, labels which contain UCS character codes greater than
|
||
U+007F will result in multi-byte UTF-8 and ACE encodings, so the
|
||
maximum length of a label or an internationalized domain name is
|
||
governed by their UTF-8 and ACE encoded lengths. Both encodings
|
||
MUST result in an encoded length of 63 octets or less in order to
|
||
be usable, with a maximum cumulative length of 255 octets.
|
||
|
||
|
||
4.1.2. IDN transfer encodings
|
||
|
||
The UCS is currently occupies a 21-bit range of character code
|
||
values, containing tens of thousands of assigned characters, and
|
||
hundreds of thousands of unassigned characters. Due to the multi-
|
||
byte nature of the code point values, UCS characters cannot be
|
||
passed as protocol or application data in most of the existing
|
||
Internet protocols (including DNS messages), at least not without
|
||
the help of some kind of encoding scheme. At the very least, the
|
||
UCS character values have to be encoded as eight-bit sequences if
|
||
they are to fit within existing eight-bit data structures, and
|
||
have to be encoded as a subset of US-ASCII characters if they are
|
||
to be usable with legacy protocols and applications which only use
|
||
STD13's host identifier rules for their structured domain name
|
||
data types.
|
||
|
||
With this objective in mind, this document defines three different
|
||
transfer encoding systems which can be used to convert
|
||
|
||
Hall I-D Expires: May 2002 [page 22]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
internationalized domain names and labels into a form which is
|
||
suitable for transfer in different data streams. These are the
|
||
legacy STD13 octet encoding, ACE, and UTF-8. Each of these
|
||
encoding schemes provide different benefits and capabilities to
|
||
the internationalized DNS effort.
|
||
|
||
* STD13 octets. The STD13 octet encoding scheme provides a
|
||
direct one-to-one mapping between eight-bit characters and
|
||
their eight-bit values, but it is only capable of storing
|
||
character codes in the range of U+0000 through U+00FF,
|
||
which severely restricts its usefulness.
|
||
|
||
* ACE. The ACE encoding scheme is capable of storing UCS
|
||
character code value as seven-bit sequences in STD13 legacy
|
||
labels. While this makes it practically compatible with the
|
||
legacy host identifier rules, the resulting data imposes
|
||
additional labor on the Internet community, and the reuse
|
||
of the legacy label also results in certain amounts of
|
||
ambiguity with some DNS domain names and labels.
|
||
|
||
* UTF-8. The UTF-8 encoding scheme is capable of encoding all
|
||
UCS character code values as sequences of eight-bit data
|
||
which are compatible with legacy DNS message restrictions,
|
||
but the encoded output requires explicit support from
|
||
internationalized applications and protocols. UTF-8 output
|
||
uses a new label type in order to prevent additional
|
||
ambiguity problems from arising.
|
||
|
||
The table below illustrates the UCS character code sequences which
|
||
are supported by each of the different encoding schemes.
|
||
|
||
STD13
|
||
Octets ACE UTF-8
|
||
+-------+-------+--------
|
||
| | |
|
||
US-ASCII | Y | | Y
|
||
| | |
|
||
Eight-Bit | Y | Y | Y
|
||
| | |
|
||
Any UCS Chars | | Y | Y
|
||
| | |
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 23]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
More specifically, the character code sequence ranges and their
|
||
valid encodings are:
|
||
|
||
* US-ASCII. If a label only contains character codes from the
|
||
range of U+0000 through U+007F, then it MAY be encoded as a
|
||
legacy STD13 octet sequence or UTF-8, but MUST NOT be
|
||
encoded as ACE.
|
||
|
||
Note that this specification explicitly prohibits seven-bit
|
||
labels from being encoded as ACE data, since such an action
|
||
would be redundant, results in greater processing overhead
|
||
for those labels, and multiple representations introduce
|
||
problems with caches on legacy systems. Furthermore,
|
||
certain security risks would be introduced if this were
|
||
allowed. For example, a malicious user could register or
|
||
purposefully create an ACE encoded representation of the
|
||
"example.com" label sequence such that users mistakenly
|
||
sent sensitive data to malicious systems.
|
||
|
||
In order to prevent these problems from occurring, this
|
||
specification requires that any ACE-encoded label which
|
||
consists entirely of seven-bit characters MUST be
|
||
immediately discarded with extreme prejudice. This rule
|
||
applies to every implementation of this specification,
|
||
including any applications, resolvers, caches or servers
|
||
which process labels.
|
||
|
||
* Eight-bit codes. If a label contains character codes from
|
||
the eight-bit range of U+0000 through U+00FF, then it MAY
|
||
be encoded as STD13 octet sequences, ACE, or UTF-8. This
|
||
rule specifically requires that the label MUST contain at
|
||
least one character from the eight-bit range, MAY contain
|
||
any number of characters from the seven-bit range, but MUST
|
||
NOT contain characters with code values which are greater
|
||
than U+00FF.
|
||
|
||
Since the STD13 octet encoding and ACE both use the legacy
|
||
STD13 label type, this specification relies on the input
|
||
encoding of a domain name in order to determine the output
|
||
encoding. In some cases, however, the input encoding will
|
||
not be clear, or will not be specified, and this can result
|
||
in some ambiguity with label sequences from this range.
|
||
|
||
For example, if the domain name provided in a query
|
||
consists of seven-bit labels, then the STD13 octet sequence
|
||
is the only valid encoding for the legacy STD13 label,
|
||
|
||
Hall I-D Expires: May 2002 [page 24]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
meaning that ACE could not have been used in the query. If
|
||
the specified domain name exists as a CNAME resource record
|
||
which refers to a domain name that contains eight-bit
|
||
character codes, then the proper output encoding for that
|
||
domain name will not be clearly discernable. Moreover, the
|
||
STD13 and ACE encodings will generate different results,
|
||
since the STD13 octet sequence will only contain a single
|
||
octet for the eight-bit character, while the ACE encoding
|
||
will contain multiple octets of encoded data.
|
||
|
||
When this situation arises, systems MUST give preference to
|
||
the ACE encoding, on the assumption that the referenced
|
||
character is more likely to represent a UCS character than
|
||
an eight-bit code value (the UCS characters in this range
|
||
are Latin-1, which are the most common characters after the
|
||
legacy US-ASCII set). Furthermore, the ACE encoded
|
||
representation of these characters allow for a broader
|
||
range of subsequent operations (since it complies with the
|
||
legacy host naming restrictions, it can be used with CNAME
|
||
resource records that refer to hosts), while the STD13
|
||
octet encoded representation does not.
|
||
|
||
It is possible to avoid this scenario on authoritative zone
|
||
servers (and thus the affected caches) by allowing the
|
||
operator to specify whether or not the input is Latin-1 UCS
|
||
character data or binary data, with the server generating
|
||
the proper output accordingly. Also note that the default
|
||
encoding specified by this document is UTF-8, which does
|
||
not suffer from the ambiguity problems described above.
|
||
|
||
* Any UCS character codes. If a label consists of any
|
||
character codes greater than U+00FF, then it MAY be encoded
|
||
as ACE or UTF-8, but MUST NOT be encoded as STD13 octet
|
||
sequences. STD13 is not capable of representing character
|
||
codes greater than U+00FF, so it cannot be used with any
|
||
UCS characters beyond the eight-bit range.
|
||
|
||
Encodings are performed on a per-label basis. Each label MUST NOT
|
||
be encoded more than once. Also note that recursive encodings
|
||
result in applications discarding the domain name.
|
||
|
||
When the STD13 octet encoding is used to encode labels for
|
||
transmission, the labels are encoded according to the rules
|
||
specified in STD13, and are encapsulated in STD13 legacy labels.
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 25]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
When ACE is used to encode labels for transmission, the labels are
|
||
encoded according to the rules specified in <ACE-Z>, and are
|
||
encapsulated in STD13 legacy labels (this process is described in
|
||
section 5.2).
|
||
|
||
When UTF-8 is used to encode labels for transmission, the labels
|
||
are encoded according to the rules specified in RFC2279, and are
|
||
encapsulated in EDNS/UTF-8 extended labels (the format of this
|
||
label is described in section 5.1).
|
||
|
||
Note that a domain name MAY contain any combination of STD13 octet
|
||
encoded labels and ACE encoded labels. However, if a domain name
|
||
contains any UTF-8 encoded labels, then ALL of the labels from
|
||
that domain name MUST be encoded as UTF-8 data. This rule
|
||
primarily exists so that DNS compression services can be
|
||
maintained consistently, but it also prevents mixed referrals
|
||
which can trigger unnecessary fall-back processing, and also
|
||
provides a single encoding representation to internationalized
|
||
systems which benefits efficiency.
|
||
|
||
The root domain (as specified by the zero-length label at the
|
||
right edge of the domain name) MUST NOT be encoded with ACE. More
|
||
specifically, zero-length labels MUST NOT contain any character
|
||
data of any kind, and since ACE labels have prefix strings, they
|
||
are explicitly forbidden from being used for the root domain.
|
||
|
||
|
||
4.1.3. IDN comparison operations
|
||
|
||
When an internationalized domain name label is received from the
|
||
network as ACE or UTF-8 encoded data, the labels MUST be decoded
|
||
to their canonical UCS character representation, and the resulting
|
||
UCS characters MUST be compared as case-exact sequences to their
|
||
stored equivalents. Except where specifically required in this
|
||
specification (EG, validity tests which are performed by
|
||
applications), normalization and case-conversion MUST NOT be
|
||
performed against the resulting UCS character codes prior to any
|
||
comparison operations being performed.
|
||
|
||
However, internationalized domain name labels which are received
|
||
as STD13 octet sequences MUST be given special treatment, as these
|
||
domain names could have originated from legacy systems operating
|
||
under STD13's rules. In this case, the seven-bit US-ASCII
|
||
alphabetic characters (U+0041 through U+005A, and U+0061 through
|
||
U+007A) from those labels MUST be compared in a case-neutral form.
|
||
All other code values MUST be compared as case-exact code values
|
||
|
||
Hall I-D Expires: May 2002 [page 26]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
(this particularly includes eight-bit characters, which were not
|
||
defined by STD13).
|
||
|
||
|
||
4.2. Internationalized Host Identifiers
|
||
|
||
Internationalized host identifiers are a subset of the
|
||
internationalized domain names described in section 4.1, which
|
||
only use a subset of the allowable UCS characters, but which reuse
|
||
the global transfer encodings and comparison routines.
|
||
|
||
Most of the displayable characters from the UCS can be used in
|
||
host identifiers, and there are no additional rules governing the
|
||
ordering or length of their labels. However, the characters which
|
||
are used in internationalized host identifiers MUST be normalized
|
||
and case-converted before they are encoded for storage or
|
||
transfer. This requires more effort on the part of applications
|
||
and servers when the internationalized domain names are initially
|
||
created, but results in less ambiguity and lower processing
|
||
requirements for servers, caches and resolvers during subsequent
|
||
comparison operations.
|
||
|
||
The restrictions which govern the creation of internationalized
|
||
host identifiers are as follows:
|
||
|
||
a. Labels MUST be restricted to the subset of characters which
|
||
are permitted by <nameprep> [nameprep]. Characters which
|
||
are prohibited by <nameprep> MUST NOT appear in any label
|
||
of any internationalized host identifier.
|
||
|
||
b. Labels MUST be normalized through <nameprep> before they
|
||
are stored or encoded for transfer. Internationalized host
|
||
identifiers will not be normalized as part of any
|
||
comparison operation, so systems MUST normalize the labels
|
||
before they are stored or transmitted.
|
||
|
||
c. Labels MUST be converted to lowercase according to the
|
||
case-mappings rules specified in <nameprep> before they are
|
||
stored or encoded for transfer. Internationalized host
|
||
identifiers will not be converted to lowercase as part of
|
||
any comparison operation, so systems MUST normalize the
|
||
labels before they are stored or transmitted.
|
||
|
||
According to the rules above, a label from an internationalized
|
||
host identifier which was originally created with the UCS
|
||
character sequence of <LATIN CAPITAL LETTER A><COMBINING ACUTE
|
||
|
||
Hall I-D Expires: May 2002 [page 27]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
ACCENT><LATIN CAPITAL LETTER B> (U+0041 U+0301 U+0042) would be
|
||
normalized and lowercased to <LATIN SMALL LETTER A WITH
|
||
ACUTE><LATIN SMALL LETTER B> (U+00E1 U+0062). The normalized,
|
||
lowercase form would be used as the canonical UCS character
|
||
representation of that label when it was encoded for storage and
|
||
transmission purposes, and would be the form which was used for
|
||
comparison operations on any resolvers, caches and servers.
|
||
|
||
Internationalized host identifiers which are received from the
|
||
network can contain labels which have been encoded as STD13 octet
|
||
sequences, ACE or UTF-8. In all of these cases, the comparison
|
||
rules defined in section 4.1.3 MUST be applied.
|
||
|
||
|
||
4.3. STD13 Domain Names
|
||
|
||
STD13 allows any eight-bit code values to be used in domain name
|
||
labels. However, STD13 host identifiers (as described in section
|
||
4.4 of this specification) are the most common form of STD13
|
||
domain names, and have much tighter restrictions.
|
||
|
||
There are common uses of STD13 domain names which do not comply
|
||
with the STD13 host identifier subset, however. One common example
|
||
of this is SRV identifiers, which use an underscore character
|
||
(U+005F) as part of their label syntax. Another common example is
|
||
found when email addresses are provided in SOA and RP resource
|
||
records, and where the left-hand side of the email address is
|
||
stored as an STD13 domain name label which does not represent a
|
||
host identifier. Furthermore, email addresses often contain extra
|
||
characters which are not legal in STD13 host identifiers, such as
|
||
a full-stop character (U+002E). For example, "joe.admin" could be
|
||
stored as an STD13 domain name label in the fully-qualified domain
|
||
name of "joe.admin.example.com.", which would represent the email
|
||
address of "joe.admin@example.com" when that domain name was
|
||
extracted from the SOA or RP resource record and processed.
|
||
|
||
Implementations of this specification MUST allow STD13 domain
|
||
names to be created and stored, using the following rules:
|
||
|
||
a. Labels MUST be restricted to the code values of U+0000
|
||
through U+00FF. Restrictions on character content MUST NOT
|
||
be applied (note that if this domain name will be used as
|
||
part of an STD13 host identifier, the rules specified in
|
||
section 4.4 MUST be used instead).
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 28]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
b. Labels MUST NOT be normalized or lowercased before they are
|
||
stored or encoded for transfer.
|
||
|
||
c. Systems MUST allow STD13 domain names to be specified as
|
||
exact sequences of eight-bit octet values, and MUST NOT
|
||
treat these sequences as canonical UCS characters which are
|
||
normalized or lowercased. STD13 defines an escaping
|
||
mechanism whereby the decimal value of the octet is
|
||
prefaced with a reverse-solidus (such as "\193"), which is
|
||
suggested for this usage.
|
||
|
||
STD13 domain names which are received from the network can contain
|
||
labels which have been encoded as STD13 octet sequences, ACE or
|
||
UTF-8. In all of these cases, the comparison rules defined in
|
||
section 4.1.3 MUST be applied. Note that some of these sequences
|
||
can contain octet code values which have not been normalized or
|
||
lowercased by the originating system, since these values can be
|
||
used to specify binary domain names.
|
||
|
||
|
||
4.4. STD13 Host Identifiers
|
||
|
||
This document does not deprecate, replace or modify the host name
|
||
rules defined by RFC952, STD3 or STD13 as they apply to legacy
|
||
host identifiers. However, there are several issues which affect
|
||
the usage of these domain names and their labels in this system.
|
||
|
||
The range of characters which are currently defined as valid in
|
||
STD13 host identifiers are the uppercase and lowercase letters,
|
||
numbers and hyphen character from US-ASCII. No other characters
|
||
are allowed to be used. Furthermore, the current rules also
|
||
prohibit the use of the hyphen character in the first or last
|
||
character position of a host identifier label.
|
||
|
||
Implementations of this specification MUST allow STD13 host
|
||
identifiers to be created and stored, using the following rules:
|
||
|
||
a. Labels MUST be restricted to the code values of U+002D,
|
||
U+0031 through U+0039, U+0041 through U+005A, and U+0061
|
||
through U+007A.
|
||
|
||
b. Labels MUST NOT contain the code value of U+002D in either
|
||
the first or last character position of the label.
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 29]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
c. The alphabetic characters MUST be converted to lowercase
|
||
before they are stored or transmitted. STD13 host
|
||
identifiers are always compared in a case-neutral form.
|
||
|
||
STD13 host identifiers which are received from the network can
|
||
contain labels which have been encoded as STD13 octet sequences
|
||
UTF-8. In both cases, the comparison rules defined in section
|
||
4.1.3 MUST be applied.
|
||
|
||
|
||
5. Transfer Encodings and Label Types
|
||
|
||
As was discussed in section 4.1.2, internationalized domain names
|
||
and labels are required to be encoded as either eight-bit or
|
||
seven-bit data whenever they are transmitted as protocol or
|
||
application data.
|
||
|
||
The particular output encoding format which will be used for any
|
||
given label will be primarily determined by the capabilities of
|
||
the participating end-point systems. If the application or
|
||
protocol which is relaying the domain name labels supports
|
||
internationalized domain names directly then UTF-8 encoded labels
|
||
can be used, but if the protocol or application is only capable of
|
||
supporting STD13 host identifiers as domain name data, then the
|
||
STD13 octet and/or ACE encoded labels will have to be used.
|
||
|
||
With DNS messages in particular, the "data type" is the label
|
||
encapsulation in use. Although STD13 legacy labels allow for the
|
||
use of eight-bit codes, multiple encodings for the same basic
|
||
character data result in interpretation problems without some form
|
||
of ancillary tagging service. For this reason, each encoding is
|
||
represented differently by this specification. When the STD13
|
||
legacy label contains STD13 octet sequences then no tagging is
|
||
provided, but if the STD13 legacy label contains ACE encoded data
|
||
then the encoded sequence is tagged with an ACE identifier (a
|
||
character prefix which does not normally appear in labels). When
|
||
UTF-8 domain names are provided, an EDNS/UTF-8 extended label is
|
||
used to encapsulate the internationalized domain name.
|
||
|
||
Furthermore, the encoding which is used for any label in the
|
||
message will also determine the label type which is used to
|
||
encapsulate and transfer the entire domain name. If any label
|
||
contains EDNS/UTF-8 extended labels, then all of the labels from
|
||
that domain name are required to be encapsulated for transfer in
|
||
EDNS/UTF-8 extended labels. Conversely, if a domain name contains
|
||
ACE or STD13 octet encoded labels, then all of the labels from
|
||
|
||
Hall I-D Expires: May 2002 [page 30]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
that domain name are required to be encapsulated for transfer
|
||
using the STD13 legacy label format.
|
||
|
||
Note that other legacy applications and protocols will most likely
|
||
be required to provide extended encodings or negotiation features
|
||
before they can exchange internationalized domain names directly.
|
||
However, new applications and protocols which are subsequently
|
||
written to comply with BCP18 and this specification should not
|
||
require any such effort, as they should be capable of transferring
|
||
UTF-8 domain names from the beginning.
|
||
|
||
|
||
5.1. The EDNS/UTF-8 Label Type
|
||
|
||
Any internationalized domain name label which has been encoded as
|
||
UTF-8 for transmission in a DNS message MUST be encapsulated as a
|
||
EDNS/UTF-8 label.
|
||
|
||
The EDNS/UTF-8 extended label is an instance of EDNS extended
|
||
label types (as defined by RFC2671). Extended labels are indicated
|
||
by the leading bit pattern of 0b01 in the label type field (the
|
||
first two bits from the "label length" octet of the STD13 legacy
|
||
label type), with the remaining six bits of this octet indicating
|
||
the extended label type in use. The EDNS/UTF-8 label type uses the
|
||
binary value of 0b000011 for this indication (note that IANA may
|
||
change this assignment).
|
||
|
||
EDNS/UTF-8 labels contain two subordinate units of data. The first
|
||
octet contains a length indicator which works exactly the same as
|
||
the length octet as used by STD13 legacy labels: if the first two
|
||
bits of this octet are 0b00 then the rest of that octet provides
|
||
the length of the label data field, but if the first two bits of
|
||
this octet are 0b11 then the label is a pointer to some other
|
||
label, and the remainder of the length octet provides an off-set
|
||
which points to the length octet of the referenced label, as per
|
||
the rules provided in section 4.1.4 of RFC 1035 (STD13, part 2).
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 31]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
The structure of the EDNS/UTF-8 extended label is illustrated by
|
||
the following figure.
|
||
|
||
1 1 1 1 1 1 1 1 1 1
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|0 1|0 0 0 0 1 1| length | label data /// |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
||
0b01 “ The extended label identifier.
|
||
|
||
0b000011 “ The EDNS/UTF-8 extended label type identifier.
|
||
|
||
Length “ The number of octets in the label data, or the off-
|
||
set to the length octet of another EDNS/UTF-8 label.
|
||
|
||
Label data “ The label data, encoded as UTF-8 octets.
|
||
|
||
The following example shows the domain name of me.com, where the
|
||
"e" in "me" is the UCS character <LATIN SMALL LETTER E WITH ACUTE>
|
||
(U+00E9), which has the UTF-8 encoded octet sequence of 0xC3A9.
|
||
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
20 | 0 1 0 0 0 0 1 1| 0x03 |
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
22 | 0x6D (m) | 0xC3 (e') |
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
24 | 0xA9 (e') | 0 1 0 0 0 0 1 1|
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
26 | 0x03 | 0x63 (c) |
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
28 | 0x6F (o) | 0x6D (m) |
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
30 | 0 1 0 0 0 0 1 1| 0x00 |
|
||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||
|
||
Octet 20 identifies the EDNS/UTF-8 extended label type, while
|
||
octet 21 indicates that the label is three octets long. Octet 22
|
||
contains the UTF-8 value for lowercase "m", while octets 23 and 24
|
||
contain the UTF-8 value for the UCS character <LATIN SMALL LETTER
|
||
E WITH ACUTE> (encoded as 0xC3A9).
|
||
|
||
Similarly, octet 25 identifies another EDNS/UTF-8 extended label
|
||
type, while octet 26 indicates that the label is three octets
|
||
long, while octets 27 through 29 contain the UTF-8 values for the
|
||
lowercase alphabetic sequence of "com".
|
||
|
||
Hall I-D Expires: May 2002 [page 32]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
Finally, octet 30 identifies another EDNS/UTF-8 extended label
|
||
type, while octet 31 indicates that the label is zero octets in
|
||
length, thereby signifying the root zone (the end of the queried
|
||
domain name).
|
||
|
||
Note that the use of the EDNS/UTF-8 extended label type serves
|
||
multiple purposes. On the one hand, it provides a method of
|
||
signaling the resolver's capabilities to the server, so that the
|
||
server can determine which format it needs to use when returning
|
||
answers, referrals or errors. Moreover, using an encapsulation
|
||
format which is not backwards compatible prevents certain
|
||
ambiguity problems which can result from overloading the STD13
|
||
legacy label with multiple encodings. These problems are seen in
|
||
certain situations with STD13 octet encoding and ACE, where a
|
||
server cannot adequately determine which encoding a resolver
|
||
desires. By using a separate extended label type for UT-8, these
|
||
kinds of ambiguities are avoided.
|
||
|
||
There are additional benefits which come from using EDNS extended
|
||
label types, which are best expressed as "future possibilities".
|
||
Once the EDNS extended label mechanisms are widely deployed, it
|
||
becomes feasible to specify additional encoding mechanisms as soon
|
||
as the Internet community deems it desirable. In this regard,
|
||
defining alternative encodings is much easier the second time.
|
||
|
||
|
||
5.2. The STD13 Legacy Label Type
|
||
|
||
Any internationalized domain name label which has been encoded as
|
||
ACE or STD13 octet sequences for transmission in a DNS message
|
||
MUST be encapsulated within an STD13 legacy label.
|
||
|
||
This document does not deprecate, replace or extend the STD13
|
||
octet encoding or label encapsulation rules defined by STD13.
|
||
However, this document does provide some guidance on the creation
|
||
and interpretation of ACE encoded labels when they are stored in
|
||
legacy labels, which is necessary in order for recipient systems
|
||
to properly detect and decode the label contents.
|
||
|
||
Note that STD13 octet sequences and ACE data MAY both be provided
|
||
the same domain name. As such, each STD13 legacy label from a DNS
|
||
message must be examined and processed independently.
|
||
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 33]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
5.2.1. ACE encoded labels
|
||
|
||
ACE encoded labels always begin with the character sequence of
|
||
<TBD> (this document uses "zz--" as a placeholder sequence until a
|
||
formal assignment is made). Any label which contains ACE encoded
|
||
data MUST begin with this character sequence prefix. Similarly,
|
||
any label which begins with this character sequence MUST be
|
||
recognized and processed as an ACE encoded label, according to the
|
||
rules defined in this specification.
|
||
|
||
Encoding and encapsulating a label as ACE data is a three-part
|
||
process, as follows:
|
||
|
||
a. Encode the canonical UCS character data from the
|
||
internationalized domain name label into ACE using the
|
||
procedure defined in <ACE-Z>
|
||
|
||
b. Preface the encoded output with the "zz--" prefix sequence,
|
||
thereby indicating that this label contains ACE encoded UCS
|
||
character data.
|
||
|
||
c. Determine the length of the encoded data and store this
|
||
value in the STD13 legacy label's length octet.
|
||
|
||
Decoding an ACE label is the opposite of that process.
|
||
|
||
Note that whenever the ACE algorithm encounters a seven-bit
|
||
character code in the input, it is passed through unmodified to
|
||
the encoded output. If a label only contains seven-bit character
|
||
codes, the label MUST NOT be encoded as ACE, and MUST be encoded
|
||
as either STD13 octet sequences or UTF-8. Forcing a seven-bit
|
||
label to be encoded as ACE serves no benefit, incurs additional
|
||
processing on the end-point systems, and can also expose certain
|
||
security risks. Any system which is capable of generating and
|
||
deciphering ACE encoded labels is required to treat such sequences
|
||
as hostile, and MUST dispose of them immediately without any
|
||
further processing immediately; systems are forbidden to even
|
||
return these labels in DNS error messages.
|
||
|
||
Similarly, ACE MUST NOT be used to encode any zero-length labels
|
||
(including but not specifically limited to the root domain), since
|
||
the presence of prefix characters in these labels can invalidate
|
||
their protocol-specific interpretations.
|
||
|
||
When an STD13 legacy label is received which has "zz--" in the
|
||
first four character positions, the label MUST be treated as an
|
||
|
||
Hall I-D Expires: May 2002 [page 34]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
ACE-encoded internationalized domain name, and MUST be decoded to
|
||
its canonical UCS character values for further processing.
|
||
|
||
Note that STD13 legacy labels MUST be verified before the ACE
|
||
encoded data is extracted (as per the rules defined in STD13 which
|
||
govern the STD13 legacy label type), but systems which are
|
||
compliant with this specification MUST perform all subsequent
|
||
comparison, caching, or storage operations against the canonical
|
||
UCS characters, and MUST NOT use the ACE encoded label sequence
|
||
for any of these operations.
|
||
|
||
Note that the legacy systems which are not compliant with this
|
||
specification will treat ACE encoded labels as any other STD13
|
||
legacy label.
|
||
|
||
|
||
5.2.2. STD13 octet encoded labels
|
||
|
||
Any STD13 legacy labels which do not begin with the ACE prefix
|
||
MUST be treated as STD13 octet encoding sequences. The rules for
|
||
this process are defined by STD13's default label encapsulation
|
||
services, although this document also provides some clarifications
|
||
on the use of this encoding with internationalized domain names
|
||
and labels.
|
||
|
||
Whenever the STD13 octet sequence is used to encode the labels
|
||
from an internationalized domain name, the octet values of the
|
||
canonical UCS characters are stored directly in the label. Because
|
||
the DNS message is limited to octets, the range of UCS character
|
||
codes which are eligible for use with STD13 octet sequences is
|
||
limited to U+0000 through U+00FF. If any UCS character codes
|
||
outside this range need to be transferred, the internationalized
|
||
domain name label will have to be encoded as ACE or UTF-8.
|
||
|
||
Note that comparison operations for the seven-bit range of
|
||
alphabetic character values MUST be performed in a case-neutral
|
||
form, although eight-bit code values MUST NOT be normalized or
|
||
case-converted as part of a comparison operation. These rules are
|
||
required in order to ensure backwards compatibility with the STD13
|
||
compliant systems which may be generating these labels as parts of
|
||
an STD13 domain name while also supporting the normalization and
|
||
case-conversion which may have been applied to the UCS characters
|
||
in the storage or transfer encoding systems.
|
||
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 35]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
6. Application Guidelines
|
||
|
||
As was discussed in section 3.3, there are multiple scenarios in
|
||
which an application can make use of internationalized domain
|
||
names, ranging from simple lookups of connection identifiers to
|
||
abstract encapsulations of unstructured application data. This is
|
||
an extremely broad range of uses, which is complicated by the
|
||
extreme pervasiveness of applications and protocols that use
|
||
domain names for one or more of these purposes.
|
||
|
||
Furthermore, network applications face a complex array of input
|
||
and output operations which will cumulatively affect the ability
|
||
of that application to make use of the internationalized domain
|
||
name system for various services and functions. These issues are
|
||
illustrated by the figure below:
|
||
|
||
[IDNs] [IDNs]
|
||
| ^
|
||
| |
|
||
+------V------+ +------+------+
|
||
| input | | output |
|
||
| charset | | charset |
|
||
+-----------+-+ +-+-----------+
|
||
\ /
|
||
+---+-----+---+
|
||
| Application |
|
||
+---+-----+---+
|
||
/ \
|
||
+-----------+-+ +-+-----------+
|
||
| lookups | | app data <---> [IDNs]
|
||
+------+------+ +-------------+
|
||
|
|
||
+------+------+
|
||
| resolver <---> [IDNs]
|
||
+-------------+
|
||
|
||
As can be seen, the ability for an applications to complete adopt
|
||
internationalized domain names will be determined by many factors,
|
||
any one of which could prevent the application from completely
|
||
incorporating the restrictions and recommendations prescribed by
|
||
this specification.
|
||
|
||
In order to allow for a flexible adoption schedule, this
|
||
specification defines very few mandates that applications must
|
||
adopt, but instead focuses on recommendations which applications
|
||
should comply with whenever they need to use internationalized
|
||
|
||
Hall I-D Expires: May 2002 [page 36]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
domain names, and also provides recommendations for situations
|
||
where the preferred behavior is not feasible. Applications which
|
||
are compliant with all of the recommendations provided in this
|
||
specification will be able to generate, store, transfer and
|
||
resolve internationalized domain names throughout all of their
|
||
operations, using UTF-8 as a common encoding for all of these
|
||
operations. Meanwhile, applications which are not in complete
|
||
compliance with this specification will still be able to make use
|
||
of the internationalized domain names in these operations,
|
||
although such access may be limited to using backwards-compatible
|
||
encodings which require greater amounts of effort to implement and
|
||
which provide fewer benefits.
|
||
|
||
|
||
6.1. Input and Output Charsets
|
||
|
||
If an application is unable to accept, process, store or display
|
||
characters from the complete UCS repertoire, that application's
|
||
support for internationalized domain names will be somewhat
|
||
limited, by definition.
|
||
|
||
Although this document does not mandate any particular charset or
|
||
encoding which all applications must use for all operations,
|
||
applications SHOULD use coded character sets or encodings which
|
||
can handle characters from a reasonable number of scripts.
|
||
|
||
In particular, the following areas have specific requirements:
|
||
|
||
* Input charsets and encodings. Since UTF-8 is used as the
|
||
default encoding for internationalized domain names
|
||
throughout this specification (and others, such as BCP18),
|
||
UTF-8 is also RECOMMENDED for use with input encodings of
|
||
internationalized domain names in particular, although this
|
||
is not required. Many platforms and development
|
||
environments support UTF-8 as a local encoding of the UCS
|
||
and it can be reasonably used with many types of input
|
||
(such as configuration files), although many systems will
|
||
require a specific encoding (such as UCS-2, or ISO/IEC
|
||
8859-1) in situations which require memory access or
|
||
keyboard input.
|
||
|
||
Regardless of the input encodings used, implementations
|
||
MUST map domain names and labels to their canonical UCS
|
||
characters for any normalization and case-conversion work
|
||
which is subsequently required by any DNS lookups (see
|
||
section 6.3).
|
||
|
||
Hall I-D Expires: May 2002 [page 37]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
* Output choices will likely be limited to a system-preferred
|
||
charset or encoding. In general, this document RECOMMENDS
|
||
that output systems choose an output charset or encoding
|
||
which reflects the data being provided. However,
|
||
applications MUST NOT display unknown characters with
|
||
generic replacement characters (such as boxes or circles)
|
||
if it is known that the original characters are not
|
||
available for display with the specified charset, as such
|
||
characters will almost certainly trigger failure conditions
|
||
in subsequent protocol operations.
|
||
|
||
In those situations where adequate input or output charsets or
|
||
encodings are unavailable, applications MAY use ACE to encode
|
||
internationalized domain names for the purpose of ensuring that
|
||
the data is provided intact. Since ACE is capable of representing
|
||
UCS characters as sequences of seven-bit characters, it is
|
||
functionally usable as a last line of defense in almost any
|
||
environment, with the caveat that ACE encoding sequences are
|
||
extremely cryptic and will likely result in lower levels of
|
||
usability and functionality.
|
||
|
||
|
||
6.2. Protocol and Application Data
|
||
|
||
There are several interrelated issues which will determine an
|
||
application's ability to provide or accept internationalized
|
||
domain names as protocol or application data, although the
|
||
principle determining factors for any such usage will generally be
|
||
the capabilities of the underlying protocol itself.
|
||
|
||
If a protocol allows negotiation or tagging services in order to
|
||
distinguish between different encodings, that protocol can likely
|
||
be extended to support the use of UTF-8 as protocol or application
|
||
data through command/response negotiation options or through data-
|
||
type tags. Older protocols which do not provide any negotiation
|
||
services or which mandate the use of US-ASCII in all data will
|
||
likely require the use of ACE encoded domain names as a short-term
|
||
measure until the protocol is made compliant with BCP18.
|
||
|
||
* Protocol data. If the protocol supports UTF-8 encoded
|
||
internationalized domain names in commands or responses,
|
||
then that encoding SHOULD be used wherever it is allowed.
|
||
If UTF-8 is not supported by the protocol, STD13 octet
|
||
sequences and/or ACE encoded equivalents of the
|
||
internationalized domain name MUST be used.
|
||
|
||
Hall I-D Expires: May 2002 [page 38]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
|
||
In some cases, this negotiation can be performed on a per-
|
||
session basis, while in other cases this work will need to
|
||
be performed for each transaction within the session, while
|
||
in other cases the internationalized domain names will have
|
||
to be tagged whenever they are provided as protocol or
|
||
application data.
|
||
|
||
The DNS protocol is itself an example of a protocol which
|
||
requires tagging in order for internationalized domain
|
||
names to be exchanged within the existing DNS message (with
|
||
these indicators taking the form of ACE encoding prefixes
|
||
and EDNS/UTF-8 extended label type codes). Meanwhile, a
|
||
protocol such as WHOIS can theoretically support a session-
|
||
wide negotiation option that allowed the use of
|
||
internationalized domain names as protocol and application
|
||
data for the duration of that session. Conversely, a
|
||
protocol such as SMTP will likely require the use of
|
||
session-specific identifiers for some operations, while
|
||
other operations may be able to use label tags (similar to
|
||
the existing support for domain literals, which are
|
||
identified by a pair of surrounding square brackets).
|
||
|
||
Regardless of the encodings which are used, implementations
|
||
MUST map domain names and labels to their canonical UCS
|
||
characters for any normalization and case-conversion work
|
||
which is subsequently required as part of a DNS lookup (see
|
||
section 6.3).
|
||
|
||
* Structured application data. Structured application data
|
||
such as URLs and email addresses MUST be processed
|
||
according to the rules which govern those data formats.
|
||
Applications MUST NOT perform any conversion or
|
||
transliteration which is not explicitly prescribed by the
|
||
governing documents, since non-standard usages are likely
|
||
to result in misinterpreted data.
|
||
|
||
* Unstructured application data. Domain names which appear as
|
||
unstructured data in application content are beyond the
|
||
control of this specification, and are generally subject to
|
||
the encoding and formatting desires of the end-users who
|
||
created the data. Generally speaking, it is RECOMMENDED
|
||
that applications allow users to enter or view documents in
|
||
whatever format they prefer, but that any conversion
|
||
between multiple source and destination charsets and
|
||
encodings use UCS as the translation intermediary, such
|
||
|
||
Hall I-D Expires: May 2002 [page 39]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
that internationalized domain names are properly converted
|
||
along with the rest of the application data.
|
||
|
||
In some cases, the application will need to probe the resolver
|
||
before it can use internationalized domain names as data. For
|
||
example, a participating system may need to determine the
|
||
internationalized domain name of the local system so that it can
|
||
provide this data in a protocol-specific banner message, and in
|
||
these cases, the application will have to communicate with the
|
||
resolver before this data can be provided.
|
||
|
||
Due to the usage-specific nature of internationalized domain names
|
||
within protocol and application data streams, each development
|
||
group will have to analyze the restrictions and capabilities which
|
||
affect their specific services independently.
|
||
|
||
|
||
6.3. DNS Lookups and Resolver Calls
|
||
|
||
One of the most frequent uses for domain names is for lookup
|
||
operations, such as for locating the IP addresses associated with
|
||
a specified domain name, determining the domain name associated
|
||
with a specified IP address, or performing a protocol-specific
|
||
lookup operation for a specific resource record (such as the MX or
|
||
SOA resource records associated with a specific domain).
|
||
|
||
Since these lookup operations do not directly affect external
|
||
protocols or data, internationalized domain names can be used for
|
||
lookup operations at the application's discretion. For example,
|
||
applications such as ping and netstat only use domain names for
|
||
display purposes, and can therefore make immediate use of
|
||
internationalized domain names within their protocol operations.
|
||
Similarly, a protocol can be limited to STD13 host identifiers as
|
||
protocol identifiers which will require the application to provide
|
||
internationalized domain names as ACE encoded sequences, but any
|
||
lookup operations which are necessary for the internationalized
|
||
domain names can still be performed in their native form. In these
|
||
cases, the protocol operations and lookup operations are separate
|
||
tasks with separate rules.
|
||
|
||
Similarly, applications are not required to use internationalized
|
||
domain names and internationalized resolver APIs for every lookup.
|
||
In some cases, it may be more efficient for an application to only
|
||
use internationalized domain names for lookup operations against
|
||
connection identifiers, and to use STD13 octet sequences or ACE
|
||
encoded legacy lookups for domain names which were obtained as
|
||
|
||
Hall I-D Expires: May 2002 [page 40]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
protocol or application data (this will be especially true in
|
||
those cases where the protocol does not yet provide an
|
||
internationalized domain name data-type). In those cases where an
|
||
application prefers to use the legacy resolution path, the
|
||
application MUST use the resolver's legacy APIs. For lookups
|
||
against internationalized domain names, the application MUST use
|
||
the resolver's internationalized APIs.
|
||
|
||
Note that this specification does not define a mandatory encoding
|
||
which must be used between the applications and the local
|
||
resolver. However, resolvers MUST provide at least one encoding
|
||
which is capable of supporting the entire UCS repertoire of
|
||
character codes, including character codes which are currently
|
||
unassigned. Since UTF-8 is the default encoding which is used
|
||
throughout this specification, it is also RECOMMENDED for use with
|
||
resolver APIs, although this is not required. Resolvers MAY
|
||
dictate a local encoding, with the only requirement being support
|
||
for the entire range of UCS character codes.
|
||
|
||
Regardless of the data being provided or the charset or encoding
|
||
which is used to provide that data, applications MUST normalize
|
||
and case-convert any internationalized host identifiers which it
|
||
generates or receives from a lookup operation. This process MUST
|
||
use the canonical UCS characters of the domain name according to
|
||
the rules specified in <nameprep> for every host identifier which
|
||
is sent to or received from a resolver.
|
||
|
||
If the application knows that the requested data specifically
|
||
refers to a host identifier, then the domain name data which is
|
||
returned by the resolver MUST be normalized and case-converted,
|
||
and the resulting domain name MUST be compared to the original
|
||
domain name which was received prior to the normalization and
|
||
case-conversion steps. If the processed domain name does not match
|
||
the domain name which was received, the domain name MUST be
|
||
discarded as malformed.
|
||
|
||
This step is necessary in order to ensure the integrity and
|
||
veracity of internationalized domain names which are processed by
|
||
applications, since there are multiple opportunities for errors to
|
||
be introduced (such as mistyped entries in the resolver's hosts
|
||
database, or malicious data which has been purposefully provided
|
||
in a zone), and these errors can result in sensitive data being
|
||
directed to the wrong network. Note that the above rule
|
||
specifically applies to host identifiers and not to all
|
||
internationalized domain names as a whole; applications MUST NOT
|
||
arbitrarily normalize and case-convert any and all domain names,
|
||
|
||
Hall I-D Expires: May 2002 [page 41]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
but MUST apply these steps to any and all domain names which are
|
||
known to be used as host identifiers.
|
||
|
||
As part of the processing rules for DNS lookups, it is expected
|
||
that an application can exchange internationalized domain names
|
||
with the resolver using a charset or encoding which is capable of
|
||
representing the entire UCS character code range. Towards this
|
||
objective, applications SHOULD test the capabilities of the
|
||
resolver prior to transferring internationalized domain names. In
|
||
those situations where the resolver is unable to support this
|
||
usage, the application MUST encode the internationalized domain
|
||
name as STD13 octet sequences or ACE, and pass the resulting STD13
|
||
host identifier to the resolver.
|
||
|
||
|
||
7. Resolver Guidelines
|
||
|
||
Resolvers play a crucial role in the use of internationalized
|
||
domain names, in that they provide the internationalized namespace
|
||
which applications work with. As part of this service, resolvers
|
||
provide encapsulation services for the internationalized domain
|
||
names which are exchanged with the applications, resolve queries
|
||
in the internationalized namespace on behalf of the applications,
|
||
and provide lookup matching for entries which are stored in a
|
||
local hosts database. Note that resolvers which cache answer data
|
||
for subsequent operations are also governed by the caching
|
||
restrictions provided in section 9.
|
||
|
||
|
||
7.1. Resolver APIs
|
||
|
||
Stub resolvers which communicate directly with applications that
|
||
are compliant with this specification are strongly encouraged to
|
||
provide a separate set of APIs for those applications to use
|
||
whenever internationalized domain names need to be provided in
|
||
queries or response messages.
|
||
|
||
The use of an internationalized API will generally facilitate
|
||
smoother operations for the applications, in that it will allow
|
||
the application to determine the capabilities of the resolver, to
|
||
obtain the internationalized domain name of the local system, and
|
||
to process queries for internationalized domain names as special
|
||
data types.
|
||
|
||
Furthermore, the use of internationalized versus legacy APIs
|
||
provides a way for resolvers to separate internationalized and
|
||
|
||
Hall I-D Expires: May 2002 [page 42]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
legacy application query paths, such that the legacy APIs only
|
||
result in STD13 legacy labels, while the internationalized APIs
|
||
generate and trigger EDNS/UTF-8 extended labels. The output
|
||
formatting of the DNS messages are controlled by tight
|
||
restrictions, and the use of alternative APIs will likely result
|
||
in simpler resolver implementations.
|
||
|
||
For example, it is suggested that applications use the
|
||
internationalized APIs for all of the DNS lookups they generate,
|
||
even if the domain name only contains seven-bit characters. This
|
||
is required in case the queried domain name only exists with a
|
||
CNAME or PTR resource record which references an internationalized
|
||
domain name, and the server has to know which encoding to use for
|
||
that query. If the client had not used the internationalized API
|
||
for the original lookup of the domain name, the resolver may have
|
||
chosen the wrong label type, and thus the response data would only
|
||
be returned as ACE encoded data.
|
||
|
||
Conversely, older applications which generate malformed eight-bit
|
||
queries through the legacy APIs will result in those queries being
|
||
properly rejected by the DNS servers, preventing undue problems
|
||
with these applications from occurring. For example, an older
|
||
application may process an internationalized domain name through
|
||
the system-default charset or encoding (such as MacRoman), which
|
||
would result in the domain name being malformed when the
|
||
application tried to do something important with that domain name
|
||
(such as send an email message over SMTP). The use of multiple
|
||
APIs causes these malformed applications to break, and the invalid
|
||
domain names are kept out of the application protocol space.
|
||
|
||
Internationalized APIs are optional to the extent that an
|
||
application MAY use an embedded resolver which is known to be
|
||
capable of generating and processing internationalized domain
|
||
names through the existing function calls. However, the use of
|
||
separate APIs for internationalized domain names is encouraged.
|
||
|
||
Although this document does not mandate any specific APIs, the
|
||
following functions SHOULD be provided for in some form:
|
||
|
||
* Test Wide. Applications MUST be able to test the resolver
|
||
for compliance with this specification. In those cases
|
||
where this function is performed by some other function
|
||
(such as one of the following), the capabilities of the
|
||
resolver MUST be detectable even if the requested operation
|
||
fails. For example, if an application issues a call for the
|
||
internationalized domain name of the local system, the
|
||
|
||
Hall I-D Expires: May 2002 [page 43]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
capability of the resolver to handle internationalized
|
||
domain names MUST be uniquely represented even if the local
|
||
host name cannot be determined.
|
||
|
||
* Get Wide X-By-Y. Applications SHOULD be able to specify any
|
||
resource record associated with any internationalized
|
||
domain name as part of a lookup operation. Whether this
|
||
service is provided as a series of lookup-specific APIs or
|
||
as a general purpose API is up to the resolver.
|
||
|
||
* Get Wide Local Name. Applications which utilize
|
||
internationalized domain names as data will need to be able
|
||
to determine the internationalized form of their local
|
||
system name for some operations (such as a protocol-
|
||
specific welcome banner). When this function is called, the
|
||
resulting data MUST be provided as the canonical UCS
|
||
character code values, or their equivalent as represented
|
||
by a locally mandated charset or encoding.
|
||
|
||
Note that an ACE equivalent of the system name SHOULD be
|
||
returned when the relevant legacy API is queried. In those
|
||
cases where the legacy and internationalized domain names
|
||
both contain seven-bit character codes (possibly because
|
||
the host name is only available in US-ASCII, or because the
|
||
host name was assigned as ACE by an external configuration
|
||
service), the internationalized host name MUST still be
|
||
accessible through the internationalized function.
|
||
|
||
Note that this application does not specify a charset or encoding
|
||
which must be used by the resolver APIs. However, wherever an
|
||
internationalized API is presented, the resolver MUST utilize a
|
||
charset or encoding which supports the entire UCS repertoire of
|
||
character codes, including character codes which are currently
|
||
unassigned. Since UTF-8 is the default charset for most of the
|
||
operations specified in this document, it is also RECOMMENDED for
|
||
this service, but is not required.
|
||
|
||
|
||
7.2. Query Processing Services
|
||
|
||
Resolvers which are compliant with the recommendations provided in
|
||
this specification will provide two query paths, one of which
|
||
supports STD13 domain names and another which supports
|
||
internationalized domain names. Technically, there is no
|
||
requirement for two processing paths, although these paths will
|
||
|
||
Hall I-D Expires: May 2002 [page 44]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
likely exist as conceptual paths even if they are not represented
|
||
or implemented uniquely in all resolvers.
|
||
|
||
The legacy processing path is defined by STD13. This document does
|
||
not update, modify or extend the rules that resolvers operate
|
||
under when an STD13 compliant domain name is received by a legacy
|
||
application through any legacy APIs which may exist. However, when
|
||
an internationalized domain name is received from an
|
||
internationalized application through any internationalized APIs,
|
||
the processing rules defined in this section MUST be followed.
|
||
Note that these rules apply to all resolvers, whether they are
|
||
stub resolvers, forwarders or caching servers.
|
||
|
||
Generally speaking, the internationalized domain name resolution
|
||
process has two major components: processing internationalized
|
||
domain names as queries, and performing fall-back processing if an
|
||
EDNS/UTF-8 query is rejected by an authoritative server.
|
||
|
||
|
||
7.2.1. Internationalized queries
|
||
|
||
Queries for internationalized domain names which are received
|
||
through internationalized APIs can be expected to have originated
|
||
at an application which is capable of accepting and processing
|
||
internationalized domain names in the response messages.
|
||
|
||
Resolvers MUST encode the labels from the queried domain name as
|
||
UTF-8 and encapsulate the resulting encoded labels into EDNS/UTF-8
|
||
extended labels for transfer within DNS messages, per the
|
||
instructions provided in section 5.1.
|
||
|
||
Any and all responses to these queries will also be encoded as
|
||
UTF-8 and encapsulated in EDNS/UTF-8 extended labels. Resolvers
|
||
MUST decode the provided response data, convert the labels to
|
||
their canonical UCS character codes, and return the requested data
|
||
to the calling application.
|
||
|
||
The resolver MUST NOT normalize or case convert internationalized
|
||
domain names which may be received in queries or response
|
||
messages. Since the queries have originated from applications
|
||
which have indicated that they are compliant with this
|
||
specification (via the API) while the responses will have
|
||
originated from caches or servers which indicate that they are
|
||
also compliant (via the EDNS/UTF-8 extended labels), those systems
|
||
are assumed to have normalized and case-converted the domain names
|
||
before they were generated or stored. Also note that applications
|
||
|
||
Hall I-D Expires: May 2002 [page 45]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
will validate the host identifiers that they receive in response
|
||
messages, so an additional check is expected to be performed on
|
||
the answer data by those systems.
|
||
|
||
|
||
7.2.2. Fall-back processing
|
||
|
||
If a queried server is unable to process EDNS/UTF-8 extended
|
||
labels, then it is required by STD13 to generate an error
|
||
signifying the problem. Resolvers MUST interpret these errors,
|
||
decode the UTF-8 queried domain name, re-encode it as STD13 octets
|
||
and/or ACE per the instructions provided in section 5.2, and then
|
||
reissue the query as an STD13 legacy label sequence.
|
||
|
||
The legacy DNS error responses which will trigger this series of
|
||
events are FORMERR and NOTIMPL. Any other errors indicate that the
|
||
EDNS/UTF-8 extended label was successfully processed but that the
|
||
query was not matched, and those errors MUST be returned to the
|
||
application. If the fallback processing results in any error
|
||
responses whatsoever, then the resolver MUST return those errors
|
||
to the calling application.
|
||
|
||
Any servers which subsequently receive the fall-back queries and
|
||
which are compliant with this specification will process the
|
||
queries as internationalized domain names, and will return the
|
||
answer data as STD13 octet sequences or ACE encoded data, using
|
||
the STD13 legacy label.
|
||
|
||
Generally speaking, fall-back processing serves two purposes:
|
||
|
||
* Answering the initial query. If a UTF-8 domain name cannot
|
||
be resolved because a server in the delegation path does
|
||
not understand the EDNS/UTF-8 label type, the resolver can
|
||
reissue the query as an ACE encoded legacy label type so
|
||
that the query proceeds past the problematic server.
|
||
|
||
* Seeding the resolver's cache. As a result of the above, the
|
||
resolver will learn about the authoritative name servers
|
||
for the target zone, and this information can be used for
|
||
any subsequent queries for domain names within the
|
||
specified zone (for as long as the data is cached, anyway).
|
||
As such, any subsequent EDNS/UTF-8 queries which are issued
|
||
for the portion of the namespace served by that zone will
|
||
be sent directly to one of those authoritative servers
|
||
where they can be answered directly. In this regard,
|
||
|
||
Hall I-D Expires: May 2002 [page 46]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
subsequent lookups do not require fall-back processing if
|
||
they are received during the cache window.
|
||
|
||
Regardless of whether or not fall-back processing has been
|
||
performed, if the calling application issued the original query as
|
||
an internationalized domain name, then the resolver MUST respond
|
||
to the query in that form as well. This means that the resolver
|
||
MUST convert any STD13 octet sequences or ACE encoded labels into
|
||
their canonical UCS characters, convert the answer data into the
|
||
resolver's native charset or encoding, and return the data to the
|
||
calling process. The resolver MUST NOT perform any normalization
|
||
or case-conversion during this process, as such an action can
|
||
corrupt domain names which are not used for host identifiers.
|
||
|
||
If the original query was received through the resolver's legacy
|
||
APIs, then the query MUST be generated and returned in the legacy
|
||
format, and MUST NOT be converted to an internationalized domain
|
||
name prior to the query or response being passed through.
|
||
|
||
Once fall-back processing occurs, the process MUST NOT be repeated
|
||
for any additional queries in the current lookup operation. No
|
||
other queries from the current lookup operations MUST NOT be sent
|
||
as EDNS/UTF-8 extended labels, since multiple fall-back operations
|
||
can result in time-outs on the client systems.
|
||
|
||
Because the fall-back process results in two lookups being issued
|
||
against the rejecting zone, eliminating the fall-back processing
|
||
as soon as possible will be an operational requirement for many
|
||
organizations. Any caches or forwarders which are used by stub
|
||
resolvers within an end-user network are practically required to
|
||
be able to process the EDNS/UTF-8 queries, since those servers
|
||
will receive every query which is issued by the stub resolvers.
|
||
While this isn't a technical requirement (fall-back processing
|
||
will get around the problematic servers), it will likely prove to
|
||
be a consideration for network operators looking to support
|
||
internationalized domain names on their local networks.
|
||
|
||
This document also strongly encourages the root and TLD servers to
|
||
be upgraded as soon as possible (even if they do not intend to
|
||
directly provide UTF-8 domain name delegations), in order to allow
|
||
those servers to read and process the EDNS/UTF-8 extended labels,
|
||
thereby reducing the number of fall-back queries which are sent to
|
||
those servers.
|
||
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 47]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
7.3. The Hosts Database
|
||
|
||
Generally speaking, there are two areas of consideration for stub
|
||
resolvers that provide local hosts databases for name resolution
|
||
services. These are the input requirements for internationalized
|
||
domain names which will be added to the hosts database, and the
|
||
requirements which govern how queries will be compared to the
|
||
entries in the hosts database.
|
||
|
||
Note that resolvers are not required to implement a hosts database
|
||
or local lookup services (STD3 says "a host MAY also implement a
|
||
host name translation mechanism that searches a local Internet
|
||
host table"). However, wherever a hosts database is provided with
|
||
an internationalized resolver, compliance with the rules specified
|
||
in this section is required.
|
||
|
||
If a stub resolver offers the capability to compare
|
||
internationalized domain names against a local hosts database,
|
||
that database MUST be compatible with the internationalized domain
|
||
name rules specified in section 4 of this document.
|
||
|
||
In particular, the resolver SHOULD allow internationalized domain
|
||
names with any code values to be stored, even if the canonical UCS
|
||
characters for those values are undefined or are illegal for use
|
||
with internationalized host identifiers (this is required to
|
||
support domain names which are not host identifiers). In those
|
||
cases where an internationalized domain name specifies an exact
|
||
sequence of octets for binary comparison, the hosts database MUST
|
||
provide a mechanism for tagging the eight-bit characters so that
|
||
they are not interpreted, processed or compared as the canonical
|
||
UCS character equivalents of those codes.
|
||
|
||
However, entries which explicitly provide host identifiers MUST be
|
||
normalized and case-converted prior to being stored. In order to
|
||
satisfy both of these requirements, it is RECOMMENDED that hosts
|
||
databases store internationalized host identifiers as untagged
|
||
data, but that they also provide some sort of tagging service for
|
||
character code values which are to be returned as-is. STD13
|
||
defines an escaping mechanism whereby the decimal value of the
|
||
octet is prefaced with a reverse-solidus (such as "\193"), which
|
||
is suggested for this usage.
|
||
|
||
The storage format of the hosts database MAY use any charset or
|
||
encoding the resolver deems most suitable for that platform, as
|
||
long as the rules and restrictions provided above are followed.
|
||
Since UTF-8 is used as the default encoding throughout this
|
||
|
||
Hall I-D Expires: May 2002 [page 48]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
specification, it is RECOMMENDED as the default encoding for hosts
|
||
databases as well, although this is not required.
|
||
|
||
Not all of the applications which use a resolver are likely to be
|
||
compliant with this specification, so resolvers MUST ensure that
|
||
they are able to interpret and process any queries from the legacy
|
||
APIs which provide the ACE equivalent of an internationalized
|
||
domain name that is stored in the hosts database. When such a
|
||
query arrives, the domain name MUST be converted to the canonical
|
||
UCS character codes represented by the ACE encoded sequence and
|
||
compared to entries in the hosts database in that form (tagged
|
||
octets excluded). Any internationalized domain names which are
|
||
required to be returned through the legacy APIs MUST be converted
|
||
to STD13 octet sequences and/or ACE before they are returned.
|
||
|
||
|
||
8. Server Guidelines
|
||
|
||
When a zone administrator desires to provide internationalized
|
||
domain names in a zone, they are presented with two options: they
|
||
can add the STD13 octets or ACE encoded internationalized domain
|
||
names to an existing zone, or they can use internationalized zone
|
||
databases directly. Both of these usage scenarios have their own
|
||
benefits and restrictions.
|
||
|
||
Using STD13 octet sequences and ACE with legacy servers allows for
|
||
the immediate deployment of internationalized domain names on
|
||
existing servers, and within hierarchies which include
|
||
internationalized domain names. However, any such queries which
|
||
originate at applications that are compliant with this
|
||
specification will always initially fail, guaranteeing that fall-
|
||
back processing will always occur for those zones.
|
||
|
||
Conversely, using internationalized zones directly allows servers
|
||
to process legacy, ACE and EDNS/UTF-8 queries equally, thereby
|
||
providing greater value to the applications and resolvers which
|
||
have been made compliant with this specification. However,
|
||
internationalized zones have additional requirements (most
|
||
notably, they are required to be upgraded simultaneously), and
|
||
these will prove burdensome to some zone operators.
|
||
|
||
This specification focuses on the processing requirements for
|
||
internationalized zones which support the use of internationalized
|
||
domain names as explicit data, and which also support the
|
||
necessary subordinate mechanisms such as EDNS/UTF-8 queries. When
|
||
STD13 octet sequences or ACE encoded domain names are used with
|
||
|
||
Hall I-D Expires: May 2002 [page 49]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
legacy servers, the rules defined in STD13 for those servers MUST
|
||
be used.
|
||
|
||
Note that each zone SHOULD be configurable independently. If a
|
||
server hosts multiple zones, each of those zones SHOULD be
|
||
operable as independent entities, with any of them using ACE or
|
||
internationalized domain names as necessary. This rule is
|
||
necessary since each zone is likely to have different replication
|
||
partners and configuration rules which will require different
|
||
migration strategies.
|
||
|
||
|
||
8.1. Internationalized Zones
|
||
|
||
All domain names which are published by an internationalized zone
|
||
MUST be compatible with the restrictions specified in section 4 of
|
||
this document. In particular, the zone database MUST allow binary
|
||
domain names to be stored as any octet value, but MUST also comply
|
||
with the normalization and case-mapping rules when a domain name
|
||
represents a host identifier. These restrictions MUST be applied
|
||
as part of the process in which the domain name is being added to
|
||
the zone database. In those cases where an internationalized
|
||
domain name specifies an exact sequence of octets for binary
|
||
comparison, the hosts database MUST provide a mechanism for
|
||
tagging the eight-bit characters so that they are not interpreted,
|
||
processed or compared as the canonical UCS character equivalents
|
||
of those codes. STD13 defines an escaping mechanism whereby the
|
||
decimal value of the octet is prefaced with a reverse-solidus
|
||
(such as "\193"), which is suggested for this usage.
|
||
|
||
Servers which are compliant with this specification MUST be
|
||
capable of providing UTF-8 and ACE encoded representations of the
|
||
UCS domain names which are stored in the zone, and servers MUST
|
||
restrict output to only one label type for any protocol operation,
|
||
such that queries containing STD13 legacy labels MUST be answered
|
||
with STD13 octet sequences and/or ACE encoded domain names, while
|
||
EDNS/UTF-8 queries MUST only be answered with UTF-8 encoded domain
|
||
names (this not only includes basic operations such as simple
|
||
queries, but also includes advanced operations such as zone
|
||
transfers; see section 8.2). Similarly, external operations such
|
||
as exporting the contents of the zone to a master file (as
|
||
discussed in section 8.3) MUST result in a single encoding form
|
||
being used for that specific operation.
|
||
|
||
Note that the underlying zone database technology which may be
|
||
employed by any particular server is beyond the scope of this
|
||
|
||
Hall I-D Expires: May 2002 [page 50]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
document. Servers MAY use any database technology, charset or
|
||
encoding deemed appropriate for the local environment, although
|
||
the contents of the zone MUST be mapped to the canonical UCS
|
||
character codes for all comparison operations (octet values
|
||
excluded). Since UTF-8 is used as the default encoding throughout
|
||
this specification, it is RECOMMENDED for use as the default
|
||
encoding with zone databases as well, but is not required.
|
||
|
||
Servers MUST NOT normalize or case-map any UCS characters which
|
||
are decoded from UTF-8 or ACE encoded labels, and MUST restrict
|
||
comparison operations of these labels to precise matches of the
|
||
UCS domain names which are stored in the zone database. However,
|
||
the seven bit character codes from any labels which are received
|
||
as STD13 octet sequences MUST be compared in a case-neutral form,
|
||
and MUST NOT be normalized as part of the comparison operation.
|
||
|
||
When a zone is converted to support internationalized domain
|
||
names, all of the servers which replicate that zone MUST be
|
||
upgraded. This is required due to ambiguities that can occur with
|
||
labels which may be encoded as either STD13 octet sequences or ACE
|
||
data, and where the label only uses character codes from the
|
||
eight-bit range of character codes (this problem is described in
|
||
detail in section 4.1.2). In order to ensure that all of the
|
||
servers for a zone respond to one of those queries correctly, all
|
||
of the servers which replicate the zone MUST fully support this
|
||
document and its requirements.
|
||
|
||
|
||
8.2. Namespace Visibility Restrictions
|
||
|
||
In all cases, the encoding format of the domain names which are
|
||
returned in response to a query MUST be the same as the encoding
|
||
format which was used by the query. If the query was provided as a
|
||
sequence of legacy labels, then all of the domain names which are
|
||
provided in the response message MUST be provided as legacy labels
|
||
(containing either ACE or STD13 octet encoded values).
|
||
|
||
Similarly, if a query is provided as EDNS/UTF-8 encoded data, all
|
||
domain names which are provided in the response message MUST be
|
||
provided as UTF-8 encoded data in EDNS/UTF-8 extended labels. In
|
||
some situations, this process may require the server to perform an
|
||
extra conversion.
|
||
|
||
For example, assume that the <idn>.example.com. domain name has
|
||
two associated MX resource records, one of which points to the UCS
|
||
domain name of mail.<idn>.example.com, while the other points to
|
||
|
||
Hall I-D Expires: May 2002 [page 51]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
the ACE encoded domain name of mail.<ace>.example.net. (where the
|
||
"<ace>" label is the ACE equivalent of an internationalized sub-
|
||
domain in the example.net. zone). If a UTF-8 query arrives for the
|
||
MX resource records associated with the <idn>.example.com. domain
|
||
name, both resource records MUST be returned as EDNS/UTF-8 data.
|
||
In order for this requirement to be satisfied, the server will
|
||
have to decode the <ace> label to its UCS canonical form for zone
|
||
storage purposes, and encode the domain name as UTF-8 for
|
||
transmission whenever an EDNS/UTF-8 answer set is required.
|
||
|
||
The visibility rules specified in this section are mandatory for
|
||
every domain name which is provided in any message. If a system
|
||
requests a zone transfer and uses the EDNS/UTF-8 extended label
|
||
type in the request, all of the domain names in all of the
|
||
messages which are sent as part of the zone transfer MUST be
|
||
provided in their UTF-8 encoded form. Similarly, if a zone
|
||
transfer is requested and uses the legacy label type, then all of
|
||
the domain names from all of the messages which are sent as part
|
||
of the zone transfer MUST be provided as either STD13 octet
|
||
sequences or ACE encoded data, using the legacy label type.
|
||
|
||
|
||
8.3. The Master File Format
|
||
|
||
STD13 specifies a "master file" format which is used as a
|
||
platform-neutral storage and transfer format for importing and
|
||
exporting the contents of a particular zone. Note that the master
|
||
file is not the same as the operating database for a zone; the
|
||
master file format is used (or is useful) for copying a zone to
|
||
another server, storing a copy of the zone database off-line,
|
||
emailing a copy of the zone to another user or system, and
|
||
performing other off-line actions against the database' contents.
|
||
Once a zone is loaded on a server, however, any database
|
||
technology can be used for managing the zones and generating
|
||
response messages.
|
||
|
||
In order to facilitate the continued use of master files, any zone
|
||
which is compliant with this specification MUST support the use of
|
||
UTF-8 as an import and export encoding format for the master file
|
||
associated with that zone.
|
||
|
||
Furthermore, compliant versions of a master file are required to
|
||
have the "$UTF-8" control literal at the beginning of the first
|
||
line of text in the master file if it contains UTF-8 encoded data.
|
||
Master files from zones which do not contain UTF-8 encoded domain
|
||
|
||
Hall I-D Expires: May 2002 [page 52]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
names MUST NOT contain the "$UTF-8" control literal in the first
|
||
print position of any line.
|
||
|
||
If the master file contains the "$UTF-8" control literal, all of
|
||
the data within the master file MUST be encoded in UTF-8 as
|
||
specified by RFC2279, and SHOULD be managed with UTF-8 compliant
|
||
tools (such as UTF-8 text editors, mailers that support UTF-8 MIME
|
||
encodings, and so forth).
|
||
|
||
|
||
9. Caching Guidelines
|
||
|
||
Whenever an internationalized domain name is stored in a cache, it
|
||
MUST be stored in its canonical UCS character code form,
|
||
regardless of whether the domain name was received as STD13 octet
|
||
encoding sequences, UTF-8, or ACE data. Caches MUST NOT normalize
|
||
or case convert any domain names that they store, as such a
|
||
process could invalidate domain names that are not used for host
|
||
identifiers.
|
||
|
||
Any subsequent queries which are processed through the cache MUST
|
||
be compared against the stored UCS characters. Internationalized
|
||
domain name labels which are decoded from UTF-8 or ACE labels MUST
|
||
NOT be normalized or case-converted as part of the comparison
|
||
operation, although labels which are provided as STD13 octet
|
||
sequences MUST be compared as case-neutral octet values.
|
||
|
||
Caches MUST be capable of providing UTF-8 and ACE encoded
|
||
representations of the UCS domain names which are stored in the
|
||
cache, with the appropriate format determined by the format used
|
||
in the corresponding query. However, answer data MUST be
|
||
restricted to only one encoding form for any protocol operation,
|
||
meaning that queries containing legacy labels MUST only be
|
||
answered with STD13 octet sequences and/or ACE encoded labels,
|
||
while UTF-8 queries MUST only be answered with UTF-8 encoded
|
||
domain names.
|
||
|
||
|
||
10. Security Considerations
|
||
|
||
This document defines an extension to the domain name system, and
|
||
as such, it inherits the weaknesses which already exist in DNS.
|
||
Where possible, this specification strengthens DNS with multiple
|
||
checks. For example, this specification requires that domain names
|
||
be validated three times before they are used by applications:
|
||
once on specification, once on entry at the authoritative zone or
|
||
|
||
Hall I-D Expires: May 2002 [page 53]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
hosts database, and once again when the answer data is received by
|
||
the requesting application. Despite these checks, the root
|
||
weaknesses inherent in DNS are still present.
|
||
|
||
This document uses multiple encoding algorithms, although boundary
|
||
conditions from the existing DNS are preserved for both the source
|
||
and encoded representations.
|
||
|
||
|
||
11. IANA Considerations
|
||
|
||
This document requires the use of an EDNS extended label type
|
||
identification code. This document uses the b000011 ELT code.
|
||
|
||
|
||
12. References
|
||
|
||
[AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version
|
||
0.3.1"
|
||
|
||
[NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of
|
||
Internationalized Host Names"
|
||
|
||
[RFC2119] "Key words for use in RFCs to Indicate Requirement
|
||
Levels"
|
||
|
||
[RFC952] "DoD Internet host table specification"
|
||
|
||
[STD13] (RFC 1034) "Domain names - concepts and facilities",
|
||
(RFC 1035) "Domain names - implementation and
|
||
specification"
|
||
|
||
[STD3] (RFC 1122) "Requirements for Internet Hosts --
|
||
Communication Layers", (RFC1123) "Requirements for Internet
|
||
Hosts -- Application and Support"
|
||
|
||
[BCP18] (RFC 2277) "IETF Policy on Character Sets and
|
||
Languages"
|
||
|
||
[RFC2279] "UTF-8, a transformation format of ISO 10646"
|
||
|
||
[RFC2671] "Extension Mechanisms for DNS (EDNS0)"
|
||
|
||
[ASCII] "ANSI X3.4-1968. USA Standard Code for Information
|
||
Interchange"
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 54]
|
||
INTERNET-DRAFT draft-hall-dm-idns-00.txt November 2001
|
||
|
||
|
||
[ISO10646] "ISO/IEC 10646-1:2000. International Standard --
|
||
Information technology -- Universal Multiple-Octet Coded
|
||
Character Set (UCS) -- Part 1: Architecture and Basic
|
||
Multilingual Plane"
|
||
|
||
|
||
13. Acknowledgements
|
||
|
||
This document is an assembly of multiple ideas and proposals which
|
||
have been made on the IDN working group mailing list. Many of the
|
||
ideas presented here have been proposed by multiple parties in one
|
||
form or another, although Dan Oscarsson is credited for proposing
|
||
a dual-mode operation which is capable of simultaneously
|
||
supporting UTF-8 and legacy mode encodings. Other contributors to
|
||
key elements from this specification (some of them unknowingly or
|
||
unwillingly) include (alphabetically) Marc Blanchett, Adam
|
||
Costello, Mark Davis, Martin Duerst, Patrik Faltstrom, Paul
|
||
Hoffman, David Hopwood, and many others.
|
||
|
||
|
||
14. Editor's Address
|
||
|
||
Eric A. Hall
|
||
ehall@ehsco.com
|
||
|
||
|
||
|
||
|
||
Hall I-D Expires: May 2002 [page 55]
|