688 lines
26 KiB
Plaintext
688 lines
26 KiB
Plaintext
Internet Draft Paul Hoffman
|
|
draft-ietf-idn-compare-01.txt IMC & VPNC
|
|
July 11, 2000
|
|
Expires in six months
|
|
|
|
Comparison of Internationalized Domain Name Proposals
|
|
|
|
Status of this memo
|
|
|
|
This document is an Internet-Draft and is in full conformance with all
|
|
provisions of Section 10 of RFC 2026.
|
|
|
|
Internet-Drafts are working documents of the Internet Engineering Task
|
|
Force (IETF), its areas, and its working groups. Note that other groups
|
|
may also distribute working documents as Internet-Drafts.
|
|
|
|
Internet-Drafts are draft documents valid for a maximum of six months
|
|
and may be updated, replaced, or obsoleted by other documents at any
|
|
time. It is inappropriate to use Internet-Drafts as reference material
|
|
or to cite them other than as "work in progress."
|
|
|
|
The list of current Internet-Drafts can be accessed at
|
|
http://www.ietf.org/ietf/1id-abstracts.txt
|
|
|
|
The list of Internet-Draft Shadow Directories can be accessed at
|
|
http://www.ietf.org/shadow.html.
|
|
|
|
|
|
Abstract
|
|
|
|
The IDN Working Group is working on proposals for internationalized
|
|
domain names that might become a standard in the IETF. Before a single
|
|
full proposal can be made, competing proposals must be compared on a
|
|
wide range of requirements and desired features. This document compares
|
|
the many parts of a comprehensive protocol that have been proposed. It
|
|
is the companion document to "Requirements of Internationalized Domain
|
|
Names" [IDN-REQ], which lays out the requirements for the
|
|
internationalized domain name protocol.
|
|
|
|
|
|
1. Introduction
|
|
|
|
As the IDN Working Group has discussed the requirements for IDN,
|
|
suggestions have been made for various candidate protocols that might
|
|
meet the requirements. These proposals have been somewhat helpful in
|
|
bringing up real-world needs for the requirements.
|
|
|
|
It became clear no single proposal had wide agreement from the working
|
|
group. In fact, the authors of various proposals found themselves taking
|
|
some features from other proposals as they revised their drafts. At the
|
|
same time, working group participants were making suggestions for
|
|
incremental changes that might affect more than one proposal.
|
|
|
|
Because of this mixing and matching, it was decided that this IDN
|
|
comparisons document should compare features that might end up in the
|
|
final protocol, not full protocol suggestions themselves. The features
|
|
that had been discussed in the working group were divided by function,
|
|
and appear in this document in separate sections. For each function,
|
|
there are multiple suggestions for protocol elements that might meet the
|
|
requirements that are described in [IDN-REQ].
|
|
|
|
This document is being discussed on the "idn" mailing list. To join the
|
|
list, send a message to <majordomo@ops.ietf.org> with the words
|
|
"subscribe idn" in the body of the message. Archives of the mailing list
|
|
can also be found at ftp://ops.ietf.org/pub/lists/idn*.
|
|
|
|
1.1 Format of this document
|
|
|
|
Each section covers one feature that has been discussed as being part of
|
|
the final IDN solution. Within each section, alternate proposals are
|
|
listed with the major perceived pros and cons of the proposal. Also,
|
|
each proposal is given a label to make discussion of this document (and
|
|
of the proposals themselves) easier.
|
|
|
|
References to the numbered requirements in [IDN-REQ] are from version
|
|
-02 of that document. These numbers are expected to change and the
|
|
requirements document evolves. In this draft, the requirements are show
|
|
as "[#n-02]", where "n" is the requirement number from draft -02 of
|
|
[IDN-REQ]. This document only lists where particular proposals don't
|
|
meet particular requirmenents from [IDN-REQ], not the ones that they
|
|
fulfill.
|
|
|
|
Note that this document is supposed to reflect the discussion of all
|
|
proposed alternatives, not just the ones that fully match the
|
|
requirements in [IDN-REQ]. It will serve as a summary of the discussion
|
|
in the IDN WG for readers in the future who may want to know why certain
|
|
alternatives were not chosen for the eventual protocol.
|
|
|
|
The proposal drafts covered in this document are:
|
|
|
|
[DUERST] Character Normalization in IETF Protocols,
|
|
draft-duerst-i18n-norm-03
|
|
|
|
[IDNE] Internationalized domain names using EDNS (IDNE),
|
|
draft-ietf-idn-idne-01
|
|
|
|
[KWAN] Using the UTF-8 Character Set in the Domain Name System,
|
|
draft-skwan-utf8-dns-03
|
|
|
|
[RACE] RACE: Row-based ASCII Compatible Encoding for IDN,
|
|
draft-ietf-idn-race-00
|
|
|
|
[SENG] UTF-5, a transformation format of Unicode and ISO 10646,
|
|
draft-jseng-utf5-01
|
|
|
|
[UDNS] Using the Universal Character Set in the Domain Name System
|
|
(UDNS), draft-ietf-idn-udns-00
|
|
|
|
|
|
2. Architecture
|
|
|
|
One of the biggest questions raised early in the IDN discussion was what
|
|
the format of internationalized name parts would be on the wire, that
|
|
is, between the user's computer and the DNS resolvers. It was agreed
|
|
that the DNS protocols certainly allow non-ASCII octets in domain name
|
|
parts and resource records, but there was also acknowledgement that many
|
|
protocols that rely on the DNS could not handle non-ASCII names due to
|
|
the design of the protocol. Section 3.1 of this document describes the
|
|
proposed encodings for the non-ASCII name parts.
|
|
|
|
Because of requirement [#2-02], there were proposals for
|
|
ASCII-compatible encodings (ACEs) of non-ASCII characters. Different
|
|
ACEs were proposed (and are discussed in Section 4 of this document),
|
|
but they all have the same goal: to allow non-ASCII characters to be
|
|
represented in host names that conform to RFC 1034 [RFC1034].
|
|
|
|
2.1 arch-1: Just send binary
|
|
|
|
[KWAN] proposes beginning to send characters outside the range allowed
|
|
in RFC 1034.
|
|
|
|
Pro: Easiest to describe. Only changes host name syntax, not any of the
|
|
related DNS protocols.
|
|
|
|
Con: Doesn't work with many exiting protocols that relies on DNS.
|
|
Violates requirement [#9-02].
|
|
|
|
2.2 arch-2: Send binary or ACE
|
|
|
|
[UDNS] (and, later, [IDNE]) proposes using both binary and ACE formats
|
|
on the wire.
|
|
|
|
Pro: Allows protocols that can handle binary name parts to use them
|
|
directly, while allowing protocols that cannot use binary name parts to
|
|
also handle names without conversion. Allows domain names in free text
|
|
to be displayed in binary even in systems that require ACE-formatted
|
|
names on the wire.
|
|
|
|
Con: Requires all software that uses domain names to handle both
|
|
formats. Requires processing time for conversion of ACE formats into the
|
|
format must likely used internally to the software.
|
|
|
|
2.3 arch-3: Just send ACE
|
|
|
|
[RACE] and [SENG] propose that host naming rules remain the same and
|
|
that all internationalize domain names be sent in ACE format.
|
|
|
|
Pro: No changes at all to current DNS protocols.
|
|
|
|
Con: Requires all software to recognize ACE domain names and convert
|
|
them to human-readable for display. This is true not only in domain
|
|
names used on the wire but also domain names used in free text.
|
|
|
|
|
|
3. Names in binary
|
|
|
|
Both arch-1 and arch-2 include domain name parts that are represented on
|
|
the wire in a binary format. This section describes some of the features
|
|
of such names.
|
|
|
|
3.1 bin-1: Format
|
|
|
|
There are many different charsets and encodings for the scripts of the
|
|
world. The WG has discussed which binary encoding should be used on the
|
|
wire.
|
|
|
|
3.1.1 bin-1.1: UTF-8
|
|
|
|
The IETF policy on character sets [RFC2277] states that UTF-8 [RFC2279]
|
|
is the preferred charset for IETF protocols. UTF-8 encodes all
|
|
characters in the ISO 10646 repertoire.
|
|
|
|
Pro: Well-supported in other IETF protocols. Compact for most scripts.
|
|
Wide implementation in programming languages. US-ASCII characters have
|
|
the same encoding in UTF-8 as they do in US-ASCII. Because it is based
|
|
on ISO 10646, expansion of the repertoire comes from respected
|
|
international standards bodies.
|
|
|
|
Con: Asian scripts require three octets per character.
|
|
|
|
3.1.2 bin-1.2: Labelled charsets
|
|
|
|
Mailing list discussion mentioned using multiple charsets for the binary
|
|
representation. Each name part would be labelled with the charset used.
|
|
|
|
Pro: Allows users to specify names in the charsets they are most
|
|
familiar with.
|
|
|
|
Con: All resolvers would have to know all charsets. Thus, the number of
|
|
charsets would probably have to be limited and never expand. Mapping of
|
|
characters between charsets would have to be exact and not change over
|
|
time.
|
|
|
|
3.2 bin-2: Distinguishing binary from current format
|
|
|
|
Software built for current domain names might give unexpected results
|
|
when dealing with non-ASCII characters in domain names. For example, it
|
|
was reported on the mailing list that some software crashes when a
|
|
non-ASCII domain name is returned for in-addr.arpa requests. Thus, there
|
|
may be a need for IDN to prevent software that is not binary-aware from
|
|
receiving domain names with binary parts. This would only apply to an
|
|
IDN that used arch-2, not arch-1.
|
|
|
|
3.2.1 bin-2.1: Don't mark binary
|
|
|
|
[KWAN] does not specify any way of changing requests to prevent binary
|
|
name parts from being transmitted.
|
|
|
|
Pro: No changes to current DNS requests and responses.
|
|
|
|
Con: Likely to cause disruption in software that is not binary-aware.
|
|
Likely to cause systems to misread names and possibly (and incorrectly)
|
|
convert them to ASCII names by stripping off the high bit in octets;
|
|
this in turn would lead to security problems due to mistaken identities.
|
|
Returning binary host names to DNS queries is known to break some
|
|
current software.
|
|
|
|
3.2.2 bin-2.2: Mark binary with IN bit
|
|
|
|
[UDNS] describes using a bit from the header of DNS queries to mark the
|
|
query as possibly containing a binary name part and indicating that the
|
|
response to the query can contain binary name parts.
|
|
|
|
Pro: This bit is currently unused and must be set to zero, so current
|
|
software won't use it accidentally. No changes to any other part of the
|
|
query or RRs.
|
|
|
|
Con: It's the last unused bit in the header and DNS folks have indicated
|
|
that they are very hesitant to give it up.
|
|
|
|
3.2.3 bin-2.3: Mark binary with new QTYPEs
|
|
|
|
[UDNS] using new QTYPEs to mark the query as possibly containing a
|
|
binary name part and indicating that the response to the query can
|
|
contain binary name parts. QTYPEs are two octets long, and no QTYPEs to
|
|
date use more than the lower eight bits, so one of the bits from the
|
|
upper octet could be used to indicate binary names.
|
|
|
|
Pro: These bits are currently unused and must be set to zero, so current
|
|
software won't use them accidentally. No changes to any other part of
|
|
the query or RRs. Uses a bit that isn't as prized as the IN bit.
|
|
|
|
Con: Software must pay more attention to the QTYPEs than it might have
|
|
previously.
|
|
|
|
3.2.4 bin-2.4: Mark binary with EDNS
|
|
|
|
[IDNE] uses EDNS [RFC2671] to mark the query and response as containing
|
|
a binary name part.
|
|
|
|
Pro: There is little use of EDNS at this point, so it is very unlikely
|
|
to have bad interactions with old software. EDNS allows longer name
|
|
parts, and allows additional information (such as IDN version number)
|
|
in each name part.
|
|
|
|
Con: There is little use of EDNS and this might make implementation
|
|
harder.
|
|
|
|
|
|
4. Names in ASCII-compatible encoding (ACE)
|
|
|
|
Both arch-2 and arch-3 include domain name parts that are represented on
|
|
the wire in an ASCII-compatible encoding (ACE). This section describes
|
|
some of the features of such names.
|
|
|
|
4.1 ace-1: Format
|
|
|
|
A variety of proposals for the format of ACE have been proposed. Each
|
|
proposal has different features, such as how many characters can be
|
|
encoded within the 63 octet limit for each name part. The length
|
|
descriptions in this section assume that there is no distinguishing of
|
|
ACE from current names; this is not a likely outcome of the WG work.
|
|
|
|
The descriptions of lengths is based on script block names from
|
|
[BLOCK-NAMES].
|
|
|
|
4.1.1 ace-1.1: UTF-5
|
|
|
|
[SENG] Describes UTF-5, which is a fairly direct encoding of ISO 10646
|
|
characters using a system similar to UTF-8. Characters from Basic Latin
|
|
and Latin-1 Supplement take 2 octets; Latin Extended-A through Tibetan
|
|
take 3 octets; Myanmar through the end of BMP take 4 octets; non-BMP
|
|
characters take 5 octets. This means that names using all characters
|
|
in the Myanmar through the end of BMP are limited to 15 characters.
|
|
|
|
Pro: Extremely simple.
|
|
|
|
Con: Poor compression, particularly for Asian scripts.
|
|
|
|
4.1.2 ace-1.2: RACE
|
|
|
|
[RACE] describes RACE, which is a two-step algorithm that first
|
|
compresses the name part, then converts the compressed string into and
|
|
ACE. Name parts in all scripts other than Han, Yi, Hangul syllables,
|
|
Ethiopic, and non-BMP take up ceil(1.6*(n+1)) octets; name parts in
|
|
those scripts and any name that mixes characters from different rows in
|
|
ISO 10646 take up ceil(3.2*(n+1)) octets. This means that names using
|
|
Han, Yi, Hangul syllables, or Ethiopic, are limited to 18 characters.
|
|
(Note: this document used to be called CIDNUC.)
|
|
|
|
Pro: Best compression for most scripts, and similar compression for the
|
|
scripts where it is not the best.
|
|
|
|
Con: More complicated than UTF-5. Not well optimized for names that have
|
|
mixed scripts, such as non-Latin names that use hyphen or ASCII digits.
|
|
|
|
4.1.3 ace-1.3: Hex of UTF-8
|
|
|
|
An early draft described "hex of UTF-8", which is a straight-forward
|
|
hexadecimal encoding of UTF-8. Characters in Basic Latin (other than
|
|
non-US-ASCII and hyphen) take 3 octets; Latin Extended-A through Tibetan
|
|
take 5 octets; Myanmar through end of BMP take 7 octets; non-BMP
|
|
characters take 9 octets. This means that names using all characters
|
|
in the Myanmar through the end of BMP are limited to 9 characters.
|
|
|
|
Pros: Very simple to describe.
|
|
|
|
Cons: Very poor compression for all scripts.
|
|
|
|
4.1.4 ace-1.5: SACE
|
|
|
|
A message on the mailing list pointed to code for SACE, an ASCII
|
|
encoding that purports to compact to about the same size as UTF-8.
|
|
|
|
Pros: Similar compression to UTF-8.
|
|
|
|
Cons: No description of how the algorithm works.
|
|
|
|
4.2 ace-2: Distinguishing ACE from current names
|
|
|
|
Software that finds ACE name parts in free text probably should
|
|
display the name part using the actual characters, not the ACE
|
|
equivalent. Thus, software must be able to identify which ASCII name
|
|
parts are ACE and which are non-ACE ASCII parts (such as current names).
|
|
This would only apply to an IDN proposal that used arch-2, not arch-3.
|
|
|
|
4.2.1 ace-2.1: Currently legal names
|
|
|
|
Name parts that are currently legal in RFC 1034 can be tagged to
|
|
indicate the part is encoded with ACE.
|
|
|
|
4.2.1.1 ace-2.1.1: Add hopefully-unique legal tag
|
|
|
|
[RACE] proposes adding a hopefully-unique legal tag to the beginning
|
|
of the name. The proposal would also work with such a tag at the end of
|
|
the name part, but it is easier for most people to recognize at the
|
|
beginning of name parts.
|
|
|
|
Pros: Easy for software (and humans) to recognize.
|
|
|
|
Cons: There is no way to prevent people from beginning non-ACE names
|
|
with the tag. Unless the tag is very unlikely to appear in any name in
|
|
any human language, non-ACE names that begin with the tag will display
|
|
oddly or be rejected by some systems.
|
|
|
|
4.2.1.2 ace-2.1.2: Add a checksum
|
|
|
|
Off-list discussion has mentioned the possibility of creating a checksum
|
|
mechanism where the checksum would be added to the beginning (or end) of
|
|
ACE name parts.
|
|
|
|
4.2.2 ace-2.2: Currently illegal names
|
|
|
|
Instead of creating names that are currently legal, another proposal is
|
|
to create names that use the current ASCII characters but are illegal.
|
|
|
|
4.2.2.1 ace-2.2.1: Add trailing hyphen
|
|
|
|
An earlier draft described using a trailing hyphen as a signifier of an
|
|
ACE name.
|
|
|
|
Pros: It is surmised that most current software does not reject names
|
|
that are illegal in this fashion. Thus, there would be little disruption
|
|
to current systems. This mechanism takes up fewer characters than any
|
|
proposed in ace-2.1.
|
|
|
|
Cons: Some current software is will probably break with this mechanism.
|
|
It goes against some current protocols that match the rules in RFC 1034.
|
|
|
|
5. Prohibited characters
|
|
|
|
There was a short but active discussion on the mailing list about which
|
|
characters from the ISO 10646 character set should never appear in host
|
|
names. To date, there are no Internet Drafts on the subject. This
|
|
section summarizes some of the suggestions.
|
|
|
|
5.1 prohib-1: Identical and near-identical characters
|
|
|
|
Some characters are visually identical or incredibly similar to other
|
|
characters, thus making it impossible to accurately enter host names
|
|
that are seen in print.
|
|
|
|
5.2 prohib-2: Separators
|
|
|
|
Horizontal and vertical spacing characters would make it unclear where a
|
|
host name begins and ends. Also, allowing periods and period-like
|
|
characters as characters within a name part would also cause similar
|
|
confusion.
|
|
|
|
5.3 prohib-3: Non-displaying and non-spacing characters
|
|
|
|
There are many characters that cannot be seen in the ISO 10646 character
|
|
set. These include control characters, non-breaking spaces, formatting
|
|
characters, and tagging characters. These characters would certainly
|
|
cause confusion if allowed in host names.
|
|
|
|
5.4 prohib-4: Private use characters
|
|
|
|
Private use characters from ISO 10646 inherently have no specified
|
|
visual form (and in fact can be used for non-displaying characters).
|
|
Thus, there could be no visual interoperability for characters in the
|
|
private use areas.
|
|
|
|
5.5 prohib-5: Punctuation
|
|
|
|
Some punctuation characters are disallowed in URLs because they are used
|
|
in URL syntax.
|
|
|
|
5.6 prohib-6: Symbols
|
|
|
|
Some mailing list discussion stated that characters that do not normally
|
|
appear in human or company names should not be allowed in host names.
|
|
This includes symbols and non-name punctuation.
|
|
|
|
|
|
6. Canonicalization
|
|
|
|
The working group has a spirited discussion on the need for
|
|
canonicalization. [IDN-REQ] describes many requirements for when and what
|
|
type of canonicalization might be performed.
|
|
|
|
6.1 canon-1: Type of canonicalization
|
|
|
|
The Unicode Consortium's recommendations and definitions of
|
|
canonicalization [UTR-15] describes many forms of canonicalization that
|
|
can be performed on character strings. [DUERST] covers much of the same
|
|
ground but makes more focused requirements for canonicalization on the
|
|
Internet.
|
|
|
|
6.1.1 canon-1.1: Normalization Form C
|
|
|
|
[DUERST] recommends Normalization Form C, as described in [UTR-15], for
|
|
use on the Internet. This form is a canonical decomposition, followed by
|
|
canonical composition.
|
|
|
|
6.1.2 canon-1.2: Normalization Form KC
|
|
|
|
Discussion on the mailing list recommended Normalization Form KC. This
|
|
form is a compatibility decomposition, followed by canonical
|
|
composition. Compatibility decomposition makes characters that have
|
|
compatibility equivalence the same after decomposing.
|
|
|
|
6.2 canon-2: Other canonicalization
|
|
|
|
Host names may have special canonicalization needs that can be added to
|
|
those given in canon-1.
|
|
|
|
6.2.1 canon-2.1: Case folding in ASCII
|
|
|
|
RFC 1034 specifies that there is no difference between host names that
|
|
have the same letters but the letters have different case. Thus, the
|
|
name part "example" is considered the same as "Example" and "EXamPLe".
|
|
Neither uppercase nor lowercase is specified as being canonical.
|
|
|
|
6.2.2 canon-2.2: Case folding in non-ASCII
|
|
|
|
Discussion on the mailing list has raised the issue of whether or not
|
|
non-ASCII Latin characters should have the same case-folding rules as
|
|
ASCII. Such rules would match the expectations of native speakers of
|
|
some languages, but would go counter to the expectations of native
|
|
speakers of other languages.
|
|
|
|
6.2.3 canon-2.3: Han folding
|
|
|
|
Discussion on the mailing list has raised the issue of equivalences in
|
|
some languages use of Han characters. For example, in Chinese, there are
|
|
many traditional characters that have equivalent simplified characters.
|
|
Similarly, there are some Han ideographs for which there are multiple
|
|
representations in ISO 10646. There are no well-established rules for
|
|
such folding, and some of the proposed folding would be locale-specific.
|
|
|
|
6.3 canon-3: Location of canonicalization
|
|
|
|
Canonicalization can be performed in any system in the DNS. Because it
|
|
is not a trivial operation and can require large tables, the location of
|
|
where canonicalization is performed is important.
|
|
|
|
6.3.1 canon-3.1: Canonicalize only in the application
|
|
|
|
Early canonicalization is a cleaner architecture design. Spending the
|
|
cycles on the end systems puts less burden on resolvers or servers in
|
|
the DNS service. When IDN is first adopted, the applications need to be
|
|
updated anyway to handle the new format for the names. It is easier for
|
|
people to upgrade their applications than their resolvers if they need a
|
|
new IDN feature.
|
|
|
|
6.3.2 canon-3.2: Canonicalize only in the resolver
|
|
|
|
Updating a single resolver provides new service to large number of
|
|
applications and (possibly) users. It is easier to find canonicalization
|
|
bugs in resolvers than in applications because the resolver has
|
|
predictable programmatic interfaces. IDN will probably be revised often
|
|
as new characters are added to ISO 10646, so updating smaller number of
|
|
resolvers is better than revising more applications. When an end user
|
|
has a problem with resolving an IDN name, it is much easier to test if
|
|
the problem is in the resolver than in the user's application.
|
|
|
|
6.3.3 canon-3.3: Canonicalize in the DNS service
|
|
|
|
Canonicalization should happen as late as possible so that changes in
|
|
the canonicalization algorithm don't orphan all applications and
|
|
resolvers. Some canonicalization discards information and so should be
|
|
delayed as long as possible. Canonicalization is practically free,
|
|
computationally (although it involves some large tables). Because adding
|
|
IDN to the DNS will happen over time, canonicalizing at the server will
|
|
minimize the number of things that need to be changed, and simplify and
|
|
centralize the process of change.
|
|
|
|
|
|
7. Transitions
|
|
|
|
Early in the working group discussion, there was active debate about how
|
|
the transition from the current host name rules to IDN would be handled.
|
|
Given requirement [#1-02], this transition is quite important to
|
|
deciding which proposals might be feasible.
|
|
|
|
7.1 trans-1: Always do current plus new architecture
|
|
|
|
In this proposal, IDN will be used at the same time as the current DNS
|
|
forever. That is, IDN will be in addition to the current DNS.
|
|
|
|
7.2 trans-2: Transition period
|
|
|
|
In this proposal, IDN will be used at the same time as the current DNS
|
|
for a specified period of time, after which only IDN will exist. That
|
|
is, IDN will replace the current DNS.
|
|
|
|
|
|
8. Root server considerations
|
|
|
|
DNS root servers receive all requests for top-level domains that are not
|
|
in the local DNS cache. They are critical to the Internet. Care must be
|
|
taken to ensure that root servers will not be affected by new mechanisms
|
|
introduced.
|
|
|
|
Any IDN proposal that includes a binary encoding will have an impact on
|
|
the root servers. The binary requests will affect the root servers
|
|
because the current root server software is designed to handle current
|
|
host names. Further, the root zone files which contain ccTLDs and gTLDs
|
|
would have to support binary domain names and possibly binary host names
|
|
for NS records. Because all the root servers are equivalent, they would
|
|
have to be synchronized to support the binary domain names at the same
|
|
time.
|
|
|
|
Proposals that only use ACE and use tagging with currently-legal names
|
|
would, by definition, not affect the root servers.
|
|
|
|
|
|
9. Security considerations
|
|
|
|
All security considerations listed in [IDN-REQ] apply to this document.
|
|
Further, all security considerations listed in each of the IDN proposals
|
|
must be considered when comparing the proposals.
|
|
|
|
Some proposals described in this document may create new security
|
|
considerations. However, these considerations will have to be addressed
|
|
in the eventual protocol document. All the proposals described here are
|
|
still incomplete and security considerations may be added to them as
|
|
they are revised. All the proposals listed in this document use the ISO
|
|
10646 character set, so the proposals inherit any security
|
|
characteristics of that character set.
|
|
|
|
Many protocols and applications rely on domain names to identify the
|
|
parties involved in a network transaction. For example, a user who
|
|
connects to a web site by entering or selecting a URL expects that their
|
|
software will select the web site named in the URL. The uniqueness of
|
|
domain names are crucial to ensure identification of Internet entities.
|
|
|
|
To make round-trip translation between local charsets and ISO 10646, the
|
|
ISO 10646 specification has assigned multiple code points to individual
|
|
glyphs. Moreover, some glyphs might look similar to some users, but look
|
|
clearly different by other users. This means that it would be simple for
|
|
an attacker to mimic a domain name by using similar-looking but
|
|
different glyphs and guessing that some users will not see the
|
|
difference in their user interface.
|
|
|
|
Some IDN protocols may have denial of service attacks, such as by using
|
|
non-identified chars, exception characters, or under-specified behavior
|
|
in using some special characters.
|
|
|
|
|
|
10. IANA considerations
|
|
|
|
This document does not create any new IANA registries. However, it is
|
|
possible that a character property registry may need to be set up when
|
|
the IDN protocol is created in order to list prohibited characters
|
|
(section 5) and canonicalization mappings (section 6).
|
|
|
|
|
|
11. Acknowledgements
|
|
|
|
James Seng and Marc Blanchet gave many helpful suggestions on the
|
|
pre-release versions of this document.
|
|
|
|
|
|
12. References
|
|
|
|
[BLOCK-NAMES] Unicode Consortium,
|
|
<ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt>.
|
|
|
|
[DUERST] Character Normalization in IETF Protocols,
|
|
draft-duerst-i18n-norm-03
|
|
|
|
[IDN-REQ] Requirements of Internationalized Domain Names,
|
|
draft-ietf-idn-requirements-02
|
|
|
|
[IDNE] Internationalized domain names using EDNS (IDNE),
|
|
draft-ietf-idn-idne-01
|
|
|
|
[KWAN] Using the UTF-8 Character Set in the Domain Name System,
|
|
draft-skwan-utf8-dns-03
|
|
|
|
[RACE] RACE: Row-based ASCII Compatible Encoding for IDN,
|
|
draft-ietf-idn-race-00
|
|
|
|
[RFC2277] IETF Policy on Character Sets and Languages, RFC 2277
|
|
|
|
[RFC2279] UTF-8, a transformation format of ISO 10646, RFC 2279
|
|
|
|
[RFC2671] Extension Mechanisms for DNS (EDNS0), RFC 2671
|
|
|
|
[SENG] UTF-5, a transformation format of Unicode and ISO 10646,
|
|
draft-jseng-utf5-01
|
|
|
|
[UDNS] Using the Universal Character Set in the Domain Name System
|
|
(UDNS), draft-ietf-idn-udns-00
|
|
|
|
[UTR15] Unicode Normalization Forms, Unicode Technical Report #15
|
|
|
|
|
|
A. Differences Between -00 and -01 Drafts
|
|
|
|
Throughout: Changed references from [HOFFMAN] to [RACE].
|
|
|
|
Throughout: Changed references from [OSCARSSON] to [UDNS].
|
|
|
|
Throughout: Added [IDNE].
|
|
|
|
Removed section 1.2.
|
|
|
|
3.2.3: Updated to mention [UDNS].
|
|
|
|
3.2.4: Updated with [IDNE], changed "EDNS0" to "EDNS", and reworded.
|
|
|
|
4.1.2: Added Ethiopic to the list of scripts that require two octets per
|
|
character.
|
|
|
|
4.1.3: Removed reference to [OSCARSSON] because that is no longer in the
|
|
[UDNS] draft.
|
|
|
|
4.2.2.1: Removed reference to [OSCARSSON] because that is no longer in
|
|
the [UDNS] draft.
|
|
|
|
6.1.1: Reworded first sentence.
|
|
|
|
6.3: Added entire section and subsections.
|
|
|
|
8: Fixed typo in first sentence.
|
|
|
|
|
|
B. Author Contact
|
|
|
|
Paul Hoffman
|
|
IMC & VPNC
|
|
127 Segre Place
|
|
Santa Cruz, CA 95060
|
|
phoffman@imc.org or paul.hoffman@vpnc.org
|