From 843e4dfd2c6a183e8be20d634bfdd15dad822b02 Mon Sep 17 00:00:00 2001 From: Andreas Gustafsson Date: Thu, 14 Jun 2001 16:44:41 +0000 Subject: [PATCH] updated drafts --- ...etf-dnsext-parent-stores-zone-keys-01.txt} | 283 ++-- doc/draft/draft-ietf-idn-dude-01.txt | 898 ---------- doc/draft/draft-ietf-idn-dude-02.txt | 864 ++++++++++ ...iptr-01.txt => draft-ietf-idn-iptr-02.txt} | 210 +-- ...ft-ietf-ipngwg-default-addr-select-04.txt} | 384 ++--- ...e-00.txt => draft-klensin-dns-role-01.txt} | 1438 ++++++++++------- doc/draft/draft-skwan-utf8-dns-05.txt | 228 --- doc/draft/draft-skwan-utf8-dns-06.txt | 421 +++++ 8 files changed, 2578 insertions(+), 2148 deletions(-) rename doc/draft/{draft-ietf-dnsext-parent-stores-zone-keys-00.txt => draft-ietf-dnsext-parent-stores-zone-keys-01.txt} (76%) delete mode 100644 doc/draft/draft-ietf-idn-dude-01.txt create mode 100644 doc/draft/draft-ietf-idn-dude-02.txt rename doc/draft/{draft-ietf-idn-iptr-01.txt => draft-ietf-idn-iptr-02.txt} (72%) rename doc/draft/{draft-ietf-ipngwg-default-addr-select-03.txt => draft-ietf-ipngwg-default-addr-select-04.txt} (84%) rename doc/draft/{draft-klensin-dns-role-00.txt => draft-klensin-dns-role-01.txt} (51%) delete mode 100644 doc/draft/draft-skwan-utf8-dns-05.txt create mode 100644 doc/draft/draft-skwan-utf8-dns-06.txt diff --git a/doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-00.txt b/doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-01.txt similarity index 76% rename from doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-00.txt rename to doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-01.txt index a5653fa28f..7900d8ab07 100644 --- a/doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-00.txt +++ b/doc/draft/draft-ietf-dnsext-parent-stores-zone-keys-01.txt @@ -5,7 +5,7 @@ Expires September 2001 T. Lindgreen Parent stores the child's zone KEYs - draft-ietf-dnsext-parent-stores-zone-keys-00.txt + draft-ietf-dnsext-parent-stores-zone-keys-01.txt Status of This Document @@ -28,7 +28,7 @@ Status of This Document Comments should be sent to the authors or the DNSEXT WG mailing list namedroppers@ops.ietf.org. - This document updates RFC 2535 [2]. + This document updates RFC 2535. Copyright Notice @@ -51,9 +51,9 @@ Abstract - Gieben Expires September 2001 [Page 2] + Gieben & Lindgreen Expires November 2001 [Page 2] -Internet Draft Parent Stores Zone KEYS March 2001 +Internet Draft Parent Stores Zone KEYS May 2001 simple key rollover and resigning mechanism. For large TLDs this is extremely important. @@ -69,29 +69,29 @@ Internet Draft Parent Stores Zone KEYS March 2001 Table of Contents - Status of This Document....................................2 - Abstract...................................................2 + Status of This Document.................................... + Abstract................................................... - Table of Contents..........................................3 - 1 Introduction.............................................3 - 2 Proposal.................................................4 - 2.1. TTL of the KEY and SIG at the parent..................4 - 2.2. No NULL KEY...........................................5 - 3 Impact on a secure aware resolver/forwarder..............5 - 3.1 Impact of key rollovers on resolver/forwarder..........5 - 4 Scheduled key rollover...................................6 - 5 Unscheduled key rollover.................................6 - 6 Zone resigning...........................................7 - 7. Consequences for KEY and NXT records....................7 - 7.1. KEY bit in NXT records................................7 - 7.2. Authority of KEY records..............................8 - 7.3. Selecting KEY sets....................................8 - 8. The zone-KEY and local KEY records......................8 - 9. Security Considerations.................................8 + Table of Contents.......................................... + 1 Introduction............................................. + 2 Proposal................................................. + 2.1. TTL of the KEY and SIG at the parent.................. + 2.2. No NULL KEY........................................... + 3 Impact on a secure aware resolver/forwarder.............. + 3.1 Impact of key rollovers on resolver/forwarder.......... + 4 Scheduled key rollover................................... + 5 Unscheduled key rollover................................. + 6 Zone resigning........................................... + 7. Consequences for KEY and NXT records.................... + 7.1. KEY bit in NXT records................................ + 7.2. Authority of KEY records.............................. + 7.3. Selecting KEY sets.................................... + 8. The zone-KEY and local KEY records...................... + 9. Security Considerations................................. - Authors' Addresses.........................................9 - References.................................................9 - Full Copyright Statement...................................9 + Authors' Addresses......................................... + References................................................. + Full Copyright Statement................................... 1. Introduction @@ -99,8 +99,8 @@ Table of Contents DNSSEC on the ccTLDs and gTLDs. In this document we are considering a secure zone, somewhere under a - secure entry point and on-tree [1] validation between the secure - entry point and the zone in question. The resolver we are + secure entry point and on-tree [RFC 3090] validation between the + secure entry point and the zone in question. The resolver we are considering is security aware and is preconfigured with the KEY of the secure entry point. We also make a distinction between a scheduled and a unscheduled key rollover. A scheduled rollover is @@ -109,12 +109,12 @@ Table of Contents - Gieben Expires September 2001 [Page 3] + Gieben & Lindgreen Expires November 2001 [Page 3] -Internet Draft Parent Stores Zone KEYS March 2001 +Internet Draft Parent Stores Zone KEYS May 2001 - RFC 2535 [2] states that a zone KEY must be present in the apex of a + RFC 2535 states that a zone KEY must be present in the apex of a zone. This can be in the at the delegation point in the parent's zonefile, or in the child's zonefile, or in both. This key is only valid if it is signed by the parent, so there is also the question @@ -122,8 +122,8 @@ Internet Draft Parent Stores Zone KEYS March 2001 The original idea was to have the zone KEY RR and the parent's SIG to reside in the child's zone and perhaps also in the parent's zone. - There is a draft proposal [3], that describes how a keyrollover can - be handled. + There is a draft proposal [RFC 2535], that describes how a + keyrollover can be handled. At NLnet Labs we found that storing the parent's signature over the child's zone KEY in the child's zone: @@ -138,14 +138,16 @@ Internet Draft Parent Stores Zone KEYS March 2001 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this - document are to be interpreted as described in RFC 2119 [2]. + document are to be interpreted as described in RFC 2119. 2. Proposal The core of the new proposal is that the parent zone stores the parent's signature over the child's zone KEY and also the child's - zone KEY itself. The child zone may also contain its zone KEY, in - which case is must be selfsigned. + zone KEY itself, and is authoritative for both KEY and SIG. The + child zone may also contain its zone KEY, in which case is must be + selfsigned. The child zone must not hold the parent's SIG, and must + also not set the AA-bit on requests for its zone KEY. The main advantage of this proposal is that all signatures signed by a key are in the same zone file as the producing key. This allows for @@ -162,15 +164,15 @@ Internet Draft Parent Stores Zone KEYS March 2001 2.1. TTL of the KEY and SIG at the parent Each zone in DNS expresses in its SOA record the maximum and minimum + + + + Gieben & Lindgreen Expires November 2001 [Page 4] + +Internet Draft Parent Stores Zone KEYS May 2001 + TTL values that they allow in the zone. Thus it is possible that the parent will sign with a value that is unacceptable to the child. The - - - - Gieben Expires September 2001 [Page 4] - -Internet Draft Parent Stores Zone KEYS March 2001 - parent MUST follow the TTL request of the child as long as that is within the allowed range for the parent. @@ -192,19 +194,40 @@ Internet Draft Parent Stores Zone KEYS March 2001 Section 3.4 "Determination of Zone Secure/Unsecured Status": " A zone KEY RR with the "no-key" type field value (both key type - flag bits 0 and 1 on) indicates that the zone named is unsecured - while a zone KEY RR with a key present indicates that the zone named - is secure. The secured versus unsecured status of a zone may vary - with different cryptographic algorithms. Even for the same - algorithm, conflicting zone KEY RRs may be present. " + flag bits 0 and 1 on) indicates that the zone named is unsecured + while a zone KEY RR with a key present indicates that the zone named + is secure. The secured versus unsecured status of a zone may vary + with different cryptographic algorithms. Even for the same + algorithm, conflicting zone KEY RRs may be present. " This is rewritten as: - " A zone is considered secured by on-tree validation [1] when the - there is a zone KEY from that zone present at its parent. If there - is no zone KEY present, and the resolver is also unaware of - alternative algorithms used and/or possible off-tree validation, the - zone is considered unsecured. " + " A zone is considered secured by on-tree validation [RFC 3090] when + the there is a zone KEY from that zone present at its parent. If + there is no zone KEY present, and the resolver is also unaware of + alternative algorithms used and/or possible off-tree validation, the + zone is considered unsecured. " + + To further clarify this. A zone is secure, when the resolver expects + it to be, there are two possibilities: + 1. When its parent is secure and holds a signed KEY for this child. + 2. When zone is a secure entry point, i.e. the resolver is + preconfigured with the KEY of this zone. + + RFC 3090 calls this globally secured. + + When a zone contains SIGs and a selfsigned KEY and this KEY is + preconfigured in the resolvers of interest, the a zone can be + considered locally secured (the RFC 3090 defintion). hijacked. + + If a zone is not globally or locally it must be considered unsecure. + + + + + Gieben & Lindgreen Expires November 2001 [Page 5] + +Internet Draft Parent Stores Zone KEYS May 2001 3. Impact on a secure aware resolver/forwarder @@ -222,13 +245,6 @@ Internet Draft Parent Stores Zone KEYS March 2001 3.1. Impact of key rollovers on resolver/forwarder When a zone is in the process of a key rollover, there could be a - - - - Gieben Expires September 2001 [Page 5] - -Internet Draft Parent Stores Zone KEYS March 2001 - discrepancy between the KEY and the SIG in the apex of the zone and the KEY and SIG that are stored in the cache of a resolver. @@ -257,13 +273,20 @@ Internet Draft Parent Stores Zone KEYS March 2001 4. Scheduled key rollover When the signatures, produced by the key to be rolled over, are all in one zone file, there are two parties involved. Let us look at an - example where a TLD rolls over its zone KEY. The new key needs to be - signed with the root's key before it can be used to sign the TLD zone - and the zone KEYs of the TLD's children. The steps that need to be - taken by TLD and root are: + possible example where a TLD rolls over its zone KEY. The new key + needs to be signed with the root's key before it can be used to sign + the TLD zone and the zone KEYs of the TLD's children. The steps that + need to be taken by TLD and root are: - the TLD adds the new key to its KEY set in its zonefile. This zone and KEY set are signed with the old zone KEY - then the TLD signals the parent + + + + Gieben & Lindgreen Expires November 2001 [Page 6] + +Internet Draft Parent Stores Zone KEYS May 2001 + - the root copies the new KEY set, consisting of the both new and the old key, in its zonefile, resigns it and signals the TLD - the TLD removes the old key from its KEY set, resigns its zone @@ -280,25 +303,18 @@ Internet Draft Parent Stores Zone KEYS March 2001 5. Unscheduled key rollover - - - - Gieben Expires September 2001 [Page 6] - -Internet Draft Parent Stores Zone KEYS March 2001 - Although nobody hopes that this will ever happen, we must be able to cope with possible key compromises. When such an event occurs, an immediate keyrollover is needed and must be completed in the shortest possible time. With two parties involved, it will still be awkward, but not impossible to update two zonefiles overnight. "Out-of-band" communication between the two parties will be necessary, since the - compromised old key can not be trusted. We think that between two - parties this is doable, but this complicated procedure [5] is beyond - the scope of this document. + compromised old key can not be trusted. We think that between two + parties this is doable, but this complicated procedure is beyond the + scope of this document. An alternative to an emergency key-rollover is becoming unsecured as - an emercengy measure. This has already been mentioned above in + an emergency measure. This has already been mentioned above in section 3.1. This only involves an emergency change in the parents zonefile (deleting the child's zone KEY), and allows the child and its underlying zones time to clean up before becoming secured again, @@ -322,6 +338,13 @@ Internet Draft Parent Stores Zone KEYS March 2001 To cope with 1, secure aware resolvers MUST be aware that during a key-rollover there may be a conflict, and that in that case the + + + + Gieben & Lindgreen Expires November 2001 [Page 7] + +Internet Draft Parent Stores Zone KEYS May 2001 + parent always holds the active KEY set. To cope with 2, the local resolver/caching forwarder should be preconfigured with the zone-KEY and thus looks at its own zone as were it a secure entry-point. For @@ -329,24 +352,17 @@ Internet Draft Parent Stores Zone KEYS March 2001 zonefile. 7.1. KEY bit in NXT records - RFC 2535 [3], section 5.2 states: + RFC 2535, section 5.2 states: - " The NXT RR type bit map format currently defined is one bit per - RR type present for the owner name. A one bit indicates that at - least one RR of that type is present for the owner name. A zero - indicates that no such RR is present. [....] " + " The NXT RR type bit map format currently defined is one bit per RR + type present for the owner name. A one bit indicates that at least + one RR of that type is present for the owner name. A zero indicates + that no such RR is present. [....] " - As the zone KEY is present in a child zone, and signed by the - zone KEY (thus selfsigned), the definition of NXT RR type bit states - - - - Gieben Expires September 2001 [Page 7] - -Internet Draft Parent Stores Zone KEYS March 2001 - - in RFC 2535 [3], section 5.2 that the KEY bit must be set. We do not - see a compelling reason to change this default behavior. + As the zone KEY is present in a child zone, and signed by the zone + KEY (thus selfsigned), the definition of NXT RR type bit states in + RFC 2535, section 5.2 that the KEY bit must be set. We do not see a + compelling reason to change this default behavior. 7.2. Authority of KEY records The parent of a zone generates the signature for the key belonging to @@ -371,15 +387,22 @@ Internet Draft Parent Stores Zone KEYS March 2001 mechanism, like publishing it in a newspaper. 7.3. Selecting KEY sets - As the zone KEY set is present in two places, there may be a - possibility to find conflicting KEY sets, and this will at least - really happen during a key-rollover. + As the zone KEY set is present in two places, there is a possibility + of two conflicting KEY sets, this will happen during a key-rollover + and may happen at other times. With one exception, a resolver MUST always select the KEY set from the parent in case of a conflict, as this is the active KEY set. For this reason, the parent sets the AA-bit on requests, while the child does not. + + + + Gieben & Lindgreen Expires November 2001 [Page 8] + +Internet Draft Parent Stores Zone KEYS May 2001 + The one exception is when a resolver regards the child's zone as a secure-entry point, in which case it has the zone KEY preconfigured. In other words: a preconfigured KEY has even more authority then what @@ -389,28 +412,22 @@ Internet Draft Parent Stores Zone KEYS March 2001 8. The zone KEY and local KEY records. It must be recognized that the zone KEY RR, which is signed by a - non-local organisation, is something special. The external signature + non-local organization, is something special. The external signature over the public part of the key provides the local zone-administrator with the authority to use the corresponding private part to sign everything local, and thus to make his/her own zone secure. Please also note that the external signer, and NOT the local zone is authoritative for the zone KEY RRset. - - - - Gieben Expires September 2001 [Page 8] - -Internet Draft Parent Stores Zone KEYS March 2001 - Part of the RRs that the zone-administrator may wish to sign are KEY RRs for local use, for instance for IPSEC. To make sure, that the local zone is authoritative for its own local KEY RRs, and that they get not exported and signed externally, these local KEY records SHOULD not be part of the zone KEY RRset. - Therefore, they SHOULD be placed under a label in the zonefile, f.i. - keys.child.parent. + Therefore, they could be placed under a label in the zonefile, f.i. + keys.child.parent, or for these kind of keys a new RR type could be + defined (e.g. PUBKEY). Besides being kept clear of local KEY records, the zone KEY RRset SHOULD also be kept clear of any other obsolete or otherwise not @@ -423,15 +440,38 @@ Internet Draft Parent Stores Zone KEYS March 2001 progress. During a keyrollover a new KEY RR must be added to this RRset. Once the new KEY becomes the active zone KEY, the old KEY becomes obsolete and SHOULD be removed as soon as practically - possible. + possible. Information stored in caches SHOULD NOT be an issue on when + to remove the old zone KEY. 9. Security Considerations - This document addresses the operational difficulties that arise if - DNSSEC is deployed as it stands now, with the child's zone KEY not - stored at the parent. By putting that key in the parent's zone the - communication between the two is kept to a minimum thus reducing the - risk of errors. All security considerations from RFC 2535 apply. + This document addresses the operational difficulties that arise when + DNSSEC is deployed. By putting the child's zone KEY at the parent we + solve at lot of problems by minimizing the amount of communication + between the two. There is one security issue: the parent must not + ever create a valid parental SIG over a KEY RR, from which the + private part is (also) known to someone else than the legitimate + administrator of the child zone. This can happen in two ways: + 1. The private KEY at the child has been compromised. + 2. The parent has been fooled and thus insufficiently checked + + + + Gieben & Lindgreen Expires November 2001 [Page 9] + +Internet Draft Parent Stores Zone KEYS May 2001 + + whether the KEY RR is really from the child. + + For the security it doesn't matter if the SIG and the KEY are located + at the child or at the parent, but if they are located at the parent + it is much easier to replace the SIG. And by keeping the parental SIG + lifetime short, the parent helps to protect the child against + possible key compromises. The selfsigned zone KEY stored in the + child's zone can have a long SIG expiration lifetime, this has no + impact on the child's security. + + All security considerations from RFC 2535 apply. Authors' Addresses @@ -445,26 +485,14 @@ Authors' Addresses References - [1] Lewis, E. "DNS Security Extension Clarification on Zone + [RFC 3090] Lewis, E. "DNS Security Extension Clarification on Zone Status", RFC 3090 www.ietf.org/rfc/rfc3090.txt - [2] Bradner, S. "Key words for use in RFCs to Indicate Requirement + [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119 www.ietf.org/rfc/rfc2119.txt - [3] Eastlake, D. "DNS Security Extensions", RFC 2535 + [RFC 2535] Eastlake, D. "DNS Security Extensions", RFC 2535 www.ietf.org/rfc/rfc2535.txt - [4] Andrews, M., Eastlake, D. "Domain Name System (DNS) Security - - - - Gieben Expires September 2001 [Page 9] - -Internet Draft Parent Stores Zone KEYS March 2001 - - Key Rollover" - www.ietf.org/internet-drafts/draft-ietf-dnsop-rollover-01.txt - [5] Gieben, R. "Chain of trust" - secnl.nlnetlabs.nl/thesis/thesis.html Full Copyright Statement @@ -485,6 +513,13 @@ Full Copyright Statement followed, or as required to translate it into languages other than English. + + + Gieben & Lindgreen Expires November 2001 [Page 10] + +Internet Draft Parent Stores Zone KEYS May 2001 + + The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. diff --git a/doc/draft/draft-ietf-idn-dude-01.txt b/doc/draft/draft-ietf-idn-dude-01.txt deleted file mode 100644 index d1b83c223f..0000000000 --- a/doc/draft/draft-ietf-idn-dude-01.txt +++ /dev/null @@ -1,898 +0,0 @@ -Internet Engineering Task Force (IETF) Mark Welter -INTERNET-DRAFT Brian W. Spolarich -draft-ietf-idn-dude-01.txt WALID, Inc. -March 02, 2001 Expires September 02, 2001 - - - DUDE: Differential Unicode Domain Encoding - - -Status of this memo - -This document is an Internet-Draft and is in full conformance with all -provisions of Section 10 of RFC2026. - -Internet-Drafts are working documents of the Internet Engineering Task -Force (IETF), its areas, and its working groups. Note that other -groups may also distribute working documents as Internet-Drafts. - -Internet-Drafts are draft documents valid for a maximum of six months -and may be updated, replaced, or obsoleted by other documents at any -time. It is inappropriate to use Internet-Drafts as reference -material or to cite them other than as "work in progress." - - The list of current Internet-Drafts can be accessed at - http://www.ietf.org/ietf/1id-abstracts.txt - - The list of Internet-Draft Shadow Directories can be accessed at - http://www.ietf.org/shadow.html. - -The distribution of this document is unlimited. - -Copyright (c) The Internet Society (2000). All Rights Reserved. - -Abstract - -This document describes a tranformation method for representing -Unicode character codepoints in host name parts in a fashion that is -completely compatible with the current Domain Name System. It provides -for very efficient representation of typical Unicode sequences as -host name parts, while preserving simplicity. It is proposed as a -potential candidate for an ASCII-Compatible Encoding (ACE) for supporting -the deployment of an internationalized Domain Name System. - - -Table of Contents - -1. Introduction -1.1 Terminology -2. Hostname Part Transformation -2.1 Post-Converted Name Prefix -2.2 Radix Selection -2.3 Hostname Prepartion -2.4 Definitions -2.5 DUDE Encoding -2.5.1 Extended Variable Length Hex Encoding -2.5.2 DUDE Compression Algorithm -2.5.3 Forward Transformation Algorithm -2.6 DUDE Decoding -2.6.1 Extended Variable Length Hex Decoding -2.6.2 DUDE Decompression Algorithm -2.6.3 Reverse Transformation Algorithm -3. Examples -4. Optional Case Preservation -5. Security Considerations -6. References - - -1. Introduction - -DUDE describes an encoding scheme of the ISO/IEC 10646 [ISO10646] -character set (whose character code assignments are synchronized -with Unicode [UNICODE3]), and the procedures for using this scheme -to transform host name parts containing Unicode character sequences -into sequences that are compatible with the current DNS protocol -[STD13]. As such, it satisfies the definition of a 'charset' as -defined in [IDNREQ]. - -1.1 Terminology - -The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and -"MAY" in this document are to be interpreted as described in RFC 2119 -[RFC2119]. - -Hexadecimal values are shown preceded with an "0x". For example, -"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are -shown preceded with an "0b". For example, a nine-bit value might be -shown as "0b101101111". - -Examples in this document use the notation from the Unicode Standard -[UNICODE3] as well as the ISO 10646 names. For example, the letter "a" -may be represented as either "U+0061" or "LATIN SMALL LETTER A". - -DUDE converts strings with internationalized characters into -strings of US-ASCII that are acceptable as host name parts in current -DNS host naming usage. The former are called "pre-converted" and the -latter are called "post-converted". This specification defines both -a forward and reverse transformation algorithm. - - -2. Hostname Part Transformation - -According to [STD13], hostname parts must start and end with a letter -or digit, and contain only letters, digits, and the hyphen character -("-"). This, of course, excludes most characters used by non-English -speakers, characters, as well as many other characters in the ASCII -character repertoire. Further, domain name parts must be 63 octets or -shorter in length. - -2.1 Post-Converted Name Prefix - -This document defines the string 'dq--' as a prefix to identify -DUDE-encoded sequences. For the purposes of comparison in the IDN -Working Group activities, the 'dq--' prefix should be used solely to -identify DUDE sequences. However, should this document proceed beyond -draft status the prefix should be changed to whatever prefix, if any, -is the final consensus of the IDN working group. - -Note that the prepending of a fixed identifier sequence is only one -mechanism for differentiating ASCII character encoded international -domain names from 'ordinary' domain names. One method, as proposed in -[IDNRACE], is to include a character prefix or suffix that does not -appear in any name in any zone file. A second method is to insert a -domain component which pushes off any international names one or more -levels deeper into the DNS hierarchy. There are trade-offs between -these two methods which are independent of the Unicode to ASCII -transcoding method finally chosen. We do not address the international -vs. 'ordinary' name differention issue in this paper. - -2.2 Radix Selection - -There are many proposed methods for representing Unicode characters -within the allowed target character set, which can be split into groups -on the basis of the underlying radix. We have chosen a method with -radix 16 because both UTF-32 and ASCII are represented by even multiples -of four bits. This allows a Unicode character to be encoded as a -whole number of ASCII characters, and permits easier manipulation of -the resulting encoded data by humans. - -2.3 Hostname Preparation - -The hostname part is assumed to have at least one character disallowed -by [STD13], and that is has been processed for logically equivalent -character mapping, filtering of disallowed characters (if any), and -compatibility composition/decomposition before presentation to the DUDE -conversion algorithm. - -While it is possible to invent a transcoding mechanism that relies -on certain Unicode characters being deemed illegal within domain names -and hence available to the transcoding mechanism for improving encoding -efficiency, we feel that such a proposal would complicate matters -excessively. - -2.4 Definitions - -For clarity: - - 'integer' is an unsigned binary quantity; - 'byte' is an 8-bit integer quantity; - 'nibble' is a 4-bit integer quantity. - -2.5 DUDE Encoding - -The idea behind this scheme is to provide compression by encoding the -contiguous least significant nibbles of a character that differ from the -preceding character. Using a variant of the variable length hex encoding -desribed in [IDNDUERST] and elsewhere, by encoding leading zero nibbles -this technique allows recovery of the differential length. The encoding -is, with some practice, easy to perform manually. - -2.5.1 Extended Variable Length Hex Encoding - -The variable length hex encoding algorithm was introduced by Duerst in -[IDNDUERST]. It encodes an integer value in a slight modification of -traditional hexadecimal notation, the difference being that the most -significant digit is represented with an alternate set of "digits" -- -- 'g through 'v' are used to represent 0 through 15. The result is a -variable length encoding which can efficiently represent integers of -arbitrary length. - -This specification extends the variable length hex encoding algorithm -to support the compression scheme defined below by potentially not -supressing leading zero nibbles. - -The extended variable length nibble encoding of an integer, C, -to length N, is defined as follows: - - 1. Start with I, the Nth least significant nibble from the least - significant nibble of C; - - 2. Emit the Ith character of the sequence [ghijklmnopqrstuv]; - - 3. Continue from the most to least significant, encoding each - remaining nibble J by emitting the Jth character of the - sequence [0123456789abcdef]. - -2.5.2 DUDE Compression Algorithm - - 1. Let PREV = 0; - - 2. If there are no more characters in the input, terminate successfully; - - 4. Let C be the next character in the input; - - 5. If C != '-' , then go to step 7; - - 6. Consume the input character, emit '-', and go to step 2; - - 7. Let D be the result of PREV exclusive ORed with C; - - 8. Find the least positive value N such that - D bitwise ANDed with M is zero - where M = the bitwise complement of (16**N) - 1; - - 9. Let V be C ANDed with the bitwise complement of M; - - 10. Variable length hex encode V to length N and emit the result; - - 11. Let PREV = C and go to step 2. - - -2.5.3 Forward Transformation Algorithm - -The DUDE transformation algorithm accepts a string in UTF-32 -[UNICODE3] format as input. It is assumed that prior nameprep -processing has disallowed the private use code points in -0X100000 throuh 0X10FFFF, so that we are left with the task of -encoding 20 bit integers. The encoding algorithm is as follows: - - 1. Break the hostname string into dot-separated hostname parts. - For each hostname part which contains one or more characters - disallowed by [STD13], perform steps 2 and 3 below; - - 2. Compress the hostname part using the method described in section - 2.5.2 above, and encode using the encoding described in section - 2.5.1; - - 3. Prepend the post-converted name prefix 'dq--' (see section 2.1 - above) to the resulting string. - - -2.6 DUDE Decoding - -2.6.1 Extended Variable Length Hex Decoding - - Decoding extended variable length hex encoded strings is identical -to the standard variable length hex encoding, and is defined as -follows: - - 1. Let CL be the lower case of the first input character, - - If CL is not in set [ghijklmnopqrstuv], - return error, - else - consume the input character; - - 2. Let R = CL - 'g', - Let N = 1; - - 3. If no more input characters exist, go to step 9. - - 4. Let CL be the lower case of the next input character; - - 5. If CL is not in the set [0123456789abcdef], go to Step 9; - - 6. Consume the next input character, - Let N = N + 1; - Let R = R * 16; - - 7. If N is in set [0123456789], - then let R = R + (N - '0') - else let R = R + (N - 'a') + 10; - - 8. Go to step 3; - - 9. Let MASK be the bitwise complement of (16**N) - 1; - - 10. Return decoded result R as well as MASK. - -2.6.2 DUDE Decompression Algorithm - - 1. Let PREV = 0; - - 2. If there are no more input characters then terminate successfully; - - 3. Let C be the next input character; - - 4. If C == '-', append '-' to the result string, consume the character, - and go to step 2, - - 5. Let VPART, MASK be the next extended variable length hex decoded - value and mask; - - 6. If VPART > 0xFFFFF then return error status, - - 7. Let CU = ( PREV bitwise-AND MASK) + VPART, - Let PREV = CU; - - 8. Append the UTF-32 character CU to the result string; - - 9. Go to step 2. - - -2.6.3 Reverse Transformation Algorithm - - 1. Break the string into dot-separated components and apply Steps - 2 through 4 to each component; - - 2. Remove the post converted name prefix 'dq--' (see Section 2.1); - - 3. Decompress the component using the decompression algorithm - described above (which in turn invokes the decoding algorithm - also described above); - - 4. Concatenate the decoded segments with dot separators and return. - -3. Examples - -The examples below illustrate the encoding algorithm. Allowed RFC1035 -characters, including period [U+002E] and dash [U+002D] are shown as -literals in the UTF-16 version of the example. DUDE is compared to -LACE as proposed in [IDNLACE]. A comprehensive comparison of ACE -proposals is outside of the scope of this document. However we believe -that DUDE shows a good balance between efficiency (resulting in shorter -ACE sequences for typical names) and complexity. - - -3.1 'www.walid.com' [Arabic]: - - UTF-16: U+0645 U+0648 U+0642 U+0639 . U+0648 U+0644 U+064A U+062F . - U+0634 U+0631 U+0643 U+0629 - - DUDE: dq--m45oij9.dq--m48kqif.dq--m34hk3i9 - - LACE: bq--aqdekscche.bq--aqdeqrckf5.bq--aqddimkdfe - -3.2 'Abugazalah-Intellectual-Property.com' [Arabic]: - - UTF-16: U+0623 U+0628 U+0648 U+063A U+0632 U+0627 U+0644 U+0629 - - U+0644 U+0644 U+0645 U+0644 U+0643 U+064A U+0629 - U+0627 - U+0644 U+0641 U+0643 U+0631 U+064A U+0629 . U+0634 U+0631 - U+0643 U+0629 - - DUDE: dq--m23ok8jaii7k4i9-m44klkjqi9-m27k4hjj1kai9.dq--m34hk3i9 - - LACE: bq--badcgkcihizcorbjaeac2bygircekrcdjiuqcabna4dcorcbimyuuki. - bq--aqddimkdfe - -3.3 'King-Hussain.person.jr' [Arabic] - - UTF-16: U+0627 U+0644 U+0645 U+0644 U+0643 - U+062D U+0633 U+064A - U+0646 . U+0634 U+062E U+0635 . U+0627 U+0644 U+0623 U+0631 - U+062F U+0646 - - DUDE: dq--m27k4lkj-m2dj3kam.dq--m34iej5.dq--m27k4i3j1ifk6 - - LACE: bq--audcorcfirbqcabnaudegljtjjda.bq--amddilrv. - bq--aydcorbdgexum - -3.4 'Jordanian-Dental-Center.com.jr' [Arabic] - - UTF-16: U+0645 U+0631 U+0643 U+0632 - U+0627 U+0644 U+0623 U+0631 U+062F - U+0646 - U+0644 U+0644 U+0623 U+0633 U+0646 U+0627 U+0646 . - U+0634 U+0631 U+0643 U+0629 . U+0627 U+0644 U+0623 U+0631 U+062F - U+0646 - - DUDE: dq--m45j1k3j2-m27k4i3j1ifk6-m44ki3j3k6i7k6.dq--m34hk3i9. - dq--m27k4i3j1ifk6 - - LACE: bq--aqdekmkdgiaqaligaytuiizrf5dacabna4deirbdgndcorq. - bq--aqddimkdfe.bq--aydcorbdgexum - -3.5 'Mahindra.com' [Hindi]: - - UTF-16: U+092E U+0939 U+093F U+0928 U+094D U+0926 U+094D U+0930 - U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930 - - DUDE: dq--p2ej9vi8kdi6kdj0u.dq--p35kdifjeiajeg - - LACE: bq--bees4oj7fbgsmtjqhy.bq--a4etktjphyvd4ma - -3.6 'Webdunia.com' [Hindi]: - - UTF-16: U+0935 U+0947 U+092C U+0926 U+0941 U+0928 U+093F U+092F - U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930 - - DUDE: dq--p35k7icmk1i8jfifje.dq--p35kdifjeiajeg - - LACE: bq--beetkrzmezasqpzphy.bq--a4etktjphyvd4ma - -3.7 'Chinese Finance.com' [Traditional Chinese] - - UTF-16: U+4E2D U+83EF U+8CA1 U+7D93 . c o m - - DUDE: dq--ke2do3efsa1nd93.com - - LACE: bq--75hc3a7prsqx3ey.com - -3.8 'Chinese Readers.net' [Chinese] - - UTF-16: U+842C U+7DAD U+8B80 U+8005 . U+7DB2 U+7D61 - - DUDE: dq--o42cndadob80g05.dq--ndb2m1 - - LACE: bq--76ccy7nnroaiabi.bq--aj63eyi - -3.9 'Russian-Standard.com.ru' [Russian] - - UTF-16: U+0440 U+0443 U+0441 U+0441 U+043A U+0438 U+0439 - - U+0441 U+0442 U+0430 U+043D U+0434 U+0430 U+0440 U+0442 . - U+043A U+043E U+043C . U+0440 U+0444 - - DUDE: dq--k40jhhjaop-k3ausk1ij0tkgk0i.dq--k3aus.dq--k40k - - LACE: bq--a4ceaq2bie5dqoibaawqqbcbiiyd2nbqibba.bq--amcdupr4. - bq--aiceara - -3.10 'Vladimir-Putin.person.ru' [Russian] - - UTF-16: U+0432 U+043B U+0430 U+0434 U+0438 U+043C U+0438 U+0440 - - U+043F U+0443 U+0442 U+0438 U+043D . U+043B U+0438 U+0447 - U+043D U+043E U+0441 U+0442 U+044C . U+0440 U+0444 U+0020 - - DUDE: dq--k32rgkosok0-k3fk3ij8t.dq--k3bok7jduk1is.dq--k40k - - LACE: bq--bacdeozqgq4dyocaaeac2bieh5bueob5. - bq--bacdwochhu7ecqsm.bq--aiceara - - -4. Optional Case Preservation - -An extension to the DUDE concept recognizes that the first -character emitted by the variable length hex encoding algorithm is -always alphabetic. We encode the case (if any) of the original Unicode -character in the case of the initial "hex" character. Because the DNS -performs case-insensitive comparisons, mixed case international domain -names behave in exactly the same way as traditional domain names. -In particular, this enables reverse lookups to return names in the -preferred case. - -In contrast to other proposals as of this writing, such a case preserving -version of DUDE will interoperate with the non case preserving version. - -Despite the foregoing, we feel that the additional complexity of tracking -character case through the nameprep processing is not warranted by the -marginal utility of the result. - -5. Security Considerations - -Much of the security of the Internet relies on the DNS and any -change to the characteristics of the DNS may change the security of -much of the Internet. Therefore DUDE makes no changes to the DNS itself. - -DUDE is designed so that distinct Unicode sequences map to distinct -domain name sequences (modulo the Unicode and DNS equivalence rules). -Therefore use of DUDE with DNS will not negatively affect security below -the application level. - -If an application has security reliance on the Unicode string S, produced -by an inverse ACE transformation of a name T, the application must verify -that the nameprepped and ACE encoded result of S is DNS-equivalent to T. - -6. Change History - -The statement that we intended to submit a Nameprep draft was removed in -light of the changes made between the frist and second nameprep drafts. - -The details of DUDE extensions for case preservation etc. have been -removed. Basic DUDE was changed to operate over the relevant 20 bit -UTF32 code points. - -Examples have been extended. - -ACE security issues were clarified. - -7. References - -[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name -Proposals", draft-ietf-idn-compare; - -[IDNrACE] Paul Hoffman, "RACE: Row-Based ASCII Compatible Encoding for -IDN", draft-ietf-idn-race; - -[IDNLACE] Mark Davis, "LACE: Length-Based ASCII Compatible Encoding for -IDN", draft-ietf-idn-lace; - -[IDNREQ] James Seng, "Requirements of Internationalized Domain Names", -draft-ietf-idn-requirement; - -[IDNNAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of -Internationalized Host Names", draft-ietf-idn-nameprep; - -[IDNDUERST] M. Duerst, "Internationalization of Domain Names", -draft-duerst-dns-i18n; - -[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information -technology -- Universal Multiple-Octet Coded Character Set (UCS) -- -Part 1: Architecture and Basic Multilingual Plane. Five amendments and -a technical corrigendum have been published up to now. UTF-16 is -described in Annex Q, published as Amendment 1. 17 other amendments are -currently at various stages of standardization; - -[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate -Requirement Levels", March 1997, RFC 2119; - -[STD13] Paul Mockapetris, "Domain names - implementation and -specification", November 1987, STD 13 (RFC 1035); - -[UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version -3.0", ISBN 0-201-61633-5. Described at -. - - -A. Acknowledgements - -The structure (and some of the structural text) of this document is -intentionally borrowed from the LACE IDN draft (draft-ietf-idn-lace-00) -by Mark Davis and Paul Hoffman. - -B. IANA Considerations - -There are no IANA considerations in this document. - - -C. Author Contact Information - -Mark Welter -Brian W. Spolarich -WALID, Inc. -State Technology Park -2245 S. State St. -Ann Arbor, MI 48104 -+1-734-822-2020 - -mwelter@walid.com -briansp@walid.com - -D. DUDE C++ Implementation - -#include -#include -#include -#include - -#define IDN_ERROR INT_MIN - -#define DUDETAG "dq--" - -typedef unsigned int uchar_t; - -bool idn_isRFC1035(const uchar_t * in, int len) -{ - const uchar_t * end = in + len; - - while (in < end) - { - if ((*in > 127) || - !strchr("abcdefghijklmnopqrstuvwxyz0123456789-.", tolower(*in))) - return false; - in++; - } - return true; -} - -static const char *hexchar = "0123456789abcdef"; -static const char *leadchar = "ghijklmnopqrstuv"; - -/* - dudehex -- convert an integer, v, into n DUDE hex characters. - The result is placed in ostr. The buffer ends at the byte before - eop, and false is returned to indicate insufficient buffer space. -*/ -static bool dudehex(char * & ostr, const char * eop, - unsigned int v, int n) -{ - if ((ostr + n) >= eop) - return false; - - n--; // convert to zero origin - - *ostr++ = leadchar[(v >> (n << 2)) & 0x0F]; - - while (n > 0) - { - n--; - *ostr++ = hexchar[(v >> (n << 2)) & 0x0F]; - } - return true; -} - -/* - idn_dudeseg converts istr, a utf-32 domain name segment into DUDE. - eip points at the character after the input segment. - ostr points at an output buffer which ends just before eop. - If there is insufficient buffer space, the function return is false. - Invalid surrogate sequences will also cause a return of false. -*/ -static bool idn_dudeseg(const uchar_t * istr, const uchar_t * eip, - char * & ostr, char * eop) -{ - const uchar_t * ip = istr; - unsigned p = 0; - - while (ip < eip) - { - if (*ip == '-') - *ostr++ = *ip; - else // if (validnc(*ip)) - { - unsigned int c = *ip; - - unsigned d = p ^ c; // d now has the difference (xor) - // between the current and previous char - - int n = 1; // Count the number of significant nibbles - while (d >>= 4) - n++; - - dudehex(ostr, eop, c, n); - p = c; - } - ip++; - } - *ostr = 0; - return true; -} - - -/* - idn_UTF32toDUDE converts a UTF-32 domain name into DUDE. - in, a UTF-32 vector of length inlen is the input domain name. - outstr is a char output buffer of length outmax. - On success, the number of output characters is returned. - On failure, a negative number is returned. - - It is assumed that the input has been nameprepped. - - If this routine is used in a registration context, segment and - overall length restrictions must be checked by the user. -*/ - -int idn_UTF32toDUDE(const uchar_t * in, int inlen, char *outstr, int outmax) -{ - const uchar_t *ip = in; - const uchar_t *eip = in + inlen; - const uchar_t *ep = ip; - char *op = outstr; - char *eop = outstr + outmax - 1; - - while (ip < eip) - { - ep = ip; - while ((ep < eip) && (*ep != '.')) - ep++; - - const char * tagp = DUDETAG; // prefix the segment - while (*tagp) // with the tag (dq--) - { - if (op >= eop) - { - *outstr = '\0'; - return IDN_ERROR; - } - *op++ = *tagp++; - } - - if (idn_isRFC1035(ip, ep - ip)) - { - if ((ep - ip) >= (eop - op)) - { - *outstr = '\0'; - return IDN_ERROR; - } - while (ip < ep) - *op++ = *ip++; - } - else - { - if (!idn_dudeseg(ip, ep, op, eop)) - { - *outstr = '\0'; - return IDN_ERROR; - } - } - - if (op >= eop) // check for output buffer overflow - { - *outstr = '\0'; - return IDN_ERROR; - } - if (ep < eip) - *op++ = *ep; // copy '.' - - ip = ep + 1; - } - - *op = '\0'; - - return (op - outstr) - 1; -} - -/* - idn_DUDEsegtoUTF32 converts instr, DUDE encoded domain name segment - into UTF32. - eip points at the character after the input segment. - ostr points at an output buffer which ends just before eop. - If there is insufficient buffer space, the function return is false. -*/ -static int idn_DUDEsegtoUTF32(const char * instr, int inlen, - uchar_t * outstr, int maxlen) -{ - const char * ip = instr; - const char * eip = instr + inlen; - uchar_t * op = outstr; - uchar_t * eop = op + maxlen - 1; - - unsigned prev = 0; - - while (ip < eip) - { - if (*ip == '-') - *op++ = '-'; - else - { - char c0 = tolower(*ip); - if ((c0 < 'g') || (c0 > 'v')) - return false; - - ip++; - - unsigned r = c0 - 'g'; - int n = 1; - while (ip < eip) - { - char cl = tolower(*ip); - if ((cl >= '0') && (cl <= '9')) - { - r <<= 4; - r += cl - '0'; - } - else if ((cl >= 'a') && (cl <= 'f')) - { - r <<= 4; - r += (cl - 'a') + 10; - } - else - break; - - ip++; - n++; - } - - if (r >= 0x0fffff) - { - return false; - } - unsigned mask = -1 << (n << 2); - - unsigned cu = (prev & mask) + r; - prev = cu; - - if (op >= eop) - return IDN_ERROR; - *op++ = cu; - } - } - *op = '\0'; - return (op - outstr); -} - -int idn_DUDEtoUTF32(const char * in, int inlen, uchar_t * outstr, int outmax) -{ - const char *ip = in; - const char *eip = in + inlen; - const char *ep = ip; - uchar_t *op = outstr; - uchar_t *eop = outstr + outmax - 1; - - while (ip < eip) - { - ep = ip; - while ((ep < eip) && (*ep != L'.')) - ep++; - - const char * tip = ip; - const char * tagp = DUDETAG; - while (*tagp && (tip < ep) && (tolower(*tagp) == tolower(*tip))) - { - tip++; - tagp++; - } - - if (*tagp) - { // tag doesn't match, copy segment verbatim - while (ip < ep) - { - if (op >= eop) - return IDN_ERROR; - *op++ = *ip++; - } - } - else - { - ip = tip; - int rv = idn_DUDEsegtoUTF32(ip, ep - ip, op, eop - op); - - if (rv < 0) - return IDN_ERROR; - - op += rv; - } - - *op++ = *ep; - - if (!*ep) - break; - - ip = ep + 1; - } - - if (op >= eop) - return IDN_ERROR; - - *op = '\0'; - - return (op - outstr) - 1; -} - -/* - DUDE test driver -*/ - -void printres(char *title, int rv, char *buff); -void printres(char *title, int rv, uchar_t *buff); - -int main(int argc, char *argv[]) -{ - char inbuff[512]; - - while (fgets(inbuff, sizeof(inbuff), stdin)) - { - char cbuff[128]; - uchar_t wbuff[128]; - uchar_t iwbuff[128]; - uchar_t *wsp = wbuff; - uchar_t wc; - int in; - int nr; - - char * inp = inbuff; - wsp = wbuff; - while (sscanf(inp, "%x%n", &in, &nr) > 0) - { - inp += nr; - *wsp++ = in; - } - fprintf(stdout, "\n"); - - int rv; - rv = idn_UTF32toDUDE(wbuff, wsp - wbuff, cbuff, sizeof(cbuff)); - printres("toDUDE", rv, cbuff); - - if (rv >= 0) - { - rv = idn_DUDEtoUTF32(cbuff, rv, iwbuff, sizeof(iwbuff)); - printres("toUTF32", rv, iwbuff); - } - - } - return 0; -} - -void printres(char *title, int rv, char *buff) -{ - fprintf(stdout, "%s (%d) : ", title, rv); - if (rv >= 0) - { - unsigned char *dp = (unsigned char *) buff; - while (*dp) - { - fprintf(stdout, "%c", *dp++); - } - } - fprintf(stdout, "\n"); -} - -void printres(char *title, int rv, uchar_t *buff) -{ - fprintf(stdout, "%s (%d) : ", title, rv); - if (rv >= 0) - { - uchar_t *dp = buff; - while (*dp) - { - fprintf(stdout, " %05x", *dp++); - } - } - fprintf(stdout, "\n"); -} diff --git a/doc/draft/draft-ietf-idn-dude-02.txt b/doc/draft/draft-ietf-idn-dude-02.txt new file mode 100644 index 0000000000..3af28936c4 --- /dev/null +++ b/doc/draft/draft-ietf-idn-dude-02.txt @@ -0,0 +1,864 @@ +INTERNET-DRAFT Mark Welter +draft-ietf-idn-dude-02.txt Brian W. Spolarich +Expires 2001-Dec-07 Adam M. Costello + 2001-Jun-07 + + Differential Unicode Domain Encoding (DUDE) + +Status of this Memo + + This document is an Internet-Draft and is in full conformance with + all provisions of Section 10 of RFC2026. + + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF), its areas, and its working groups. Note + that other groups may also distribute working documents as + Internet-Drafts. + + Internet-Drafts are draft documents valid for a maximum of six + months and may be updated, replaced, or obsoleted by other documents + at any time. It is inappropriate to use Internet-Drafts as + reference material or to cite them other than as "work in progress." + + The list of current Internet-Drafts can be accessed at + http://www.ietf.org/ietf/1id-abstracts.txt + + The list of Internet-Draft Shadow Directories can be accessed at + http://www.ietf.org/shadow.html + + Distribution of this document is unlimited. Please send comments to + the authors or to the idn working group at idn@ops.ietf.org. + +Abstract + + DUDE is a reversible transformation from a sequence of nonnegative + integer values to a sequence of letters, digits, and hyphens (LDH + characters). DUDE provides a simple and efficient ASCII-Compatible + Encoding (ACE) of Unicode strings [UNICODE] for use with + Internationalized Domain Names [IDN] [IDNA]. + +Contents + + 1. Introduction + 2. Terminology + 3. Overview + 4. Base-32 characters + 5. Encoding procedure + 6. Decoding procedure + 7. Example strings + 8. Security considerations + 9. References + A. Acknowledgements + B. Author contact information + C. Mixed-case annotation + D. Differences from draft-ietf-idn-dude-01 + E. Example implementation + +1. Introduction + + The IDNA draft [IDNA] describes an architecture for supporting + internationalized domain names. Each label of a domain name may + begin with a special prefix, in which case the remainder of the + label is an ASCII-Compatible Encoding (ACE) of a Unicode string + satisfying certain constraints. For the details of the constraints, + see [IDNA] and [NAMEPREP]. The prefix has not yet been specified, + but see http://www.i-d-n.net/ for prefixes to be used for testing + and experimentation. + + DUDE is intended to be used as an ACE within IDNA, and has been + designed to have the following features: + + * Completeness: Every sequence of nonnegative integers maps to an + LDH string. Restrictions on which integers are allowed, and on + sequence length, may be imposed by higher layers. + + * Uniqueness: Every sequence of nonnegative integers maps to at + most one LDH string. + + * Reversibility: Any Unicode string mapped to an LDH string can + be recovered from that LDH string. + + * Efficient encoding: The ratio of encoded size to original size + is small. This is important in the context of domain names + because [RFC1034] restricts the length of a domain label to 63 + characters. + + * Simplicity: The encoding and decoding algorithms are reasonably + simple to implement. The goals of efficiency and simplicity are + at odds; DUDE places greater emphasis on simplicity. + + An optional feature is described in appendix C "Mixed-case + annotation". + +2. Terminology + + The key words "must", "shall", "required", "should", "recommended", + and "may" in this document are to be interpreted as described in + RFC 2119 [RFC2119]. + + LDH characters are the letters A-Z and a-z, the digits 0-9, and + hyphen-minus. + + A quartet is a sequence of four bits (also known as a nibble or + nybble). + + A quintet is a sequence of five bits. + + Hexadecimal values are shown preceeded by "0x". For example, 0x60 + is decimal 96. + + As in the Unicode Standard [UNICODE], Unicode code points are + denoted by "U+" followed by four to six hexadecimal digits, while a + range of code points is denoted by two hexadecimal numbers separated + by "..", with no prefixes. + + XOR means bitwise exclusive or. Given two nonnegative integer + values A and B, A XOR B is the nonnegative integer value whose + binary representation is 1 in whichever places the binary + representations of A and B disagree, and 0 wherever they agree. + For the purpose of applying this rule, recall that an integer's + representation begins with an infinite number of unwritten zeros. + In some programming languages, care may need to be taken that A and + B are stored in variables of the same type and size. + +3. Overview + + DUDE encodes a sequence of nonnegative integral values as a sequence + of LDH characters, although implementations will of course need to + represent the output characters somehow, typically as ASCII octets. + When DUDE is used to encode Unicode characters, the input values are + Unicode code points (integral values in the range 0..10FFFF, but not + D800..DFFF, which are reserved for use by UTF-16). + + Each value in the input sequence is represented by one or more LDH + characters in the encoded string. The value 0x2D is represented + by hyphen-minus (U+002D). Each non-hyphen-minus character in + the encoded string represents a quintet. A sequence of quintets + represents the bitwise XOR between each non-0x2D integer and the + previous one. + +4. Base-32 characters + + "a" = 0 = 0x00 = 00000 "s" = 16 = 0x10 = 10000 + "b" = 1 = 0x01 = 00001 "t" = 17 = 0x11 = 10001 + "c" = 2 = 0x02 = 00010 "u" = 18 = 0x12 = 10010 + "d" = 3 = 0x03 = 00011 "v" = 19 = 0x13 = 10011 + "e" = 4 = 0x04 = 00100 "w" = 20 = 0x14 = 10100 + "f" = 5 = 0x05 = 00101 "x" = 21 = 0x15 = 10101 + "g" = 6 = 0x06 = 00110 "y" = 22 = 0x16 = 10110 + "h" = 7 = 0x07 = 00111 "z" = 23 = 0x17 = 10111 + "i" = 8 = 0x08 = 01000 "2" = 24 = 0x18 = 11000 + "j" = 9 = 0x09 = 01001 "3" = 25 = 0x19 = 11001 + "k" = 10 = 0x0A = 01010 "4" = 26 = 0x1A = 11010 + "m" = 11 = 0x0B = 01011 "5" = 27 = 0x1B = 11011 + "n" = 12 = 0x0C = 01100 "6" = 28 = 0x1C = 11100 + "p" = 13 = 0x0D = 01101 "7" = 29 = 0x1D = 11101 + "q" = 14 = 0x0E = 01110 "8" = 30 = 0x1E = 11110 + "r" = 15 = 0x0F = 01111 "9" = 31 = 0x1F = 11111 + + The digits "0" and "1" and the letters "o" and "l" are not used, to + avoid transcription errors. + + A decoder must accept both the uppercase and lowercase forms of + the base-32 characters (including mixtures of both forms). An + encoder should output only lowercase forms or only uppercase forms + (unless it uses the feature described in the appendix C "Mixed-case + annotation"). + +5. Encoding procedure + + All ordering of bits, quartets, and quintets is big-endian (most + significant first). + + let prev = 0x60 + for each input integer n (in order) do begin + if n == 0x2D then output hyphen-minus + else begin + let diff = prev XOR n + represent diff in base 16 as a sequence of quartets, + as few as are sufficient (but at least one) + prepend 0 to the last quartet and 1 to each of the others + output a base-32 character corresponding to each quintet + let prev = n + end + end + + If an encoder encounters an input value larger than expected (for + example, the largest Unicode code point is U+10FFFF, and nameprep + [NAMEPREP03] can never output a code point larger than U+EFFFD), + the encoder may either encode the value correctly, or may fail, but + it must not produce incorrect output. The encoder must fail if it + encounters a negative input value. + +6. Decoding procedure + + let prev = 0x60 + while the input string is not exhausted do begin + if the next character is hyphen-minus + then consume it and output 0x2D + else begin + consume characters and convert them to quintets until + encountering a quintet whose first bit is 0 + fail upon encountering a non-base-32 character or end-of-input + strip the first bit of each quintet + concatenate the resulting quartets to form diff + let prev = prev XOR diff + output prev + end + end + encode the output sequence and compare it to the input string + fail if they do not match (case-insensitively) + + The comparison at the end is necessary to guarantee the uniqueness + property (there cannot be two distinct encoded strings representing + the same sequence of integers). This check also frees the decoder + from having to check for overflow while decoding the base-32 + characters. (If the decoder is one step of a larger decoding + process, it may be possible to defer the re-encoding and comparison + to the end of that larger decoding process.) + +7. Example strings + + The first several examples are nonsense strings of mostly unassigned + code points intended to exercise the corner cases of the algorithm. + + (A) u+0061 + DUDE: b + + (B) u+2C7EF u+2C7EF + DUDE: u6z2ra + + (C) u+1752B u+1752A + DUDE: tzxwmb + + (D) u+63AB1 u+63ABA + DUDE: yv47bm + + (E) u+261AF u+261BF + DUDE: uyt6rta + + (F) u+C3A31 u+C3A8C + DUDE: 6v4xb5p + + (G) u+09F44 u+0954C + DUDE: 39ue4si + + (H) u+8D1A3 u+8C8A3 + DUDE: 27t6dt3sa + + (I) u+6C2B6 u+CC266 + DUDE: y6u7g4ss7a + + (J) u+002D u+002D u+002D u+E848F + DUDE: ---82w8r + + (K) u+BD08E u+002D u+002D u+002D + DUDE: 57s8q--- + + (L) u+A9A24 u+002D u+002D u+002D u+C05B7 + DUDE: 434we---y393d + + (M) u+7FFFFFFF + DUDE: z999993r or explicit failure + + The next several examples are realistic Unicode strings that could + be used in domain names. They exhibit single-row text, two-row + text, ideographic text, and mixtures thereof. These examples are + names of Japanese television programs, music artists, and songs, + merely because one of the authors happened to have them handy. + + (N) 3b (Latin, kanji) + u+0033 u+5E74 u+0062 u+7D44 u+91D1 u+516B u+5148 u+751F + DUDE: xdx8whx8tgz7ug863f6s5kuduwxh + + (O) -with-super-monkeys (Latin, kanji, hyphens) + u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074 + u+0068 u+002D u+0073 u+0075 u+0070 u+0065 u+0072 u+002D u+006D + u+006F u+006E u+006B u+0065 u+0079 u+0073 + DUDE: x58jupu8nuy6gt99m-yssctqtptn-tmgftfth-trcbfqtnk + + (P) majikoi5 (Latin, hiragana, kanji) + u+006D u+0061 u+006A u+0069 u+3067 u+006B u+006F u+0069 u+3059 + u+308B u+0035 u+79D2 u+524D + DUDE: pnmdvssqvssnegvsva7cvs5qz38hu53r + + (Q) de (Latin, katakana) + u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0 + DUDE: vs5bezgxrvs3ibvs2qtiud + + (R) (hiragana, katakana) + u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067 + DUDE: vsvpvd7hypuivf4q + +8. Security considerations + + Users expect each domain name in DNS to be controlled by a single + authority. If a Unicode string intended for use as a domain label + could map to multiple ACE labels, then an internationalized domain + name could map to multiple ACE domain names, each controlled by + a different authority, some of which could be spoofs that hijack + service requests intended for another. Therefore DUDE is designed + so that each Unicode string has a unique encoding. + + However, there can still be multiple Unicode representations of the + "same" text, for various definitions of "same". This problem is + addressed to some extent by the Unicode standard under the topic of + canonicalization, and this work is leveraged for domain names by + "nameprep" [NAMEPREP03]. + +9. References + + [IDN] Internationalized Domain Names (IETF working group), + http://www.i-d-n.net/, idn@ops.ietf.org. + + [IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host + Names In Applications (IDNA)", draft-ietf-idn-idna-01. + + [NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation + of Internationalized Host Names", 2001-Feb-24, + draft-ietf-idn-nameprep-03. + + [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host + Table Specification", 1985-Oct, RFC 952. + + [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", + 1987-Nov, RFC 1034. + + [RFC1123] Internet Engineering Task Force, R. Braden (editor), + "Requirements for Internet Hosts -- Application and Support", + 1989-Oct, RFC 1123. + + [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate + Requirement Levels", 1997-Mar, RFC 2119. + + [SFS] David Mazieres et al, "Self-certifying File System", + http://www.fs.net/. + + [UNICODE] The Unicode Consortium, "The Unicode Standard", + http://www.unicode.org/unicode/standard/standard.html. + +A. Acknowledgements + + The basic encoding of integers to quartets to quintets to base-32 + comes from earlier IETF work by Martin Duerst. DUDE uses a slight + variation on the idea. + + Paul Hoffman provided helpful comments on this document. + + The idea of avoiding 0, 1, o, and l in base-32 strings was taken + from SFS [SFS]. + +B. Author contact information + + Mark Welter + Brian W. Spolarich + WALID, Inc. + State Technology Park + 2245 S. State St. + Ann Arbor, MI 48104 + +1 734 822 2020 + + Adam M. Costello + University of California, Berkeley + http://www.cs.berkeley.edu/~amc/ + +C. Mixed-case annotation + + In order to use DUDE to represent case-insensitive Unicode strings, + higher layers need to case-fold the Unicode strings prior to DUDE + encoding. The encoded string can, however, use mixed-case base-32 + (rather than all-lowercase or all-uppercase as recommended in + section 4 "Base-32 characters") as an annotation telling how to + convert the folded Unicode string into a mixed-case Unicode string + for display purposes. + + Each Unicode code point (unless it is U+002D hyphen-minus) is + represented by a sequence of base-32 characters, the last of which + is always a letter (as opposed to a digit). If that letter is + uppercase, it is a suggestion that the Unicode character be mapped + to uppercase (if possible); if the letter is lowercase, it is a + suggestion that the Unicode character be mapped to lowercase (if + possible). + + DUDE encoders and decoders are not required to support these + annotations, and higher layers need not use them. + + Example: In order to suggest that example O in section 7 "Example + strings" be displayed as: + + -with-SUPER-MONKEYS + + one could capitalize the DUDE encoding as: + + x58jupu8nuy6gt99m-yssctqtptn-tMGFtFtH-tRCBFQtNK + +D. Differences from draft-ietf-idn-dude-01 + + Four changes have been made since draft-ietf-idn-dude-01 (DUDE-01): + + 1) DUDE-01 computed the XOR of each integer with the previous one + in order to decide how many bits of each integer to encode, but + now the XOR itself is encoded, so there is no need for a mask. + + 2) DUDE-01 made the first quintet of each sequence different from + the rest, while now it is the last quintet that differs, so it's + easier for the decoder to detect the end of the sequence. + + 3) The base-32 map has changed to avoid 0, 1, o, and l, to help + humans avoid transcription errors. + + 4) The initial value of the previous code point has changed from 0 + to 0x60, making the encodings of a few domain names shorter and + none longer. + + +E. Example implementation + + + +/******************************************/ +/* dude.c 0.2.3 (2001-May-31-Thu) */ +/* Adam M. Costello */ +/******************************************/ + +/* This is ANSI C code (C89) implementing */ +/* DUDE (draft-ietf-idn-dude-02). */ + + +/************************************************************/ +/* Public interface (would normally go in its own .h file): */ + +#include + +enum dude_status { + dude_success, + dude_bad_input, + dude_big_output /* Output would exceed the space provided. */ +}; + +enum case_sensitivity { case_sensitive, case_insensitive }; + +#if UINT_MAX >= 0x1FFFFF +typedef unsigned int u_code_point; +#else +typedef unsigned long u_code_point; +#endif + +enum dude_status dude_encode( + unsigned int input_length, + const u_code_point input[], + const unsigned char uppercase_flags[], + unsigned int *output_size, + char output[] ); + + /* dude_encode() converts Unicode to DUDE (without any */ + /* signature). The input must be represented as an array */ + /* of Unicode code points (not code units; surrogate pairs */ + /* are not allowed), and the output will be represented as */ + /* null-terminated ASCII. The input_length is the number of code */ + /* points in the input. The output_size is an in/out argument: */ + /* the caller must pass in the maximum number of characters */ + /* that may be output (including the terminating null), and on */ + /* successful return it will contain the number of characters */ + /* actually output (including the terminating null, so it will be */ + /* one more than strlen() would return, which is why it is called */ + /* output_size rather than output_length). The uppercase_flags */ + /* array must hold input_length boolean values, where nonzero */ + /* means the corresponding Unicode character should be forced */ + /* to uppercase after being decoded, and zero means it is */ + /* caseless or should be forced to lowercase. Alternatively, */ + /* uppercase_flags may be a null pointer, which is equivalent */ + /* to all zeros. The encoder always outputs lowercase base-32 */ + /* characters except when nonzero values of uppercase_flags */ + /* require otherwise. The return value may be any of the */ + /* dude_status values defined above; if not dude_success, then */ + /* output_size and output may contain garbage. On success, the */ + /* encoder will never need to write an output_size greater than */ + /* input_length*k+1 if all the input code points are less than 1 */ + /* << (4*k), because of how the encoding is defined. */ + +enum dude_status dude_decode( + enum case_sensitivity case_sensitivity, + char scratch_space[], + const char input[], + unsigned int *output_length, + u_code_point output[], + unsigned char uppercase_flags[] ); + + /* dude_decode() converts DUDE (without any signature) to */ + /* Unicode. The input must be represented as null-terminated */ + /* ASCII, and the output will be represented as an array of */ + /* Unicode code points. The case_sensitivity argument influences */ + /* the check on the well-formedness of the input string; it */ + /* must be case_sensitive if case-sensitive comparisons are */ + /* allowed on encoded strings, case_insensitive otherwise. */ + /* The scratch_space must point to space at least as large */ + /* as the input, which will get overwritten (this allows the */ + /* decoder to avoid calling malloc()). The output_length is */ + /* an in/out argument: the caller must pass in the maximum */ + /* number of code points that may be output, and on successful */ + /* return it will contain the actual number of code points */ + /* output. The uppercase_flags array must have room for at */ + /* least output_length values, or it may be a null pointer if */ + /* the case information is not needed. A nonzero flag indicates */ + /* that the corresponding Unicode character should be forced to */ + /* uppercase by the caller, while zero means it is caseless or */ + /* should be forced to lowercase. The return value may be any */ + /* of the dude_status values defined above; if not dude_success, */ + /* then output_length, output, and uppercase_flags may contain */ + /* garbage. On success, the decoder will never need to write */ + /* an output_length greater than the length of the input (not */ + /* counting the null terminator), because of how the encoding is */ + /* defined. */ + + +/**********************************************************/ +/* Implementation (would normally go in its own .c file): */ + +#include + +/* Character utilities: */ + +/* base32[q] is the lowercase base-32 character representing */ +/* the number q from the range 0 to 31. Note that we cannot */ +/* use string literals for ASCII characters because an ANSI C */ +/* compiler does not necessarily use ASCII. */ + +static const char base32[] = { + 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, /* a-k */ + 109, 110, /* m-n */ + 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, /* p-z */ + 50, 51, 52, 53, 54, 55, 56, 57 /* 2-9 */ +}; + +/* base32_decode(c) returns the value of a base-32 character, in the */ +/* range 0 to 31, or the constant base32_invalid if c is not a valid */ +/* base-32 character. */ + +enum { base32_invalid = 32 }; + +static unsigned int base32_decode(char c) +{ + if (c < 50) return base32_invalid; + if (c <= 57) return c - 26; + if (c < 97) c += 32; + if (c < 97 || c == 108 || c == 111 || c > 122) return base32_invalid; + return c - 97 - (c > 108) - (c > 111); +} + +/* unequal(case_sensitivity,s1,s2) returns 0 if the strings s1 and s2 */ +/* are equal, 1 otherwise. If case_sensitivity is case_insensitive, */ +/* then ASCII A-Z are considered equal to a-z respectively. */ + +static int unequal( enum case_sensitivity case_sensitivity, + const char s1[], const char s2[] ) +{ + char c1, c2; + + if (case_sensitivity != case_insensitive) return strcmp(s1,s2) != 0; + + for (;;) { + c1 = *s1; + c2 = *s2; + if (c1 >= 65 && c1 <= 90) c1 += 32; + if (c2 >= 65 && c2 <= 90) c2 += 32; + if (c1 != c2) return 1; + if (c1 == 0) return 0; + ++s1, ++s2; + } +} + + +/* Encoder: */ + +enum dude_status dude_encode( + unsigned int input_length, + const u_code_point input[], + const unsigned char uppercase_flags[], + unsigned int *output_size, + char output[] ) +{ + unsigned int max_out, in, out, k, j; + u_code_point prev, codept, diff, tmp; + char shift; + + prev = 0x60; + max_out = *output_size; + + for (in = out = 0; in < input_length; ++in) { + + /* At the start of each iteration, in and out are the number of */ + /* items already input/output, or equivalently, the indices of */ + /* the next items to be input/output. */ + + codept = input[in]; + + if (codept == 0x2D) { + /* Hyphen-minus stands for itself. */ + if (max_out - out < 1) return dude_big_output; + output[out++] = 0x2D; + continue; + } + + diff = prev ^ codept; + + /* Compute the number of base-32 characters (k): */ + for (tmp = diff >> 4, k = 1; tmp != 0; ++k, tmp >>= 4); + + if (max_out - out < k) return dude_big_output; + shift = uppercase_flags && uppercase_flags[in] ? 32 : 0; + /* shift controls the case of the last base-32 digit. */ + + /* Each quintet has the form 1xxxx except the last is 0xxxx. */ + /* Computing the base-32 digits in reverse order is easiest. */ + + out += k; + output[out - 1] = base32[diff & 0xF] - shift; + + for (j = 2; j <= k; ++j) { + diff >>= 4; + output[out - j] = base32[0x10 | (diff & 0xF)]; + } + + prev = codept; + } + + /* Append the null terminator: */ + if (max_out - out < 1) return dude_big_output; + output[out++] = 0; + + *output_size = out; + return dude_success; +} + + +/* Decoder: */ + +enum dude_status dude_decode( + enum case_sensitivity case_sensitivity, + char scratch_space[], + const char input[], + unsigned int *output_length, + u_code_point output[], + unsigned char uppercase_flags[] ) +{ + u_code_point prev, q, diff; + char c; + unsigned int max_out, in, out, scratch_size; + enum dude_status status; + + prev = 0x60; + max_out = *output_length; + + for (c = input[in = 0], out = 0; c != 0; c = input[++in], ++out) { + + /* At the start of each iteration, in and out are the number of */ + /* items already input/output, or equivalently, the indices of */ + /* the next items to be input/output. */ + + if (max_out - out < 1) return dude_big_output; + + if (c == 0x2D) output[out] = c; /* hyphen-minus is literal */ + else { + /* Base-32 sequence. Decode quintets until 0xxxx is found: */ + + for (diff = 0; ; c = input[++in]) { + q = base32_decode(c); + if (q == base32_invalid) return dude_bad_input; + diff = (diff << 4) | (q & 0xF); + if (q >> 4 == 0) break; + } + + prev = output[out] = prev ^ diff; + } + + /* Case of last character determines uppercase flag: */ + if (uppercase_flags) uppercase_flags[out] = c >= 65 && c <= 90; + } + + /* Enforce the uniqueness of the encoding by re-encoding */ + /* the output and comparing the result to the input: */ + + scratch_size = ++in; + status = dude_encode(out, output, uppercase_flags, + &scratch_size, scratch_space); + if (status != dude_success || scratch_size != in || + unequal(case_sensitivity, scratch_space, input) + ) return dude_bad_input; + + *output_length = out; + return dude_success; +} + + +/******************************************************************/ +/* Wrapper for testing (would normally go in a separate .c file): */ + +#include +#include +#include +#include + +/* For testing, we'll just set some compile-time limits rather than */ +/* use malloc(), and set a compile-time option rather than using a */ +/* command-line option. */ + +enum { + unicode_max_length = 256, + ace_max_size = 256, + test_case_sensitivity = case_insensitive + /* suitable for host names */ +}; + + +static void usage(char **argv) +{ + fprintf(stderr, + "%s -e reads code points and writes a DUDE string.\n" + "%s -d reads a DUDE string and writes code points.\n" + "Input and output are plain text in the native character set.\n" + "Code points are in the form u+hex separated by whitespace.\n" + "A DUDE string is a newline-terminated sequence of LDH characters\n" + "(without any signature).\n" + "The case of the u in u+hex is the force-to-uppercase flag.\n" + , argv[0], argv[0]); + exit(EXIT_FAILURE); +} + + +static void fail(const char *msg) +{ + fputs(msg,stderr); + exit(EXIT_FAILURE); +} + +static const char too_big[] = + "input or output is too large, recompile with larger limits\n"; +static const char invalid_input[] = "invalid input\n"; +static const char io_error[] = "I/O error\n"; + + +/* The following string is used to convert LDH */ +/* characters between ASCII and the native charset: */ + +static const char ldh_ascii[] = + "................" + "................" + ".............-.." + "0123456789......" + ".ABCDEFGHIJKLMNO" + "PQRSTUVWXYZ....." + ".abcdefghijklmno" + "pqrstuvwxyz"; + + +int main(int argc, char **argv) +{ + enum dude_status status; + int r; + char *p; + + if (argc != 2) usage(argv); + if (argv[1][0] != '-') usage(argv); + if (argv[1][2] != 0) usage(argv); + + if (argv[1][1] == 'e') { + u_code_point input[unicode_max_length]; + unsigned long codept; + unsigned char uppercase_flags[unicode_max_length]; + char output[ace_max_size], uplus[3]; + unsigned int input_length, output_size, i; + + /* Read the input code points: */ + + input_length = 0; + + for (;;) { + r = scanf("%2s%lx", uplus, &codept); + if (ferror(stdin)) fail(io_error); + if (r == EOF || r == 0) break; + + if (r != 2 || uplus[1] != '+' || codept > (u_code_point)-1) { + fail(invalid_input); + } + + if (input_length == unicode_max_length) fail(too_big); + + if (uplus[0] == 'u') uppercase_flags[input_length] = 0; + else if (uplus[0] == 'U') uppercase_flags[input_length] = 1; + else fail(invalid_input); + + input[input_length++] = codept; + } + + /* Encode: */ + + output_size = ace_max_size; + status = dude_encode(input_length, input, uppercase_flags, + &output_size, output); + if (status == dude_bad_input) fail(invalid_input); + if (status == dude_big_output) fail(too_big); + assert(status == dude_success); + + /* Convert to native charset and output: */ + + for (p = output; *p != 0; ++p) { + i = *p; + assert(i <= 122 && ldh_ascii[i] != '.'); + *p = ldh_ascii[i]; + } + + r = puts(output); + if (r == EOF) fail(io_error); + return EXIT_SUCCESS; + } + + if (argv[1][1] == 'd') { + char input[ace_max_size], scratch[ace_max_size], *pp; + u_code_point output[unicode_max_length]; + unsigned char uppercase_flags[unicode_max_length]; + unsigned int input_length, output_length, i; + + /* Read the DUDE input string and convert to ASCII: */ + + fgets(input, ace_max_size, stdin); + if (ferror(stdin)) fail(io_error); + if (feof(stdin)) fail(invalid_input); + input_length = strlen(input); + if (input[input_length - 1] != '\n') fail(too_big); + input[--input_length] = 0; + + for (p = input; *p != 0; ++p) { + pp = strchr(ldh_ascii, *p); + if (pp == 0) fail(invalid_input); + *p = pp - ldh_ascii; + } + + /* Decode: */ + + output_length = unicode_max_length; + status = dude_decode(test_case_sensitivity, scratch, input, + &output_length, output, uppercase_flags); + if (status == dude_bad_input) fail(invalid_input); + if (status == dude_big_output) fail(too_big); + assert(status == dude_success); + + /* Output the result: */ + + for (i = 0; i < output_length; ++i) { + r = printf("%s+%04lX\n", + uppercase_flags[i] ? "U" : "u", + (unsigned long) output[i] ); + if (r < 0) fail(io_error); + } + + return EXIT_SUCCESS; + } + + usage(argv); + return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ +} + + + + INTERNET-DRAFT expires 2001-Dec-07 diff --git a/doc/draft/draft-ietf-idn-iptr-01.txt b/doc/draft/draft-ietf-idn-iptr-02.txt similarity index 72% rename from doc/draft/draft-ietf-idn-iptr-01.txt rename to doc/draft/draft-ietf-idn-iptr-02.txt index 9e9f3ec76d..b96f37cedb 100644 --- a/doc/draft/draft-ietf-idn-iptr-01.txt +++ b/doc/draft/draft-ietf-idn-iptr-02.txt @@ -5,9 +5,9 @@ INTERNET-DRAFT Hongbo Shi -draft-ietf-idn-iptr-01.txt Waseda University -17 November 2000 Jiang Ming Liang -Expires: 17 May 2001 i-DNS.net +draft-ietf-idn-iptr-02.txt Waseda University +17 May 2001 Jiang Ming Liang +Expires: 17 November 2001 i-DNS.net Internationalized PTR Resource Record (IPTR) @@ -61,7 +61,7 @@ Shi, Jiang [Page 1] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 mapping architecture. This document describes a new RR TYPE named IPTR @@ -121,7 +121,7 @@ Shi, Jiang [Page 2] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 properties: @@ -181,7 +181,7 @@ Shi, Jiang [Page 3] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 Mapping IPv6 to IDNs can be similarly supported. This document recom- @@ -241,7 +241,7 @@ Shi, Jiang [Page 4] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 not find the corresponding LANGUAGE IDN finally, then the correspond- @@ -301,7 +301,7 @@ Shi, Jiang [Page 5] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 @@ -361,7 +361,7 @@ Shi, Jiang [Page 6] -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 Thus, @@ -381,123 +381,27 @@ INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 is allowed. -8. Open Issues +8. Changes - 1. Is it necessary to let a IDN aware server to send back all of - the corresponding IDNs to a resolver? Meanings, + Through the discussion on the IETF49 meeting in San Diego, we + deleted the chapter "Open Issues" of our previous draft (version + 01). + And, - +------------------------------------------------------+ - Header | OPCODE=SQUERY, RESPONSE, AA | - +------------------------------------------------------+ - Question | QNAME=4.3.2.1.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=IPTR | - +------------------------------------------------------+ - Answer | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name1-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name2-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name3-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name4-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name5-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name6-in-utf8" | - +------------------------------------------------------+ - Authority | ... | - +------------------------------------------------------+ - Additional | ... | - +------------------------------------------------------+ + 4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8" + IPTR "zh-TW" "[difffoo.sample] in utf8" + IPTR "zh-CN" "[samefoo.sample] in utf8" + IPTR "ja-JP" "[samefoo.sample] in utf8" + IPTR "ko-KR" "[samefoo.sample] in utf8" + is allowed. - Or, just using current fixed/cyclic/random options to return - one of the corresponding IDNs per LANGUAGE? In short, "one IP - one IDN per LANGUAGE". Such like +8. Changes - - - - - - -Shi, Jiang [Page 7] - - - - - -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 - - - - +------------------------------------------------------+ - Header | OPCODE=SQUERY, RESPONSE, AA | - +------------------------------------------------------+ - Question | QNAME=4.3.2.1.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=IPTR | - +------------------------------------------------------+ - Answer | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name1-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name4-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name5-in-utf8" | - | 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name6-in-utf8" | - +------------------------------------------------------+ - Authority | ... | - +------------------------------------------------------+ - Additional | ... | - +------------------------------------------------------+ - - - - - 2. If QTYPE is IPTR, should an IDN aware server send all of the - corresponding IDNs to the resolver? Is this kind of behavior - friendly to implent the resolver? How about letting a server - just feedback the corresponding PTR record, if a server - doesn't find the corresponding LANGUAGE IDN that a client - requires. - - In the following case, it is wasteful to return all the - corresponding IDNs to the clients. - - 4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[foo1.example] in utf8" - IPTR "zh-TW" "[foo2.example] in utf8" - ... - IPTR "zh-CN" "[foo1.example] in utf8" - IPTR "zh-CN" "[foo2.example] in utf8" - ... - IPTR "ja-JP" "[foo1.example] in utf8" - IPTR "ja-JP" "[foo2.example] in utf8" - ... - IPTR "ko-KR" "[foo1.example] in utf8" - IPTR "ko-KR" "[foo2.example] in utf8" - ... - - The benefit of the IPTR is introducing LANGUAGE. It SHOULD be - used in query from clients, then servers can select minimum - size of corresponding IDNs. For working this effectively, you - should introduce default LANGUAGE if no corresponding LANGUAGE - exists. The default MUST be ASCII. So that default IPTR can be - natural extension of PTR. I.E. - - - -Shi, Jiang [Page 8] - - - - - -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 - - - 4.3.2.1.in-addr.arpa. IN PTR ASCII-domain-name - - is equivalent to - - 4.3.2.1.in-addr.arpa. IN IPTR "default" ASCII-domain-name - - Of course, ASCII includes ACE. - - - 3. According to the consideration above, how about the following - thinking? That means a response MAY include not only a - corresponding IDN in a specific LANGUAGE but also the LANGUAGE - tags of the corresponding IDNs. And the client will load these - LANGUAGE tags in the DNS cache for the next IPTR query. + Through the discussion on the IETF49 meeting in San Diego, we + deleted the chapter "Open Issues" of our previous draft (version + 01). References @@ -507,8 +411,20 @@ References [IDNE] Marc Blanchet & Paul Hoffman, "Internationalized domain names using EDNS", draft-ietf-idn-idne. - [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of Interna- - tionalized Host Names", draft-ietf-idn-nameprep. + [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of + + + +Shi, Jiang [Page 7] + + + + + +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 + + + Internationalized Host Names", draft-ietf-idn-nameprep. [RFC1034] P. Mockapetris, "DOMAIN NAMES - CONCEPTS AND FACILITIES", November 1987, RFC1034 @@ -532,18 +448,6 @@ References August 1999, RFC 2671. [ISO 639] ISO 639:1988 (E/F) - Code for the representation of names - - - -Shi, Jiang [Page 9] - - - - - -INTERNET-DRAFT Internationalized PTR Resource Record 14 Nov. 2000 - - of languages - The International Organization for Standardization, 1st edition, 1988 17 pages Prepared by ISO/TC 37 - Terminology (principles and coordination). @@ -568,6 +472,18 @@ Authors' Information Tokyo, 169-8555 Japan shi@goto.info.waseda.ac.jp + + + +Shi, Jiang [Page 8] + + + + + +INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001 + + Jiang Ming Liang i-DNS.net 8 Temasek Boulevard @@ -595,6 +511,30 @@ Authors' Information -Shi, Jiang [Page 10] + + + + + + + + + + + + + + + + + + + + + + + + +Shi, Jiang [Page 9] diff --git a/doc/draft/draft-ietf-ipngwg-default-addr-select-03.txt b/doc/draft/draft-ietf-ipngwg-default-addr-select-04.txt similarity index 84% rename from doc/draft/draft-ietf-ipngwg-default-addr-select-03.txt rename to doc/draft/draft-ietf-ipngwg-default-addr-select-04.txt index 33115b5517..6bc5adb182 100644 --- a/doc/draft/draft-ietf-ipngwg-default-addr-select-03.txt +++ b/doc/draft/draft-ietf-ipngwg-default-addr-select-04.txt @@ -2,7 +2,7 @@ IPng Working Group Richard Draves Internet Draft Microsoft Research -Document: draft-ietf-ipngwg-default-addr-select-03.txt March 3, 2001 +Document: draft-ietf-ipngwg-default-addr-select-04.txt May 14, 2001 Category: Standards Track Default Address Selection for IPv6 @@ -54,8 +54,8 @@ Abstract These addresses may also be "preferred" or "deprecated" [3]. Privacy considerations have introduced the concepts of "public addresses" -Draves Standards Track - Expires September 2001 1 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 +Draves Standards Track - Expires December 2001 1 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 and "temporary addresses" [4]. The mobility architecture introduces @@ -106,14 +106,14 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 transition scenarios, but they are certainly not a panacea. The selection rules specified in this document MUST NOT be construed - to override an application or upper-layer's explicit choice of - destination or source address. + to override an application or upper-layer's explicit choice of a + legal destination or source address. -Draves Standards Track - Expires September 2001 2 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 +Draves Standards Track - Expires December 2001 2 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 1.1. Conventions used in this document @@ -132,27 +132,30 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 mechanism for administrative policy override. In this implementation architecture, applications use APIs [8] like - getaddrinfo() and getipnodebyname() that return a list of addresses - to the application. This list might contain both IPv6 and IPv4 - addresses (sometimes represented as IPv4-mapped addresses). The - application then passes a destination address to the network stack - with connect() or sendto(). The application might use only the first - address in the list, or it might loop over the list of addresses to - find a working address. In any case, the network layer is never in a - situation where it needs to choose a destination address from - several alternatives. The application might also specify a source - address with bind(), but often the source address is left - unspecified. Therefore the network layer does often choose a source - address from several alternatives. + getaddrinfo() that return a list of addresses to the application. + This list might contain both IPv6 and IPv4 addresses (sometimes + represented as IPv4-mapped addresses). The application then passes a + destination address to the network stack with connect() or sendto(). + The application might use only the first address in the list, or it + might loop over the list of addresses to find a working address. In + any case, the network layer is never in a situation where it needs + to choose a destination address from several alternatives. The + application might also specify a source address with bind(), but + often the source address is left unspecified. Therefore the network + layer does often choose a source address from several alternatives. As a consequence, we intend that implementations of getaddrinfo() - and getipnodebyname() will use the destination address selection - algorithm specified here to sort the list of IPv6 and IPv4 addresses - that they return. Separately, the IPv6 network layer will use the - source address selection algorithm when an application or upper- - layer has not specified a source address. Application of this - framework to source address selection in an IPv4 network layer may - be possible but this is not explored further here. + will use the destination address selection algorithm specified here + to sort the list of IPv6 and IPv4 addresses that they return. + Separately, the IPv6 network layer will use the source address + selection algorithm when an application or upper-layer has not + specified a source address. Application of this framework to source + address selection in an IPv4 network layer may be possible but this + is not explored further here. + + Well-behaved applications should iterate through the list of + addresses returned from getaddrinfo() until they find a working + addresses. The algorithms use several criteria in making their decisions. The combined effect is to prefer destination/source address pairs for @@ -161,19 +164,19 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 deprecated source addresses, avoid the use of transitional addresses when native addresses are available, and all else being equal prefer address pairs having the longest possible common prefix. For source - address selection, temporary addresses [4] are preferred over public + address selection, public addresses [4] are preferred over temporary addresses. In mobile situations [5], home addresses are preferred over care-of addresses. If an address is simultaneously a home address and a care-of address (indicating the mobile node is "at home" for that address), then the home/care-of address is preferred + +Draves Standards Track - Expires December 2001 3 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + over addresses that are solely a home address or solely a care-of address. - -Draves Standards Track - Expires September 2001 3 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - The framework optionally allows for the possibility of administrative configuration of policy that can override the default behavior of the algorithms. The policy override takes the form of a @@ -220,18 +223,18 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 assigned link-local scope. IPv4 private addresses [10], which have the prefixes 10/8, 172.16/12, and 192.168/16, are assigned site- local scope. IPv4 loopback addresses [11, section 4.2.2.11], which - have the prefix 127/8, are assigned link-local scope. Other IPv4 - addresses are assigned global scope. + have the prefix 127/8, are assigned link-local scope (analogously to + the treatment of the IPv6 loopback address [9, section 4]). Other + IPv4 addresses are assigned global scope. + +Draves Standards Track - Expires December 2001 4 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + IPv4 addresses should be treated as having "preferred" configuration status. - - -Draves Standards Track - Expires September 2001 4 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - 2.3. IPv6 Addresses with Embedded IPv4 Addresses IPv4-compatible addresses [2] and 6to4 addresses [12] contain an @@ -244,7 +247,7 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 2.4. Loopback Address and Other Format Prefixes The loopback address should be treated as having link-local - scope [9] and "preferred" configuration status. + scope [9, section 4] and "preferred" configuration status. NSAP addresses and other addresses with as-yet-undefined format prefixes should be treated as having global scope and "preferred" @@ -281,15 +284,15 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 2002::/16 30 2 ::/96 20 3 ::ffff:0:0/96 10 4 + + +Draves Standards Track - Expires December 2001 5 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + One effect of the default policy table is to prefer using native source addresses with native destination addresses, 6to4 [12] source - - -Draves Standards Track - Expires September 2001 5 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - addresses with 6to4 destination addresses, and v4-compatible [2] source addresses with v4-compatible destination addresses. Another effect of the default policy table is to prefer communication using @@ -340,14 +343,13 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 not in the candidate set for the destination, then the network layer MUST treat this is an error. If the application or upper-layer specifies a source address that is in the candidate set for the + +Draves Standards Track - Expires December 2001 6 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + destination, then the network layer MUST respect that choice. If the application or upper-layer does not specify a source address, then - - -Draves Standards Track - Expires September 2001 6 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - the network layer uses the source address selection algorithm specified in the next section. @@ -399,11 +401,9 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Similarly, if SB is assigned to the interface that will be used to send to D and SA is assigned to a different interface, then prefer SB. - - -Draves Standards Track - Expires September 2001 7 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 +Draves Standards Track - Expires December 2001 7 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 Rule 6: Prefer matching label. @@ -411,15 +411,23 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Similarly, if Label(SB) = Label(D) and Label(SA) <> Label(D), then choose SB. - Rule 7: Prefer temporary addresses. - If SA is a temporary address and SB is a public address, then prefer - SA. Similarly, if SB is a temporary address and SA is a public + Rule 7: Prefer public addresses. + If SA is a public address and SB is a temporary address, then prefer + SA. Similarly, if SB is a public address and SA is a temporary address, then prefer SB. An implementation may support a per-connection configuration mechanism (for example, a socket option) to reverse the sense of - this preference and prefer public addresses over temporary + this preference and prefer temporary addresses over public addresses. + This rule avoids applications potentially failing due to the + relatively short lifetime of temporary addresses or due to the + possibility of the reverse lookup of a temporary address either + failing or returning a randomized name. Implementations for which + privacy considerations outweigh these application compatibility + concerns MAY reverse the sense of this rule and by default prefer + temporary addresses over public addresses. + Rule 8: Use longest matching prefix. If CommonPrefixLen(SA, D) > CommonPrefixLen(SB, D), then choose SA. Similarly, if CommonPrefixLen(SB, D) > CommonPrefixLen(SA, D), then @@ -450,6 +458,12 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 the source address selection algorithm. Source address selection for IPv4 addresses is not specified in this document. + + +Draves Standards Track - Expires December 2001 8 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + We say that Source(D) is undefined if there is no source address available for destination D. For IPv6 addresses, this is only the case if CandidateSource(D) is the empty set. @@ -459,11 +473,6 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 result, then the remaining rules are not relevant and should be ignored. Subsequent rules act as tie-breakers for earlier rules. - -Draves Standards Track - Expires September 2001 8 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - Rule 1: Avoid unusable destinations. If there is no route to DB or if Source(DB) is undefined, then sort DA before DB. Similarly, if there is no route to DA or if Source(DA) @@ -508,7 +517,11 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Source(DB)), then sort DA before DB. Similarly, if CommonPrefixLen(DA, Source(DA)) < CommonPrefixLen(DB, Source(DB)), then sort DB before DA. - + +Draves Standards Track - Expires December 2001 9 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Rule 9: Otherwise, leave the order unchanged. Sort DA before DB. @@ -517,11 +530,6 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 implementation somehow knows which destination addresses will result in the "best" communications performance. - -Draves Standards Track - Expires September 2001 9 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - 6. Interactions with Routing This specification of source address selection assumes that routing @@ -550,36 +558,35 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 The destination address selection algorithm needs information about potential source addresses. One possible implementation strategy is - for getipnodebyname() and getaddrinfo() to call down to the IPv6 - network layer with a list of destination addresses, sort the list in - the network layer with full current knowledge of available source - addresses, and return the sorted list to getipnodebyname() or - getaddrinfo(). This is simple and gives the best results but it - introduces the overhead of another system call. One way to reduce - this overhead is to cache the sorted address list in the resolver, - so that subsequent calls for the same name do not need to resort the - list. + for getaddrinfo() to call down to the IPv6 network layer with a list + of destination addresses, sort the list in the network layer with + full current knowledge of available source addresses, and return the + sorted list to getaddrinfo(). This is simple and gives the best + results but it introduces the overhead of another system call. One + way to reduce this overhead is to cache the sorted address list in + the resolver, so that subsequent calls for the same name do not need + to resort the list. Another implementation strategy is to call down to the network layer to retrieve source address information and then sort the list of - addresses directly in the context of getipnodebyname() or - getaddrinfo(). To reduce overhead in this approach, the source - address information can be cached, amortizing the overhead of - retrieving it across multiple calls to getipnodebyname() and - getaddrinfo(). In this approach, the implementation may not have - knowledge of the outgoing interface for each destination, so it MAY - use a looser definition of the candidate set during destination + addresses directly in the context of getaddrinfo(). To reduce + overhead in this approach, the source address information can be + cached, amortizing the overhead of retrieving it across multiple + calls to getaddrinfo(). In this approach, the implementation may not + have knowledge of the outgoing interface for each destination, so it + + +Draves Standards Track - Expires December 2001 10 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + + MAY use a looser definition of the candidate set during destination address ordering. In any case, if the implementation uses cached and possibly stale information in its implementation of destination address selection, or if the ordering of a cached list of destination addresses is possibly stale, then it should ensure that the destination address - -Draves Standards Track - Expires September 2001 10 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - ordering returned to the application is no more than one second out of date. For example, an implementation might make a system call to check if any routing table entries or source address assignments @@ -588,7 +595,7 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 underlying state is changed. By caching the current invalidation counter value with derived state and then later comparing against the current value, the implementation can detect if the derived - state is stale. + state is potentially stale. 8. Security Considerations @@ -605,16 +612,14 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 the attack does not specify a particular source address for its reply packets.) By using different addresses for itself, the unfriendly node can cause the target node to expose the target's own - addresses. For example, the unfriendly node might correlate the - target's current IPv6 temporary address with its IPv4 address by - sending requests with a global source address and an IPv4-compatible - source address. + addresses. 9. Examples This section contains a number of examples, first of default behavior and then demonstrating the utility of policy table - configuration. + configuration. These examples are provided for illustrative + purposes; they should not be construed as normative. 9.1. Default Source Address Selection @@ -628,16 +633,15 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Destination: 2001::1 Sources: fe80::1 vs fec0::1 Result: fec0::1 (prefer appropriate scope) - + +Draves Standards Track - Expires December 2001 11 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Destination: fec0::1 Sources: fe80::1 vs 2001::1 Result: 2001::1 (prefer appropriate scope) - -Draves Standards Track - Expires September 2001 11 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - Destination: ff05::1 Sources: fe80::1 vs fec0::1 vs 2001::1 Result: fec0::1 (prefer appropriate scope) @@ -659,12 +663,12 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Result: 3ffe::2 (prefer home address) Destination: 2002:836b:2179::1 - Sources: 2002:836b:2179::2 vs 2001::d5e3:7953:13eb:22e8 (temporary) - Result: 2002:836b:2179::2 (prefer matching label) + Sources: 2002:836b:2179::d5e3:7953:13eb:22e8 (temporary) vs 2001::2 + Result: 2002:836b:2179::d5e3:7953:13eb:22e8 (prefer matching label) - Destination: 2001::1 + Destination: 2001::d5e3:0:0:1 Sources: 2001::2 vs 2001::d5e3:7953:13eb:22e8 (temporary) - Result: 2001::d5e3:7953:13eb:22e8 (prefer temporary address) + Result: 2001::2 (prefer public address) 9.2. Default Destination Address Selection @@ -687,15 +691,16 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Result: 2001::1 (src 2001::2) then 10.1.2.3 (src 10.1.2.4) (prefer higher precedence) + +Draves Standards Track - Expires December 2001 12 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Sources: 2001::2 or fec0::2 or fe80::2 Destinations: 2001::1 vs fec0::1 vs fe80::1 Result: fe80::1 (src fe80::2) then fec0::1 (src fec0::2) then 2001::1 (src 2001::2) (prefer smaller scope) - -Draves Standards Track - Expires September 2001 12 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - + Sources: 2001::2 (care-of address) or 3ffe::1 (home address) or fec0::2 (care-of address) or fe80::2 (care-of address) Destinations: 2001::1 vs fec0::1 @@ -742,18 +747,18 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 Sources: 2001::2 or fe80::1 or 169.254.13.78 Destinations: 2001::1 vs 131.107.65.121 + + + +Draves Standards Track - Expires December 2001 13 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Unchanged Result: 2001::1 (src 2001::2) then 131.107.65.121 (src 169.254.13.78) (prefer matching scope) Sources: fe80::1 or 131.107.65.117 Destinations: 2001::1 vs 131.107.65.121 - - - -Draves Standards Track - Expires September 2001 13 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - Unchanged Result: 131.107.65.121 (src 131.107.65.117) then 2001::1 (src fe80::1) (prefer matching scope) @@ -801,17 +806,17 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 contracted for service with a special high-performance ISP. This is in addition to the normal Internet connection that both sites have with different ISPs. The high-performance ISP is expensive and the + + +Draves Standards Track - Expires December 2001 14 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + two sites wish to use it only for their business-critical traffic with each other. Each site has two global prefixes, one from the high-performance ISP and one from their normal ISP. Site A has prefix 2001:aaaa:aaaa::/48 - - -Draves Standards Track - Expires September 2001 14 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - from the high-performance ISP and prefix 2007:0:aaaa::/48 from its normal ISP. Site B has prefix 2001:bbbb:bbbb::/48 from the high- performance ISP and prefix 2007:0:bbbb::/48 from its normal ISP. All @@ -853,6 +858,18 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 desired behavior via policy table configuration. For example, they can use the following policy table: + + + + + + + + +Draves Standards Track - Expires December 2001 15 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Prefix Precedence Label ::1 50 0 2001:aaaa:aaaa::/48 45 5 @@ -864,12 +881,6 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 This policy table produces the following behavior: - - -Draves Standards Track - Expires September 2001 15 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - Sources: 2001:aaaa:aaaa::a or 2007:0:aaaa::a or fe80::a Destinations: 2001:bbbb:bbbb::b vs 2007:0:bbbb::b New Result: 2001:bbbb:bbbb::b (src 2001:aaaa:aaaa::a) then @@ -900,48 +911,47 @@ References uration", RFC 2462 , December 1998. 4 T. Narten, R. Draves, "Privacy Extensions for Stateless Address - Autoconfiguration in IPv6", draft-ietf-ipngwg-addrconf-privacy- - 01.txt, July 2000. + Autoconfiguration in IPv6", RFC 3041, January 2001. 5 D. Johnson, C. Perkins, "Mobility Support in IPv6", draft-ietf- - mobileip-ipv6-12.txt, April 2000. + mobileip-ipv6-13.txt, November 2000. - 6 S. Cheshire. "Dynamic Configuration of IPv4 Link-local - Addresses", draft-ietf-zeroconf-ipv4-linklocal-01.txt, November - 2000. + 6 S. Cheshire, B. Aboba, "Dynamic Configuration of IPv4 Link-local + Addresses", draft-ietf-zeroconf-ipv4-linklocal-02.txt, March + 2001. 7 S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. - + + +Draves Standards Track - Expires December 2001 16 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + + 8 R. Gilligan, S. Thomson, J. Bound, W. Stevens, "Basic Socket Interface Extensions for IPv6", RFC 2553, March 1999. - 9 S. Deering, B. Haberman, B. Zill. "IP Version 6 Scoped Address - Architecture", draft-ietf-ipngwg-scoping-arch-01.txt, March 2000. + 9 S. Deering et. al, "IP Version 6 Scoped Address Architecture", + draft-ietf-ipngwg-scoping-arch-02.txt, March 2001. 10 Y. Rekhter et. al, "Address Allocation for Private Internets", RFC 1918, February 1996. - - -Draves Standards Track - Expires September 2001 16 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - - - 11 F. Baker, Editor. "Requirements for IP Version 4 Routers", RFC + 11 F. Baker, Editor, "Requirements for IP Version 4 Routers", RFC 1812, June 1995. - 12 B. Carpenter, K. Moore. "Connection of IPv6 Domains via IPv4 - Clouds", draft-ietf-ngtrans-6to4-07.txt, September 2000. + 12 B. Carpenter, K. Moore, "Connection of IPv6 Domains via IPv4 + Clouds", RFC 3056, February 2001. Acknowledgments The author would like to acknowledge the contributions of the IPng - Working Group, particularly Steve Deering, Jun-ichiro itojun Hagino, - M.T. Hollinger, Ken Powell, Markku Savela, Dave Thaler, and Mauro - Tortonesi. Please let the author know if you contributed to the - development of this draft and are not mentioned here. + Working Group, particularly Marc Blanchet, Brian Carpenter, Matt + Crawford, Steve Deering, Jun-ichiro itojun Hagino, Tony Hain, M.T. + Hollinger, Erik Nordmark, Ken Powell, Markku Savela, Dave Thaler, + and Mauro Tortonesi. Please let the author know if you contributed + to the development of this draft and are not mentioned here. Author's Address @@ -954,12 +964,28 @@ Author's Address Revision History +Changes from draft-ietf-ipngwg-default-addr-select-03 + + Reversed the treatment of temporary addresses, so that unless an + application specifies otherwise public addresses are preferred over + temporary addresses. + + Added text clarifying our expectation that applications should + iterate through the list of possible destination addresses until + finding a working address. + + Removed references to getipnodebyname(). + Changes from draft-ietf-ipngwg-default-addr-select-02 Changed scope treatment of IPv4-compatible and 6to4 addresses, so they are always considered to be global. Removed mention of IPX addresses. - + +Draves Standards Track - Expires December 2001 17 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Changed home address rules to favor addresses that are simultaneously home and care-of addresses, over addresses that are just home addresses or just care-of addresses. @@ -979,13 +1005,6 @@ Changes from draft-ietf-ipngwg-default-addr-select-01 of source addresses and the source address selection rule that prefers source addresses of appropriate scope. - - - -Draves Standards Track - Expires September 2001 17 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - Simplified the default policy table. Reordered the source address selection rules to reduce the influence of policy labels. Added more destination address selection rules. @@ -1020,7 +1039,11 @@ Changes from draft-ietf-ipngwg-default-addr-select-00 Added a rule to source address selection to handle anonymous/public addresses. - + +Draves Standards Track - Expires December 2001 18 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 + + Added a rule to source address selection to handle home/care-of addresses. @@ -1039,11 +1062,7 @@ Changes from draft-draves-ipngwg-simple-srcaddr-01 Added mechanism to allow the specification of administrative policy that can override the default behavior. - -Draves Standards Track - Expires September 2001 18 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 - - + Added section on routing interactions and TBD section on mobility interactions. @@ -1077,29 +1096,10 @@ Changes from draft-draves-ipngwg-simple-srcaddr-00 - - - - - - - - - - - - - - - - - - - -Draves Standards Track - Expires September 2001 19 -draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 +Draves Standards Track - Expires December 2001 19 +draft-ietf-ipngwg-default-addr-select-04 May 14, 2001 Full Copyright Statement @@ -1156,4 +1156,4 @@ draft-ietf-ipngwg-default-addr-select-03 March 3, 2001 -Draves Standards Track - Expires September 2001 20 \ No newline at end of file +Draves Standards Track - Expires December 2001 20 \ No newline at end of file diff --git a/doc/draft/draft-klensin-dns-role-00.txt b/doc/draft/draft-klensin-dns-role-01.txt similarity index 51% rename from doc/draft/draft-klensin-dns-role-00.txt rename to doc/draft/draft-klensin-dns-role-01.txt index d708ec5fc0..e0b59a75e4 100644 --- a/doc/draft/draft-klensin-dns-role-00.txt +++ b/doc/draft/draft-klensin-dns-role-01.txt @@ -1,571 +1,867 @@ -INTERNET-DRAFT John C. Klensin -November 10, 2000 -Expires May 2001 - - - Role of the Domain Name System - draft-klensin-dns-role-00.txt - -Status of this Memo - -This document is an Internet-Draft and is in full conformance with -all provisions of Section 10 of RFC2026. - -Internet-Drafts are working documents of the Internet Engineering -Task Force (IETF), its areas, and its working groups. Note that -other groups may also distribute working documents as Internet-Drafts. - -Internet-Drafts are draft documents valid for a maximum of six months -and may be updated, replaced, or obsoleted by other documents at any -time. It is inappropriate to use Internet-Drafts as reference -material or to cite them other than as "work in progress." - -The list of current Internet-Drafts can be accessed at -http://www.ietf.org/ietf/1id-abstracts.txt - -The list of Internet-Draft Shadow Directories can be accessed at -http://www.ietf.org/shadow.html. - -This document represents a summary of the personal opinions of the -author on the subject covered and is not intended to evolve into a -standard of any kind. - -Copyright Notice - -Copyright (C) The Internet Society (2000). All Rights Reserved. - - - -0. Abstract - -The original function and purpose of the DNS is reviewed, and -contrasted with some of the functions into which it is being forced -today and some of the newer demands being placed upon it or suggested -for it. A framework for an alternative to placing these additional -stresses on the DNS is then outlined. This document and that -framework are not a proposed solution, only a strong suggestion that -the time has come to begin thinking more broadly about the problems -we are encountering and possible approaches to solving them. - - -1. History - -Several of the comments that follow are somewhat revisionist. Good -design and engineering often requires a level of intuition by the -designers about things that will be necessary in the future; the -reasons for some of these design decisions are not made explicit at -the time because no one is able to articulate them. The discussion -below reconstructs some of the decisions about the Internet's primary -namespace in the light of subsequent development and experience. In -addition, the historical reasons for particular decisions about the -Internet were often severely underdocumented contemporaneously and, -not surprisingly, different participants have different recollections -about what happened and what was considered important. Consequently, -the quasi-historical story below is just one story. There may be -(indeed, almost certainly are) other stories about how we got to -where we are today, but they probably don't, of themselves, -invalidate the inferences and conclusions. - -1.1 Context for DNS development - -During the entire life of the ARPANET and nearly the first decade or -so of operation of the Internet, the list of host names and their -mapping to and from addresses was maintained in a frequently-updated -"host table" [RFC625, 811, 952]. This table was just a list in an -agreed-upon format; sites were expected to frequently obtain copies -of, and install, new versions. The host tables themselves were -introduced to - - * Eliminate the requirement for people to remember host numbers - (addresses). Despite apparent experience to the contrary in the - conventional telephone system, numeric numbering systems, including - the numeric host number strategy, did not (and do not) work well for - more than a (large) handful of hosts. - - * Provide stability when addresses changed. Since addresses --to - some degree in the ARPANET and more importantly in the contemporary - Internet-- are a function of network topology and routing, they - often had to be changed when connectivity or topology changed. The - names could be kept stable even as addresses changed. - - * Some hosts (so-called "multihomed" ones) needed multiple - addresses to reflect different types of connectivity and topology. - Again, the names were very useful for avoiding the requirement that - would otherwise exist for users and other hosts to track these - multiple host numbers and addresses. -Toward the end of that long (in network time) period, the community -concluded that the host table model did not scale adequately and that -it would not adequately support new service variations. A working -group was created, and the DNS was the result of that effort. The -role of the DNS was to preserve the capabilities of the host table -arrangements (especially unique, unambiguous, host names), provide -for addition of additional services (e.g., the special record types -for electronic mail routing which rather quickly followed -introduction of the DNS), and to do so on the base of a robust, -hierarchical, distributed, name lookup system. That system also -permitted distribution of name administration, rather than requiring -that each host be entered into a single, central, table by a central -administration. - -1.2 Review of the DNS - -The DNS was designed primarily to identify network resources. -Although there was speculation about including, e.g., personal names -and email addresses, it was not designed primarily to identify -people, brands, etc. At the same time, the system was designed with -the flexibility to accomodate new data types and structures through -the addition of new record types to the initial "INternet" class. -Since the appropriate identifiers and content of those future -extensions could not be anticipated, the design provided that these -fields could contain any (binary) information, not just the -restricted text forms of the host table. - -However, the DNS as-used is intimately tied to the applications and -application protocols that utilize it, often at a fairly low level. - -In particular, despite the ability of the protocols and data -structures themselves to accomodate any binary representation, DNS -names as used are historically not [even] ASCII, but a very -restricted subset of it, a subset that derives primarily from the -original host table naming rules. Selection of that subset was -driven in part by human factors considerations, including a desire to -eliminate possible ambiguities in an international context. Hence -character codes that had international variations in interpretation -were excluded, the underscore character and case distinctions were -eliminated as being confusing (in the underscore's case, with the -hyphen character) when written or read by people, and so on. These -considerations appear to be very similar to those that resulted in -similarly restricted character sets being used as protocol elements -in many ITU and ISO protocols (cf. X.9, X.29). - -Another assumption was that there would be a high ratio of physical -hosts to second level domains and, more generally, that the system -would be deeply hierarchical, with most systems (and names) at the -third level or below and a large ratio of names representing physical -hosts to total names. There are domains that follow this model: many -university and corporate domains use fairly deep hierarchies, as do a -few country code TLDs (".US" is an excellent example). However, the -RIPE hostcount list is now showing a count of SOA records that is -approaching (and may have passed) the number of distinct hosts. -While recent experience has shown that the DNS is robust enough ---given contemporary machines as servers and current bandwidth -norms-- to be able to continue to operate reasonably well when those -historical assumptions are not met (e.g., with a huge, flat, -structure under ".COM"), it is still useful to remember that the -system could have been designed to work optimally with a flat -structure (and very large zones) rather than a deeply hierarchical -one, and was not. - -Similarly, despite some early speculation about entering people's -names and email addresses into the DNS directly, with the sole -exception (at least in the "IN" class) of one field of the SOA -record, electronic mail addresses in the Internet have preserved the -original, pre-DNS, "user at location" conceptual format rather than a -flatter one. Location, in that instance, is a reference to a host. - -Both the DNS architecture itself and the two-level provisions for -email and similar functions (e.g., see the finger protocol), also -anticipated a relatively high ratio of users to actual hosts. It was -never clear that the DNS was intended to, or could, scale to the -order of magnitude of number of users (or, more recently, products or -document objects), rather than that of physical hosts. - -Like the host table before it, the DNS has provided criticial -uniqueness for names and universal accessibility to them as part of -overall "single internet" and "end to end" models (cf [RFC2826]). -However, there are many signs that, as new uses evolve and original -assmumptions are abused, the system is being stretched to, or beyond, -its practical limits. - -1.3 The web and user-visible domain names - ->From the standpoint of the integrity of the domain name system --and -scaling of the Internet, including optimal accessibility to content-- -the design decision to use "A record" domain names, rather than some -system of indirection, has proven to be a serious mistake in several -respects. Convenience of typing, and the desire to make domain names -out of easily-remembered product names, has led to a flattening of -the DNS, with many people now perceiving that second-level names -under COM (or in some countries, second- or third-level names under -the relevant ccTLD) are all that is meaningful (this perception has -been reinforced by some domain name registrars who have been anxious -to "sell" additional names). And, of course, the perception that one -needs a top-level domain per product, rather than a (usually -organizational) collection of network resources has led to a rapid -acceleration in the number of names being registered, a phenonenum -that has clearly benefited registrars charging on a per-name basis, -"cybersquatters", and others in the business of "selling" names, but -has not obviously benefitted the Internet as a whole. - -The emphasis on second-level domain names has also created a problem -for the trademark community. Since the Internet is international, -and names are being populated in a flat and unqualified space, -similarly-named entities are in conflict even if there would -ordinarily be no chance of confusing them in the marketplace. The -problem appears to be unsolvable except by a choice between draconian -measures --possibly including significant changes to the underlying -legislation and conventions-- and a situation in which the "rights" -to a name are typically not settled using the subtle and traditional -product (or industry) type and geopolitical scope rules of the -trademark system but by depending largely on main force, e.g., the -organization with the greatest resources to invest in defending (or -attacking) names will ultimately win out. The latter raises not only -important issues of equity, but the risk of backlash as the numerous -small players are forced to relinquish names they find attractive and -to adopt less-desirable naming conventions. - -Independent of these sociopolitical problems, content distribution -issues have made it clear that it should be possible for an -organization to have copies of data it wishes to make available -distributed around the network, with a user who asks for the -information by name getting the topologically-closest copy. This is -not possible with simple, as-designed, use of the DNS: DNS names -identify target resources or, in the case of email "MX" records, a -preferentially-ordered list of resources "closest" to a target (not -the source/user). Several technologies (and, in some cases, -corresponding business models) have arisen to work around these -problems, including intercepting and altering DNS requests so as to -point to other locations, -While additional implications are still being discovered and -seriously evaluated, it appears, not surprisingly, that rewriting DNS -names in the middle of the network, or trying to give them different -values or interpretations depending on the topological location of -the user trying to resolve the name interferes with end-to-end -applications in the general case. These problems occur even if the -rewriting machinery is accompanied by additional workarounds for -particular applications: security associations and applications that -need to identify "the same host" as the applications for which these -tools have been designed often run into one problem or another. - - -1.4 A pessimistic history of the evolution of Internet applications -protocols. - -At the applications level, few of the protocols in active, widespread -use on the Internet reflect the either contemporary knowledge in -computer science or human factors or experience accumulated through -deployment and use. Instead, protocols tend to be deployed at a -just-past-prototype level, typically including the types of expedient -compromises typical with prototypes. If they prove useful, the -nature of the network permit very rapid dissemination (i.e., they -fill a vacuum, even if one that no one previously knew existed). -But, once the vacuum is filled, the installed base provides its own -inertia: unless the design is so seriously faulty as to prevent -effective use (or there is a widely-perceived sense of impending -disaster unless the protocol is replaced), future developments must -maintain backward compatibility and workarounds for problematic -characteristics rather than benefiting from redesign in the light of -experience. Applications that are "almost good enough" prevent -development and deployment of high-quality replacements. - -2. Signs of DNS overloading - -Parts of the historical discussion above identify areas in which it -is becoming clear that the DNS is becoming overloaded (semantically -if not in the mechanical ability to resolve names). While we seem to -still be well within the "just about good enough" range -- current -mechanisms and proposals to deal with these problems are all focused -on patching or working around limitations within the DNS rather than -dramatic rethinking -- the number of these issues that are arising -at the same time may argue for rethinging mechanisms and -relationships, not just more patches and kludges. For example: - -o While technical approaches such as larger and higher-powered servers -and more bandwidth, and legal/political mechanisms such as dispute -resolution policies have arguably kept the problems from becoming -critical, the DNS has not proven adequately responsive to business -and individual needs to describe or identify things (such as product -names and names of individuals) other than strict network resources. - -o While stacks have been modified to better handle multiple addresses -on a physical interface and some protocols have been extended to -include DNS names for determining context, the DNS doesn't deal -especially well with high-multiple names per host (needed for web -hosting facilities with multiple domains on a server). - -o Efforts to add names deriving from languages or character sets -based on other than simple ASCII and English-like names (see below), -or even to utilize complex company or product names without the use -of hierarchy have created apparent requirements names (labels) that -are over 63 octets long. This requirement will undoubtedly increase -over time; while there are workarounds to accomodate longer names, -they impose their own restrictions and cause their own problems. - -o Increasing commercialization of the Internet, and visibility of -domain names that are assumed to match names of companies or -products, has turned the DNS and DNS names into a trademark -battleground. The traditional trademark system in (at least) most -countries makes careful distinctions about fields of applicability. -When the space is flattened, without differentiators by either -geography or industry sector, not only are there likely conflicts -between "Joe's Pizza" (of Boston) and "Joe's Pizza" (of San Francisco) -but between both and "Joe's Auto Repair" (of Los Angeles): all three -would like to control "Joes.com" and may claim trademark rights to do -so, even though conflict or confusion would not occcur with -traditional trademarks. - -o Many organizations wish to have different web sites under the same -URL and domain name. Sometimes this is to create local variations ---the Widget Company might want to present different material to a UK -user relative to a US one-- and sometimes it is to provide higher -performance by supplying information from the server topologically -closest to the user. Arguably, the name resolution mechanism should -provide information about multiple sites that can provide information -associated with the same name and sufficient attributes associated -with each of those sites to permit applications to make sensible -choices, or should accept client-site attributes and utilize them in -the search process. -o Many existing and proposed systems for "finding things on the -Internet" require a true search capability in which near matches can -be reported to the user and queries may be slightly ambiguous or -fuzzy. The DNS can accomodate only one set of (quite rigid) matching -rules. Current proposals to permit different rules in different -localities help to identify the problem, but, if applied directly to -the DNS either don't provide the level of flexibility that would be -desirable or tend to isolate different parts of the Internet from -each other (or both). Fuzzy or ambiguous searches are desirable for -(at least) resolution of business names that might have spelling -variations and for names that can be resolved into different sets of -glyphs depending on context. This goes beyond "mere" -canonicalization differences (different ways of representing the same -character) and into such relationships as the use of different -alphabets for the same language, Kanji-Hiragana relationships, etc. - -o The historical DNS and applications that make assumptions about how -it works impose significant risk (or forces technical kludges and -consequent odd restrictions), when one considers adding mechanisms -for use with various multi-character-set and multilingual -"internationalization" systems. Cf RFC 2825. - -o In order to provide proper functionality to the Internet, the DNS -must have a single unique root (see RFC 2826 for a discussion of this -issue). There are many desires for local treatment of names or -character sets that cannot be accomodated without either multiple -roots (e.g., a separate root for multilingual names) or mechanisms -that would have similar effects in terms of Internet fragmentation -and isolation. - -In each of these cases, it is, or might be, possible to devise ways -to trick the DNS system into supporting mechanisms that were not -designed into it. Several ingenious solutions have been proposed in -many of these areas already, and some have been successfully deployed -into the marketplace. - -Several of the above problems are addressed well by a good directory -system (supported by the LDAP protocol or otherwise) or searching -environment (such as common web search engines) although not by the -DNS. Given the difficulty of deploying new applications discussed -above, an important question is whether the kludges are bad enough, -or will scale up to bad enough, that new solutions are needed and can -be deployed. - - -3. The directory story. - -3.1 Overview -The constraints of the DNS argue for introducing an intermediate -protocol mechanism, referred to here as a "directory layer". -Directory layer proposals would use a two-stage lookup, not unlike -several of the IDN proposals, but would do the first lookup in a -directory system, rather than in the DNS itself. This would permit -us to relax several constraints and produce a more comprehensive -system. - -Ultimately, many of the issues with domain names arise as the result -of people attempting to use the DNS as a directory. While there -hasn't been enough pressure/demand to justify a change to date, it -has already been quite clear that, as a directory system, the DNS is -a good deal less than ideal. This document suggests that there -actually is a requirement for a directory system, and that the right -solution to a directory requirement is a directory, not a series of -DNS patches, kludges, or workarounds. - -In particular... - -* A directory system would not require imposition of particular -length limits on names. - -* A directory system could permit explicit association of attributes -of, e.g., language and country, with a name, without having to -utilize trick encodings to incorporate that information in DNS labels -(or creating artificial hierarchy for doing so). - -* There is considerable experience in doing fuzzy and "sonex" -(similar-sounding) matching in directory systems. Moreover, it is -plausible to think about different matching rules for different areas -and sets of names so that these can be adapted to local cultural -requirements. Specifically, it might be possible to have a single -form of a name in a directory, but to have great flexibility about -what queries matched that name (and even have different variations in -different areas). Of course, the more flexibility one provides, the -greater the possibility of real or imagined trademark conflicts, but -we would have the opportunity to design a directory structure that -dealt with those issues in an intelligent way, while DNS constraints -arguably make a general and equitable DNS-only solution impossible. - -* If a directory system is used to translate to DNS names, and then -DNS names are looked up in the normal fashion, it may be possible to -relax several of the constraints that have been traditional (and -perhaps necessary) with the DNS. For example, reverse-mapping of -addresses to directory names may not be a requirement, since the DNS -name(s) would (continue to) uniquely identify the host. - -* Solutions to multilingual transcription problems that are common in -"normal life" (e.g., two-sided business cards to be sure that a -recipient trying to contact a person can access romanized spellings -and numbers when the original language may not be comprehensible to -that recipient) can be easily handled in a directory system by -inserting both sets of entries. - -* One can easily imagine a directory system that would return, not a -single name, but a set of names paired with network-locational -information or other context-establishing attributes. This type of -information might be of considerable use in resolving the "nearest -(or best) server for a particular named resource" problems that are a -significant concern for organizations hosting web and other sites -that are accessed from a wide range of locations and subnets. - -* Names bound to countries and languages might help to manage -trademark realities, while use of the DNS in trademark-significant -areas tends to require worldwide "flattening" of the trademark -system. -3.2 Some details and comments. - -As several proposals have noted, almost any i18n proposal for names -that are in, or map into, the DNS will require changing DNS resolver -API calls ("gethostbyname" or equivalent or adding some -pre-resolution preparation mechanism) in almost all Internet -applications -- whether to cause the API to take a different -character set, to accept or return more arguments with qualifying or -identifying information, or otherwise. Once applications must be -opened to make such changes, it is a relatively small matter to -switch from calling into the DNS to calling a directory service and -then the DNS (in many situations, both actions could be accomplished -in a single API call). - -A directory approach can be consistent both with "flat" stories and -multi-attribute ones. The DNS requires strict hierarchies, limiting -its ability to handle differentiation among names by their -properties. By contrast, modern directories can utilize -independently-searched attributes and other structured schema to -provide flexibilities not present in a strictly hierarchical system. - -There is a strong argument for a single directory structure (implying -a need for mechanisms for registration, delegation, etc.). But it is -not a strict requirement, especially if in-depth case analysis and -design work leads to the conclusion that reverse-mapping to directory -names is not a requirement (see section 4). - -While the discussion above includes very general comments about -attributes, it appears that only a very small number of attributes -would be needed. The list would almost certainly include country and -language for IDN purposes and might require "charset" if we cannot -agree on a character set and encoding. Trademark issues might -motivate "commercial" and "non-commercial" (or other) attributes if -they would be helpful in bypassing trademark problems. And applications to -resource location might argue for a few other attributes (as outlined -above). - -4. The Controversies - -4.1. One directory or many - -As suggested in some of the text above, it is an open question as to -whether the needs of the community would be best served by a single -directory with universal applicability, a single directory but -locally-tailored search (and, most important, matching) functions, or -multiple, locally-determined, directories. Each has its attractions. -Any but the first would essentially prevent reverse-mapping -(determination of the user-visible name of the host or resource from -target information such as an address or DNS name). But reverse -mapping has become less useful over the years --at least to users-- -as we have assigned more and more names per host address. -Locally-tailored search and mappings would permit national variations -on interpretation of which strings matched which other ones, an -arrangement that is especially important when different localities -apply different rules to, e.g., matching of characters with and -without diacriticals. But, of course, this implies that a URL may -evaluate properly or not depending on either settings on a client -machine or the network connectivity of the user, which is not, in -general, a desirable situation. - -And, of course, completely separate directories would permit -translation and transliteration functions to be embedded in the -directory, given much of the Internet a different appearance -depending on which directory was chosen. The attractions of this are -obvious, but, unless things were very carefully designed to preserve -uniqueness and precise identities at the right points (which may or -may not be possible), such a system would have many of the -difficulties associated with multiple roots. - -4.2 Why not a proposal? - -As this document has gone through various preliminary drafts and -reviews, the question has been raised as to whether it should contain -a specific proposal: a specific directory mechanism, schema, and so -on. It deliberately does not take that step. It has been difficult -to get directory systems deployed in significant ways in the Internet -infrastructure, partially because we have a surplus of options. -There are also some approaches that could be used to implement the -general concepts described here, such as the Common Name Resolution -Protocol [RFC2972], which some would not consider directory protocols -at all. Consequently, it appeared better to present the general -concepts and arguments here and leave the specifics to other sources, -documents, and proposals. - -5. Security Considerations - -The set of proposals implied by this document suggests an interesting -set of security issues (i.e., nothing important is ever easy). A -directory system used for this purpose would presumably need to be as -carefully protected against unauthorized changes as the DNS itself. -There also might be new opportunities for problems in the two-layer -arrangement; but those problems are not more severe than a two-stage -lookup in the DNS. - - -6. References - -RFC 625 On-line hostnames service. M.D. Kudlick, E.J. Feinler. -Mar-07-1974. - -RFC 811 Hostnames Server. K. Harrenstien, V. White, E.J. Feinler. -Mar-01-1982. - -RFC 952 DoD Internet host table specification. K. Harrenstien, M.K. -Stahl, E.J. Feinler. Oct-01-1985. - -RFC 882 Domain names: Concepts and facilities. P.V. Mockapetris. -Nov-01-1983. - -RFC 883 Domain names: Implementation specification. P.V. Mockapetris. -Nov-01-1983. - -RFC 1035 Domain names - implementation and specification. P.V. -Mockapetris. Nov-01-1987. - -RFC 1591 Domain Name System Structure and Delegation. J. Postel. -March 1994. - -RFC 2825 A Tangled Web: Issues of I18N, Domain Names, and the Other -Internet protocols. IAB, L. Daigle, ed.. May 2000. - -RFC 2826 IAB Technical Comment on the Unique DNS Root. IAB. May 2000. - -RFC 2972 Context and Goals for Common Name Resolution. N. Popp, M. -Mealling, L. Masinter, K. Sollins. October 2000. - -ITU Recommendation X.9 - -ITU Recommendation X.25 - - -7. Culprit address - -John Klensin -AT&T Labs -99 Bedford Street -Boston, MA 02111 -klensin@research.att.com - -Expires May 2001 +INTERNET-DRAFT John C. Klensin +May 28, 2001 +Expires November 2001 + + + Role of the Domain Name System + draft-klensin-dns-role-01.txt + +Status of this Memo + +This document is an Internet-Draft and is in full conformance with +all provisions of Section 10 of RFC2026. + +Internet-Drafts are working documents of the Internet Engineering +Task Force (IETF), its areas, and its working groups. Note that +other groups may also distribute working documents as Internet-Drafts. + +Internet-Drafts are draft documents valid for a maximum of six months +and may be updated, replaced, or obsoleted by other documents at any +time. It is inappropriate to use Internet-Drafts as reference +material or to cite them other than as "work in progress." + +The list of current Internet-Drafts can be accessed at +http://www.ietf.org/ietf/1id-abstracts.txt + +The list of Internet-Draft Shadow Directories can be accessed at +http://www.ietf.org/shadow.html. + +This document represents a summary of the personal opinions of the +author on the subject covered and is not intended to evolve into a +standard of any kind. + +Copyright Notice + +Copyright (C) The Internet Society (2000). All Rights Reserved. + + + +0. Abstract + +The original function and purpose of the DNS is reviewed, and +contrasted with some of the functions into which it is being forced +today and some of the newer demands being placed upon it or suggested +for it. A framework for an alternative to placing these additional +stresses on the DNS is then outlined. This document and that +framework are not a proposed solution, only a strong suggestion that +the time has come to begin thinking more broadly about the problems +we are encountering and possible approaches to solving them. + +A mailing list has been initiated for discussion of this draft, +its successors, and closely-related issues at +ietf-i18n-dns-directory@imc.org. See +http://www.imc.org/ietf-i18n-dns-directory/ for subscription +and archival information. + + +1. History + +Several of the comments that follow are somewhat revisionist. Good +design and engineering often requires a level of intuition by the +designers about things that will be necessary in the future; the +reasons for some of these design decisions are not made explicit at +the time because no one is able to articulate them. The discussion +below reconstructs some of the decisions about the Internet's primary +namespace (the "Class=IN" DNS) in the light of subsequent development +and experience. In addition, the historical reasons for particular +decisions about the Internet were often severely underdocumented +contemporaneously and, not surprisingly, different participants have +different recollections about what happened and what was considered +important. Consequently, the quasi-historical story below is just +one story. There may be (indeed, almost certainly are) other stories +about how we got to where we are today, but they probably don't, of +themselves, invalidate the inferences and conclusions. + +1.1 Context for DNS development + +During the entire life of the ARPANET and nearly the first decade or +so of operation of the Internet, the list of host names and their +mapping to and from addresses was maintained in a frequently-updated +"host table" [RFC625, 811, 952]. This table was just a list in an +agreed-upon format; sites were expected to frequently obtain copies +of, and install, new versions. The host tables themselves were +introduced to + + * Eliminate the requirement for people to remember host numbers + (addresses). Despite apparent experience to the contrary in the + conventional telephone system, numeric numbering systems, including + the numeric host number strategy, did not (and do not) work well for + more than a (large) handful of hosts. + + * Provide stability when addresses changed. Since addresses --to + some degree in the ARPANET and more importantly in the contemporary + Internet-- are a function of network topology and routing, they + often had to be changed when connectivity or topology changed. The + names could be kept stable even as addresses changed. + + * Some hosts (so-called "multihomed" ones) needed multiple + addresses to reflect different types of connectivity and topology. + Again, the names were very useful for avoiding the requirement that + would otherwise exist for users and other hosts to track these + multiple host numbers and addresses. + +Toward the end of that long (in network time) period, the community +concluded that the host table model did not scale adequately and that +it would not adequately support new service variations. A working +group was created, and the DNS was the result of that effort. The +role of the DNS was to preserve the capabilities of the host table +arrangements (especially unique, unambiguous, host names), provide +for addition of additional services (e.g., the special record types +for electronic mail routing which rather quickly followed +introduction of the DNS), and to do so on the base of a robust, +hierarchical, distributed, name lookup system. That system also +permitted distribution of name administration, rather than requiring +that each host be entered into a single, central, table by a central +administration. + +1.2 Review of the DNS + +The DNS was designed primarily to identify network resources. +Although there was speculation about including, e.g., personal names +and email addresses, it was not designed primarily to identify +people, brands, etc. At the same time, the system was designed with +the flexibility to accomodate new data types and structures through +the addition of new record types to the initial "INternet" class. +Since the appropriate identifiers and content of those future +extensions could not be anticipated, the design provided that these +fields could contain any (binary) information, not just the +restricted text forms of the host table. + +However, the DNS as-used is intimately tied to the applications and +application protocols that utilize it, often at a fairly low level. + +In particular, despite the ability of the protocols and data +structures themselves to accomodate any binary representation, DNS +names as used are historically not [even] ASCII, but a very +restricted subset of it, a subset that derives primarily from the +original host table naming rules. Selection of that subset was +driven in part by human factors considerations, including a desire to +eliminate possible ambiguities in an international context. Hence +character codes that had international variations in interpretation +were excluded, the underscore character and case distinctions were +eliminated as being confusing (in the underscore's case, with the +hyphen character) when written or read by people, and so on. These +considerations appear to be very similar to those that resulted in +similarly restricted character sets being used as protocol elements +in many ITU and ISO protocols (cf. X.9, X.29). + +Another assumption was that there would be a high ratio of physical +hosts to second level domains and, more generally, that the system +would be deeply hierarchical, with most systems (and names) at the +third level or below and a large ratio of names representing physical +hosts to total names. There are domains that follow this model: many +university and corporate domains use fairly deep hierarchies, as do a +few country code TLDs (".US" is an excellent example). However, the +RIPE hostcount list is now showing a count of SOA records that is +approaching (and may have passed) the number of distinct hosts. +While recent experience has shown that the DNS is robust enough +--given contemporary machines as servers and current bandwidth +norms-- to be able to continue to operate reasonably well when those +historical assumptions are not met (e.g., with a huge, flat, +structure under ".COM"), it is still useful to remember that the +system could have been designed to work optimally with a flat +structure (and very large zones) rather than a deeply hierarchical +one, and was not. + +Similarly, despite some early speculation about entering people's +names and email addresses into the DNS directly, with the sole +exception (at least in the "IN" class) of one field of the SOA +record, electronic mail addresses in the Internet have preserved the +original, pre-DNS, "user at location" conceptual format rather than a +flatter or strictly faceted one. Location, in that instance, is a +reference to a host. + +Both the DNS architecture itself and the two-level provisions for +email and similar functions (e.g., see the finger protocol), also +anticipated a relatively high ratio of users to actual hosts. It was +never clear that the DNS was intended to, or could, scale to the +order of magnitude of number of users (or, more recently, products or +document objects), rather than that of physical hosts. + +Like the host table before it, the DNS has provided criticial +uniqueness for names and universal accessibility to them as part of +overall "single internet" and "end to end" models (cf [RFC2826]). +However, there are many signs that, as new uses evolve and original +assmumptions are abused, the system is being stretched to, or beyond, +its practical limits. + +The original design effort that led to the DNS included examination +of the directory technologies available at the time. The working +group concluded that the DNS design, with its simplifying assumptions +and restricted capabilities, would be feasible to deploy and make +adequately robust, which the more comprehensive directory approaches +were not. At the same time, some of the participants feared that the +limitations might cause future problems; this document essentially +takes the position that they were probably correct. On the other +hand, directory technology and implementations have evolved +significantly in the ensuing years: it may be time to revisit the +assumptions, either in the context of the two- (or more) level +mechanism contemplated by the rest of this document or, even more +radically, as a path toward a DNS replacement. + + +1.3 The web and user-visible domain names + +From the standpoint of the integrity of the domain name system --and +scaling of the Internet, including optimal accessibility to content-- +the web design decision to use "A record" domain names, rather than +some system of indirection, has proven to be a serious mistake in +several respects. Convenience of typing, and the desire to make +domain names out of easily-remembered product names, has led to a +flattening of the DNS, with many people now perceiving that +second-level names under COM (or in some countries, second- or +third-level names under the relevant ccTLD) are all that is +meaningful (this perception has been reinforced by some domain name +registrars who have been anxious to "sell" additional names). And, +of course, the perception that one needs a top-level domain per +product, rather than a (usually organizational) collection of network +resources has led to a rapid acceleration in the number of names +being registered, a phenonenum that has clearly benefited registrars +charging on a per-name basis, "cybersquatters", and others in the +business of "selling" names, but has not obviously benefitted the +Internet as a whole. + +The emphasis on second-level domain names has also created a problem +for the trademark community. Since the Internet is international, +and names are being populated in a flat and unqualified space, +similarly-named entities are in conflict even if there would +ordinarily be no chance of confusing them in the marketplace. The +problem appears to be unsolvable except by a choice between draconian +measures --possibly including significant changes to the underlying +legislation and conventions-- and a situation in which the "rights" +to a name are typically not settled using the subtle and traditional +product (or industry) type and geopolitical scope rules of the +trademark system but by depending largely on main force, e.g., the +organization with the greatest resources to invest in defending (or +attacking) names will ultimately win out. The latter raises not only +important issues of equity, but the risk of backlash as the numerous +small players are forced to relinquish names they find attractive and +to adopt less-desirable naming conventions. + +Independent of these sociopolitical problems, content distribution +issues have made it clear that it should be possible for an +organization to have copies of data it wishes to make available +distributed around the network, with a user who asks for the +information by name getting the topologically-closest copy. This is +not possible with simple, as-designed, use of the DNS: DNS names +identify target resources or, in the case of email "MX" records, a +preferentially-ordered list of resources "closest" to a target (not +to the source/user). Several technologies (and, in some cases, +corresponding business models) have arisen to work around these +problems, including intercepting and altering DNS requests so as to +point to other locations, + +While additional implications are still being discovered and +seriously evaluated, it appears, not surprisingly, that rewriting DNS +names in the middle of the network, or trying to give them different +values or interpretations depending on the topological location of +the user trying to resolve the name interferes, in the general case, +with end-to-end applications. These problems occur even if the +rewriting machinery is accompanied by additional workarounds for +particular applications: security associations and applications that +need to identify "the same host" as the applications for which these +tools have been designed often run into one problem or another. + + +1.4 A pessimistic history of the evolution of Internet applications +protocols. + +At the applications level, few of the protocols in active, widespread +use on the Internet reflect the either contemporary knowledge in +computer science or human factors or experience accumulated through +deployment and use. Instead, protocols tend to be deployed at a +just-past-prototype level, typically including the types of expedient +compromises typical with prototypes. If they prove useful, the +nature of the network permit very rapid dissemination (i.e., they +fill a vacuum, even if a vacuum that no one previously knew existed). +But, once the vacuum is filled, the installed base provides its own +inertia: unless the design is so seriously faulty as to prevent +effective use (or there is a widely-perceived sense of impending +disaster unless the protocol is replaced), future developments must +maintain backward compatibility and workarounds for problematic +characteristics rather than benefiting from redesign in the light of +experience. Applications that are "almost good enough" prevent +development and deployment of high-quality replacements. + + +2. Signs of DNS overloading + +Parts of the historical discussion above identify areas in which it +is becoming clear that the DNS is becoming overloaded (semantically +if not in the mechanical ability to resolve names). While we seem to +still be well within the "just about good enough" range -- current +mechanisms and proposals to deal with these problems are all focused +on patching or working around limitations within the DNS rather than +dramatic rethinking -- the number of these issues that are arising +at the same time may argue for rethinging mechanisms and +relationships, not just more patches and kludges. For example: + +o While technical approaches such as larger and higher-powered +servers and more bandwidth, and legal/political mechanisms such as +dispute resolution policies, have arguably kept the problems from +becoming critical, the DNS has not proven adequately responsive to +business and individual needs to describe or identify things (such as +product names and names of individuals) other than strict network +resources. + +o While stacks have been modified to better handle multiple addresses +on a physical interface and some protocols have been extended to +include DNS names for determining context, the DNS doesn't deal +especially well with high-multiple names per host (needed for web +hosting facilities with multiple domains on a server). + +o Efforts to add names deriving from languages or character sets +based on other than simple ASCII and English-like names (see below), +or even to utilize complex company or product names without the use +of hierarchy have created apparent requirements for names (labels) +that are over 63 octets long. This requirement will undoubtedly +increase over time; while there are workarounds to accomodate longer +names, they impose their own restrictions and cause their own +problems. + +o Increasing commercialization of the Internet, and visibility of +domain names that are assumed to match names of companies or +products, has turned the DNS and DNS names into a trademark +battleground. The traditional trademark system in (at least) most +countries makes careful distinctions about fields of applicability. +When the space is flattened, without differentiators by either +geography or industry sector, not only are there likely conflicts +between "Joe's Pizza" (of Boston) and "Joe's Pizza" (of San +Francisco) but between both and "Joe's Auto Repair" (of Los Angeles): +all three would like to control "Joes.com" and may claim trademark +rights to do so, even though conflict or confusion would not occcur +with traditional trademark principles. + +o Many organizations wish to have different web sites under the same +URL and domain name. Sometimes this is to create local variations +--the Widget Company might want to present different material to a UK +user relative to a US one-- and sometimes it is to provide higher +performance by supplying information from the server topologically +closest to the user. Arguably, the name resolution mechanism should +provide information about multiple sites that can provide information +associated with the same name and sufficient attributes associated +with each of those sites to permit applications to make sensible +choices, or should accept client-site attributes and utilize them in +the search process. + +o Many existing and proposed systems for "finding things on the +Internet" require a true search capability in which near matches can +be reported to the user and queries may be slightly ambiguous or +fuzzy. The DNS can accomodate only one set of (quite rigid) matching +rules. Current proposals to permit different rules in different +localities help to identify the problem, but, if applied directly to +the DNS, either don't provide the level of flexibility that would be +desirable or tend to isolate different parts of the Internet from +each other (or both). Fuzzy or ambiguous searches are desirable for +(at least) resolution of business names that might have spelling +variations and for names that can be resolved into different sets of +glyphs depending on context. This goes beyond "mere" +canonicalization differences (different ways of representing the same +character or ordering the same string) and into such relationships as +the use of different alphabets for the same language, Kanji-Hiragana +relationships, Simplified and Traditional Chinese, etc. + +o The historical DNS and applications that make assumptions about how +it works impose significant risk (or forces technical kludges and +consequent odd restrictions), when one considers adding mechanisms +for use with various multi-character-set and multilingual +"internationalization" systems. Cf RFC 2825. + +o In order to provide proper functionality to the Internet, the DNS +must have a single unique root (see RFC 2826 for a discussion of this +issue). There are many desires for local treatment of names or +character sets that cannot be accomodated without either multiple +roots (e.g., a separate root for multilingual names) or mechanisms +that would have similar effects in terms of Internet fragmentation +and isolation. + +o For some purposes, it is desirable to be able to search targets +(i.e., by value, not just by name (label)). One might, for example, +want to locate all of the host (and virtual host) names which cause +mail to be directed to a given server via MX records. The DNS does +not support this capability and it can be simulated only by +extracting all of the relevant records (perhaps by zone transfer if +the source doesn't prohibit that through access lists) and then +searching a file built from those records. + +o Finally, as additional types of personal or identifying information +are added to the DNS, issues of protection of that information and +making different information available based on the credentials and +authorization of the source of the inquiry. As with site locational +and proximity information (as discussed above), the DNS protocols +make the mechanisms needed to do this quite difficult if not +impossible. + +In each of these cases, it is, or might be, possible to devise ways +to trick the DNS system into supporting mechanisms that were not +designed into it. Several ingenious solutions have been proposed in +many of these areas already, and some have been deployed into the +marketplace with some success. + +Several of the above problems are addressed well by a good directory +system (supported by the LDAP protocol or otherwise) or searching +environment (such as common web search engines) although not by the +DNS. Given the difficulty of deploying new applications discussed +above, an important question is whether the kludges are bad enough, +or will scale up to bad enough, that new solutions are needed and can +be deployed. + + + +3. The directory story. + +3.1 Overview + +The constraints of the DNS argue for introducing an intermediate +protocol mechanism, referred to here as a "directory layer". The +terms "directory" and "directory system" are used interchangably with +"searchable system" in this document although the latter is far more +precise. Directory layer proposals would use a two (or more) -stage +lookup, not unlike several of the proposals for internationalized +names in the DNS (see section 4), but all operations but the final +one would involving searching other systems, rather than looking up +identifiers in the DNS itself. This would permit us to relax several +constraints and produce a more comprehensive system. + +Ultimately, many of the issues with domain names arise as the result +of people attempting to use the DNS as a directory. While there has +not been enough pressure/demand to justify a change to date, it has +already been quite clear that, as a directory system, the DNS is a +good deal less than ideal. This document suggests that there +actually is a requirement for a directory system, and that the right +solution to a searchable system requirement is a searchable system, +not a series of DNS patches, kludges, or workarounds. + +In particular... + +o A directory system would not require imposition of particular +length limits on names. + +o A directory system could permit explicit association of attributes +of, e.g., language and country, with a name, without having to +utilize trick encodings to incorporate that information in DNS labels +(or creating artificial hierarchy for doing so). + +o There is considerable experience (albeit not much of it very +successful) in doing fuzzy and "sonex" (similar-sounding) matching in +directory systems. Moreover, it is plausible to think about +different matching rules for different areas and sets of names so +that these can be adapted to local cultural requirements. +Specifically, it might be possible to have a single form of a name in +a directory, but to have great flexibility about what queries matched +that name (and even have different variations in different areas). +Of course, the more flexibility one provides, the greater the +possibility of real or imagined trademark conflicts, but we would +have the opportunity to design a directory structure that dealt with +those issues in an intelligent way, while DNS constraints arguably +make a general and equitable DNS-only solution impossible. + +o If a directory system is used to translate to DNS names, and then +DNS names are looked up in the normal fashion, it may be possible to +relax several of the constraints that have been traditional (and +perhaps necessary) with the DNS. For example, reverse-mapping of +addresses to directory names may not be a requirement, since the DNS +name(s) would (continue to) uniquely identify the host. + +o Solutions to multilingual transcription problems that are common in +"normal life" (e.g., two-sided business cards to be sure that a +recipient trying to contact a person can access romanized spellings +and numbers when the original language may not be comprehensible to +that recipient) can be easily handled in a directory system by +inserting both sets of entries. + +o One can easily imagine a directory system that would return, not a +single name, but a set of names paired with network-locational +information or other context-establishing attributes. This type of +information might be of considerable use in resolving the "nearest +(or best) server for a particular named resource" problems that are a +significant concern for organizations hosting web and other sites +that are accessed from a wide range of locations and subnets. + +o Names bound to countries and languages might help to manage +trademark realities, while use of the DNS in trademark-significant +areas tends to require worldwide "flattening" of the trademark +system. + +3.2 Some details and comments. + +As several proposals have noted, almost any internationalization +(i18n) proposal for names that are in, or map into, the DNS will +require changing DNS resolver API calls ("gethostbyname" or +equivalent or adding some pre-resolution preparation mechanism) in +almost all Internet applications -- whether to cause the API to take +a different character set, to accept or return more arguments with +qualifying or identifying information, or otherwise. Once +applications must be opened to make such changes, it is a relatively +small matter to switch from calling into the DNS to calling a +directory service and then the DNS (in many situations, both actions +could be accomplished in a single API call). + +A directory approach can be consistent both with "flat" stories and +multi-attribute ones. The DNS requires strict hierarchies, limiting +its ability to handle differentiation among names by their +properties. By contrast, modern directories can utilize +independently-searched attributes and other structured schema to +provide flexibilities not present in a strictly hierarchical system. + +There is a strong argument for a single directory structure (implying +a need for mechanisms for registration, delegation, etc.). But it is +not a strict requirement, especially if in-depth case analysis and +design work leads to the conclusion that reverse-mapping to directory +names is not a requirement (see section 4). + +While the discussion above includes very general comments about +attributes, it appears that only a very small number of attributes +would be needed. The list would almost certainly include country and +language for IDN purposes and might require "charset" if we cannot +agree on a character set and encoding. Trademark issues might +motivate "commercial" and "non-commercial" (or other) attributes if +they would be helpful in bypassing trademark problems. And +applications to resource location might argue for a few other +attributes (as outlined above). + + +4. Examining internationalization + +Much of the thinking underlying this document has been driven by +considerations of internationalizing the DNS or, more specifically, +providing access to the functions of the DNS from languages and +naming systems that cannot be accurately expressed in ASCII (or in +the traditional DNS subset of ASCII). Much of this work has been +done in the "IETF Internationalized Access to Domain Names" (IDN) +Working Group. This section contains an evaluation of what that +group has learned and how that learning might reasonably impact +IETF's next steps. It assumes familiarity with the work and +terminology of that working group. + +When the IDN effort started, several of us made the observation that +the first important task for the WG was an undocumented one: to +increase the understanding of the complexities of the problem +sufficiently that naive solutions could be rejected and people could +go to work on the harder problems. That has clearly been +accomplished. With the exception of some continuing background +noise, the simplistic stuff, with promises of one-year deployment, +has just disappeared and almost no one thinks this is simple any more. + +But some of the lessons learned are quite painful and should give us +pause, both generally and in the context of the remarks above: + +4.1. ASCII isn't just because of English + +The hostname rules chosen in the mid-70s weren't just "ASCII +because English uses ASCII", although that was a starting +point. We have discovered that almost every other script +(and, I think, even ASCII if we let the rest of the ISO 646 +non-BV characters in) is more complex than hostname- +restricted-ASCII. In some cases, case mapping works from one +case to the other, but is not reversible. In others, there +are conventions about alternate ways to represent characters +(in the language, not [just] in character coding) that work +most of the time, but not always. And there are issues in +coding, with Unicode/10646 providing different ways to +represent the same character (I am using that word, rather +than "glyph", deliberately here). And, in others, there are +questions as to whether two glphs "match", which may be a +distance-function question, not one with a binary answer. We +have tried to solve this set of problems with "nameprep" (see +below). + +4.2. "Nameprep" and its complexities + +The model for getting around the various problems described above and +elsewhere has evolved into a notion that all strings are to be placed +into the DNS only after being passed through a string preparation +function that eliminates or rejects spurious character codes, maps +some characters onto others, performs some sequence canonicalization, +and generally creates forms that can be accurately compared. The +impact of this process on host-table-subset ASCII is trivial and +essentially adds only overhead. For other scripts, the impact is, of +necessity, quite significant. + +Defining that process was quite complex. Although the general notion +was simple, the devil is often in the details, and there are many +details. A design team worked on it for months, with considerable +effort placed into clarifying and fine-tuning the protocol. Despite +general agreement that the IETF would avoid getting into the business +of defining character sets, character codings, and the associated +conventions, the group has several times taken excursions into +special treatment of code positions to more nearly match the +distinctions of Unicode with user-perceptions about similarities and +differences between characters. The IETF-specific code position work +has been removed from the protocol draft, but the fact that the +temptation has been strong may indicate problems we haven't solved to +everyone's satisfaction. + +At the same time, the nameprep work has been extremely useful, both +in identifying many of the problem code points and issues and +providing a reasonable set of rules. The problem is arguably not +with nameprep, but with the DNS-imposed requirement that nameprep, as +with all other parts of the matching and comparison process, yield a +binary "match or no match" answer, rather than, e.g., a value on a +similarity scale that can be evaluated by the user or by user-driven +heuristic functions. + + +4.3 The UCS Stability Problem + +ISO 10646 basically defines only code points, and not rules for using +or comparing the characters. This is a long- standing issue with +standards coming out of ISO/IEC JTC1/SC2; internationalization +issues, as contrasted with character-listing and code point +assignment issues, are just not dealt with effectively in that group. +The Unicode Technical Committee has defined some rules for +canonicalization and comparision, many of which have been factored +into the "nameprep" work, but we are still in progress on figuring +out how to make or define those rules in a sufficiently precise and +permanent fashion that the DNS can depend on them. Perhaps more +important, our nameprep efforts have identified several areas in +which the UTC rules do not adequately define things to make matching +precise and unambiguous. That raises two issues: whether trying to +do precise matching at the character set level is actually possible +(addressed below) and whether driving toward more precision could +create issues that cause instability in the implementation and +resolution models. + +In addition, JTC1 has recently assigned some (most?) of these issues +to JTC1/SC22/WG20 (the Internationalization WG within the +subcommittee that deals with programming languages, systems, and +environments). WG20 is historically strong and deals with +internationalization issues thoughtfully and in depth. Whether or +not they get it right, assignment of these matters to WG20 +significantly increases the risk of an eventual ISO standard that +specifies different behavior from the UTC specification. + +4.4 Audiences, end users, and the UI problem + +Part of what has "caused" the DNS i18n problem, as well as the DNS +trademark problem and several others is that we have stopped thinking +about "identifiers for objects", which normal people are not expected +to see, and started thinking about "names" -- strings that are +expected not only to be readable, but to have culturally-dependent +meaning to non-specialist users. + +The WG, and others, have attempted to avoid addressing the +implications of that transition by taking "someone else's problem" +approaches or by suggesting that we can adopt conventions and people +will just get used to them. I suggest that neither will work: + + * If we want to make it a problem in a different part of the + UI structure, we need to figure out where it goes in order + to have proof of concept of our solution. Unlike those + whose sole [business] model is the selling or registering of + names, any solution IETF produces actually needs to work, in + applications context, for the end user. + + * The "they will get used to our conventions and adapt" + principle is fine if we are writing rules for programming + languages or an API. But the conventions we are talking + about aren't part of a semi-mathematical system, they are + deeply ingrained in culture. No matter how often we tell an + English-speaking American that the Internet requires that the + correct spelling of "colour" be used, he or she isn't going to be + convinced. Getting a French-speaker in Lyon to use exactly + the same lexical conventions as a French-speaker in Quebec + in order to accomodate the decisions of the IETF or of a + registrar or registry is just not likely. "Montreal" is + either a misspelling or an anglicization (anglicisation?) of + Montr‰al (with an acute accent mark over the "e"), but we + are as unlikely to get global agreement on a rule that will + determine whether the two forms should match --and that + won't astonish end users and speakers of one language or the + other-- as we are to get agreement on whether "misspelling" + or "anglicization" is the greater travesty. + +More generally, it is not clear that the outcome of any conceivable +nameprep-like process is going to be good enough. In the use of +human languages by humans, we have many cases in which things that do +not match are nonetheless interpreted as matching. The +Norwegian/Danish glyph "°" (lower case 'o' with forward slash) and +the German glyph "" (lower case 'o' with umlaut) are clearly +different and no matching program should yield an "equal" comparison. +But they are more similar than either of them is to, e.g., "e", and +humans are able to mentally make the correction in context and can be +surprised if computers can't do so. + +This text uses examples in Roman scripts because it is being written +in English and those examples are relatively easy to render. But one +of the important lessons of the IDN discussions of the last year or +so is that problems like this exist in almost every language and +script. Each one has its idiosyncracies, and each set of +idiosyncracies is tied to common usage and cultural issues that are +deeply embedded. As long as a schoolchild in the US can get a bad +grade on a spelling test for using a perfectly valid British +spelling, or one in France or Germany can get a poor grade for +leaving off a diacritical mark, or one in Egypt or Israel will find +it acceptable to write a word with or without vowels or stress marks, +but, if they are included, that they must be the correct ones, there +are issues with the relevant language. We are dealing with culture, +not identifier symbol-strings for geeks or computers, and the efforts +of the last year have made it ever more clear that, if we ignore that +distinction, we are solving an insufficient problem. + + +4.5 Business cards and other natural uses of natural languages + +We have some established local conventions in the world for dealing +with multilingual situations. Looking at them may be helpful. If +one visits a country where the language is different from ones own, +business cards are often printed on two sides, one side in each +language. This is usually a high-tolerance situation: exact +translations are often not possible, and people typically smile at +errors, appreciate the effort, and move on. The DNS situation +differs from this in at least two ways: since we need a global +solution, the business card would need a number of sides +approximating the number of languages in the world, which is probably +impossible without violating laws of physics. And the opportunities +for tolerance don't exist: the DNS requires a exact match or the +lookup fails. + +4.6 ASCII encodings and the Roman keyboard assumption + +Part of the argument for ACE-based solutions is that they provide an +escape for multilingual environments when applications have not been +upgraded. When an older application encounters an ACE-based name, +the assumption is that the (admittedly ugly) ASCII string will be +displayed and can be typed in. This argument is reasonable from the +standpoint of mixtures of Latin-based alphabets, but may not be +relevant if user-level systems and devices are involved that do not +support the entry of Roman-based characters or which cannot +conveniently render such characters. + +4.7 A pessimistic summary of IDN WG directions + +It appears, from the cases above and others, that none of the +intra-DNS-based solutions for "multilingual names" are workable. +They just rest on too many assumptions that do not appear to be +feasible -- that people will adapt deeply-entrenched language habits +to conventions laid down to make the lives of computers easy; that we +can make "freeze it now, no need for changes in these areas" +decisions about Unicode and nameprep; that ACE will smooth over +applications problems, even in environments without the ability to +key or render roman-based glyphs (or where user experience is such +that they cannot easily be told apart); that we can either deploy +EDNS or that long names aren't really important; that the Chinese +Government (and others) will either give up their IS 2022-based +solutions (for which UTC adding large fractions of a million new code +points is almost certainly a necessary, but probably not sufficient +condition) or build leakproof boundary conversion mechanisms; that +out of band or contextual information will always be sufficient for +the "map glyph onto script" problem; and so on. In each case, we can +get about 80% or 90%, but it is not clear that is going to be good +enough. For example, suppose someone can spell her name 90% +correctly: is that likely to be considered adequate? + + +6. The Key Controversies + +6.1. One directory or many + +As suggested in some of the text above, it is an open question as to +whether the needs of the community would be best served by a single +directory with universal applicability, a single directory but +locally-tailored search (and, most important, matching) functions, or +multiple, locally-determined, directories. Each has its attractions. +Any but the first would essentially prevent reverse-mapping +(determination of the user-visible name of the host or resource from +target information such as an address or DNS name). But reverse +mapping has become less useful over the years --at least to users-- +as we have assigned more and more names per host address. + +Locally-tailored search and mappings would permit national variations +on interpretation of which strings matched which other ones, an +arrangement that is especially important when different localities +apply different rules to, e.g., matching of characters with and +without diacriticals. But, of course, this implies that a URL may +evaluate properly or not depending on either settings on a client +machine or the network connectivity of the user, which is not, in +general, a desirable situation. + +And, of course, completely separate directories would permit +translation and transliteration functions to be embedded in the +directory, given much of the Internet a different appearance +depending on which directory was chosen. The attractions of this are +obvious, but, unless things were very carefully designed to preserve +uniqueness and precise identities at the right points (which may or +may not be possible), such a system would have many of the +difficulties associated with multiple roots. + +6.2 Why not a proposal? + +As this document has gone through various preliminary drafts and +reviews, the question has been raised as to whether it should contain +a specific proposal: a specific directory mechanism, schema, and so +on. It deliberately does not take that step. It has been difficult +to get directory systems deployed in significant ways in the Internet +infrastructure, partially because we have a surplus of options. +There are also some approaches that could be used to implement the +general concepts described here, such as the Common Name Resolution +Protocol [RFC2972], which some would not consider directory protocols +at all. Consequently, it appeared better to present the general +concepts and arguments here and leave the specifics to other sources, +documents, and proposals. + + +7. Security Considerations + +The set of proposals implied by this document suggests an interesting +set of security issues (i.e., nothing important is ever easy). A +directory system used for this purpose would presumably need to be as +carefully protected against unauthorized changes as the DNS itself. +There also might be new opportunities for problems in the two-layer +arrangement; but those problems are not more severe than a two-stage +lookup in the DNS. + + +8. References + +RFC 625 On-line hostnames service. M.D. Kudlick, E.J. Feinler. +Mar-07-1974. + +RFC 811 Hostnames Server. K. Harrenstien, V. White, E.J. Feinler. +Mar-01-1982. + +RFC 952 DoD Internet host table specification. K. Harrenstien, M.K. +Stahl, E.J. Feinler. Oct-01-1985. + +RFC 882 Domain names: Concepts and facilities. P.V. Mockapetris. +Nov-01-1983. + +RFC 883 Domain names: Implementation specification. P.V. Mockapetris. +Nov-01-1983. + +RFC 1035 Domain names - implementation and specification. P.V. +Mockapetris. Nov-01-1987. + +RFC 1591 Domain Name System Structure and Delegation. J. Postel. +March 1994. + +RFC 2825 A Tangled Web: Issues of I18N, Domain Names, and the Other +Internet protocols. IAB, L. Daigle, ed.. May 2000. + +RFC 2826 IAB Technical Comment on the Unique DNS Root. IAB. May 2000. + +RFC 2972 Context and Goals for Common Name Resolution. N. Popp, M. +Mealling, L. Masinter, K. Sollins. October 2000. + +ITU Recommendation X.9 + +ITU Recommendation X.25 + +9. Acknowledgements + +Many people have contributed to versions of this document or the +thinking that went into it. The author would particularly like to +thank Harald Alvestrand, Leslie Daigle, Patrik Faltstrom, Eric A. +Hall, and Paul Hoffman for challenging the assumptions of earlier +versions and suggesting ways to improve them. + + +10. Culprit address + +John Klensin +AT&T Labs +99 Bedford Street +Boston, MA 02111 +klensin@att.com + +Expires November 2001 diff --git a/doc/draft/draft-skwan-utf8-dns-05.txt b/doc/draft/draft-skwan-utf8-dns-05.txt deleted file mode 100644 index a9e1dbbe96..0000000000 --- a/doc/draft/draft-skwan-utf8-dns-05.txt +++ /dev/null @@ -1,228 +0,0 @@ -INTERNET-DRAFT Stuart Kwan - James Gilroy - Levon Esibov - Microsoft Corp. - March 2001 - Expires September 2001 - - - Using the UTF-8 Character Set in the Domain Name System - - -Status of this Memo - -This document is an Internet-Draft and is in full conformance -with all provisions of Section 10 of RFC2026. - -Internet-Drafts are working documents of the Internet Engineering -Task Force (IETF), its areas, and its working groups. Note that -other groups may also distribute working documents as -Internet-Drafts. - -Internet-Drafts are draft documents valid for a maximum of six -months and may be updated, replaced, or obsoleted by other -documents at any time. It is inappropriate to use Internet- -Drafts as reference material or to cite them other than as -"work in progress." - -The list of current Internet-Drafts can be accessed at -http://www.ietf.org/ietf/1id-abstracts.txt - -The list of Internet-Draft Shadow Directories can be accessed at -http://www.ietf.org/shadow.html. - - -Abstract - -The Domain Name System standard specifies that names are represented -using the ASCII character encoding. This document expands that -specification to allow the use of the UTF-8 character encoding, a -superset of ASCII and a translation of the UCS-2 character encoding. - - - - - - - - - - - -Expires September 2001 [Page 1] - - -INTERNET-DRAFT UTF-8 DNS March 2001 - -1. Introduction - -The Domain Name System standard [RFC1035] specifies that names are -represented using the ASCII character encoding. This document expands -that specification to allow the use of the UTF-8 character encoding -[RFC2044], a superset of ASCII and a translation of the UCS-2 -character encoding. - -Interpreting names as ASCII-only limits the utility of DNS in an -international setting. The UTF-8 character set includes characters -from most of the world's written languages, allowing a far greater -range of possible names and allowing names to use characters that are -relevant to a particular locality. UTF-8 is the recommended character -set for protocols that are evolving beyond ASCII [RFC2130]. - -This document defines the technology for a richer character set in -DNS. This document specifically does not define policy for the -characters allowed in a name when used in a particular application. -For example, some protocols place restrictions on the characters -allowed in a name. In addition, names that are intended to be -globally visible [RFC1958] should contain ASCII-only characters -per [RFC1123]. - - -2. Protocol Description - -A UTF-8-aware DNS server is a DNS server that can load and store DNS -names that contain UTF-8 characters. Names are encoded in logical -order as opposed to visual order (see [UNICODE 2.0]). - -Uniform downcasing permits UTF-8-aware DNS implementations to -interoperate with non-UTF-8-aware DNS implementations. Any binary -string can be used in a DNS name [RFC2181], but names must be -compared with case-insensitivity [RFC1035]. A non-UTF-8-aware DNS -implementation is unable to perform a case-insensitive comparison -on a name containing UTF-8 characters. However, if UTF-8 names are -downcased before transmission, then binary comparisons will provide -the desired result on non-UTF-8-aware servers without violating the -case-insensitivity requirement. - -The DNS protocol standard states that original case should be -preserved when possible as data is entered into the system. This -requirement is modified as follows: a UTF-8-aware DNS server must -downcase all names containing UTF-8 characters in both record names -and record data before transmitting those names in any message. -A UTF-8-aware DNS client/resolver must downcase all names containing -UTF-8 characters before transmitting those names in any message. - - - - -Expires September 2001 [Page 2] - - -INTERNET-DRAFT UTF-8 DNS March 2001 - - -For consistency, UTF-8-aware DNS servers must compare names that -contain UTF-8 characters byte-for-byte, as opposed to using Unicode -equivalency rules. - -Applications should take care when allowing uppercase UTF-8 characters -to be passed to the resolver, and DNS servers should take care when -allowing uppercase UTF-8 characters to be entered in zone data. -Downcasing in UTF-8 is locale-sensitive and the result may vary -according to the locale of the code execution. The desired result will -always be obtained if the application and server only accept lowercase -characters. - -Names encoded in UTF-8 must not exceed the size limits clarified in -[RFC2181]. Character count is insufficient to determine size, since -some UTF-8 characters exceed one octet in length. - - -3. Interoperability Considerations - -The UTF-8 character encoding is ideal for use with existing protocol -implementations that expect US-ASCII characters. The representation -of a US-ASCII characters in UTF-8 is byte for byte identical to the -US-ASCII representation. Non-UTF-8-aware DNS clients always encode -names in ASCII format and those names will always be correctly -interpreted by a UTF-8-aware DNS server. - -DNS server authors may wish to provide a configuration switch on the -DNS server to allow/disallow the use of UTF-8 characters on a -per-server or per-zone basis. - -A non-UTF-8-aware DNS server may accept a zone transfer of a zone -containing UTF-8 names, but it may not be able to write back those -names to a zone file or reload those names from a zone file. -Administrators should exercise caution when transferring a zone -containing UTF-8 names to a non-UTF-8-aware DNS server. - - -4. Security Considerations - -The choice of character encoding for names does not impact the -security of the DNS protocol. - - -5. Acknowledgements - -The authors of this document would like to thank the following people -for their contribution to this specification: John McConnell, -Cliff Van Dyke and Bjorn Rettig. - - - -Expires September 2001 [Page 3] - - -INTERNET-DRAFT UTF-8 DNS March 2001 - - -6. References - -[RFC1035] P.V. Mockapetris, "Domain Names - Implementation and - Specification," RFC 1035, ISI, Nov 1987. - -[RFC2044] F. Yergeau, "UTF-8, a transformation format of Unicode - and ISO 10646," RFC 2044, Alis Technologies, Oct 1996. - -[RFC1958] B. Carpenter, "Architectural Principles of the - Internet," RFC 1958, IAB, June 1996. - -[RFC1123] R. Braden, "Requirements for Internet Hosts - - Application and Support," STD 3, RFC 1123, January 1989. - -[RFC2130] C. Weider et. al., "The Report of the IAB Character - Set Workshop held 29 July - 1 March 1996", - RFC 2130, Apr 1997. - -[RFC2181] R. Elz and R. Bush, "Clarifications to the DNS - Specification," RFC 2181, University of Melbourne and - RGnet Inc, July 1997. - -[UNICODE 2.0] The Unicode Consortium, "The Unicode Standard, Version - 2.0," Addison-Wesley, 1996. ISBN 0-201-48345-9. - - -7. Author's Addresses - -Stuart Kwan James Gilroy -Microsoft Corporation Microsoft Corporation -One Microsoft Way One Microsoft Way -Redmond, WA 98052 Redmond, WA 98052 -USA USA - - -Levon Esibov -Microsoft Corporation -One Microsoft Way -Redmond, WA 98052 -USA - - - - - - - - - - - -Expires September 2001 [Page 4] - - - - - - diff --git a/doc/draft/draft-skwan-utf8-dns-06.txt b/doc/draft/draft-skwan-utf8-dns-06.txt new file mode 100644 index 0000000000..de92b417b3 --- /dev/null +++ b/doc/draft/draft-skwan-utf8-dns-06.txt @@ -0,0 +1,421 @@ +INTERNET-DRAFT Stuart Kwan + James Gilroy + Levon Esibov + Microsoft Corp. + May 2001 + Expires November 2001 + + + Using the UTF-8 Character Set in the Domain Name System + +Status of this Memo + +This document is an Internet-Draft and is in full conformance +with all provisions of Section 10 of RFC2026. + +Internet-Drafts are working documents of the Internet Engineering +Task Force (IETF), its areas, and its working groups. Note that +other groups may also distribute working documents as +Internet-Drafts. + +Internet-Drafts are draft documents valid for a maximum of six +months and may be updated, replaced, or obsoleted by other +documents at any time. It is inappropriate to use Internet- +Drafts as reference material or to cite them other than as +"work in progress." + +The list of current Internet-Drafts can be accessed at +http://www.ietf.org/ietf/1id-abstracts.txt + +The list of Internet-Draft Shadow Directories can be accessed at +http://www.ietf.org/shadow.html. + + +Abstract + +The Domain Names standard specifies that hostnames are represented +using the ASCII character encoding. This document expands that +specification to allow the use of the UTF-8 character encoding, a +superset of ASCII and a translation of the UCS-2 character encoding. + + +1. Introduction + +The Domain Names standard [RFC1123] specifies that hostnames are +represented using the ASCII character encoding. This document expands +that specification to allow the use of the UTF-8 character encoding +[RFC2044], a superset of ASCII and a translation of the UCS-2 +character encoding. + +Interpreting names as ASCII-only limits the utility of DNS in an +international setting. The UTF-8 character set includes characters +from most of the world's written languages, allowing a far greater +range of possible names and allowing names to use characters that are +relevant to a particular locality. UTF-8 is the recommended character +set for protocols that are evolving beyond ASCII [RFC2130]. + +Expires November 2001 [Page 1] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +This document defines the technology for a richer character set in +DNS. This document specifically does not define policy for the +characters allowed in a name when used in a particular application. +For example, some protocols place restrictions on the characters +allowed in a name + + +2. Protocol Description + +2.1 Components and roles + +Before the description of the protocol itself authors feel a need to +clarify which components are involved in processing the hostnames and +describe the usage of the hostnames by these components. The following +list contains such information. + +User. +User could be a human or application. Its role is to specify (also +known as "write") and retrieve (also known as "read") the hostname to +and from an application. The examples of such operations include +typing the hostname, writing it on a touch sensitive screen, reading +the name from the monitor, listening to a voicemail, etc... + +Application. +Application's role is to +- process the hostname specified by user or other local or remote + application. +- return to the user (for example display on a monitor screen) the + hostname returned by DNS resolver. +- call DNS name resolution APIs to request resolver to perform the + name resolution + +Resolver. +Resolver's role is to +- process the name resolution requests from an application and submit + appropriate DNS query to the DNS servers +- process the response from a DNS server and pass the response to the + Application. + +DNS server. +The role of the DNS server is to store and maintain the DNS data, +process the updates to its database, update the replica copies of the +databases and perform the DNS name resolution through responding to +the DNS queries. + + +2.2 Protocol details + +This section describes the modifications (if any) to each of these +components and interfaces between the communicating components. + + + +Expires November 2001 [Page 2] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +2.2.1 Users + +No modifications to the users are proposed in this document. At the +same time support of this protocol by other components specified later +in this section may enable users to start using in hostnames +characters from wider set than one specified in [RFC1123]. + + +2.2.2 Interface between users and applications + +User may use any character set or multiple character sets supported by +the particular application. Specification of the allowed character +sets supported by an application is outside of the scope of this +document. The decision on which characters sets can be used to allow +user to input and retrieve the hostnames is left to the implementers +of the particular applications unless a protocol underlying specific +application specifies the supported characters set. Thus this protocol +does not affect the interface between users and applications. + + +2.2.3 Applications + +Storage format of the hostnames by the applications is outside of the +scope of this protocol. + + +2.2.4 Interface between applications and resolvers + +This protocol does not specify the APIs that applications should use +to request the resolver to perform the DNS name resolution of the +internationalized hostnames. Instead it only specifies the format of +the hostnames specified in the input and output of such APIs. + +The applications supporting non-ASCII characters in hostnames MUST +pass to the resolvers a hostname in ISO/IEC 10646 encoding. If the +response returned by the resolver to the application contains the +hostname, then the application should expect the hostname to be +encoded using ISO/IEC 10646. + + +2.2.5 Resolvers + +Before sending the hostname in the query packet, the resolver MUST +prepare each name part as specified in [NAMEPREP]. After the name +preparation the resolver MUST convert the hostname to be encoded using +UTF-8 as specified in [RFC2044]. +Names encoded in UTF-8 must not exceed the size limits clarified in +[RFC2181]. Character count is insufficient to determine size, since +some UTF-8 characters exceed one octet in length. + + + + +Expires November 2001 [Page 3] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +When resolver receives a response to the query from a DNS server, it +MUST convert all of the hostnames from UTF-8 encoded format to the +ISO/IEC 10646 encoding before passing these hostnames back to the +application. + + +2.2.6 DNS servers + +DNS servers authoritative for the records containing the hostnames +containing the characters not allowed by [RFC1123] MUST allow use of +the namepreped UTF-8 format to store and transmit those parts of the +hostnames. + +According to existing standards, any binary string can be used in a +DNS name [RFC2181], but names must be compared with case-insensitivity +[RFC1035]. At the same time DNS protocol standard states that original +case SHOULD be preserved when possible as data is entered into the DNS +database. This requirement is modified as follows: a DNS server +authoritative for the internationalized hostnames MUST nameprep and +perform UTF-8 conversion on all names containing internationalized +characters in both record names and record data before storing these +hostnames and transmitting those names in any message. This new +requirement guarantees case-insensitive comparison of the +internationalized hostnames even by those DNS servers that do not +support this protocol. + +DNS servers must compare names that contain UTF-8 characters +byte-for-byte, as opposed to using Unicode equivalency rules. + + +3. Interoperability Considerations + +If user continues using ASCII-only characters in the hostnames, then +there is no need to upgrade any applications and/or resolvers. + +As pointed in the previous section, there is no need to upgrade DNS +servers, except possibly those that are authoritative for the zones +containing internationalized hostnames. + +The following interoperability issues should be taken into account + +- A legacy application may not be able to process the hostnames +containing non-ASCII characters returned by DNS resolvers. Effect of +failure to process a name containing 7-bit needs to be separately +investigated. +- If other protocols decide to use the nameprep-UTF-8-encoding to +represent internationalized hostnames in their wire packets, then a +legacy application supporting such protocol that receives UTF-8 +encoded hostname from another application (for example, such as mail +server or client) may fail to process such hostname. Effect of failure +to process a name containing 7-bit needs to be separately investigate. + + +Expires November 2001 [Page 4] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +Thus hostnames that are intended to be globally usable [RFC1958] on +legacy applications should still contain ASCII-only characters per +[RFC1123]. + +- If an updated application runs on legacy resolver that rejects name +resolution of the names containing any character not allowed by +[RFC1123], then such resolvers will require an upgrade to enable name +resolution of the internationalized hostnames. + +- As specified above, DNS servers authoritative for the DNS records +containing the internationalized hostnames must be able to save and +load the hostnames containing napepreped-UTF-8-converted characters. +If the DNS server doesn't satisfy this requirement, but needs to host +such resource records, then it needs to be upgraded. + +- Any DNS server involved in a name resolution process of the DNS +records containing an internationalized hostname must not reject name +resolution only because the hostname contains characters not allowed +by [RFC1123]. This requirement does not mean that every DNS server in +the name resolution path between the client and authoritative server +must be able to store and load the DNS records containing the +internationalized hostnames, but only means that the DNS server +performing recursive resolution needs to be able to query for and +cache such records, and that the DNS servers authoritative for the DNS +names higher in the DNS name hierarchy than the internationalized +names in query, need to be able to respond to such queries. +Overwhelming majority of the DNS servers currently deployed on the +Internet already satisfy this requirement. Authors are not aware of +any implementation of the DNS server widely deployed on the Internet +that doesn't satisfy this requirement. + +Although most of the DNS servers may be capable of accepting a zone +transfer of a zone containing UTF-8 encoded hostnames, some of them +may not be able to store those names in a zone file or load those +names from a zone file. Administrators should exercise caution when +transferring a zone containing UTF-8 encoded hostnames to such DNS +servers. + + + +4. Security Considerations + +Support for internationalized hostnames introduces a possibility of a +new type of spoofing attacks that could be based on attacker's +knowledge of misbehaving applications or resolvers that modifies the +internationalized hostname that needs to be resolved. For example, if +there is an application that modifies any character containing 7-bit +in some predictable manner (for example by simply dropping the 7-bit), + + + + + +Expires November 2001 [Page 5] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +then an attacker may register a DNS record mapping the derivative +(i.e. modified by the misbehaving application or resolver) name to the +data desired by attacker. In this scenario any user using such +misbehaving application may receive as a result of name resolution the +data (for example an IP address in A resource record) specified by the +attacker without noticing that they are subjected to an attack even if +the DNSSEC is used to verify the authenticity of the response. + +Because this protocol depends on the procedures described in +[NAMEPREP] and [RFC2044], the security issues identified in these +document are also applicable to this protocol. + + +5. Acknowledgements + +The authors of this document would like to thank the following people +for their contribution to this specification: John McConnell, +Cliff Van Dyke and Bjorn Rettig. + + +6. References + +[RFC1035] P.V. Mockapetris, "Domain Names - Implementation and + Specification," RFC 1035, ISI, Nov 1987. + +[RFC2044] F. Yergeau, "UTF-8, a transformation format of Unicode + and ISO 10646," RFC 2044, Alis Technologies, Oct 1996. + +[RFC1958] B. Carpenter, "Architectural Principles of the + Internet," RFC 1958, IAB, June 1996. + +[RFC1123] R. Braden, "Requirements for Internet Hosts - + Application and Support," STD 3, RFC 1123, January 1989. + +[RFC2130] C. Weider et. al., "The Report of the IAB Character + Set Workshop held 29 July - 1 March 1996", + RFC 2130, Apr 1997. + +[RFC2181] R. Elz and R. Bush, "Clarifications to the DNS + Specification," RFC 2181, University of Melbourne and + RGnet Inc, July 1997. + +[UNICODE 2.0] The Unicode Consortium, "The Unicode Standard, Version + 2.0," Addison-Wesley, 1996. ISBN 0-201-48345-9. + +[NAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of + Internationalized Host Names", + draft-ietf-idn-nameprep-*.txt. + + + + + +Expires November 2001 [Page 6] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +7. Author's Addresses + +Stuart Kwan James Gilroy +Microsoft Corporation Microsoft Corporation +One Microsoft Way One Microsoft Way +Redmond, WA 98052 Redmond, WA 98052 +USA USA +skwan@microsoft.com jamesg@microsoft.com + +Levon Esibov +Microsoft Corporation +One Microsoft Way +Redmond, WA 98052 +USA +levone@microsoft.com + + +11. Intellectual Property Statement + +The IETF takes no position regarding the validity or scope of any +intellectual property or other rights that might be claimed to pertain +to the implementation or use of the technology described in this +document or the extent to which any license under such rights might or +might not be available; neither does it represent that it has made any +effort to identify any such rights. Information on the IETF's +procedures with respect to rights in standards-track and standards- +related documentation can be found in BCP-11. Copies of claims of +rights made available for publication and any assurances of licenses to +be made available, or the result of an attempt made to obtain a general +license or permission for the use of such proprietary rights by +implementors or users of this specification can be obtained from the +IETF Secretariat. + +The IETF invites any interested party to bring to its attention any +copyrights, patents or patent applications, or other proprietary rights +which may cover technology that may be required to practice this +standard. Please address the information to the IETF Executive +Director. + + +12. Full Copyright Statement + +Copyright (C) The Internet Society (2001). All Rights Reserved. +This document and translations of it may be copied and furnished to +others, and derivative works that comment on or otherwise explain it or +assist in its implementation may be prepared, copied, published and +distributed, in whole or in part, without restriction of any kind, +provided that the above copyright notice and this paragraph are included +on all such copies and derivative works. However, this document itself +may not be modified in any way, such as by removing the copyright notice +or references to the Internet Society or other Internet organizations, +except as needed for the purpose of developing Internet standards in + +Expires November 2001 [Page 7] + +INTERNET-DRAFT UTF-8 DNS May 2001 + + +which case the procedures for copyrights defined in the Internet +Standards process must be followed, or as required to translate it into +languages other than English. The limited permissions granted above are +perpetual and will not be revoked by the Internet Society or its +successors or assigns. This document and the information contained +herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE +INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE +INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED +WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." + +Expires November 2001 [Page 8] \ No newline at end of file