Characters and character sets for various languages Thu Jun 17 12:29:46 MET DST 1993 Harald Tveit Alvestrand SINTEF DELAB Harald.Alvestrand@delab.sintef.no Abstract There is a need to have a source of information about the characters that are used in various languages. No such information is currently readily available on the net. This document attempts to fill that void. Status of this Memo This draft document is being circulated for comment. It does not yet cover anything but Latin-based scripts; volunteers to collect material for other scripts are sought. Please send comments to the author, or to the RARE WG-CHAR list . The following text is required by the Internet-draft rules: This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Alvestrand Expires Dec 17 93 [Page 1] draft Languages and character sets Mar 93 1. Introduction There are a lot of languages in the world. Estimates vary between 500 and 6000, with some eternal conflicts about the difference between a language and a dialect guaranteeing that any list claiming to be authoritative will be the source of endless debate. Many of these languages have a writing system. Some have several. These are also likely to have changed over time, with the meaning of character symbols changing, the shape of the characters changing, or completely new characters being added, or old ones removed from the set. This means that even within a single language, a list of characters is likely to be controversial. These problems have made several experts in the field of languages and characters refuse to even consider the idea of working out such a list. Nevertheless, it is clear that an easily available source of this kind of information is needed, in order to: (1) Identify the problems encountered when trying to use equipment with limited character support for a language (2) Identify what support for additional characters will be "enough" for that language (3) Identify what internationally standardized character sets are able to fulfill the requirements for that languag The tables given below are an attempt at providing such an identification. The rest of the document is in 3 parts: The language tables a 2. Introduction to language tables Alvestrand Expires Dec 17 93 [Page 2] draft Languages and character sets Mar 93 2.1. Table structure Each language is listed in 4 parts: (1) The language name with its ISO 639 code if applicable (2) The characters required for that language. For brevity, the characters of ASCII (A-Z) are not listed. Note that some languages do NOT require all the ASCII characters. (3) Characters that are in normal use, but have replacements that mostly do not change the meaning of the word in context. These may be called "optional" characters. This should _not_ be taken as liberty to remove those characters from the language, but as a reminder that if it is great trouble to use the charsets that cover the complete language, a smaller character set may be used without causing grievous harm to the expressive power of the writer. (4) Internationally registered character sets that cover the required and/or optional characters for that language. (5) Comments The division between "required" and "optional" characters is likely to produce much discussion. As a rough guide, I have taken the registered ISO 646 variants of a number of countries, and classified as "optional" all characters which did _not_ appear in that ISO 646 variant. As a result, an ISO 646 variant should appear under the "required characters only" for all languages that have an ISO 646 variant. Note that for brevity, only the lower case version of the character is listed. If no note is made, one should assume that the upper case version is equally required. Note, however, that a lot of languages permit the dropping of accents on upper case characters where it would be considered improper to drop them on lower case characters. Alvestrand Expires Dec 17 93 [Page 3] draft Languages and character sets Mar 93 2.2. Sources utilized The table of Latin-script languages is based on work by Johan van Wingen. . The others are best guesses by the author. The tables of character sets prepared by Keld Jorn Simonsen (RFC-KELD) were invaluable in matching the data on languages to the data on character sets. The language codes (for those languages that have codes) come from ISO 639. NOTE: ISO 639 is a very incomplete list of the world's languages (perhaps 10 or 20 % according to some experts), and is undergoing revision. The only reason for using it is that it is the only ISO-standardized shorthand notation for languages available at the moment. Languages for which no such exact information is known are listed at the end of the tables. 2.3. What accents mean For those who feel unfamiliar with the names of accents: Grave slants upwards to the left, like the Unix "backtick". Acute slants upwards to the right. Circumflex looks like a little pointed hat. Tilde looks like a wavy line. Alvestrand Expires Dec 17 93 [Page 4] draft Languages and character sets Mar 93 Macron looks like a bar placed on top of the character. Breve looks like the lower quarter of a circle, placed on top of the character. Dot above should be self-explanatory. Diaeresis looks like 2 dots above the character. Ring above should be self-explanatory. Cedilla looks like a little squiggle on the bottom of the letter, down and then left. Ogonek looks like a squiggle too, but goes down and to the right. Caron looks like a little "v" on top of the character. 3. Language tables This language has no known character set 3.1. lt Lithuanian Required characters a; 0105 LATIN SMALL LETTER A WITH OGONEK e; 0119 LATIN SMALL LETTER E WITH OGONEK i; 012f LATIN SMALL LETTER I WITH OGONEK Alvestrand Expires Dec 17 93 [Page 5] draft Languages and character sets Mar 93 u; 0173 LATIN SMALL LETTER U WITH OGONEK e. 0117 LATIN SMALL LETTER E WITH DOT ABOVE u- 016b LATIN SMALL LETTER U WITH MACRON c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.2. lv Latvian Required characters a- 0101 LATIN SMALL LETTER A WITH MACRON e- 0113 LATIN SMALL LETTER E WITH MACRON i- 012b LATIN SMALL LETTER I WITH MACRON o- 014d LATIN SMALL LETTER O WITH MACRON u- 016b LATIN SMALL LETTER U WITH MACRON g, 0123 LATIN SMALL LETTER G WITH CEDILLA k, 0137 LATIN SMALL LETTER K WITH CEDILLA l, 013c LATIN SMALL LETTER L WITH CEDILLA n, 0146 LATIN SMALL LETTER N WITH CEDILLA r, 0157 LATIN SMALL LETTER R WITH CEDILLA c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) Alvestrand Expires Dec 17 93 [Page 6] draft Languages and character sets Mar 93 ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) 3.3. et Estonian Required characters o? 00f5 LATIN SMALL LETTER O WITH TILDE a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.4. fi Finnish Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS Character sets covering the whole NATS-SEFI (iso 8) NATS-DANO-ADD (iso 9) SEN_850200_B (iso 10) SEN_850200_C (iso 11) DIN_66003 (iso 21) Alvestrand Expires Dec 17 93 [Page 7] draft Languages and character sets Mar 93 videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.5. ?? Sami Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE d/ 0111 LATIN SMALL LETTER D WITH STROKE n' 0144 LATIN SMALL LETTER N WITH ACUTE ng 014b LATIN SMALL LETTER ENG t/ 0167 LATIN SMALL LETTER T WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) Alvestrand Expires Dec 17 93 [Page 8] draft Languages and character sets Mar 93 T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.6. sv Swedish Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE Optional characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE e: 00eb LATIN SMALL LETTER E WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) latin6 (iso 157) JIS_X0212-1990 (iso 159) Character sets covering the required characters only NATS-SEFI (iso 8) SEN_850200_B (iso 10) SEN_850200_C (iso 11) Alvestrand Expires Dec 17 93 [Page 9] draft Languages and character sets Mar 93 3.7. no Norwegian Required characters ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE Optional characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) latin6 (iso 157) JIS_X0212-1990 (iso 159) Character sets covering the required characters only NATS-DANO (iso 9) NS_4551-1 (iso 60) NS_4551-2 (iso 61) ISO_8859-4:1988 (iso 110) 3.8. da Danish Required characters ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE Optional characters Alvestrand Expires Dec 17 93 [Page 10] draft Languages and character sets Mar 93 a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) Character sets covering the required characters only NATS-DANO (iso 9) NS_4551-1 (iso 60) NS_4551-2 (iso 61) ISO_8859-4:1988 (iso 110) ISO_8859-9:1989 (iso 148) 3.9. fo Faeroese Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE ae 00e6 LATIN SMALL LETTER AE o/ 00f8 LATIN SMALL LETTER O WITH STROKE d- 00f0 LATIN SMALL LETTER ETH (Icelandic) Character sets covering the whole videotex-suppl (iso 70) Alvestrand Expires Dec 17 93 [Page 11] draft Languages and character sets Mar 93 iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.10. is Icelandic Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS ae 00e6 LATIN SMALL LETTER AE d- 00f0 LATIN SMALL LETTER ETH (Icelandic) th 00fe LATIN SMALL LETTER THORN (Icelandic) Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.11. kl Greenlandic Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE Alvestrand Expires Dec 17 93 [Page 12] draft Languages and character sets Mar 93 i' 00ed LATIN SMALL LETTER I WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE a? 00e3 LATIN SMALL LETTER A WITH TILDE i? 0129 LATIN SMALL LETTER I WITH TILDE u? 0169 LATIN SMALL LETTER U WITH TILDE kk 0138 LATIN SMALL LETTER KRA (Greenlandic) Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) 3.12. ?? Gaelic Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE Character sets covering the whole GB_2312-80 (iso 58) videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) Alvestrand Expires Dec 17 93 [Page 13] draft Languages and character sets Mar 93 ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.13. ga Irish Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE Character sets covering the whole GB_2312-80 (iso 58) videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.14. cy Welsh Required characters w' 1e83 LATIN SMALL LETTER W WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE Alvestrand Expires Dec 17 93 [Page 14] draft Languages and character sets Mar 93 e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE w! 1e81 LATIN SMALL LETTER W WITH GRAVE y! 1ef3 LATIN SMALL LETTER Y WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX w> 0175 LATIN SMALL LETTER W WITH CIRCUMFLEX y> 0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS w: 1e85 LATIN SMALL LETTER W WITH DIAERESIS y: 00ff LATIN SMALL LETTER Y WITH DIAERESIS This language has no known character set 3.15. br Breton Required characters e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX u! 00f9 LATIN SMALL LETTER U WITH GRAVE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS n? 00f1 LATIN SMALL LETTER N WITH TILDE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) Alvestrand Expires Dec 17 93 [Page 15] draft Languages and character sets Mar 93 ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.16. fy Frisian Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) 3.17. nl Dutch Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE Alvestrand Expires Dec 17 93 [Page 16] draft Languages and character sets Mar 93 a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ij 0133 LATIN SMALL LIGATURE IJ Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.18. af Afrikaans Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE e! 00e8 LATIN SMALL LETTER E WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) Alvestrand Expires Dec 17 93 [Page 17] draft Languages and character sets Mar 93 3.19. de German Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ss 00df LATIN SMALL LETTER SHARP S (German) Optional characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE Comments The "ss" character exists only in lower case; the upper case equivalent is "SS" (2 letters). Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) JIS_X0212-1990 (iso 159) Character sets covering the required characters only DIN_66003 (iso 21) ISO_8859-2:1987 (iso 101) ISO_8859-4:1988 (iso 110) CSN_369103 (iso 139) latin6 (iso 157) Alvestrand Expires Dec 17 93 [Page 18] draft Languages and character sets Mar 93 3.20. fr French Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE e! 00e8 LATIN SMALL LETTER E WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA a! 00e0 LATIN SMALL LETTER A WITH GRAVE Optional characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX ae 00e6 LATIN SMALL LETTER AE oe 0153 LATIN SMALL LIGATURE OE e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS y: 00ff LATIN SMALL LETTER Y WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) Character sets covering the required characters only IT (iso 15) NF_Z_62-010_(1973) (iso 25) NF_Z_62-010 (iso 69) ISO_8859-1:1987 (iso 100) ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-1 (iso 121) CSA_Z243.4-1985-2 (iso 122) CSA_Z243.4-1985-gr (iso 123) ISO_8859-9:1989 (iso 148) Alvestrand Expires Dec 17 93 [Page 19] draft Languages and character sets Mar 93 JIS_X0212-1990 (iso 159) 3.21. ca Catalan Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE i: 00ef LATIN SMALL LETTER I WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS l. 0140 LATIN SMALL LETTER L WITH MIDDLE DOT n? 00f1 LATIN SMALL LETTER N WITH TILDE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.22. es Spanish Required characters n? 00f1 LATIN SMALL LETTER N WITH TILDE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA !I 00a1 INVERTED EXCLAMATION MARK ?I 00bf INVERTED QUESTION MARK Optional characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE Alvestrand Expires Dec 17 93 [Page 20] draft Languages and character sets Mar 93 o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS n? 00f1 LATIN SMALL LETTER N WITH TILDE Comments Note that this language also uses special punctuation marks. The c, appears in ISO 646-ES, but not in van Wingen's tables. Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) JIS_X0212-1990 (iso 159) Character sets covering the required characters only ES (iso 17) ES2 (iso 85) 3.23. gl Galician Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS n? 00f1 LATIN SMALL LETTER N WITH TILDE Character sets covering the whole videotex-suppl (iso 70) Alvestrand Expires Dec 17 93 [Page 21] draft Languages and character sets Mar 93 iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) JIS_X0212-1990 (iso 159) 3.24. pt Portuguese Required characters a? 00e3 LATIN SMALL LETTER A WITH TILDE o? 00f5 LATIN SMALL LETTER O WITH TILDE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA Optional characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u: 00fc LATIN SMALL LETTER U WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) Alvestrand Expires Dec 17 93 [Page 22] draft Languages and character sets Mar 93 Character sets covering the required characters only PT (iso 16) PT2 (iso 84) ISO_8859-9:1989 (iso 148) 3.25. eu Basque Required characters n? 00f1 LATIN SMALL LETTER N WITH TILDE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA Character sets covering the whole ES (iso 17) videotex-suppl (iso 70) ES2 (iso 85) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) JIS_X0212-1990 (iso 159) 3.26. mt Maltese Required characters a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX c. 010b LATIN SMALL LETTER C WITH DOT ABOVE g. 0121 LATIN SMALL LETTER G WITH DOT ABOVE h/ 0127 LATIN SMALL LETTER H WITH STROKE Alvestrand Expires Dec 17 93 [Page 23] draft Languages and character sets Mar 93 z. 017c LATIN SMALL LETTER Z WITH DOT ABOVE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.27. it Italian Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE Optional characters i' 00ed LATIN SMALL LETTER I WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u! 00f9 LATIN SMALL LETTER U WITH GRAVE i: 00ef LATIN SMALL LETTER I WITH DIAERESIS Comments The accented characters appear only in the lower case variant in the Italian version of ISO 646 (ISO-IR-15). Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) Alvestrand Expires Dec 17 93 [Page 24] draft Languages and character sets Mar 93 ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) Character sets covering the required characters only GB_2312-80 (iso 58) 3.28. ?? Rhaetian Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.29. ro Romanian Required characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX Alvestrand Expires Dec 17 93 [Page 25] draft Languages and character sets Mar 93 a( 0103 LATIN SMALL LETTER A WITH BREVE s, 015f LATIN SMALL LETTER S WITH CEDILLA t, 0163 LATIN SMALL LETTER T WITH CEDILLA Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.30. hu Hungarian Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS o" 0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE u" 0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) Alvestrand Expires Dec 17 93 [Page 26] draft Languages and character sets Mar 93 3.31. sq Albanian Required characters e: 00eb LATIN SMALL LETTER E WITH DIAERESIS c, 00e7 LATIN SMALL LETTER C WITH CEDILLA Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-1:1987 (iso 100) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) CSA_Z243.4-1985-gr (iso 123) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) JIS_X0212-1990 (iso 159) 3.32. tr Turkish Required characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS i. 0131 LATIN SMALL LETTER I WITH NO DOT c, 00e7 LATIN SMALL LETTER C WITH CEDILLA s, 015f LATIN SMALL LETTER S WITH CEDILLA g( 011f LATIN SMALL LETTER G WITH BREVE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) Alvestrand Expires Dec 17 93 [Page 27] draft Languages and character sets Mar 93 ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) 3.33. hr Croatian Required characters c' 0107 LATIN SMALL LETTER C WITH ACUTE d/ 0111 LATIN SMALL LETTER D WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) JUS_I.B1.002 (iso 141) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.34. sl Slovenian Required characters c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) Alvestrand Expires Dec 17 93 [Page 28] draft Languages and character sets Mar 93 T.61-8bit (iso 103) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) CSN_369103 (iso 139) JUS_I.B1.002 (iso 141) ISO_6937-2-add (iso 142) latin6 (iso 157) JIS_X0212-1990 (iso 159) 3.35. sk Slovak Required characters y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX l' 013a LATIN SMALL LETTER L WITH ACUTE r' 0155 LATIN SMALL LETTER R WITH ACUTE c< 010d LATIN SMALL LETTER C WITH CARON d< 010f LATIN SMALL LETTER D WITH CARON l< 013e LATIN SMALL LETTER L WITH CARON n< 0148 LATIN SMALL LETTER N WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON t< 0165 LATIN SMALL LETTER T WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) Alvestrand Expires Dec 17 93 [Page 29] draft Languages and character sets Mar 93 3.36. cs Czech Required characters y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE e< 011b LATIN SMALL LETTER E WITH CARON u0 016f LATIN SMALL LETTER U WITH RING ABOVE c< 010d LATIN SMALL LETTER C WITH CARON d< 010f LATIN SMALL LETTER D WITH CARON n< 0148 LATIN SMALL LETTER N WITH CARON r< 0159 LATIN SMALL LETTER R WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON t< 0165 LATIN SMALL LETTER T WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) 3.37. pl Polish Required characters o' 00f3 LATIN SMALL LETTER O WITH ACUTE a; 0105 LATIN SMALL LETTER A WITH OGONEK e; 0119 LATIN SMALL LETTER E WITH OGONEK c' 0107 LATIN SMALL LETTER C WITH ACUTE n' 0144 LATIN SMALL LETTER N WITH ACUTE s' 015b LATIN SMALL LETTER S WITH ACUTE z' 017a LATIN SMALL LETTER Z WITH ACUTE l/ 0142 LATIN SMALL LETTER L WITH STROKE Alvestrand Expires Dec 17 93 [Page 30] draft Languages and character sets Mar 93 z. 017c LATIN SMALL LETTER Z WITH DOT ABOVE Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) JIS_X0212-1990 (iso 159) 3.38. ?? Sorbian Required characters o' 00f3 LATIN SMALL LETTER O WITH ACUTE e< 011b LATIN SMALL LETTER E WITH CARON c' 0107 LATIN SMALL LETTER C WITH ACUTE n' 0144 LATIN SMALL LETTER N WITH ACUTE s' 015b LATIN SMALL LETTER S WITH ACUTE z' 017a LATIN SMALL LETTER Z WITH ACUTE l/ 0142 LATIN SMALL LETTER L WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON r< 0159 LATIN SMALL LETTER R WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) ISO_8859-2:1987 (iso 101) T.61-8bit (iso 103) T.101-G2 (iso 128) CSN_369103 (iso 139) ISO_6937-2-add (iso 142) Alvestrand Expires Dec 17 93 [Page 31] draft Languages and character sets Mar 93 3.39. eo Esperanto Required characters u( 016d LATIN SMALL LETTER U WITH BREVE c> 0109 LATIN SMALL LETTER C WITH CIRCUMFLEX g> 011d LATIN SMALL LETTER G WITH CIRCUMFLEX h> 0125 LATIN SMALL LETTER H WITH CIRCUMFLEX j> 0135 LATIN SMALL LETTER J WITH CIRCUMFLEX s> 015d LATIN SMALL LETTER S WITH CIRCUMFLEX Character sets covering the whole videotex-suppl (iso 70) iso-ir-90 (iso 90) ANSI_X3.110-1983 (iso 99) T.61-8bit (iso 103) ISO_8859-3:1988 (iso 109) T.101-G2 (iso 128) ISO_6937-2-add (iso 142) ISO_8859-supp (iso 154) JIS_X0212-1990 (iso 159) 4. Other languages with appropriate character sets Other languages for which appropriate character sets are known are listed in the table below. Language Character set ar Arabic ISO-8859-6 be Byelorussian ISO-8859-5 bg Bulgarian ISO-8859-5 el Greek ISO-8859-7 en English USASCII fa Persian ISO-8859-6 iw Hebrew ISO-8859-8 ja Japanese ISO-IR-87 (Japanese JIS C6226-1983) ko Korean ISO-IR-149 (Korean KS C 5601-1989) la Latin USASCII lo Laotian ISO-IR-166 ru Russian ISO-8859-5 sw Swahili USASCII th Thai ISO-IR-166 Alvestrand Expires Dec 17 93 [Page 32] draft Languages and character sets Mar 93 uk Ukrainian ISO-8859-5 ur Urdu ISO-8859-6 vo Volapuk ISO-8859-1 zh Chinese ISO-IR-58 (Chinese GB 2312-80) Additional entries in this table are welcome! 4.1. ISO 10646 only languages The following languages can (to the author's limited knowledge) be written with the current ISO 10646 standard, but with no other registered character sets: Language Country(ies) Script(s) aa Afar Somalia, Ethiopia, Djibouti Latin ab Abkhazian Georgia Cyrillic am Amharic Ethiopia Ethiopic as Assamese India, Nepal Bengali ay Aymara Bolivia, Peru, Chile Latin az Azerbaijani SNC, Iran, Iraq, Turkey Cyrillic, Arabic ba Bashkir SNC Cyrillic bh Bihari India Gujarati (or Kaithi) bi Bislama Vanuatu, New Caledonia Latin bn Bengali India Bengali co Corsican France Latin fj Fiji Fiji Latin gd Scots UK Latin gn Guarani Paraguay Latin gu Gujarati India Gujarati ha Hausa Nigeria, Niger, Chad, Sudan,... Latin hi Hindi India Devanagari hy Armenian Armenia Armenian ia Interlingua None (Artificial Language) Latin ie Interlingue None (Artificial Language) Latin ik Inupiak USA, Cannada Latin, Cree in Indonesian Indonesia Latin ji Yiddish Germany, USA, SNC, Israel Hebrew jw Javanese Indonesia, Malaysia Latin, Javanese ka Georgian Georgia Georgian kk Kazakh SNC, Afghanistan Cyrillic, Arabic km Cambodian Cambodia Khmer Alvestrand Expires Dec 17 93 [Page 33] draft Languages and character sets Mar 93 kn Kannada India Kannada ks Kashmiri India, Pakistan Arabic ku Kurdish SNC, Turkey, Iraq, Iran Cyrillic, Arabic ky Kirghiz SNC, China, Afghanistan Cyrillic, Arabic ln Lingala CAR, Congo, Zaire Latin mg Malagasy Madagascar, Comoro Islands Latin, Arabic mi Maori New Zealand Latin mk Macedonian Greece, Yugoslavia Greek, Cyrillic ml Malayalam India Malayalam mn Mongolian Mongolia Cyrillic, Mongolian mo Moldavian Romania Latin mr Marathi India Devanagari ms Malay Malaysia, Thailand Latin my Burmese Myanmar Burmese na Nauru Nauru Latin ne Nepali Nepal Devanagari oc Occitan France Latin or Oriya India Oriya pa Punjabi India Gurmukhi ps Pashto (Western) Afghanistan, Iran Arabic qu Quechua Peru Latin rm Rhaeto Swizerland Latin rn Kirundi Burundi, Uganda Latin rw Kinyarwanda Rwanda, Uganda, Zaire Latin sa Sanskrit India Devanagari sd Sindhi Pakistan, India, Afghanistan Arabic, Gurmukhi sg Sangro Central African Republic Latin si Singhalese Sri Lanka Sinhalese sm Samoan Samoa, USA, New Zealand Latin sn Shona Zimbabwe, Zambia, Mozambique Latin so Somali Somalia, Ethiopia, Djibouti Latin sr Serbian former Yugoslavia Cyrillic ss Siswati S. Africa, Swaziland Latin st Sesotho S. Africa, Lesotho Latin su Sudanese Sudan Latin ta Tamil India, Malaysia Tamil te Tegulu India Telugu tg Tajik Tajikistan Arabic ti Tigrinya Ethiopia Latin, Ethiopic tk Turkmen SNC, Iran, Afghanistan Cyrillic, Arabic tl Tagalog Phillipines Latin tn Setswana S. Africa, Botswana, Namibia Latin to Tonga (3) Mozambique Latin ts Tsonga Mozambique, Swaziland Latin Alvestrand Expires Dec 17 93 [Page 34] draft Languages and character sets Mar 93 tt Tatar SNC Cyrillic tw Twi (Ewe) Ghana Latin uz Uzbek (Southern) Afghanistan, Turkey Arabic vi Vietnamese Vietnam, Cambodia, China Latin wo Wolof Senegal, Mauritania Latin xh Xhosa S. Africa Latin yo Yoruba Nigeria, Togo, Benin Latin zu Zulu S. Africa, Lesotho, Malawi Latin The information about languages in ISO 10646 was kindly supplied by Glenn Adams Languages for which the author does NOT know any proper character set include: bo Tibetan dz Bhutani et Estonian lt Lithuanian lv Latvian, Lettish mt Maltese sh Serbo-Croatian 5. Encoded format of charset data This section contains, in a very compact format, all the information used to make the technical content of this RFC, apart from the content of ISO 639 and RFC 1345. It would be helpful if new information was also supplied in this format. # A list of languages and their required/optional characters. # Format: # &language Name # Required characters # Important characters # Comments &language Lithuanian Alvestrand Expires Dec 17 93 [Page 35] draft Languages and character sets Mar 93 a; e; i; u; e. u- c< s< z< &language Latvian a- e- i- o- u- g, k, l, n, r, c< s< z< &language Estonian o? a: o: u: s< z< &language Finnish a: o: &language Sami a' e' a> a: e: i: o: u: ae aa o/ d/ n' ng t/ c< s< z< &language Swedish a: o: aa a' e' e: u: &language Norwegian ae aa o/ e' o' o> &language Danish ae aa o/ a' e' i' o' u' y' &language Faeroese a' i' o' u' y' ae o/ d- &language Icelandic a' e' i' o' u' y' o: ae d- th &language Greenlandic a' e' i' u' a> e> i> o> u> ae aa o/ a? i? u? kk &language Gaelic a' e' o' a! e! i! o! u! &language Irish a' e' i' o' u' &language Welsh w' y' a' e' i' o' u' a! e! i! o! u! w! y! a> e> i> o> u> w> y> a: e: i: o: u: w: y: Alvestrand Expires Dec 17 93 [Page 36] draft Languages and character sets Mar 93 &language Breton e> u! u: n? &language Frisian e' u' a> e> o> u> a: e: i: o: u: &language Dutch a' e' i' o' u' a: e: i: o: u: ij &language Afrikaans a' e' e! a> e> i> o> u> e: i: o: 'n &language German a: o: u: ss e' a! The "ss" character exists only in lower case; the upper case equivalent is "SS" (2 letters). &language French e' e! u! c, a! a> e> i> o> u> ae oe e: i: u: y: &language Catalan e' i' o' u' a! e! o! i: u: l. n? &language Spanish n? c, !I ?I a' e' i' o' u' u: n? Note that this language also uses special punctuation marks. The c, appears in ISO 646-ES, but not in van Wingen's tables. &language Galician a' e' i' o' u' u: n? &language Portuguese a? o? c, a' e' i' o' u' a! a> e> o> u: &language Basque n? c, &language Maltese a! e! i! o! u! i> c. g. h/ z. Alvestrand Expires Dec 17 93 [Page 37] draft Languages and character sets Mar 93 &language Italian e' o' a! e! i! o! i' u' u! i: The accented characters appear only in the lower case variant in the Italian version of ISO 646 (ISO-IR-15). &language Rhaetian e' a! e! o! a> e> i> o> o: u: &language Romanian a> i> a( s, t, &language Hungarian a' e' i' o' u' o: u: o" u" &language Albanian e: c, &language Turkish a> i> u> o: u: i. c, s, g( &language Croatian c' d/ c< s< z< &language Slovenian c< s< z< &language Slovak y' a' e' i' o' u' a: o> l' r' c< d< l< n< s< t< z< &language Czech y' a' e' i' o' u' e< u0 c< d< n< r< s< t< z< &language Polish o' a; e; c' n' s' z' l/ z. &language Sorbian o' e< c' n' s' z' l/ c< r< s< z< &language Esperanto u( c> g> h> j> s> Alvestrand Expires Dec 17 93 [Page 38] draft Languages and character sets Mar 93 6. REFERENCES [ISO 8859] Information technology - 8-bit single-byte coded graphic character sets [ISO 6937] Information processing - Coded graphic character set for text communication [ISO 639] Codes for identifying languages (1988 version) [ISO 10646] Information technology - Universal Multiple-Octet Coded Character Set [RFC-KELD] Keld Simonsen: Character Mnemonics & Character Sets, RFC 1345, June 1992 Alvestrand Expires Dec 17 93 [Page 39] draft Languages and character sets Mar 93 Table of Contents Abstract ................................................... 1 Status of this Memo ........................................ 1 1 Introduction .............................................. 2 2 Introduction to language tables ........................... 2 2.1 Table structure ......................................... 3 2.2 Sources utilized ........................................ 4 2.3 What accents mean ....................................... 4 3 Language tables ........................................... 5 3.1 lt Lithuanian ........................................... 5 3.2 lv Latvian .............................................. 6 3.3 et Estonian ............................................. 7 3.4 fi Finnish .............................................. 7 3.5 ?? Sami ................................................. 8 3.6 sv Swedish .............................................. 9 3.7 no Norwegian ............................................ 10 3.8 da Danish ............................................... 10 3.9 fo Faeroese ............................................. 11 3.10 is Icelandic ........................................... 12 3.11 kl Greenlandic ......................................... 12 3.12 ?? Gaelic .............................................. 13 3.13 ga Irish ............................................... 14 3.14 cy Welsh ............................................... 14 3.15 br Breton .............................................. 15 3.16 fy Frisian ............................................. 16 3.17 nl Dutch ............................................... 16 3.18 af Afrikaans ........................................... 17 3.19 de German .............................................. 18 3.20 fr French .............................................. 19 3.21 ca Catalan ............................................. 20 3.22 es Spanish ............................................. 20 3.23 gl Galician ............................................ 21 3.24 pt Portuguese .......................................... 22 3.25 eu Basque .............................................. 23 3.26 mt Maltese ............................................. 23 3.27 it Italian ............................................. 24 3.28 ?? Rhaetian ............................................ 25 3.29 ro Romanian ............................................ 25 3.30 hu Hungarian ........................................... 26 3.31 sq Albanian ............................................ 27 3.32 tr Turkish ............................................. 27 3.33 hr Croatian ............................................ 28 Alvestrand Expires Dec 17 93 [Page 40] draft Languages and character sets Mar 93 3.34 sl Slovenian ........................................... 28 3.35 sk Slovak .............................................. 29 3.36 cs Czech ............................................... 30 3.37 pl Polish .............................................. 30 3.38 ?? Sorbian ............................................. 31 3.39 eo Esperanto ........................................... 32 4 Other languages with appropriate character sets ........... 32 4.1 ISO 10646 only languages ................................ 33 5 Encoded format of charset data ............................ 35 6 REFERENCES ................................................ 39 Alvestrand Expires Dec 17 93 [Page 41]