Western Latin character sets (computing)

Several binary representations of character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay, Swahili, and Classical Latin.

Summary

The ISO-8859 series of 8-bit character sets encodes all Latin character sets used in Europe, albeit that the same code points have multiple uses that caused some difficulty. The arrival of Unicode, with a unique code point for every glyph, resolved these issues.

ISO/IEC 8859-1 or Latin-1 is the most used and also defines the first 256 codes in Unicode
ISO/IEC 8859-15 modifies ISO-8859-1 to fully support Estonian, Finnish and French and add the euro sign.
Windows-1252 is a superset of ISO-8859-1 that includes the characters from ISO-8859-15 and popular punctuation such as curved quotation marks. It is common that web page tools for Windows use Windows-1252 but label the web page as using ISO-8859-1, this has been addressed in HTM 5, which mandates that pages labeled as ISO-8859-1 must be interpreted as Windows-1252.
IBM CP437, being intended for English only, has very little in the way of accented letters but has far more graphics characters than the others and also some Greek characters that are useful as technical symbols.
IBM CP850 has all the printable characters that ISO-8859-1 has (albeit arranged differently) and still manages to have enough graphics characters to build a usable text-mode user interface.
IBM CP858 differs from CP850 only by one character — a dotless i (ı), rarely used outside Turkey, was replaced by euro currency sign (€).^[1]
IBM CP859 contains all the printable characters that ISO-8859-15 has, so unlike CP850 it supports the € and French.
IBM code pages 037, 500, and 1047 are EBCDIC encodings that include all of the ISO-8859-1 characters.
The Mac OS Roman character set (often referred to as MacRoman and known by the IANA as simply MACINTOSH) has most, but not all, of the same characters as ISO-8859-1 but in a very different arrangement; and it also adds many technical and mathematical characters (though it lacks the important ×) and more diacritics. Older Macintosh web browsers were known to munge the few characters that were in ISO-8859-1 but not their native Macintosh character set when editing text from Web sites. Conversely, in Web material prepared on an older Macintosh, many characters were displayed incorrectly when read by other operating systems.
The euro sign post-dates these (ISO-8859) specifications: conflicting ways to retrofit it led to significant difficulty until Unicode became more generally adopted.

Notes

The mappings for the IBM code pages are from the Unicode site supplied by Microsoft. Refer to the Unicode Consortium's document on the differences between IBM's and Microsoft's mappings for these code pages.
The old PC code pages actually defined printable characters for the control code ranges. While these could not be used when printing text through DOS, as they would be trapped before reaching the screen, they could be used by applications that used screen memory directly.
Position F0_HEX was used in the Macintosh character sets for the Apple logo. The Apple logo was not accepted into Unicode due to its trademarked nature, and so Apple mapped it to a code point (U+F8FF) in the private use area. Therefore, it may not display correctly in the table.
In Windows-1252, positions 81, 8D, 8F, 90, and 9D are unused according to the WINDOWS mapping tables on the Unicode site. However, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes; so does the "best fit" mapping which describes "windows code page behavior".

History

The earlier seven-bit U.S. ASCII encoding has characters sufficient to properly represent only English, Latin, and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most U.S.-supplied computer platforms, ASCII was unavoidable in most of the non-English-speaking world (seven-bit encoding was necessitated by the limitations of early computing networks). There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages.

Although seven-bit communication was the norm, most computers internally used eight-bit bytes, and they mostly put some form of characters in the 128 higher byte positions. In the early days most of these were system specific, but gradually a few standards were settled on.

In recent years, as storage and memory costs fall, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to Unicode as their main internal representation. However Windows does not support Unicode using their 8-bit character interfaces (by supporting UTF-8 in standard interfaces such as fopen), so many applications continue to be restricted to these legacy character sets.

The euro sign

The coming of the euro and its euro sign introduced significant pressure to support the euro sign (€), and most 8-bit character sets had to be adapted in some way.

Apple with MacRoman and Sun Microsystems with Solaris OS simply replaced the generic currency sign (¤). This caused significant difficulty because organisations had found other uses for it, such as the company logo.
ISO introduced a further variant of ISO 8859, ISO 8859-15, which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics. ISO 8859-15 never received widespread adoption.
Windows-1252 placed the euro sign in a gap (position 80_hex) in the existing C1 control codes.

All of these issues have been resolved as operating systems have been upgraded to support Unicode as standard, which encodes the euro sign at U+20AC (decimal 8364).

Comparison table

Code points U+0000 to U+007F are not shown in this table currently, as they are directly mapped in all character sets listed here. The ASCII coding standard defines the original specification for the mapping of the first 0-127 characters.

The table is arranged by Unicode code point. Character sets are referred to here by their IANA names in upper case.

Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
NBSP	U+00A0	A0	A0	A0	FF	FF	CA
¡	U+00A1	A1	A1	A1	AD	AD	C1
¢	U+00A2	A2	A2	A2	9B	BD	A2
£	U+00A3	A3	A3	A3	9C	9C	A3
¤	U+00A4	A4		A4		CF
¥	U+00A5	A5	A5	A5	9D	BE	B4
¦	U+00A6	A6		A6		DD
§	U+00A7	A7	A7	A7		F5	A4
¨	U+00A8	A8		A8		F9	AC
©	U+00A9	A9	A9	A9		B8	A9
ª	U+00AA	AA	AA	AA	A6	A6	BB
«	U+00AB	AB	AB	AB	AE	AE	C7
¬	U+00AC	AC	AC	AC	AA	AA	C2
SHY	U+00AD	AD	AD	AD		F0
®	U+00AE	AE	AE	AE		A9	A8
¯	U+00AF	AF	AF	AF		EE	F8
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
°	U+00B0	B0	B0	B0	F8	F8	A1
±	U+00B1	B1	B1	B1	F1	F1	B1
²	U+00B2	B2	B2	B2	FD	FD
³	U+00B3	B3	B3	B3		FC
´	U+00B4	B4		B4		EF	AB
µ	U+00B5	B5	B5	B5	E6	E6	B5
¶	U+00B6	B6	B6	B6		F4	A6
·	U+00B7	B7	B7	B7	FA	FA	E1
¸	U+00B8	B8		B8		F7	FC
¹	U+00B9	B9	B9	B9		FB
º	U+00BA	BA	BA	BA	A7	A7	BC
»	U+00BB	BB	BB	BB	AF	AF	C8
¼	U+00BC	BC		BC	AC	AC
½	U+00BD	BD		BD	AB	AB
¾	U+00BE	BE		BE		F3
¿	U+00BF	BF	BF	BF	A8	A8	C0
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
À	U+00C0	C0	C0	C0		B7	CB
Á	U+00C1	C1	C1	C1		B5	E7
Â	U+00C2	C2	C2	C2		B6	E5
Ã	U+00C3	C3	C3	C3		C7	CC
Ä	U+00C4	C4	C4	C4	8E	8E	80
Å	U+00C5	C5	C5	C5	8F	8F	81
Æ	U+00C6	C6	C6	C6	92	92	AE
Ç	U+00C7	C7	C7	C7	80	80	82
È	U+00C8	C8	C8	C8		D4	E9
É	U+00C9	C9	C9	C9	90	90	83
Ê	U+00CA	CA	CA	CA		D2	E6
Ë	U+00CB	CB	CB	CB		D3	E8
Ì	U+00CC	CC	CC	CC		DE	ED
Í	U+00CD	CD	CD	CD		D6	EA
Î	U+00CE	CE	CE	CE		D7	EB
Ï	U+00CF	CF	CF	CF		D8	EC
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
Ð	U+00D0	D0	D0	D0		D1
Ñ	U+00D1	D1	D1	D1	A5	A5	84
Ò	U+00D2	D2	D2	D2		E3	F1
Ó	U+00D3	D3	D3	D3		E0	EE
Ô	U+00D4	D4	D4	D4		E2	EF
Õ	U+00D5	D5	D5	D5		E5	CD
Ö	U+00D6	D6	D6	D6	99	99	85
×	U+00D7	D7	D7	D7		9E
Ø	U+00D8	D8	D8	D8		9D	AF
Ù	U+00D9	D9	D9	D9		EB	F4
Ú	U+00DA	DA	DA	DA		E9	F2
Û	U+00DB	DB	DB	DB		EA	F3
Ü	U+00DC	DC	DC	DC	9A	9A	86
Ý	U+00DD	DD	DD	DD		ED
Þ	U+00DE	DE	DE	DE		E8
ß	U+00DF	DF	DF	DF	E1	E1	A7
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
à	U+00E0	E0	E0	E0	85	85	88
á	U+00E1	E1	E1	E1	A0	A0	87
â	U+00E2	E2	E2	E2	83	83	89
ã	U+00E3	E3	E3	E3		C6	8B
ä	U+00E4	E4	E4	E4	84	84	8A
å	U+00E5	E5	E5	E5	86	86	8C
æ	U+00E6	E6	E6	E6	91	91	BE
ç	U+00E7	E7	E7	E7	87	87	8D
è	U+00E8	E8	E8	E8	8A	8A	8F
é	U+00E9	E9	E9	E9	82	82	8E
ê	U+00EA	EA	EA	EA	88	88	90
ë	U+00EB	EB	EB	EB	89	89	91
ì	U+00EC	EC	EC	EC	8D	8D	93
í	U+00ED	ED	ED	ED	A1	A1	92
î	U+00EE	EE	EE	EE	8C	8C	94
ï	U+00EF	EF	EF	EF	8B	8B	95
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
ð	U+00F0	F0	F0	F0		D0
ñ	U+00F1	F1	F1	F1	A4	A4	96
ò	U+00F2	F2	F2	F2	95	95	98
ó	U+00F3	F3	F3	F3	A2	A2	97
ô	U+00F4	F4	F4	F4	93	93	99
õ	U+00F5	F5	F5	F5		E4	9B
ö	U+00F6	F6	F6	F6	94	94	9A
÷	U+00F7	F7	F7	F7	F6	F6	D6
ø	U+00F8	F8	F8	F8		9B	BF
ù	U+00F9	F9	F9	F9	97	97	9D
ú	U+00FA	FA	FA	FA	A3	A3	9C
û	U+00FB	FB	FB	FB	96	96	9E
ü	U+00FC	FC	FC	FC	81	81	9F
ý	U+00FD	FD	FD	FD		EC
þ	U+00FE	FE	FE	FE		E7
ÿ	U+00FF	FF	FF	FF	98	98	D8
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
ı	U+0131					D5	F5
Œ	U+0152		BC	8C			CE
œ	U+0153		BD	9C			CF
Š	U+0160		A6	8A
š	U+0161		A8	9A
Ÿ	U+0178		BE	9F			D9
Ž	U+017D		B4	8E
ž	U+017E		B8	9E
ƒ	U+0192			83	9F	9F	C4
ˆ	U+02C6			88			F6
ˇ	U+02C7						FF
˘	U+02D8						F9
˙	U+02D9						FA
˚	U+02DA						FB
˛	U+02DB						FE
˜	U+02DC			98			F7
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
˝	U+02DD						FD
Γ	U+0393				E2
Θ	U+0398				E9
Σ	U+03A3				E4
Φ	U+03A6				E8
Ω	U+03A9				EA		BD
α	U+03B1				E0
δ	U+03B4				EB
ε	U+03B5				EE
π	U+03C0				E3		B9
σ	U+03C3				E5
τ	U+03C4				E7
φ	U+03C6				ED
–	U+2013			96			D0
—	U+2014			97			D1
‗	U+2017					F2
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
‘	U+2018			91			D4
’	U+2019			92			D5
‚	U+201A			82			E2
“	U+201C			93			D2
”	U+201D			94			D3
„	U+201E			84			E3
†	U+2020			86			A0
‡	U+2021			87			E0
•	U+2022			95			A5
…	U+2026			85			C9
‰	U+2030			89			E4
‹	U+2039			8B			DC
›	U+203A			9B			DD
⁄	U+2044						DA
ⁿ	U+207F				FC
₧	U+20A7				9E
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
€	U+20AC		A4	80	(D5)^{[nb 1]}^[2]^[3] \|		DB
™	U+2122			99			AA
∂	U+2202						B6
∆	U+2206						C6
∏	U+220F						B8
∑	U+2211						B7
∙	U+2219				F9
√	U+221A				FB		C3
∞	U+221E				EC		B0
∩	U+2229				EF
∫	U+222B						BA
≈	U+2248				F7		C5
≠	U+2260						AD
≡	U+2261				F0
≤	U+2264				F3		B2
≥	U+2265				F2		B3
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
⌐	U+2310				A9
⌠	U+2320				F4
⌡	U+2321				F5
─	U+2500				C4	C4
│	U+2502				B3	B3
┌	U+250C				DA	DA
┐	U+2510				BF	BF
└	U+2514				C0	C0
┘	U+2518				D9	D9
├	U+251C				C3	C3
┤	U+2524				B4	B4
┬	U+252C				C2	C2
┴	U+2534				C1	C1
┼	U+253C				C5	C5
═	U+2550				CD	CD
║	U+2551				BA	BA
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
╒	U+2552				D5
╓	U+2553				D6
╔	U+2554				C9	C9
╕	U+2555				B8
╖	U+2556				B7
╗	U+2557				BB	BB
╘	U+2558				D4
╙	U+2559				D3
╚	U+255A				C8	C8
╛	U+255B				BE
╜	U+255C				BD
╝	U+255D				BC	BC
╞	U+255E				C6
╟	U+255F				C7
╠	U+2560				CC	CC
╡	U+2561				B5
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
╢	U+2562				B6
╣	U+2563				B9	B9
╤	U+2564				D1
╥	U+2565				D2
╦	U+2566				CB	CB
╧	U+2567				CF
╨	U+2568				D0
╩	U+2569				CA	CA
╪	U+256A				D8
╫	U+256B				D7
╬	U+256C				CE	CE
▀	U+2580				DF	DF
▄	U+2584				DC	DC
█	U+2588				DB	DB
▌	U+258C				DD
▐	U+2590				DE
Character	Code point	ISO-8859-1	ISO-8859-15	WINDOWS-1252	IBM437	IBM850	MACINTOSH
░	U+2591				B0	B0
▒	U+2592				B1	B1
▓	U+2593				B2	B2
■	U+25A0				FE	FE
◊	U+25CA						D7
ﬁ	U+FB01						DE
ﬂ	U+FB02						DF

In addition, Macintosh assigns the Apple logo ⟨⟩ (Mac OS Roman: F0) to U+F8FF in the Private Use Area.

Notes

↑ IBM's PC DOS 2000, released in 1998, changed their definition of code page 850 to what they called modified code page 850 now including the euro sign at code point 213 instead of adding support for the new code page 858. The reason for this might have been down to existing restrictions in the implementation of the codepage switching logic under MS-DOS/PC DOS, which limited .CPI files to 64 KB in size or about six codepages maximum, a limitation, which was circumvented in some OEM versions of MS-DOS, in Windows NT, and also does not exist in DR-DOS. Further, the parser in MS-DOS/PC DOS limits the number of possible country / codepage entries in COUNTRY.SYS files to a maximum of 146 or 438, a limitation non-existent in DR-DOS. So, adding support for codepage 858 might have meant to drop another (e.g. codepage 850) at the same time, which might not have been a viable solution at that time, given that some applications were hard-wired to use codepage 850.

References

↑ "00858". Code pages by CPGID. IBM. Archived from the original on 2016-06-06. Retrieved 2016-06-06.
↑ Paul, Matthias (2001-08-15). "Changing codepages in FreeDOS" (Technical design specification based on fd-dev post ). Archived from the original on 2016-06-06. Retrieved 2016-06-06. The new official ID for the Multilingual "codepage 850 with EURO SIGN" is 858, not 850. IBM will switch to use 858 instead of their 850 variant with future issues of their products. […] I can only guess why they didn't add 858 to their EGAx.CPI, COUNTRY.SYS, and KEYBOARD.SYS files in PC DOS 2000. Many third-party applications are designed to work with 850 and didn't know about 858 at the time PC DOS 2000 was released, so it's easier for everyone, but unfortunately it's not compatible. […] As explained above, COUNTRY.SYS and KEYBOARD.SYS contain only two codepage entries for a given country in Western issues of DOS. (In Arabic and Hebrew issues there can be up to 8 codepages for one country, in theory there is no limit below the range of allowed codepages 1..65534). […] The problem is that removing support for 850 might have caused compatibility problems with applications which are hard-wired to use 850. Adding 858 as a third choice to all the files would have increased the file and table sizes significantly. The COUNTRY.SYS file parser in MS-DOS/PC DOS IO.SYS/IBMBIO.COM sets aside a 6 Kb (for DOS 6) scratchpad to load all the info. This allows a maximum of 438 entries in a COUNTRY.SYS file to be accepted, otherwise you will get the message "COUNTRY.SYS too large.". The NLSFUNC parser does not have this limitation, and the file parsers in DR-DOS (kernel and NLSFUNC) also do not know of such a restriction. Older issues of MS-DOS/PC DOS even had a 2 Kb buffer for a maximum of 146 entries.
↑ Paul, Matthias (2001-08-27). "Changing codepages in FreeDOS (follow-up)". Retrieved 2013-05-08. […] one could also create custom .CPI files in the traditional FONT style without difficulties, but you could only store up to […] six codepages in such a file if it should be useable by MS-DOS/PC DOS (some OEM issues and NT can handle files larger than 64 Kb, but MS-DOS/PC DOS can not).

Character encodings

Character sets

Early telecommunications	ASCII ISO/IEC 646 ISO/IEC 6937 T.61 BCDIC Baudot code Morse code (Telegraph code Wabun code) Special telegraphy codes: Non-Latin, Chinese, Cyrillic

ISO/IEC 8859	-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16

Bibliographic use	ANSEL ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 MARC-8

National standards	ArmSCII CNS 11643 GOST 10859 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 PASCII SI 960 TIS-620 TSCII VISCII YUSCII

EUC	CN JP KR TW

ISO/IEC 2022	CN JP KR CCCII

MacOS code pages ("scripts")	Arabic Mac OS Celtic CentEuro ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari Dingbats Farsi Gaelic Greek Gujarati Gurmukhi Hebrew Iceland Japanese / ShiftJIS Korean / EUC-KR Roman Romanian Symbol Thai / TIS-620 Turkish Ukrainian

DOS code pages	100 111 112 113 151 161 162 163 164 165 220 300 301 437 449 620 667 668 708 709 710 711 720 737 770 771 772 773 774 775 776 777 778 790 808 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 872 874 876 877 878 881 882 883 884 885 891 895 897 898 899 900 903 904 906 907 909 910 911 925 926 927 928 929 932 934 936 938 941 942 943 944 946 947 948 949 950/1370 951 966 991 1004 1034 1039 1040 1041 1042 1043 1044 1046 1086 1088 1090 1092 1093 1098 1108 1109 1114 1115 1116 1117 1118 1119 1125 1126 1127 1131 1139 1161 1162 1167 1168 1351 1361 1362 1363 1372 1373 1374 1375 1380 1381 1385 1386 1391 1392 1393 1394 17248 Kamenický Mazovia CWI-2 KOI8 MIK Iran System

IBM AIX code pages	367 371 806 813 819 895 896 901 902 912 913 914 915 916 919 920 921 922 923 952 953 954 955 956 957 958 959 960 961 962 963 964 965 970 971 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1029 1036 1089 1111 1124 1129 1133 1163 1350 1382 1383

IBM Apple MacIntosh Emulations	1275 1280 1281 1282 1283 1284 1285 1286

IBM Adobe Emulations	1038 1276 1277

IBM DEC Emulations	1021 1023 1100 1101 1102 1103 1104 1105 1106 1107 1287 1288

IBM HP Emulations	1050 1051 1052 1053 1054 1055 1056 1057 1058

Windows code pages	874/1162 (TIS-620) 932/943 (Shift JIS) 936/1386 (GBK) 950/1370 (Big5) 949/1363 (EUC-KR) 1169 1174 1200 (UTF-16LE) 1201 (UTF-16BE) 1250 1251 1252 1253 1254 1255 1256 1257 1258 1261 1270 54936 (GB18030)

EBCDIC code pages	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 32 33 34 35 36 37/1140 38 39 40 251 252 254 256 257 258 259 260 264 273/1141 274 275 276 277/1142 278/1143 279 280/1144 281 282 283 284/1145 285/1146 286 287 288 289 290 293 297/1147 298 310 320 321 322 330 351 352 353 355 357 358 359 360 361 363 382 383 384 385 386 387 388 389 390 391 392 393 394 395 410 420/16804 421 423 424/12712 425 435 500/1148 803 829 833 834 835 836 837 838/1160 839 870/1153 871/1149 875/9067 880 881 882 883 884 885 886 887 888 889 890 892 893 905 918 924 930/1390 931 933/1364 935/1388 937/1371 939/1399 1001 1002 1003 1005 1007 1024 1025/1154 1026/1155 1027 1028 1030 1031 1032 1033 1037 1047 1068 1069 1070 1071 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1087 1091 1097 1110 1112/1156 1113 1122/1157 1123/1158 1130/1164 1132 1136 1137 1150 1151 1152 1159 1165 1166 1278 1303 1364 1376 1377 JEF KEIS

Platform specific	ATASCII Atari ST BICS CDC CPC DEC Radix-50 DMCS/NRCS ELWRO-Junior FIELDATA GEM GEOS GSM 03.38 HP Roman Extension HP Roman-8 HP Roman-9 HP calculators LICS LMBCS NEC APC NeXT PETSCII TI calculators WISCII XCCS ZX80 ZX81 ZX Spectrum

Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 (UTF-16LE/UTF-16BE) / UCS-2 UTF-32 (UTF-32LE/UTF-32BE) / UCS-4 UTF-EBCDIC GB 18030 BOCU-1 CESU-8 SCSU

Miscellaneous code pages	ABICOMP APL Cork HZ JOHAB TRON UTF-5 UTF-6 WTF-8

Related topics	Control character (C0 C1) CCSID Character encodings in HTML Charset detection Han unification Hardware ISO 6429/IEC 6429/ANSI X3.64 Mojibake

This article is issued from Wikipedia - version of the 11/29/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.