International Components for Unicode

International Components for Unicode
Developer(s)	IBM and many other companies.
Initial release	1999

Stable release	58.1 / 21 October 2016 (2016-10-21)
Written in	C/C++ and Java
Operating system	Cross-platform
Type	libraries for Unicode and internationalization
License	Unicode License
Website	www.icu-project.org

International Components for Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is sponsored, supported, and used by IBM and many other companies.^[1]

ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; Language sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); complex text layout for Arabic, Hebrew, Indic, and Thai; multi-calendar and time zones; and rule-based formatting and parsing of dates, times, numbers, currencies, and messages.

ICU provides more extensive internationalization facilities than the standard libraries for C and C++.

Origin and development

ICU is descended from C++ frameworks produced by Taligent in the mid 1990s. After Taligent became part of IBM in early 1996, Sun Microsystems decided that the new Java language "was missing international support. Taligent had great international technology, talented engineers, and a location about 100 meters from Sun's JavaSoft division in Cupertino, California. IBM arranged for Taligent's Text and International group to contribute international classes to Sun's Java Development Kit."^[2] Some of the code for ICU's text processing, date formatting, and other features was rewritten in Java and became the JDK 1.1 internationalization APIs. A large portion of this code still exists in the java.text and java.util packages. Further internationalization features were added with each later release of Java.

IBM programmers rewrote the Java internationalization classes in C++ and ported some classes to C functions. The C++/C version of ICU is known as ICU4C. The ICU project also provides ICU4J ("ICU for Java"), which adds features not present in the standard Java libraries. ICU4C and ICU4J are very similar, though not identical; for example, ICU4C includes a Regular Expression API, while ICU4J does not. Both frameworks have been enhanced over time to support new facilities and new features of Unicode and Common Locale Data Repository (CLDR).

ICU was released as an open source project in 1999 under the name IBM Classes for Unicode. It was later renamed to International Components For Unicode.^[3] In May, 2016, the ICU project joined the Unicode consortium as technical committee ICU-TC, and the library sources are distributed under the Unicode license.^[4]

References

↑ ICU homepage - What is ICU?
↑ Laura Werner (1999). "Getting Java ready for the world: A brief history of IBM and Sun's internationalization efforts".
↑ "ICU Project Management Committee".
↑ "ICU joins the Unicode Consortium". Unicode, Inc. 2016-05-16. Retrieved 2016-08-01.

External links

ICU website

Unicode

Code points

Characters

Special purpose	BOM Combining Grapheme Joiner Left-to-right mark / Right-to-left mark Soft hyphen Word joiner Zero-width joiner Zero-width non-joiner Zero-width space

Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence Variation sequences

Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards

Related topics

Scripts and symbols in Unicode

Common and inherited scripts	Combining marks Diacritics Punctuation Space

Modern scripts	Adlam Arabic diacritics Armenian Balinese Bamum Batak Bengali Bopomofo Braille Buhid Burmese Canadian Aboriginal Chakma Cham Cherokee CJK Unified Ideographs (Han) Cyrillic Deseret Devanagari Ge'ez Georgian Greek Gujarati Gurmukhī Hangul Hanja Hanunó'o Hebrew diacritics Hiragana Javanese Kanji Kannada Katakana Kayah Li Khmer Khudawadi Lao Latin Lepcha Limbu Lisu (Fraser) Lontara Malayalam Mandaic Meetei Mayek Mende Kikakui Miao (Pollard) Mongolian Mro N'Ko New Tai Lue Newa Ol Chiki Oriya Osage Osmanya Pahawh Hmong Pau Cin Hau Rejang Samaritan Śāradā Saurashtra Shavian Sinhala Sorang Sompeng Sundanese Sylheti Nagari Syriac Tagalog (Baybayin) Tagbanwa Tai Le Tai Tham Tai Viet Takri Tamil Telugu Thaana Thai Tibetan Tifinagh Tirhuta Vai Varang Kshiti Yi

Ancient and historic scripts	Ahom Anatolian hieroglyphs Ancient North Arabian Avestan Bassa Vah Bhaiksuki Brāhmī Carian Caucasian Albanian Coptic Cuneiform Cypriot Egyptian hieroglyphs Elbasan Glagolitic Gothic Grantha Hatran Imperial Aramaic Inscriptional Pahlavi Inscriptional Parthian Kaithi Kharosthi Khojki Linear A Linear B Lycian Lydian Mahajani Manichaean Marchen Meroitic Modi Multani Nabataean Ogham Old Hungarian Old Italic Old Permic Old Persian cuneiform Old Turkic Palmyrene 'Phags-pa Phoenician Psalter Pahlavi Runic Siddham Tangut South Arabian Ugaritic

Notational scripts	Duployan SignWriting

Symbols	Cultural, political, and religious symbols Currency Mathematical operators and symbols Phonetic symbols (including IPA) Emoji

This article is issued from Wikipedia - version of the 11/6/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

International Components for Unicode

Origin and development

See also

References

External links