ISO 639-3

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three‐letter codes for identifying languages. The standard was published by ISO on 1 February 2007.^[1]

ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. The extended language coverage was based primarily on the language codes used in the Ethnologue (volumes 10-14) published by SIL International, which is now the registration authority for ISO 639-3.^[2] It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten.^[1] However, it does not include reconstructed languages such as Proto-Indo-European.^[3]

ISO 639-3 is intended for use as metadata codes in a wide range of applications. It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. In archives and other information storage, they are used in cataloging systems, indicating what language a resource is in or about. The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous.

Because it provides comprehensive language coverage, giving equal opportunity for all languages, and because of its wide adoption in information technologies, ISO 639-3 provides an important technology component addressing the digital divide problem.

Find a language
Enter an ISO 639-3 code to find the corresponding language article.

Language codes

Main article: List of ISO 639-3 codes

ISO 639-3 includes all languages in ISO 639-1 and all individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections and Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.

Examples:

language	639-1	639-2 (B/T)	639-3 type	639-3 code
English	en	eng	individual	eng
German	de	ger/deu	individual	deu
Arabic	ar	ara	macro	ara
Arabic	ar	ara	individual	arb + others
Chinese	zh	chi/zho^[4]^[5]	macro	zho
Mandarin			individual	cmn
Cantonese			individual	yue
Minnan			individual	nan

As of April 2012, the standard contains 7776 entries.^[6] The inventory of languages is based on a number of sources including: the individual languages contained in 639-2, modern languages from the Ethnologue, historic varieties, ancient languages and artificial languages from the Linguist List,^[7] as well as languages recommended within the annual public commenting period.

Machine-readable data files are provided by the registration authority.^[6] Mappings from ISO 639-1 or ISO 639-2 to ISO 639-3 can be done using these data files.

ISO 639-3 is intended to assume distinctions based on criteria that are not entirely subjective.^[8] It is not intended to document or provide identifiers for dialects or other sub-language variations.^[9] Nevertheless, judgments regarding distinctions between languages may be subjective, particularly in the case of oral language varieties without established literary traditions, usage in education or media, or other factors that contribute to language conventionalization.

Code space

Since the code is three-letter alphabetic, one upper bound for the number of languages that can be represented is 26 × 26 × 26 = 17576. Since ISO 639-2 defines special codes (4), a reserved range (520) and B-only codes (23), 547 codes cannot be used in part 3. Therefore, a stricter upper bound is 17576 − 547 = 17029.

The upper bound gets even stricter if one subtracts the language collections defined in 639-2 and the ones yet to be defined in ISO 639-5.

Macrolanguages

Main article: ISO 639 macrolanguage

There are 56 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3.^[10]

Some of these macrolanguages had no individual language as defined by ISO 639-3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic). Others like 'nor' (Norwegian) had their two individual parts ('nno' (Nynorsk), 'nob' (Bokmål)) already in ISO 639-2.

That means some languages (e.g. 'arb', Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ('ara') are now in ISO 639-3 in certain contexts considered to be individual languages themselves.

This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as two forms of the same language, e.g. in cases of diglossia.

For example:

http://www-01.sil.org/iso639-3/documentation.asp?id=ara (Generic Arabic, 639-2)
http://www-01.sil.org/iso639-3/documentation.asp?id=arb (Standard Arabic, 639-3)

See^[11] for the complete list.

Collective languages

"A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context."^[12] These codes do not precisely represent a particular language or macrolanguage.

While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is not a superset of ISO 639-2.

ISO 639-5 defines 3-letter collective codes for language families and groups, including the collective language codes from ISO 639-2.

Special codes

Four codes are set aside in ISO 639-2 and ISO 639-3 for cases where none of the specific codes are appropriate. These are intended primarily for applications like databases where an ISO code is required regardless of whether one exists.

mis	uncoded languages
mul	multiple languages
und	undetermined languages
zxx	no linguistic content / not applicable

mis (originally an abbreviation for 'miscellaneous') is intended for languages which have not (yet) been included in the ISO standard.
mul is intended for cases where the data includes more than one language, and (for example) the database requires a single ISO code.
und is intended for cases where the language in the data has not been identified, such as when it is mislabeled or never had been labeled. It is not intended for cases such as Trojan where an unattested language has been given a name.
zxx is intended for data which is not a language at all, such as animal calls.^[13]

In addition, 520 codes in the range qaa–qtz are 'reserved for local use'. For example, the Linguist List uses them for extinct languages. Linguist List has assigned one of them a generic value:

qnp	unnamed proto-language (Linguist List only)

This is used for proposed intermediate nodes in a family tree that have no name.

Maintenance processes

The code table for ISO 639-3 is open to changes. In order to protect stability of existing usage, the changes permitted are limited to:^[14]

modifications to the reference information for an entry (including names or categorizations for type and scope),
addition of new entries,
deprecation of entries that are duplicates or spurious,
merging one or more entries into another entry, and
splitting an existing language entry into multiple new language entries.

The code assigned to a language is not changed unless there is also a change in denotation.^[15]

Changes are made on an annual cycle. Every request is given a minimum period of three months for public review.

The ISO 639-3 Web site has pages that describe "scopes of denotation"^[16] (languoid types) and types of languages,^[17] which explain what concepts are in scope for encoding and certain criteria that need to be met. For example, constructed languages can be encoded, but only if they are designed for human communication and have a body of literature, preventing requests for idiosyncratic inventions.

The registration authority documents on its Web site instructions made in the text of the ISO 639-3 standard regarding how the code tables are to be maintained.^[18] It also documents the processes used for receiving and processing change requests.^[19]

A change request form is provided, and there is a second form for collecting information about proposed additions. Any party can submit change requests. When submitted, requests are initially reviewed by the registration authority for completeness.

When a fully documented request is received, it is added to a published Change Request Index. Also, announcements are sent to the general LINGUIST discussion list at Linguist List and other lists the registration authority may consider relevant, inviting public review and input on the requested change. Any list owner or individual is able to request notifications of change requests for particular regions or language families. Comments that are received are published for other parties to review. Based on consensus in comments received, a change request may be withdrawn or promoted to "candidate status".

Three months prior to the end of an annual review cycle (typically in September), an announcement is set to the LINGUIST discussion list and other lists regarding Candidate Status Change Requests. All requests remain open for review and comment through the end of the annual review cycle.

Decisions are announced at the end of the annual review cycle (typically in January). At that time, requests may be adopted in whole or in part, amended and carried forward into the next review cycle, or rejected. Rejections often include suggestions on how to modify proposals for resubmission. A public archive of every change request is maintained along with the decisions taken and the rationale for the decisions.^[20]

Criticism

Linguists Morey, Post and Friedman raise various criticisms of ISO 639, and in particular ISO 639-3:^[15]

The three-letter codes themselves are problematic, because while officially arbitrary technical labels, they are often derived from mnemonic abbreviations for language names, some of which are pejorative. For example, Yemsa was assigned the code [jnj], from pejorative "Janejero". These codes may thus be considered offensive by native speakers, but codes in the standard, once assigned, cannot be changed.
The administration of the standard is problematic because SIL is a missionary organization with inadequate transparency and accountability. Decisions as to what deserves to be encoded as a language are made internally. While outside input may or may not be welcomed, the decisions themselves are opaque, and many linguists have given up trying to improve the standard.
Permanent identification of a language is incompatible with language change.
Languages and dialects often cannot be rigorously distinguished, and dialect continua may be subdivided in many ways, whereas the standard privileges one choice. Such distinctions are often based instead on social and political factors.
ISO 639-3 may be misunderstood and misused by authorities that make decisions about people's identity and language, abrogating the right of speakers to identify or identify with their speech variety. Though SIL is sensitive to such issues, this problem is inherent in the nature of an established standard, which may be used (or mis-used) in ways that ISO and SIL do not intend.

Martin Haspelmath agrees with four of these points, but not the point about language change.^[21] He disagrees because any account of a language requires identifying it, and we can easily identify different stages of a language. He suggests that linguists may prefer to use a codification that is made at the languoid level since “it rarely matters to linguists whether what they are talking about is a language, a dialect or a close-knit family of languages.” He also questions whether an ISO standard for language identification is appropriate since ISO is an industrial organization, while he views language documentation and nomenclature as a scientific endeavor. He cites the original need for standardized language identifiers as having been “the economic significance of translation and software localization,” for which purposes the ISO 639-1 and 639-2 standards were established. But he raises doubts about industry need for the comprehensive coverage provided by ISO 639-3, including as it does “little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction”.

Usage

Ethnologue
Linguist List
OLAC: the Open Languages Archive Community^[22]
Microsoft Windows 8:^[23] Supports all codes in ISO 639-3 at the time of release.
Wikimedia foundation: New language-based projects (e.g. Wikipedias in new languages) must have an identifier from ISO 639-1, -2, or -3.^[24]
Other standards that rely on ISO 639-3:
- Language tags as defined by the Internet Engineering Task Force (IETF), as documented in:
  - BCP 47: Best Current Practice 47,^[25] which includes RFC 5646
  - RFC 5646, which superseded RFC 4646, which superseded RFC 3066. (Therefore, all standards which depend on any of these 3 IETF standards now use ISO 639-3.)
- The ePub 3.0 standard for language metadata^[26] uses Dublin Core Metadata elements. These language metadata elements in ePubs must contain valid RFC 5646 codes for languages.^[26] RFC5646 points to ISO 639-3 for languages without shorter IANA codes.
- Dublin Core Metadata Initiative: DCMI Metadata Term^[27] for language, via IETF's RFC 4646 (now superseded by RFC 5646).
- Internet Assigned Numbers Authority (IANA) The W3C's internationalization effort recommends the use of the IANA Language Subtag Registry for selecting codes for languages.^[28] The IANA Language Subtag Registry^[29] depends on ISO 639-3 codes for languages which did not previously have codes in other parts of the ISO 639 standard.
- HTML5:^[30] via IETF's BCP 47.
- MARC library codes.
- MODS library codes:^[31] Incorporates IETF's RFC 3066 (now superseded by RFC 5646).
- Text Encoding Initiative (TEI):^[32] via IETF's BCP 47.
- Lexical Markup Framework: ISO specification for representation of machine-readable dictionaries.
- Unicode's Common locale data repository: Uses several hundred codes from ISO 639-3 not included in ISO 639-2.

References

1 2 "ISO 639-3 status and abstract". iso.org. 2010-07-20. Retrieved 2012-06-14.
↑ "Maintenance agencies and registration authorities". ISO.
↑ "Types of individual languages - Ancient languages". sil.org. Retrieved 2015-10-28.
↑ Ethnologue report for ISO 639 code: zho on ethnologue.com
↑ ISO639-3 on SIL.org
1 2 "ISO 639-3 Code Set". Sil.org. 2007-10-18. Retrieved 2012-06-14.
↑ "ISO 639-3". sil.org.
↑ "Scope of Denotation: Individual Languages". sil.org.
↑ "Scope of Denotation: Dialects". sil.org.
↑ "Scope of denotation: Macrolanguages". sil.org. Retrieved 2012-06-14.
↑ "Macrolanguage Mappings". sil.org. Retrieved 2012-06-14.
↑ "Scope of denotation: Collective languages". sil.org. Retrieved 2012-06-14.
↑ Field Recordings of Vervet Monkey Calls. Entry in the catalog of the Linguistic Data Consortium. Retrieved 2012-09-04.
↑ "Submitting ISO 639-3 Change Requests: Types of Changes". sil.org.
1 2 Morey, Stephen; Post, Mark W.; Friedman, Victor A. (2013). The language codes of ISO 639: A premature, ultimately unobtainable, and possibly damaging standardization. PARADISEC RRR Conference.
↑ "Scope of Denotation for Language Identifiers". sil.org.
↑ "Types of Languages". sil.org.
↑ "ISO 639-3 Change Management". sil.org.
↑ "Submitting ISO 639-3 Change Requests". sil.org.
↑ "ISO 639-3 Change Request Index". sil.org.
↑ Martin Haspelmath, "Can language identity be standardized? On Morey et al.'s critique of ISO 639-3", Diversity Linguistics Comment, 2013/12/04
↑ "OLAC Language Extension". language-archives.org. Retrieved 3 August 2015.
↑ "Over 7,000 languages, just 1 Windows". Microsoft. 2014-02-05.
↑ "Language proposal policy". wikimedia.org. Retrieved 3 August 2015.
↑ "BCP 47 - Tags for Identifying Languages". ietf.org. Retrieved 3 August 2015.
1 2 "EPUB Publications 3.0". idpf.org. Retrieved 3 August 2015.
↑ "DCMI Metadata Terms". purl.org. Retrieved 3 August 2015.
↑ "Two-letter or three-letter ISO language codes". w3.org. Retrieved 3 August 2015.
↑ "Language Registry". Iana.org. Retrieved 2015-08-12.
↑ "3 Semantics, structure, and APIs of HTML documents — HTML5". w3.org. Retrieved 3 August 2015.
↑ "Elements - MODS User Guidelines: Metadata Object Description Schema: MODS (Library of Congress)". loc.gov. Retrieved 3 August 2015.
↑ "TEI element language". tei-c.org. Retrieved 3 August 2015.

External links

ISO 639 and ISO 639 macrolanguage

ISO 639-4
—
guidelines

ISO 639-5
list of codes
families/groups

ISO 639-6
—
variants

ISO standards by standard number

List of ISO standards / ISO romanizations / IEC standards

1–9999	1 2 3 4 5 6 7 9 16 31 -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 128 216 217 226 228 233 259 269 302 306 428 518 519 639 -1 -2 -3 -5 -6 646 690 732 764 843 898 965 1000 1004 1007 1073-1 1413 1538 1745 1989 2014 2015 2022 2047 2108 2145 2146 2240 2281 2709 2711 2788 2852 3029 3103 3166 -1 -2 -3 3297 3307 3602 3864 3901 3977 4031 4157 4217 4909 5218 5428 5775 5776 5800 5964 6166 6344 6346 6385 6425 6429 6438 6523 6709 7001 7002 7098 7185 7200 7498 7736 7810 7811 7812 7813 7816 8000 8178 8217 8571 8583 8601 8632 8652 8691 8807 8820-5 8859 -1 -2 -3 -4 -5 -6 -7 -8 -8-I -9 -10 -11 -12 -13 -14 -15 -16 8879 9000/9001 9075 9126 9293 9241 9362 9407 9506 9529 9564 9594 9660 9897 9899 9945 9984 9985 9995

10000–19999	10005 10006 10007 10116 10118-3 10160 10161 10165 10179 10206 10218 10303 -11 -21 -22 -28 -238 10383 10487 10585 10589 10646 10664 10746 10861 10957 10962 10967 11073 11170 11179 11404 11544 11783 11784 11785 11801 11898 11940 (-2) 11941 11941 (TR) 11992 12006 12182 12207 12234-2 13211 -1 -2 13216 13250 13399 13406-2 13450 13485 13490 13567 13568 13584 13616 14000 14031 14224 14289 14396 14443 14496 -2 -3 -6 -10 -11 -12 -14 -17 -20 14644 14649 14651 14698 14750 14764 14882 14971 15022 15189 15288 15291 15292 15398 15408 15444 -3 15445 15438 15504 15511 15686 15693 15706 -2 15707 15897 15919 15924 15926 15926 WIP 15930 16023 16262 16612-2 16750 16949 (TS) 17024 17025 17203 17369 17442 17799 18000 18004 18014 18245 18629 18916 19005 19011 19092 (-1 -2) 19114 19115 19125 19136 19439 19500 19501 19502 19503 19505 19506 19507 19508 19509 19510 19600:2014 19752 19757 19770 19775-1 19794-5 19831

20000+	20000 20022 20121 21000 21047 21500 21827:2002 22000 23270 23271 23360 24517 24613 24617 24707 25178 25964 26000 26300 26324 27000 series 27000 27001:2005 27001:2013 27002 27006 27729 28000 29110 29148 29199-2 29500 30170 31000 32000 38500 40500 42010 80000 -1 -2 -3

Category

This article is issued from Wikipedia - version of the 8/17/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.