Economies covered

  • 2009-2010 Edition dr_dot2009-2010
  • 2007-2008 Edition dr_dot2007-2008
  • 2005-2006 Edition dr_dot2005-2006
  • 2003-2004 Edition dr_dot2003-2004

Click the dot to read the chapters. 

.af Afghanistan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.au Australia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bd Bangladesh dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bn Brunei Darussalam dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bt Bhutan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.cn China dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.hk Hong Kong dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.id Indonesia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.in India dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.ir Iran dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
.jp Japan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.kh Cambodia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.kp North Korea dr_dot2009-2010 dr_dot2007-2008

.kr South Korea
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.la Lao PDR
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.lk Sri Lanka
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mm Myanmar
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mn Mongolia
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mo Macau
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mv Maldives
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
.my Malaysia
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.np Nepal
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.nz New Zealand
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.ph Philippines
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.pk Pakistan
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.sg Singapore
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.th Thaïland
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.tl / .tp Timor-Leste
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.tw Taiwan
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.vn Vietnam
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
SAARC dr_dot2009-2010 dr_dot2007-2008
ASEAN
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
APEC dr_dot2009-2010
dr_dot2005-2006

Localization in Asia Pacific

Article Index
Localization in Asia Pacific
The process of localization
Regional and international organizations
Status of language technology
Policy considerations for localization in Asia Pacific
Concluding remarks
Notes
References

Regional and international organizations

Development of local language computing applications and content requires a sustained effort. Many regional and international organizations have been contributing to this development across Asia Pacific. These organizations are involved in: (a) standards development and (b) technology development. Moreover, there are many funding agencies in the region that are supporting local language computing development, notably the International Development Research Centre (IDRC) of Canada, Center of the International Cooperation for Computerization (CICC) of Japan, National Institute of Information and Communications Technology (NICT) of Japan, United Nations (through UNESCO and the UNDP-Asia Pacific Development Information Programme or APDIP) and Asia IT&C Grants by the European Union.

This section lists some of the major regional standards and technology development organizations supporting local language computing in Asia Pacific and explains the role they play in this context. National and regional initiatives need to develop liaisons with these organizations, for example by subscribing to the multiple online discussion forums that they maintain or by attending the regular meetings, conferences and special workshops organized by them. Where funds are required, the funding organizations listed provide such support.

Unicode consortium

The Unicode consortium develops the Unicode standard, which is the standard encoding scheme for the multilingual Internet and is the same as ISO 10646. The consortium aims to provide standard encoding schemes for all characters and symbols used in different scripts for all languages of the world (Unicode 2006). In addition, it provides guidelines for collation, bidirectionality, reordering and line-breaking, which are fundamental to text processing for many Asian languages based on the Unicode standard. Even though conventional national and proprietary encodings are still being used, most nations across Asia Pacific are now switching to Unicode. In addition to encoding, the Unicode consortium has recently collected and is now maintaining locales for all languages through the CLDR project.

World Wide Web Consortium (W3C)

W3C develops guidelines, standards and software to publish multilingual online content. Its Internationalization Working Group is tasked with keeping these specifications multilingual. W3C maintains the HTML standard which is used for creating multilingual Web pages. In addition, it is developing SSML and VoiceML standards which are used for voice browsing, that is, accessing the Internet through speech. This organization is also developing multimodal content publishing standards for more effective Web accessibility, including access by people with disabilities.

Internet Corporation for Assigned Names and Numbers (ICANN)

Currently Web access requires typing a Web address (also called domain name or URL) in English. For populations who do not understand English, this is one of the significant hurdles in accessing online content. Web addresses, which are the key to entering the multilingual World Wide Web, should also be in local languages. ICANN is responsible for the global coordination of Web addresses16 and it recently introduced Internationalized Domain Names (IDNs) through reports RFC 3454, 3490, 3491 and 3492, collectively called the IDN Standards (ICANN 2006). IDN would allow Web addresses in local languages. However, due to the seven-bit ASCII-based domain name system, Unicode cannot be used and multi-lingual IDNs are converted to ASCII Compatible Encoding (ACE) before the address is resolved. Still being debated is how to enable Top-Level Domains (TLDs) in local languages and who will control them (Butt 2006; Huston 2006). Due to this continuing controversy, independent systems have also been developed, for example by the Chinese Internet Network Information Center (CNNIC). ICANN and IDNs are bound to play a critical role in making the multi-lingual Internet accessible.

Development of Internationalized Domain Names (IDNs) for India's .IN domain

India's .IN domain first opened to the public in 1992. It was managed by the National Centre for Software Technology (NCST) until 2004, and then by the Centre for Development of Advanced Computing (C-DAC), both research and development institutions run by the Government of India. Until 2004, about 6,600 names existed in the .IN domain database. In late 2004, the Indian government liberalized policies surrounding the .IN domain. This included making available second level domains (example.in) on an unlimited basis, as well as third level domains <co.in>, <net.in> and <org.in> to all registrants. Furthermore, the .IN ccTLD registry separated Registry and Registrar (retail) functions, resulting in the creation of a domain name industry that had until then been dormant. The results of this opening and liberalization have been quite dramatic—100,000 registrations in the first 100 days and over 250,000 new registrations since 1 January 2005.

However, domain name registrations were in English (ASCII script) only, a significant limitation in a nation with 22 official languages, including 400 million speakers of Hindi, 200 million speakers of Bengali, 60 million speakers of Tamil and 70 million speakers of Telugu. This nation of more than a billion has schools that teach in 58 different languages, newspapers publishing in 87 languages, radio programmes broadcast in 71 languages and movies released in 15 languages. To support this diverse, multilingual population, the .IN registry embarked on a programme to internationalize the .IN domain and support the various scripts that are used to represent the 22 official languages in India.

The task of internationalizing the .IN domain is the most complex domain name internationalization project in the world because the 22 languages may be represented completely by merely 11 scripts, leading to significant overlaps and the presence of visually confusable character sequences that are equally valid in multiple languages but which may be represented on a computer by unique encodings. Such visually confusable characters are called variants, and one of the most important tasks in localization is the creation of variant tables that prescribe which characters are visually confusable between different languages. In addition, some Indian languages support bidirectional text, multiple diacentric positioning and word breaking, and non-empty spaces that are not normally supported in a standard, left-to-right ASCII-based Domain Name System (DNS).

The plan to internationalize .IN may be summarized as follows: build language tables; develop language policies; consider issues brought about by variants; ensure standards compliance and enhance dispute resolution policy to cover IDNs.

To introduce .IN in local languages such as Hindi and Tamil, language and variant tables must first be developed. Homographic variant issues must be determined, which will ensure that characters that look identical are marked clearly and registration of one character in one script automatically reserves the similar looking character in the other script(s). Linguistic experts are needed to ratify the choices of variant and language tables. Finally, steps need to be taken to ensure that the launch of Hindi and Tamil does not disadvantage the later launch of other languages that use similar characters—for example, the Tamil character ImageJ(U+0BB5) is very similar to the Malayalam character ImageJ(U+0D16).

International technical standards exist for IDNs, and .IN has carefully planned to conform to these standards while simultaneously working with the standards community to extend these standards where they are deficient or insufficient. At a minimum, conformance to the IETF RFCs 3490, 3491, 3492 and 3454 are required, as well as general conformance to the ICANN IDN Guidelines (ICANN 2006).

The launch of domain names in local languages requires the development of a robust dispute resolution policy that considers additions for IDNs and has the ability to handle disputes for domain names in either ASCII or the native language representation evenly and equally. Moreover, because variants of one name may conflict with other names, a clear policy has to be developed to resolve such conflicts in a manner that is consistent and conformant to local laws.

In December 2006, the Indian government, in partnership with .IN's technical partner Afilias, completed the first-ever launch of .IN in the Tamil language, implementing the Dravidian script that represents Tamil. Tamil, one of the world's classical languages, will be available for wide use. There are plans to soon thereafter introduce .IN in Malayalam (a related Dravidian script-based language). Language table development for the DevanImagegarImage script, which is the basis for many northern Indian languages including Hindi, is well underway, although this is a large-scale project whose end-date is yet to be determined.

A new development is the interest in the creation of IDN Top Level Domains (IDN TLDs). This allows the entire domain name to be represented in a local language character set. Technical tests are being conducted to study and ensure feasibility of the following practical issues: (a) Will they work everywhere? (b) Are they backwards compatible? (c) Do they not break application software? (d) Do they support languages appropriately? Certain principles apply towards the roll-out of IDN TLDs, including:

  1. retaining the global uniqueness of the TLD system—that is, domain names should remain unique and unambiguous;
  2. maintaining the interoperability of the TLD system, that is, 'dot Image('dot Hindi' written in DevanImagegarImage - script) needs to point applications and users to the same place regardless of whether they are accessing the domain from India, the UK or Greece;
  3. promoting 'future-proof' solutions that allow seamless introduction of new languages and character sets in the future;
  4. avoiding user confusion; and
  5. promoting multi-stakeholder involvement.

When implementing IDNs in Asia Pacific, with its large list of languages, character sets and scripts, and relative paucity of experts, important preliminary issues such as language table and variant table development often cannot get off the ground. Government involvement is critical in coordinating and bringing together the right set of individual experts in technology, language and policy to create a model for the implementation of IDNs. The development of IDNs will benefit Internet users who are not literate in English and whose computers do not use ASCII or English character sets by default, provide a good user experience on the Internet and create a multilingual Internet that can be used by all populations worldwide.

International Standards Organization (ISO)

ISO jointly develops the ISO 10646 or Unicode standard with the Unicode Consortium. The technical committee TC37 develops standards for 'Terminology and Other Language and Content Resources', including specifications for lexica, corpora and other language content. The language resource standards are still being discussed and finalized and they are not currently in wide use. Some other related standards include ISO 3166 for country codes and ISO 639 for language codes, which are used for locale definitions by Unicode within CLDR and by other organizations including W3C and ICANN. For example, ur_PK represents the Urdu language locale as used in Pakistan.

Free and Open Source Software (FOSS) initiatives

Notable within software development initiatives for multilingual computing is the FOSS community which provides internationalized software applications that allow rapid localization covered under an open license.17 Most FOSS operating systems are based on Linux, are internationalized, and are being localized by different groups (for example, Debian, Red Hat and Ubuntu). Debian is currently being localized in more than 150 languages. Open Office, which provides a complete suite of document productivity software, is being localized into 70 languages. The Mozilla project distributes Firefox Web browser and Thunderbird email client. There are many more FOSS initiatives available online, including software for chatting, multimedia, Web development and database.

Asian Federation on Natural Language Processing (AFNLP)

Academic research forums in linguistics and language processing have long existed in many countries in Asia. However, there have been limited regional discussions on Asian languages. The American Association of Computational Linguists (AACL) and European Association of Computational Linguistics (EACL) have been providing a common platform for the Americas and Europe. A similar platform in Asia was created recently by bringing existing national organizations and conferences under a single regional umbrella called AFNLP. The federation is helping organize language computing research and development across Asia by providing a collaborative platform to share academic research and exchange innovative solutions for Asian languages. AFNLP holds a regular conference called International Joint Conference on Natural Language Processing (IJCNLP). Two such conferences have been held so far.

Language resources and vendor initiatives

Many organizations collect and distribute language resources that are essential to perform linguistic and computational research and to develop local language computing. The Linguistic Data Consortium (LDC) at the University of Pennsylvania distributes text and speech corpora, lexica and additional data for many languages, including Chinese, Arabic, Japanese, Hindi, Vietnamese, Tamil, Korean and other languages. The European Language Resource Association (ELRA) distributes similar resources for many Asian languages. Similarly, the Global Wordnet Association is developing lexical-semantic resources for many languages and the South Asian Language Resource Center (SALRC) at the University of Chicago is developing a repository of lexical resources for South Asian languages. No formal centre for the collection and distribution of the language resources of Asia Pacific has been established. However, discussions for establishing an Asian Language Resource Network, similar to LDC and ELRA, are underway. Another language resource organization is the Summer Institute of Linguistics (SIL), an organization of volunteers that has been documenting languages and populations for more than 50 years (see www.ethonologue.com).

The University of California at Berkeley has started the Script Encoding Initiative which is assisting individuals and groups to identify the missing characters, for example from lesser known languages, and helping them get these characters encoded in the Unicode standard.

Some corporations have also been involved in localization. IBM has developed a large repository of C++ and Java code which is called IBM International Components for Unicode (ICU). This library of code is available at http://icu.sourceforge.net/. Microsoft has restructured its localization policy and has started developing local language interfaces, called Language Interface Packs (LIP), which are currently available for seven Asian languages. These efforts will help develop basic localization at least in the languages that have official status in Asian countries or are otherwise commercially viable (for example, languages spoken by large populations).

There is growing interest in localizing the mobile platform, but the effort has mostly been taken up by the manufacturers themselves, for example, Nokia, Samsung, Sony and others. Text-based messaging is now increasingly becoming available through these systems for many Asian languages based on the Unicode standard. However, the localization is driven mostly by commercial interests focused on languages that promise revenues. It is not possible for independent developers to localize these platforms in other languages due to proprietary platforms and lack of open standards.18



 

Add comment


Security code
Refresh