Economies covered

  • 2009-2010 Edition dr_dot2009-2010
  • 2007-2008 Edition dr_dot2007-2008
  • 2005-2006 Edition dr_dot2005-2006
  • 2003-2004 Edition dr_dot2003-2004

Click the dot to read the chapters. 

.af Afghanistan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.au Australia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bd Bangladesh dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bn Brunei Darussalam dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.bt Bhutan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.cn China dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.hk Hong Kong dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.id Indonesia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.in India dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.ir Iran dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
.jp Japan dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.kh Cambodia dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.kp North Korea dr_dot2009-2010 dr_dot2007-2008

.kr South Korea
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.la Lao PDR
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.lk Sri Lanka
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mm Myanmar
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mn Mongolia
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mo Macau
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.mv Maldives
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
.my Malaysia
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.np Nepal
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.nz New Zealand
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.ph Philippines
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.pk Pakistan
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.sg Singapore
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.th Thaïland
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.tl / .tp Timor-Leste
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.tw Taiwan
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
.vn Vietnam
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006 dr_dot2003-2004
SAARC dr_dot2009-2010 dr_dot2007-2008
ASEAN
dr_dot2009-2010 dr_dot2007-2008 dr_dot2005-2006
APEC dr_dot2009-2010
dr_dot2005-2006

Localization in Asia Pacific

Article Index
Localization in Asia Pacific
The process of localization
Regional and international organizations
Status of language technology
Policy considerations for localization in Asia Pacific
Concluding remarks
Notes
References

Status of language technology

Many of the basic standards and applications have already been developed for most of the national languages in Asia Pacific. Many of these standards have been reviewed over time and now align with international standards. However, language computing has matured to different levels in these countries. This section summarizes the status of localization of national languages in different countries in Asia Pacific. There are five levels of maturity that are at best qualitative as it is difficult to make a quantitative assessment (because each country is confronted with its own unique socio-economic, political and linguistic challenges, for example). The comparison is based on the level of work on the national language and research and development capacity in the areas of script, speech and language processing. A checklist of these applications for many national languages from the region is also provided in Table 1. For more information, see Sonlertlamvanich (2002), Tsujii (2005) and Hussain et al. (2005).

Table 1
Extent of localization for the national language of each listed country of Asia Pacific*

 

Encoding

Collation

Keyboard

Fonts

Locale

Interface

Lexicon

Spell-checker

OCR

TTS

ASR

MT

Afghanistan

xxx

x

x

xx

x

x

 

 

 

 

 

 

Bangladesh

xxx

xx

xx

xxx

x

x

xx

x

x

 

 

 

Bhutan

xxx

xx

xx

xx

xxx

xxx

x

x

 

 

 

 

Cambodia

xxx

xx

xxx

xx

xx

xx

x

x

 

 

 

 

China

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

India

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xx

xx

xx

xx

Indonesia

xxx

xxx

xxx

xxx

xx

x

xx

x

xx

x

 

xx

Japan

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

Korea

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

Laos

xxx

xx

xx

xx

x

xx

x

xx

x

 

 

 

Malaysia

xxx

xxx

xxx

xxx

xx

xxx

xx

xx

xx

xx

x

xx

Maldives

xxx

x

xx

xx

xx

 

 

 

 

 

 

 

Mongolia

xxx

xxx

xxx

xxx

x

xx

 

 

 

x

 

 

Myanmar

xxx

xx

xx

xxx

xxx

x

 

x

 

 

 

 

Nepal

xxx

xxx

xxx

xxx

xxx

xxx

xx

xx

 

 

 

 

Pakistan

xxx

xxx

xx

xxx

x

xxx

xxx

xxx

xx

xxx

x

xx

Philippines

xxx

xxx

xxx

xxx

xx

x

 

xx

xx

 

 

 

Sri Lanka

xxx

xx

xxx

xxx

xx

x

xx

xx

xx

xx

x

x

Thailand

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xx

xx

Vietnam

xxx

xx

xxx

xxx

xxx

xxx

x

xx

xx

xx

xx

x

Note: The table lists a comparison for some of the applications. The comparison is qualitative, not quantitative, and is based on the current information available to the authors through the Internet and other sources (for example, Sonlertlamvanich 2002; Tsujii 2005; Hussain et al. 2005). The information has not been independently verified and therefore has some margin of error. (blank—minimal work; x—initial work started; xx—some work completed; xxx—much work completed; for Year 2006)

Highly localized languages

Leading the development and implementation of local language computing are the more developed countries in the region, including China, Japan and Korea. These countries are very active in international standardization efforts and participate in relevant platforms and discussions. Most software is already localized in Mandarin Chinese, Japanese and Korean. Current research and development is focused on cutting-edge technology, including speech-to-speech translation, as basic localization and advanced applications, including TTS, ASR, OCR and MT, are already developed and available through the commercial sector. These countries have active academic bodies collaborating with the commercial sector, backed by governmental policy and support. Some of the organizations involved are the University of Peking, City University of Hong Kong, Academia Sinica in Taiwan, NICT and Advanced Telecommunications Research Institute International (ATR) in Japan, and Korean Advanced Institute of Science and Technology (KAIST) and Electronics and Telecommunications Research Institute (ETRI) of Korea. Significant research and development is being performed by the commercial sector as well, including Sony, NEC, IBM, Nokia, Microsoft, Hewlett-Packard, Systrans and so on.

Very localized languages

Thailand and India are also very active in local language computing. The National Electronics and Computer Technology Center (NECTEC) of the National Science Technology Development Agency (NSTDA), along with Thai industry and academia, is leading the full localization of the Thai language. A Thai OCR, text-to-speech system, and English-Thai MT are now available. The Thai Language Environment (TLE) project develops and maintains the Open Source Thai Linux distribution.

India also has a thriving and vibrant language computing development sector. The Ministry of Science and Technology has created the Technology Development for Indian Languages (TDIL) department which supports and coordinates active research on Hindi and many other constitutionally recognized languages through research centres at Indian universities and the Centre for Development of Advanced Computing (CDAC). In addition, the IndLinux group localizes Linux distributions in many languages (MIT 2006) and has released the Hindi version. However, commercial-grade applications for end-users are not fully developed and not in wide use due to the complexity and language diversity (currently 22 official languages). Nevertheless, working models of TTS, MT, ASR and OCR for a few languages, including Hindi, Tamil and Marathi, are available. Other language resources, including lexica and corpora, are also available. Government focus and a dynamic language policy are providing the correct impetus and India is seeing an emerging localization and language computing industry.

Moderately localized languages

Indonesia, Malaysia, Pakistan, Sri Lanka and Vietnam have fairly active academic research and development programmes and fairly mature standards and basic language applications, with reasonable work in advanced applications.

Research and development in Indonesia is being carried out by both the public and academic sectors. Basic resources and advanced applications are all being developed with advanced prototypes already released. Badan Pengkajian dan Penerapan Teknologi (BPPT) and the University of Indonesia are two organizations actively involved in this process. Most of the work is on Bahasa Indonesia.

Research in Malaysia started in 1987 through the KANTA project by CICC which developed an MT system for Japanese, Malay, Chinese, Thai and Bahasa Indonesia. Universities, including Universiti Teknologi Malaysia and Universiti Sains Malaysia, are actively involved in research and development.

Localization in Sri Lanka is being led by the University of Colombo School of Computing for Sinhala and Tamil, with support and guidance from the ICT Agency of Sri Lanka. The open source community is also reasonably active through Sri Lanka's Linux User Group (LkLUG), which has made some progress on the development of a Sinhala Linux distribution.

In Vietnam, localization is being led by the Ministry of IT and is also being carried out in some universities. VietKey is an open source office productivity software available in Vietnamese. Work is also underway on advanced applications, like ASR.

Pakistan has shown a promising focus on language computing (see the boxed case study below).

However, very limited development work is being carried out by the commercial sector in these countries, especially for advanced applications.

Language computing development in Pakistan

Pakistan is home to more than 160 million people who speak more than 60 languages. Urdu is the national language and the lingua franca. The official language is English, a legacy of the country's colonial past and a language understood by less than 10 per cent of the population. Punjabi, Seraiki, Sindhi, Pashto, Balochi and Kashmiri are the most spoken languages. Many of the other languages, with small populations, are found in northern Pakistan, where these linguistic communities live in valley 'islands' surrounded by tall Himalayan peaks. Pakistan is a country that has recently reawakened to the need for local language software and where all stakeholders are coming together in a synergized approach to language computing development. However, Pakistan is still struggling to balance policy, human resource and technology challenges and it is only starting to look at the social challenges and solutions for dissemination of this technology.

Pakistan experienced a boom in language computing in the early 1980s, when the indigenous software industry started developing word processors and fonts for Urdu. Multiple word processing products and fonts were made available. Although the Nastalique script used to write Urdu is very challenging to model, especially with the technology available in the 1980s, numerous solutions were developed, including Inpage, PagePro, Shahkar, Raakim, and the like. Unfortunately, by the late 1980s and early 1990s, most of this industry had vanished because copyright violations made such ventures totally unprofitable.

Language computing has emerged after a decade of stagnation, with the revived interest coming from academia and the public sector. The Center for Research in Urdu Language Processing (CRULP) at the National University of Computer and Emerging Sciences and smaller informal groups led by individual faculty members at various universities in the private and public sectors are at the forefront of this effort. Work has been ongoing in all aspects of localization technology, including MT, TTS, ASR, OCR and handwriting recognition. Universities are also offering specialized courses at master and doctorate level in these areas, thereby developing the essential human resource for this work. Most of the current efforts are focused on technology. However, significant investment also needs to be made in developing specialized linguistic and computational linguistics programmes.

With the emergence of e-governance, the public sector has realized the need for local language computing and incorporated it in the IT policy for the first time in early 2000. Since then, the government has been contributing in multiple ways. The e-government initiatives taken up by federal and state governments now require local language interfaces for many of the software services being developed. The major initiative has been that of the National Database and Registration Authority (NADRA) which is now issuing National ID cards in Urdu to all Pakistanis. NADRA's national database is in Unicode. Other large initiatives, including work on land revenue records and software for recording proceedings of the National Assembly and Senate, all require Urdu components, making localization a viable commercial option again. There are also plans for telecentre projects which will have a significant local language component. The increased demand created by the public sector is now drawing the software industry to invest in local language computing.

However, the industry remains focused on basic localization and is still not developing advanced applications due to the significant level of financial investment required by the latter. The Ministry of IT (MoIT) realizes the requirement for advanced applications and has been funding research and development in this area since early 2000. The first national encoding standard was approved by the President of Pakistan in 2002, through the efforts of a specialized committee (called Urdu and Regional Languages Software Development Forum, URLSDF) formed by MoIT in collaboration with the National Language Authority. This was soon followed by a proposal to update the Unicode standard for complete support of the Urdu language. Since then MoIT has funded a major development project to create Urdu lexical resources, Urdu TTS and English-to-Urdu MT at CRULP. The first phase of this three-year project was completed in 2007 and the content and software is to be released with open licensing to trigger further research and development in the academic and commercial sectors. The project has helped create the necessary linguistic resources, trained a team of more than 50 personnel in speech and languages processing and is bound to have far-reaching effects on language computing in Pakistan. Smaller projects have also been funded by the PTCL R&D fund (now the National ICT R&D fund), including work on developing Web guidelines for local language content publishing, localization of open source software and developing other language processing applications.

Growing awareness in the government sector, along with significant funding allocation for local language computing programmes and requiring local language computing for e-government projects, is creating excitement in the academic and commercial sectors. However, the work is currently limited to Urdu, the national language. It is hoped that other languages will receive the same attention. A more proactive approach by public organizations, civil society and academia to the localization of the languages of smaller populations, is required.

Somewhat localized languages

The national languages of countries like Bangladesh, Myanmar and Nepal belong to this category. In these countries there is an emerging realization of the importance of local language computing and focused public policy is starting to develop, integrate and align existing private initiatives. However, there is only limited work on advanced language computing applications.

Countries like Afghanistan, Lao PDR, Cambodia, Mongolia and Bhutan are also starting to develop basic localization standards and applications in their national languages.

Non-localized languages

Of the approximately 3,500 languages spoken in Asia Pacific, only about 30–40 languages are being localized. Small and developing language communities are left out due to very limited capacity to perform indigenous localization and lack of commercial incentives. This problem is especially severe for countries with exceptionally high linguistic diversity, such as Papua New Guinea (820 languages) and Indonesia (737 languages). Localizing these languages will only be possible through long-term policy initiatives and collaborative effort between national, regional and international organizations.



 

Add comment


Security code
Refresh