Page 5 of 8
Policy considerations for localization in Asia Pacific
The goal of localization is to enable communities to share and exchange information through ICTs. Achieving this goal would require planning and executing a strategy that can address the entire spectrum of associated issues. This section presents the considerations and recommendations for national, regional and international organizations to plan the development of language technology, especially in the context of Asia Pacific.
Majority vs. minority languages
National localization planning must strike a balance between the requirements of the majority and the minority. If the policy prioritizes localization based on the speaking population alone, minority languages may not be addressed. More rigorous criteria based on additional demographic and social factors need to be evolved to include minority languages in localization, as these languages present little incentive for commercial interests. Effective planning might even help preserve the linguistic diversity of the region and help protect endangered languages.
Breadth vs. depth of localization
Due to multiple languages spoken in most Asia Pacific countries, resource allocation is a tricky task. Should multiple languages be taken up for basic localization or should fewer languages be taken up for more in-depth advanced application development? If focus remains only on basic localization due to the numerous languages, advanced applications might never be addressed even though it is necessary to provide access to information to a large part of the population in the region. On the other hand, if only advanced applications are considered, only a limited number of languages may be localized because advanced applications take a much longer time to develop.
Again, a complex socio-economic balance must be struck to determine the right formula for each national context.
Human resource training
In most Asia Pacific countries, there is very limited linguistic and technical capacity to develop standards, perform linguistic analysis and create language technology. Training and human resource planning is critical. Depending on the choice of applications and languages, expertise may be required in various branches of linguistics (phonetics, phonology, morphology, syntax, semantics and pragmatics), signal and speech processing, image processing, statistics, computational linguistics and advanced computing. Training for basic localization work could take about six months. To develop advanced applications, experienced linguists and computational linguists are required and dedicated training over many years is necessary. To address national needs and to keep the training process sustainable, diploma and degree programmes in speech, script and language processing should be developed at the universities, through collaboration of the linguistics, computer science and engineering departments. Scholarships dedicated to these areas for study abroad can also help accelerate the process. Regional and international cooperation can play a significant role in these efforts.
The best way to build capacity is to involve the technical development staff in actual hands-on localization work. This can be achieved by national and regional organizations funding language computing projects (see the case study on the PAN Localization Project below). Momentum for localization can also be triggered by governments if they create awareness of local language computing and generate market demand by requiring public information to be localized through e-governance initiatives. Regional organizations can organize national and regional training and seminars. Two recent initiatives are the Summer School in Asian Language Processing in 2006 organized by the PAN Localization project and Asian Applied Natural Language Processing for Linguistics Diversity and Language Resource Development (ADD) organized by the Thai Computational Laboratory.
Partnerships and resource sharing
It is redundant and usually expensive to localize independently for all languages. A better model is to reuse the same basic technology for different languages. Most open source software work on this principle. Innovative mechanisms must be put in place to share content, training and other localization work. Regional and international organizations must play a significant role in this context, funding avenues through which research, training, resources and best practices may be shared across nations. Many such initiatives are developing in the region, such as the AFNLP, International Open Source Network (IOSN), Asia Open Source Software (AOSS) and Asia Commons, which are nongovernmental organizations. Many other technology frameworks are also available and being developed in universities and other organizations across the world.
As discussed, many different licensing regimes are possible both for the software and content being produced. As much as possible, open licensing must be adopted to propagate the work in local language computing. Liberal licenses, such as GPL, MIT and BSD, can allow open source distribution of software for non-profit as well as commercial purposes (cf. Chen 2006). Content must also be made available with liberal licensing for convenient access (for example, Creative Commons). In addition, effective channels are needed to share content and training curricula, perhaps using models similar to the Wikipedia and Sourceforge initiatives.
Because effective coordination cannot be achieved only through virtual communities, there is also a need for face-to-face networking. Regional and international organizations dedicated to social development through ICTs need to play an active leadership role in this regard. For example, the Free and Open Source Software in Asia Pacific (FOSSAP) forum by IOSN has been discussing software licensing and Asia Commons has started addressing content licensing.
A very important aspect of localization is the choice of computing platform. Both proprietary and open source platforms exist and are currently being used. For end-users in Asia Pacific, the prevalent platforms include Microsoft Windows, Java Virtual Machine (JVM or Java) and varieties of Linux (for example, Red Hat and Debian). Windows is a proprietary software platform which is not free and has some security concerns.19 Java is a virtual platform and requires a physical platform like Microsoft Windows or Linux on which it can be installed. Linux is open source and free of cost.20
However, the choice is not as apparent as it seems. Though Windows is proprietary, closed and vulnerable to security threats, it is still the most widely used software with convenient plug-and-play hardware installation features, making it very convenient for end-users. The Linux platform requires more expertise to use and is more difficult to manage and maintain given the limited administrative and management capacity currently available. Deciding which platform to target for localization is a complex issue. For some languages which are already supported by Microsoft products, Windows may present a more viable short-term solution. For these languages, Linux may present a solution in the longer term, as there is a need to train more human resources to maintain Linux-based systems. For other languages that are not currently supported by Windows, open source platforms may be the only solution, as the localization plans of Microsoft may not align with national priorities.
With the growing need and demand for multilingual computing, there is increased standardization activity. Owing to the urgency and multiplicity of the tasks, there are very frequent meetings among the participating organizations across the world, as well as public requests for comments on the developing standards. However, due to lack of expertise and resources, it is difficult for many developing countries in Asia Pacific to participate in these discussions. Unfortunately, lack of participation is always considered to be tacit approval by these standards organizations.
From an academic point of view, assuming approval when there is lack of comment is not always the best strategy for the development of standards despite the operational ease of this process. When multilingual standards are finalized without indigenous feedback, there are bound to be problems (for example, as reported for the Khmer Unicode page) especially once many of these languages catch up to the newer standards. The process of standardization must be proactive from both ends. National bodies must try to actively participate in the process and the standards development organizations should have programmes to train participants from different countries and to proactively seek their feedback before proceeding to finalize multilingual standards. This requires significant financial investment which has to be raised in a sustainable way. For example, the Asian Forum for Standardization of Information Technology (AFSIT) and associated programmes by CICC have contributed significantly in the areas of multilingual computing and related standardization training. Such efforts must continue in the future.
Translation of policy into projects
National policy alone will not ensure the development of local language computing. The policy must be translated into action plans, which in turn must be realized into projects with explicit funding allocation. The first step would be to develop a national committee of experts to discuss and finalize basic standards. Once standards are developed, basic localization for a language is possible for as little as USD 200,000 within one to two years. Developing a complete set of advanced applications would require considerably more effort and time—about three to five years to develop functional models and about a decade to mature—even when using existing software toolkits.21 Building a complete suite of language technology for a single language could cost more than USD 5 million.22 Basic localization may be undertaken by the private sector. However, because there are few commercial incentives for advanced applications in developing countries, these would only be developed with explicit support and funding by the government and other organizations.