Localization in Asia Pacific
Page 7 of 8
Notes1 The Summer Institute of Linguistics reports a total count of 6,912 languages at www.ethnologue.com 2 These terms are normally abbreviated by their first and last letter, infixed by the count of the remaining letters, as I18n, L10n and G11n. 3 For example, the Mongolian government recently decided to adopt Cyrillic script for writing the Mongolian language, abolishing the traditional Mongolian script. 4 The debate is normally between at least two groups: those who would like the language to remain 'pure' and those who would want to adapt the language to 'simplify' its use. 5 For example, 'A' is assigned a code of 65, 'B' 66, and so on, in ASCII and Unicode encoding. 6 Arbitrary assignment has been the traditional way of encoding languages across Asia. Multiple encodings exist for languages across Asia Pacific because each vendor has developed its own assignment (see Hussain et al. 2005). 7 The Unicode Standard is the same as the ISO 10646 standard and is co-managed by the Unicode Consortium and a dedicated Working Group of a Sub-Committee of the Joint Technical Committee of the International Standards Organization (ISO JTC1/SC2/WG2). 8 This process can take more than a year. It would normally take about six months for relevant ISO and Unicode committees to evaluate a proposal and another six months for vendors to provide support for these characters within technology, if the characters are approved and included in the standard. 9 Most operating systems allow users to define their own variation of the keyboard layout. 'Phonetic Keyboard Layouts', which map the [p] sounding character on the key with 'P' etc. on the regular QWERTY keyboard layout, are also popular among regular computer users. 10 OTF is an open standard jointly developed by Adobe and Microsoft. There are also other formalisms, including Apple's Advanced Typography (AAT), Postscript, etc. OTF is still an evolving standard, although it can now support the variety in Asian scripts fairly well. 11 POS indicates whether the word is a noun, verb, adjective, adverb, etc. 12 Advanced applications may require as many as 10 of such tags for each word. The following illustrates a sample entry: 'Boy: Common_Noun, Singular, Masculine, Human, Animate.' 13 Machine Learning is a branch of Artificial Intelligence in which a large amount of data is used to automatically train models to predict certain properties of unseen/new data. 14 The first syllable is stressed if it is a noun and the second syllable is stressed if it is a verb. 15 The estimates vary depending on the source and target language pair, the expertise of available linguists and computational linguists, and the techniques used. This estimate assumes availability of trained linguists and computational linguists. Systems may be developed within a shorter duration using statistical techniques. 16 Administered through the support of IANA and Regional Registries (RIRs), for example, APNIC for Asia Pacific. 17 Usage depends on the licensing schemes. Most software is available through a standard or limited version of the GNU Public License (GPL). 18 Further discussion of mobile applications is available in the chapter on Mobile and Wireless Technologies in this volume. 19 There is a backdoor to security of Microsoft Windows through _NSAKey. Refer to http://en.wikipedia.org/wiki/NSAKEY, http://www.techweb.com/wire/story/TWB19990906S0003 and http://www.cnn.com/TECH/computing/9909/03/windows.nsa.02/ for related discussions. 20 Although the Linux platform is freely available, there is a cost for installation and maintenance. 21 Toolkits like Festival and MBROLA for TTS, HTK and Sphinx for ASR and XLE for MT are being developed and made available by academic and other organizations across the world, especially for non-commercial use. 22 These estimates are based on availability of human resources with a reasonable level of experience in localization work. If such a pool of human resources is not available, more time and/or funds may be required. 23. The table lists a comparison for some of the applications. The comparison is qualitative, not quantitative, and is based on the current information available to the authors through the Internet and other sources (e.g. Sonlertlamvanich 2002; Tsujii 2005; Hussain et al. 2005). The information has not been independently verified and therefore has some margin of error. |