Locales mini-HOWTO Peeter Joot, peeter_joot@vnet.ibm.com v1.5, 21 July 1997 This document describes how to set up your Linux machine to use locales. 1. Introduction This is really a description of what I had to do to get localedef installed, compile some locales, and try them out. I did this just for fun, and thought that perhaps some people would be interested in trying it out themselves. Once it is set up you should be able to use NLS enabled applications with the locale of your choice. After a while, locale support should be part of the standard distributions, and most of this mini-HOWTO will be redundant. 2. What is a "locale" anyhow? Locales encapsulate some of the language/culture specific things that you shouldn't hard code in your programs. If you have various locales installed on your computer then you can select via the following list of environment variables how a locale sensitive program will behave. The default locale is the C, or POSIX locale which is hard coded in libc. LANG This sets the locale, but can be overridden with any other LC_xxxx environment variables LC_COLLATE Sort order. LC_CTYPE Character definitions, uppercase, lowercase, ... These are used by the functions like toupper, tolower, islower, isdigit, ... LC_MONETARY Contains the information necessary to format money in the fashion expected. It has the definitions of things like the thousands separator, decimal separator, and what the monetary symbol is and how to position it. LC_NUMERIC Thousands, and decimal separators, and the numeric grouping expected. LC_TIME How to specify the time, and date. This has the things like the days of the week, and months of the year in abbreviated, and non abbreviated form. LC_MESSAGES Yes, and No expressions. LC_ALL This sets the locale, and overrides any other LC_xxxx environment variables. Here are some other locales, and there are lots more. en_CA English Canadian. en_US US English. de_DE Germany's German. fr_FR France's French. If you are writing a program, and want to to be usable internationally you should utilize locales. The most glaring reason for this is that not everybody is going to use the same character set/code page as you. Make sure in your programs that you don't do things like: /* check for alphabetic characters */ if ( (( c >= 'a') && ( c <= 'z' )) || (( c >= 'A') && ( c <= 'Z' )) ) { ... } If you write that type of code your program assumes that the user/file/... is ASCII and nothing but ASCII, and it does not respect the code page definitions of the user's locale. For example it preludes characters such as a-umelaut which would be used in a German environment. What you should do instead is use the locale sensitive functions like isalpha(). If your program does expliticly require use of only US-ASCII alphabetics, you still use the isalpha() function, but you must also either do setlocale(LC_CTYPE,"C") or set the LANG, LC_CTYPE, or LC_ALL environment variables to "C". Locales allow a large degree of flexibility and make certain assumptions that a programmer may have made in ASCII based C programs invalid. For instance, you cannot assume the code positions of characters. There is nothing stopping you from creating a charmap file that defines the code position of 'A' to be 0xC1 rather than 0x41. This is in fact the code point mapping for 'A' in IBM code page 37, used on mainframes, while the former is used for US-ASCII, iso8859-x, and others. The basic idea is different people speak different languages, expect different sorting orders, use different code pages, and live in different countries. Locales and locale sensitive programs give one a means to respect such things, and handle them accordingly. It is not really much extra work to do so, it just requires a slightly different frame of mind when writing programs. 3. Notes. � In order to set up locales on my machine I had to upgrade a few things. Apparently ftp.tu-clausthal.de:/pub/linux/SLT/nls contains a a.out version of locale and localedef (in the file nlsutils-0.5.tar.gz), so if you don't have an ELF system, or don't want one you can use the above. There is probably a copy of the nlsutils package some other place, but I have not looked for it. I hadn't known that there was a stand alone version of locale and localedef, and kind of figured that you would have to have the corresponding libc installed. Because of this a lot of this HOWTO is just a log of what I had to do to upgrade libc and family. If you do this, as I have you, will need to be running an ELF system, or upgrade to one as you set up your locales. � The sorts of system upgrades that I did are the same sort of upgrades that have to be done to upgrade from a.out to ELF. If you haven't done this, or if you have upgraded to ELF by reinstalling Linux then you should get the resent ELF HOWTO from a sunsite mirror. This is an excellent guide, and gives additional guidance for installing libc, ld.so, and other ELF system upgrades. � For anything that you install, read the appropriate release notes, or README type files. If you mess up your system by misinterpreting something that I say here, or ( hopefully not ) by doing something that I say in here, please don't blame me. � Mis-installing a new libc, and ld.so, could leave you with an unbootable system. You probably ought to have a boot disk handy, and make sure any critical, non-replaceable, data is backed up. 4. What you need. A few things need to be down loaded from various places. Everything here except for the locale source files can be obtained from sunsite.unc.edu, tsx-11.mit.edu, or, preferably, a local mirror of these sites. When I did this originally I used libc-5.2.18, which is now quite out of date. As of now I have been told that the current libc is 5.4.17, and this substitution has been made below. However, libc 5.4.17, will likely be old before you can blink, so just use the lastest version when you do this. You may want to consider using glibc (gnu libc) rather than Linux libc 5 for any internationalization work. As of now glibc 2.0.4 (gnu libc) is available but no distributions have started using it as the standard libc yet (at least for Intel based Linux distributions). As well as being fully reentrant and having built in threading support, glibc is fully internationalized and has excellent internationalization support for programming. What internationalization has been done in libc 5 has been mostly taken from glibc. The locales and charmaps for glibc are bundled with the the glibc locale add on. If you opt for using glibc then you can ignore this mini-howto. Including the locale add on in the glibc compilation and installation is trivial, and is covered in the glibc installation documentation. Be warned that a full upgrade is not a trivial job! I am hoping that redhat (which I use) will have a glibc based release soon, as I am not inclined to recompile my entire system. � locale, and charmap sources --- These are what you compile using localedef. � libc-5.4.17.bin.tar.gz --- the ELF shared libraries for the c and math libraries. Note that the precompiled program localedef for libc.5.4.17 is apparently corrupt and creates LC_CTYPE with invalid magic number. This probably means that an older localedef got into the binary distribution. � libc-5.4.17.tar.gz --- the source code for the ELF shared libraries. You may need this to compile localedef. � make-3.74.tar.gz --- you may need to compile make to incorporate a patch to fix the dirent bug. � release.libc-5.2.18 --- these release notes have the patch to make make. it's been a while since this make bug happened, and it is likely that you don't have to worry about it. � ld.so-1.7.12+ --- the dynamic linker. � ELF gcc-2.7.2+ --- to compile things. � an ELF kernel ( eg. 2.0.xx ) --- to compile things. � binutils 2.6.0.2+ --- to compile things. There are probably lots of places that you can get locale sources. I have found public domain locale and charmap sources at dkuug.dk:/i18n/WG15-collection/locales <ftp://dkuug.dk/i18n/WG15-collection/locales> and dkuug.dk:/i18n/WG15-collection/charmaps <ftp://dkuug.dk/i18n/WG15-collection/charmaps> respectively. 5. Installing everything. This is what I did to install everything. I already had an ELF system ( compiler, kernel, ... ) installed before I did this. 1. First I installed the binutils package. tar xzf binutils-2.6.0.2.bin.tar.gz -C / 2. Next I installed the dynamic linker: tar zxf ld.so-1.7.12.tar.gz -C /usr/src cd /usr/src/ld.so-1.7.12 sh instldso.sh 3. Next I installed the libc binaries. See release.libc-5.4.17 for more instructions. rm -f /usr/lib/libc.so /usr/lib/libm.so rm -f /usr/include/iolibio.h /usr/include/iostdio.h rm -f /usr/include/ld_so_config.h /usr/include/localeinfo.h rm -rf /usr/include/netinet /usr/include/net /usr/include/pthread tar -xzf libc-5.4.17.bin.tar.gz -C / 4. Now ldconfig must be run to locate the new shared libraries. ldconfig -v. 5. There is a bug that was fixed in libc that breaks make, and some other programs. Here is what I did in order to rebuild and install make. tar zxf make-3.74.tar.gz -C /usr/src cd /usr/src/make-3.74 patch < /whereever_you_put_it/release.libc-5.4.17 configure --prefix=/usr sh build.sh ./make install cd .. rm -rf make-2.74 6. Now localedef can be compiled and installed. mkdir /usr/src/libc tar zxf libc-5.4.17.tar.gz -C /usr/src/libc cd /usr/src/libc cd include ln -s /usr/src/linux/include/asm . ln -s /usr/src/linux/include/linux . cd ../libc ./configure # I am not sure if these two makes are necessary, but just to be safe : make clean ; make depend cd locale make programs mv localedef /usr/local/bin mv locale /usr/local/bin 7. Put the charmaps where localedef will find them. This uses the charmaps and locale sources which I down loaded from dkuug.dk ftp site as charmaps.tar, and locales.tar respectively. The older localedef (5.2.18) looked in /usr/share/nls/charmap for charmap sources, but now localedef looks in /usr/share/i18n/charmaps and /usr/share/i18n/locales by default for the charmap and locale sources: mkdir /usr/share/i18n mkdir /usr/share/i18n/charmaps mkdir /usr/share/i18n/locales tar xf charmaps.tar -C /usr/share/i18n/charmaps tar xf locales.tar -C /usr/share/i18n/locales The newer localedef (5.4.17) has been made smarter and will look for other locale source files when handling the `copy' statement, whereas the older localedef needed to have the locale objects already created in order to handle the copy statement. This list of commands has the dependencies sorted out and can be used to generate all the locale objects regardless of which libc version is being used, but you should now be able to create only the ones that you wish. localedef -ci en_DK -f ISO_8859-1:1987 en_DK localedef -ci sv_SE -f ISO_8859-1:1987 sv_SE localedef -ci fi_FI -f ISO_8859-1:1987 fi_FI localedef -ci sv_FI -f ISO_8859-1:1987 sv_FI localedef -ci ro_RO -f ISO_8859-1:1987 ro_RO localedef -ci pt_PT -f ISO_8859-1:1987 pt_PT localedef -ci no_NO -f ISO_8859-1:1987 no_NO localedef -ci nl_NL -f ISO_8859-1:1987 nl_NL localedef -ci fr_BE -f ISO_8859-1:1987 fr_BE localedef -ci nl_BE -f ISO_8859-1:1987 nl_BE localedef -ci da_DK -f ISO_8859-1:1987 da_DK localedef -ci kl_GL -f ISO_8859-1:1987 kl_GL localedef -ci it_IT -f ISO_8859-1:1987 it_IT localedef -ci is_IS -f ISO_8859-1:1987 is_IS localedef -ci fr_LU -f ISO_8859-1:1987 fr_LU localedef -ci fr_FR -f ISO_8859-1:1987 fr_FR localedef -ci de_DE -f ISO_8859-1:1987 de_DE localedef -ci de_CH -f ISO_8859-1:1987 de_CH localedef -ci fr_CH -f ISO_8859-1:1987 fr_CH localedef -ci en_CA -f ISO_8859-1:1987 en_CA localedef -ci fr_CA -f ISO_8859-1:1987 fr_CA localedef -ci fo_FO -f ISO_8859-1:1987 fo_FO localedef -ci et_EE -f ISO_8859-1:1987 et_EE localedef -ci es_ES -f ISO_8859-1:1987 es_ES localedef -ci en_US -f ISO_8859-1:1987 en_US localedef -ci en_GB -f ISO_8859-1:1987 en_GB localedef -ci en_IE -f ISO_8859-1:1987 en_IE localedef -ci de_LU -f ISO_8859-1:1987 de_LU localedef -ci de_BE -f ISO_8859-1:1987 de_BE localedef -ci de_AT -f ISO_8859-1:1987 de_AT localedef -ci sl_SI -f ISO_8859-2:1987 sl_SI localedef -ci ru_RU -f ISO_8859-5:1988 ru_RU localedef -ci pl_PL -f ISO_8859-2:1987 pl_PL localedef -ci lv_LV -f BALTIC lv_LV localedef -ci lt_LT -f BALTIC lt_LT localedef -ci iw_IL -f ISO_8859-8:1988 iw_IL localedef -ci hu_HU -f ISO_8859-2:1987 hu_HU localedef -ci hr_HR -f ISO_8859-4:1988 hr_HR localedef -ci gr_GR -f ISO_8859-7:1987 gr_GR 6. Now what. After doing all the stuff above you should now be able to use the locales that have been created. Here is a simple example program. /* test.c : a simple test to see if the locales can be loaded, and * used */ #include <locale.h> #include <stdio.h> #include <time.h> main(){ time_t t; struct tm * _t; char buf[256]; time(&t); _t = gmtime(&t); setlocale(LC_TIME,""); strftime(buf,256,"%c",_t); printf("%s\n",buf); } You can use the locale program to see what your current locale environment variable settings are. $ # compile the simple test program above, and run it with $ # some different locale settings $ gcc -s -o Test test.c $ # see what the current locale is : $ locale LANG=POSIX LC_COLLATE="POSIX" LC_CTYPE="POSIX" LC_MONETARY="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_MESSAGES="POSIX" LC_ALL= $ # Ho, hum... we're using the boring C locale $ # let's change to English Canadian: $ export LC_TIME=en_CA $ Test Sat 23 Mar 1996 07:51:49 PM $ # let's try French Canadian: $ export LC_TIME=fr_CA $ Test sam 23 mar 1996 19:55:27 7. catopen bug fix. Installing the locales fixes a bug (feature ?) that is in the catopen command in Linux libc. Say you create a program that uses message catalogs, and you create an German catalog and put it in /home/peeter/catalogs/de_DE. Now upon doing the following, without the de_DE locale installed : export LC_MESSAGES=de_DE export NLSPATH=/home/peeter/catalogs/%L/%N.cat:$NLSPATH the German message catalog does not get opened, and the default mes� sages in the catgets calls are used. This is because catopen does a setlocale call to get the right message category, the setlocale fails even though the environment variable has been set. catopen then attempts to load the message catalog substituting "C" for all the "%L"'s in the NLSPATH. You can still use your message catalog without installing the locale, but you would have to explicitly set the "%L" part of the NLSPATH like export NLSPATH=/home/peeter/catalogs/de_DE/%N.cat:$NLSPATH , but this defeats the whole purpose of the locale catagory environ� ment variables. 8. Questions and Answers. This section could grow into a FAQ, but isn't really one yet. 8.1. msgcat question I am an user of LINUX, and have written the following test program: -------------------------------------------------------------------- #include <stdio.h> #include <locale.h> #include <features.h> #include <nl_types.h> main(int argc, char ** argv) { nl_catd catd; setlocale(LC_MESSAGES, ""); catd = catopen("msg", MCLoadBySet); fprintf(stderr,catgets(catd, 1, 1, "locale message fail\n")); catclose(catd); } -------------------------------------------------------------------- $ msg.m $set 1 1 locale message pass\n -------------------------------------------------------------------- If I use absolute path in catopen like catopen("/etc/locale/msg.cat",MCLoadBySet); ,I got the right result. But,if I use above example,catopen return -1 (failure). 8.2. msgcat answer This question is sort of answered in the previous section, but here is some additional information. There are a number of valid places where you can put your message catalogs. Even though you may not have NLSPATH explicitly defined in your environment settings it is defined in libc as follows : $ strings /lib/libc.so.5.4.17 | grep locale | grep %L /etc/locale/%L/%N.cat:/usr/lib/locale/%L/%N.cat:/usr /lib/locale/%N/%L:/usr/share/locale/%L/%N.cat:/usr/ local/share/locale/%L/%N.cat so you if you have done one of : $ export LC_MESSAGES=en_CA $ export LC_ALL=en_CA $ export LANG=en_CA With the NLSPATH above and the specified environment , the catopen("msg", MCLoadBySet); should work if your message catalog has been copied to any one of : /etc/locale/en_CA/msg.cat /usr/lib/locale/en_CA/msg.cat /usr/lib/locale/msg/en_CA /usr/share/locale/en_CA/msg.cat /usr/local/share/locale/en_CA/msg.cat This, however, will not work if you don't have the en_CA locale installed because the setlocale will fail, and "C" will be substituted for "%L" in the catopen routine ( rather than "en_CA" ). 9. More information. Well that's it. Hopefully this guide has been some help to you. There are probably lots of places that you can look for additional information on writing locale sensitive programs, and documents on internationalization, and localization in general. I'll bet that if you browse the web a bit you will be able to find a lot of info. Ulrich Drepper who implemented much of the gnu internationalization code has some information about internationalization and localization on his home page <http://i44www.info.uni-karlsruhe.de/~drepper>, and you can look there to start. There is also some information in the info pages for libc, and of course, there are always man pages.