107.3 Localisation and Internationalisation

Weight: 3

Goal: Configure the locale and time zone settings on a Linux system.


Term Means
Internationalisation (i18n) Designing software so it can support multiple languages, character sets, and regional formats.
Localisation (l10n) The act of adapting a system to a specific language, country, and conventions.

In practice, the exam is about three things you actually configure on a Linux system:

  1. The time zone.
  2. The locale (language, character encoding, formatting rules).
  3. The character set (UTF-8 vs. older encodings).

2. Time Zone

The hardware clock vs. the system clock

A Linux machine has two clocks:

The hardware clock can be set to either UTC (recommended for Linux-only systems) or local time (common when dual-booting Windows). Linux always works internally in UTC and converts to local time using the time zone setting.

Time zone files

Path Purpose
/usr/share/zoneinfo/ Binary database of all time zones, organized as Region/City (e.g. Europe/Paris, America/New_York, Asia/Tokyo, UTC).
/etc/localtime The active time zone — a copy of, or a symlink to, a file under /usr/share/zoneinfo/. Programs read this to know the local time zone.
/etc/timezone A plain text file containing the name of the time zone, e.g. Europe/Paris. Used on Debian-based systems; not present on all distributions.

Setting the time zone

Method 1 — modern (systemd):

timedatectl list-timezones                  # browse available zones
timedatectl set-timezone Europe/Paris       # set the system time zone
timedatectl                                 # show current settings

Method 2 — manual (works everywhere):

ln -sf /usr/share/zoneinfo/Europe/Paris /etc/localtime
echo "Europe/Paris" > /etc/timezone         # Debian/Ubuntu

Method 3 — interactive (Debian/Ubuntu):

dpkg-reconfigure tzdata

The TZ environment variable

Setting TZ in a shell overrides the system time zone for that session only:

$ date
Mon May 11 14:00:00 CEST 2026
$ TZ=Asia/Tokyo date
Mon May 11 21:00:00 JST 2026
$ export TZ=UTC               # affects this shell from now on

Useful for scripts that need to print times in a specific zone without changing the whole system.

Command Purpose
date Show or set the system clock. date, date +%Y-%m-%d, date -u (UTC).
hwclock Show or set the hardware clock. hwclock --show, hwclock --systohc, hwclock --hctosys.
timedatectl The systemd tool to manage time, time zone, and NTP synchronization.

3. Locale: Language and Regional Formatting

A locale controls how programs display dates, numbers, sort strings, what language they speak, and what character encoding they use.

Locale name format

language[_TERRITORY][.codeset][@modifier]

Examples:

Locale Meaning
C The default “raw” POSIX locale. English, ASCII, no formatting frills. Scripts that need predictable output should set LC_ALL=C.
POSIX Synonym of C.
en_US.UTF-8 American English, UTF-8 encoding.
en_GB.UTF-8 British English, UTF-8 encoding.
fr_FR.UTF-8 French (France), UTF-8.
de_DE.UTF-8 German (Germany), UTF-8.
ja_JP.UTF-8 Japanese, UTF-8.

Locale environment variables

A locale isn’t one setting — it’s a set of variables, one per category. You can set them all at once with LANG, override one or more with LC_*, or override everything with LC_ALL.

Variable Controls
LANG Default value for all LC_* variables that aren’t set explicitly.
LC_CTYPE Character classification and encoding.
LC_COLLATE String sorting order.
LC_TIME Date and time formatting.
LC_NUMERIC Decimal separator and thousands grouping.
LC_MONETARY Currency formatting.
LC_MESSAGES Language of program messages.
LC_PAPER, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION Paper size, name/address formats, units, etc.
LC_ALL Overrides every LC_* and LANG. Use sparingly — only for forcing a locale in a script.

Precedence

LC_ALL    >    LC_*    >    LANG    >    "C"  (default if nothing set)

So if LC_ALL=C is set, all other settings are ignored.

The locale command

locale                  # show all current locale variables
locale -a               # list all locales installed on the system
locale -m               # list available character maps (UTF-8, ISO-8859-1, ...)
LANG=fr_FR.UTF-8 date   # run one command in a different locale

Configuration files

File Purpose
/etc/locale.conf System-wide default locale (modern systemd-based systems, e.g. RHEL, Fedora, Arch). Usually contains LANG=en_US.UTF-8.
/etc/default/locale Same role on Debian/Ubuntu.
/etc/locale.gen List of locales to be generated (Debian/Ubuntu). Lines starting with # are skipped.
~/.profile, ~/.bash_profile A user can set LANG or LC_* here to override the system default for themselves.

Generating and switching locales

# Debian / Ubuntu
sudo dpkg-reconfigure locales              # interactive menu
sudo locale-gen en_US.UTF-8 fr_FR.UTF-8    # generate listed locales
sudo update-locale LANG=fr_FR.UTF-8        # set system default

# Modern systemd
sudo localectl list-locales
sudo localectl set-locale LANG=fr_FR.UTF-8

# RHEL / Fedora — install the language pack first
sudo dnf install glibc-langpack-fr

4. Character Encoding

A character encoding maps numbers stored on disk to characters on screen.

Key encodings to know

Encoding Notes
ASCII The original 7-bit encoding. 128 characters (English letters, digits, basic punctuation). The universal baseline.
ISO-8859-1 (Latin-1) 8-bit single-byte encoding for Western European languages. One byte = one character.
ISO-8859-15 Like ISO-8859-1 but includes the euro sign (€).
Unicode A standard that assigns a unique number (“code point”) to every character of every script in the world.
UTF-8 The dominant Unicode encoding on Linux. Variable length: 1 byte for ASCII, up to 4 bytes for other characters. ASCII-compatible.
UTF-16, UTF-32 Other Unicode encodings. Rare on Linux.

UTF-8 is the modern default everywhere. New Linux systems use locales ending in .UTF-8.

Inspecting and converting files

file mydata.txt                   # guesses encoding
# mydata.txt: UTF-8 Unicode text

iconv -f ISO-8859-1 -t UTF-8 old.txt -o new.txt
# Convert a file from Latin-1 to UTF-8.

iconv -l                          # list all encodings iconv knows

iconv is the standard tool for converting between encodings.


5. Practical Scenarios

Force English output for predictable script parsing

Many sysadmin scripts begin with:

export LC_ALL=C

This makes date, grep, sort, etc. produce output in plain ASCII English, regardless of the user’s settings. Crucial when parsing command output.

Run one command in another language

$ LANG=de_DE.UTF-8 date
Mo 11 Mai 2026 14:00:00 CEST

Sort that ignores accents

$ LC_COLLATE=C sort file.txt        # strict byte order

The default locale’s sort order can group accented characters with their base letter — sometimes desirable, sometimes not.


6. Quick Reference for the Exam

Files and directories:

Environment variables:

Commands:

Encodings to recognize:


7. Likely Exam Questions (Self-Check)

  1. Where is the binary time zone database stored? /usr/share/zoneinfo/.

  2. What two things can /etc/localtime be? Either a symlink to, or a copy of, a file under /usr/share/zoneinfo/.

  3. How do you set the system time zone to Europe/Paris using systemd? timedatectl set-timezone Europe/Paris.

  4. How do you override the time zone for a single command, without changing the system setting? Use the TZ environment variable, e.g. TZ=Asia/Tokyo date.

  5. Which environment variable overrides all other locale settings? LC_ALL.

  6. What is the precedence order between LANG, LC_TIME, and LC_ALL? LC_ALL > LC_TIME > LANG.

  7. How do you list all locales installed on the system? locale -a.

  8. What does setting LC_ALL=C do in a script, and why is it useful? It forces a plain POSIX/ASCII English locale. Output of commands becomes predictable, which is essential when scripts parse it.

  9. Which Unicode encoding is the de facto standard on Linux? UTF-8.

  10. How do you convert a file from ISO-8859-1 to UTF-8? iconv -f ISO-8859-1 -t UTF-8 input.txt -o output.txt.

  11. On Debian, how do you enable and generate a new locale (e.g. fr_FR.UTF-8)? Uncomment it in /etc/locale.gen, then run locale-gen (or use dpkg-reconfigure locales).

  12. What is the difference between the hardware clock and the system clock? The hardware clock is the battery-backed clock on the motherboard, read at boot. The system clock is maintained by the kernel while the system is running.

  13. What command synchronizes the hardware clock from the system clock? hwclock --systohc.

  14. A user reports date output in French. They want English just for their shell. What do they do? Add export LANG=en_US.UTF-8 (or export LC_ALL=en_US.UTF-8) to their ~/.bashrc or ~/.profile.

  15. What is the practical difference between ASCII and UTF-8? ASCII has 128 single-byte characters. UTF-8 is a variable-length Unicode encoding that is fully backward-compatible with ASCII but can also represent every character in the Unicode standard.