Skip to main content
To KTH's start page

Character encoding on remote connections – strange accents

When files are moved between different operating systems, or stored in a common file system such as AFS, you may sometimes find that characters such as ÅÄÖ are shown incorrectly.

A character encoding determines which binary sequence is used to represent each letter, or other character. Many different ways to encode text have been used throughout the years. CSC's Unix systems have traditionally used “Latin-1” (ISO-8859-1), which contains the letters used in western European languages. Other operating systems have used other encodings, e.g. “Mac Roman” on Mac OS, “CP-1252” on MS Windows, or “CP-437” on MS DOS. All of these are extensions of ASCII (basically, American letters, digits and punctuation), which means that such characters are displayed correctly. But accented letters differ. In particular, the Swedish letters ÅÄÖ are not displayed correctly

These days, most OSs can use some form of UTF-8, but you may need to configure the applications to use it. To do so you choose a locale, which defines formatting many settings specific to a language and region, for example:

  • Number formatting (e.g. using “1 234,5” or “1,234.5”)
  • Date and time formatting
  • String collation (i.e. sort order, so that “ångström” is sorted under A in English but Å in Swedish)

The locale is written as «language»_«variant».«encoding», e.g. “en_US.UTF-8” (American English, UTF-8) or “en_GB.ISO8859-1” (British English, latin-1).

Wikipedia's explanation of latin1 (external link)

Wikipedia's explanation of locales (external link)

Converting a file

To convert the contents of a file, you can open it in a locale-aware editor, and “save as...”
a different encoding, or use the iconv command-line tool:

iconv -f iso8859-1 -t utf-8 < original.txt > new.txt

When logging in remotely (with SSH), you can normally configure your local settings to be forwarded. Unfortunately, not all SSH servers support this. Currently (as of November 2010), CSC's Solaris SSH server does not permit forwarding of environment variables, which is needed for this to work. The relevant locales (en_US.UTF-8, sv_SE.UTF-8) are available on Solaris, and you can set them manually, but they won't be used by default.

Problem: ÅÄÖ shown as ���

Your application uses latin1 characters, but your terminal (or editor) tries to display them as UTF-8. Configure your application to use UTF-8 (see below), or change your terminal settings to use ISO-8859-1.

Problem: ÅÄÖ shown as åäö

Your application uses UTF-8, but they are displayed as latin1. Configure your application to use ISO-8859-1 (see below), or change your terminal settings to use UTF-8.

Problem: ÅÄÖ shown as ���

Your application is printing U+FFFD, the Unicode replacement character (�, usually displayed as a question mark on inverted background). This is then converted as if it were in latin1 to UTF-8 (a U+FFFD character in UTF-8 uses three bytes). Check the settings for all applications — including the terminal window — to ensure that they all agree on which encoding to use.

Select locale (application settings)

If your application is locale aware (most are, but not some legacy CSC applications), then you can select the locale by

export LC_ALL=en_US.UTF-8 ## bash

setenv LC_ALL en_US.UTF-8 ## tcsh

and then run your application. To only configure the character encoding, change the LC_CTYPE environment variable instead.

You can also select which locale to use when you log in locally, but this may cause trouble when you use a different operating system. We recommend that you use the default settings and re-configure the applications instead.

Configuring terminal encoding

Ubuntu

The encoding used by Gnome's terminal can be change under Terminal and then Set Character Encoding, but unless you have previously done so, you need to add the “Western (ISO-8859-1)” encoding.

Ubuntu terminal

Mac OS X

The default settings for Terminal.app is to use UTF-8. This can be changed by going to Terminal then Preferences… then Advanced.

Terminal.app preferences

The default for X11.app's xterm is to use latin1. You can change this by editing the startup sequence for X11, but it's easier to just use Terminal.app.

X11.app's xterm
Terminal.app

MS Windows

PuTTY's settings can be changed under Window then Translation in the configuration dialog.

PuTTY's settings

CSC's Windows computers currently run SSH Secure Shell from Tectia (formerly SSH Communications Security Corp). It is not UTF-8 aware, and will default to using latin1 encoding.