How to change character encoding of a text file on Linux

Last updated on August 10, 2020 by Gabriel Cánepa

Question: I have an "iso-8859-1"-encoded subtitle file which shows broken characters on my Linux system, and I would like to change its text encoding to "utf-8" character set. In Linux, what is a good tool to convert character encoding in a text file?

As you already know, computers can only handle binary numbers at the lowest level - not characters. When a text file is saved, each character in that file is mapped to bits, and it is those "bits" that are actually stored on disk. When an application later opens that text file, each of those binary numbers are read and mapped back to the original characters that are understood by us human. This "save and open" process is best performed when all applications that need access to a text file "understand" its encoding, meaning the way binary numbers are mapped to characters, and thus can ensure a "round trip" of understandable data.

If different applications do not use the same encoding while dealing with a text file, non-readable characters will be shown wherever special characters are found in the original file. By special characters we mean those that are not part of the English alphabet, such as accented characters (e.g., ñ, á, ü).

The questions then become: 1) how can I know which character encoding a certain text file is using?, and 2) how can I convert it to some other encoding of my choosing?

Step One: Detect Character Encoding of a File

In order to find out the character encoding of a file, we will use a commad-line tool called file. Since the file command is a standard UNIX program, we can expect to find it in all modern Linux distros.

Run the following command:

$ file --mime-encoding filename

Step Two: Find Out Supported Text Encodings

The next step is to check what kinds of text encodings are supported on your Linux system. For this, we will use a tool called iconv with the -l flag (lowercase L), which will list all the currently supported encodings.

$ iconv -l

The iconv utility is part of the the GNU libc libraries, so it is available in all Linux distributions out-of-the-box.

Step Three: Convert Text Encoding

Once we have selected a target encoding among those supported on our Linux system, let's run the following command to perform the conversion:

$ iconv -f old_encoding -t new_encoding filename

For example, to convert iso-8859-1 to utf-8:

$ iconv -f iso-8859-1 -t utf-8 input.txt

Knowing how to use these tools together as we have demonstrated, you can for example fix a broken subtitle file:

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2021 ‒ AboutWrite for UsFeed ‒ Powered by DigitalOcean