Jaa


Converting text file code pages

I've said "use Unicode" a lot, but sometimes there are programs that aren't doing what you'd expect, and outputting stuff in a different code page.  Additionally, you might sometimes encounter a text file that was created using the system code page of a different machine.  (Like if someone emailed me a txt file from a Russian computer, I wouldn't necessarily be able to make sense of it at first).

So, if you happen to have a text file in one encoding that you need to be able to read, you can write a little program to convert it.  Or, if you find this blog post, you could even copy my little program to do that:

using System;using System.IO;using System.Text; class Convert{    static void Main(string[] args)    {        if (args.Length != 3)        {            Console.WriteLine("Usage: convert.exe infile.txt outfile.txt incodepage");            Console.WriteLine("       eg: convert data.1252.txt data.utf8.txt 1252");            Console.WriteLine("       or: convert data.1252.txt data.utf8.txt windows-1252");            Console.WriteLine("      (output is always UTF-8)");            return;        }        int codepage = 0;        Encoding enc;        if (int.TryParse(args[2], out codepage))        {            enc = Encoding.GetEncoding(codepage);        }        else        {            enc = Encoding.GetEncoding(args[2]);        }         StreamReader reader = new StreamReader(args[0], enc);                             StreamWriter writer = new StreamWriter(args[1], false, Encoding.UTF8);                              String str;        while ((str = reader.ReadLine()) != null)        {            writer.WriteLine(str);        }        writer.Close();        reader.Close();    }}I've stuck the source and a compiled version in a convert.zip

Comments

  • Anonymous
    January 24, 2013
    Why not wrap your reader and writer with using statements and remove the Close calls?

  • Anonymous
    January 24, 2013
    No reason, just because I didn't do it that way :)

  • Anonymous
    January 24, 2013
    Or you could use PowerShell: gc Inputfile.txt | Out-File Outputfile.txt utf8

  • Anonymous
    January 25, 2013
    You'd have to do a little more to use random code pages for PowerShell input.