Category Archives: UTF

UTF-16 to UTF-8

I recently downloaded a csv file of a report from a report generating system and tried opening it in vi editor and all I saw was some garbage like a binary file. When opened in notepad, it looked fine as expected, as a plain text file. I thought there was problem with the ftp and I tried different things like zipping the file and then doing the ftp and so on. But nothing worked. Opening with emacs didn’t work either. Then, I tried opening it with some notepad like application on Linux and the first thing it did was show an error saying that it didn’t understand the encoding and asked me to pick one. I picked UTF-16 and then it started showing up.

Well, now that I knew what the problem was, how do I convert it to UTF-8? I need to convert it to UTF-8 because, the version of perl I was using didn’t support various encodings (research on the web indicated the need to compile with perlio option or something like that and that wasn’t the case for me). So, I used Java to achieve this. It’s really very simple. Here is what it would look like.

import java.io.*;

public class UTF16toUTF8 {
  public static void main(String[] args) {
     BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(args[0]),"utf16"));
     String line;
     while((line = br.readLine()) != null)
        System.out.println(line);
  }
}

That’s it. This simple piece of code was a real time saver for me.

Advertisements

2 Comments

Filed under Tech - Tips, UTF