UTF-16 to UTF-8

I recently downloaded a csv file of a report from a report generating system and tried opening it in vi editor and all I saw was some garbage like a binary file. When opened in notepad, it looked fine as expected, as a plain text file. I thought there was problem with the ftp and I tried different things like zipping the file and then doing the ftp and so on. But nothing worked. Opening with emacs didn’t work either. Then, I tried opening it with some notepad like application on Linux and the first thing it did was show an error saying that it didn’t understand the encoding and asked me to pick one. I picked UTF-16 and then it started showing up.

Well, now that I knew what the problem was, how do I convert it to UTF-8? I need to convert it to UTF-8 because, the version of perl I was using didn’t support various encodings (research on the web indicated the need to compile with perlio option or something like that and that wasn’t the case for me). So, I used Java to achieve this. It’s really very simple. Here is what it would look like.

import java.io.*;

public class UTF16toUTF8 {
  public static void main(String[] args) {
     BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(args[0]),"utf16"));
     String line;
     while((line = br.readLine()) != null)
        System.out.println(line);
  }
}

That’s it. This simple piece of code was a real time saver for me.

Advertisements

2 Comments

Filed under Tech - Tips, UTF

2 responses to “UTF-16 to UTF-8

  1. Eran

    Hi

    I guess when you say “report from a report generating system ” you mean google adsense report 😉
    I have the same problem , but i don’t use java .
    do you have a vbs script that do the job ?

    Thnaksd

  2. Mark

    Thanks. This was exactly what I needed. I was given the task of importing data generated by a system that I discovered was exporting data in UTF-16 format. It look right in a text editor, but wouldn’t load, and a quick glance via a hex editor revealed the character 0 values.

    Your tiny but useful utility saved the day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s