Monday, February 09, 2009

Unicode in Web Application

Let's say you store a unicode string in a database. The database encoding is UTF-8. The string is 似水流年.

In your web application, you allow users type 似水流年 and you get it from HTTP request and obtain a string.

When you compare the string obtained from request and the string stored in the database, they will not match.

What you need to do is to convert the non-unicode string to unicode. Here is one of the way, simple and elegant.

str = new String(str.getBytes("ISO-8859-1"), "UTF-8");

You would need to handle the UnsupportedEncodingException, but I doubt if there is any JVM not supporting ISO-8859-1 to UTF-8 conversion.

I am not sure if the ISO-8859-1 shall be different on different servers. I tried on my development workstation which is a Windows XP + Eclipse 3.3 + Tomcat 6 + Java 1.6 and my production server which is a Linux RH + Tomcat 6 + Java 1.6. They all work very well and so far I am OK with it hard coded.

No comments: