Thursday, July 2, 2015

Encoding Issues Related With Spring MVC Web-Service Project Based On Tomcat

It won't be more agonised and perplexed when we facing with encoding problems in our project. Here's a brief summary on most of (if possible) scenarios encoding problems may occur. Of all the scenarios, I'll take UTF-8 as an instance.

Project Package Encoding

When building our project via maven,  we need to specify charset encoding in the following way:
<build>
  <plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.0.2</version>
        <configuration>
            <source>1.7</source>
            <target>1.7</target>
            <encoding>UTF-8</encoding>
        </configuration>
    </plugin>
  </plugins>
</build>

Web Request Encoding

As for web request charset encoding, firstly we need to clarify that there are two kinds of request encoding, namely, 'URI encoding' and 'body encoding'. When we do GET operation upon an URL, all parameters in the URL will be applied 'URI encoding',whose default value is 'ISO-8859-1', whereas POST operation will go through 'body encoding'.

URI Encoding

If we don't intend to change the default charset encoding for URI encoding, we could retrieve the correct String parameter using the following code in SpringMVC controller.
new String(request.getParameter("cdc").getBytes("ISO-8859-1"), "utf-8");

Conversely, the way to change the default charset encoding is to edit '$TOMCAT_HOME/conf/server.xml' file, all <connector> tags in which need to be set `URIEncoding="UTF-8"` as an attribute.
<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443"
           URIEncoding="UTF-8" />
...
<Connector port="8009" protocol="AJP/1.3"
           redirectPort="8443"
           URIEncoding="UTF-8" />
...

Body Encoding

This is set by adding filter in web.xml of your project:
<filter>
    <filter-name>utf8-encoding</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>utf-8</param-value>
    </init-param>
    <init-param>
        <param-name>forceEncoding</param-name>
        <param-value>true</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>utf8-encoding</filter-name>
    <servlet-name>your_servlet_name</servlet-name>
</filter-mapping>

HttpClient Request Encoding

When applying HttpClient for POST request, there are times that we need to instantiate a StringEntity object. Be sure that you always add encoding information explicitly.
StringEntity se = new StringEntity(xmlRequestData, CharEncoding.UTF_8);

Java String/Byte Conversion Encoding

Similarly to the above StringEntity scenarios in HttpClient Request Encoding, when it comes to talking about String and byte in Java, we always need to specify encoding voluntarily. String has the concept of charset encoding whereas byte does not.
byte[] bytes = "some_unicode_words".getBytes(CharEncoding.UTF_8);
new String(bytes, CharEncoding.UTF_8);

Tomcat catalog.out Encoding

Tomcat's inner logging charset encoding could be set in '$TOMCAT_HOME/bin/catalina.sh'. Find the location of 'JAVA_OPTS' keyword, then append the following setting:
JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=utf-8"

log4j Log File Encoding

Edit 'log4j.properties' file, add the following property under the corresponding appender.
log4j.appender.appender_name.encoding = UTF-8

OS Encoding

Operation System's charset encoding could be checked by `locale` command on Linux. Append the following content to '/etc/profile' will set encoding to UTF-8.
# -- locale --
export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_COLLATE=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_TELEPHONE=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LC_IDENTIFICATION=en_US.UTF-8
export LC_ALL=en_US.UTF-8





REFERENCE:
1. Get Parameter Encoding
2. Spring MVC UTF-8 Encoding

No comments:

Post a Comment