Friday, July 10, 2015

Hive throws Socket Timeout Exception: Read time out

Hive on our gateway applies thrift-server for the purpose of connecting to metadata at remote MySQL database. Yesterday, there was an error complaining about 'MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)', and it just emerged out of the void.

Firstly, I checked which remote thrift-server current Hive is connecting to via netstat command:
# On terminal_window_1
$ nohup hive -e "use target_db;" &

# On another terminal_window_2
$ netstat -anp | grep 26232

Invoke the second command on terminal_window_2 right after executing the first command, in which, 26232 is the first command's PID. This will end up showing all network connections the specific process is opening:
Proto Recv-Q Send-Q Local Address               Foreign Address                State       PID/Program name
tcp        0      0 127.0.0.1:21786             192.168.7.11:9083              ESTABLISHED 26232/java

Obviously, current hive is connecting to a thrift-server residing at 192.168.7.11:9083. Execute command `telnet 192.168.7.11 9083` to verify whether it is accessible to the target IP and port. In our case, it is accessible.
96:/root>telnet 192.168.7.11 9083
Trying 192.168.7.11...
Connected to undefine.inidc.com.cn (192.168.7.11).
Escape character is '^]'.

Then I switch to that thrift-server, find the process via `ps aux | grep HiveMetaStore`, kill it and then start it again by `hive --service metastore`, it complains 'Access denied for user 'Diablo'@'d82.hide.cn''. Apparently, it is related with MySQL privileges. run the following command in MySQL as user root:
GRANT ALL PRIVILEGES ON Diablo.* TO 'Diablo'@'d82.hide.cn' IDENTIFIED BY 'Diablo'

Restart thrift-server again, and eventually, everything's back to normal.







Wednesday, July 8, 2015

Issue Related With The Owner Of Hive Dirs/Files Created On HDFS is Not The User Executing Hive Command (Hive User Impersonation)

When executing hive command from one of our gateways in any user 'A', then doing some operations which will create files/dirs to hive warehouse on HDFS, the owner of newly-created files/dirs will always be 'supertool', which is the creator of hiveserver2(metastore) process, whichever user 'A' is:
###-- hiveserver2(metastore) belongs to user 'supertool' --
K1201:~>ps aux | grep -v grep | grep metastore.HiveMetaStore --color
500      30320  0.0  0.5 1209800 263548 ?      Sl   Jan28  59:29 /usr/java/jdk1.7.0_11//bin/java -Xmx10000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/workspace/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/workspace/hadoop -Dhadoop.id.str=supertool -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/workspace/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/workspace/hive-0.13.0-bin/lib/hive-service-0.13.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore
K1201:~>cat /etc/passwd | grep 500
supertool:x:500:500:supertool:/home/supertool:/bin/bash

###-- invoke hive command in user 'withdata' and create a database and table --
114:~>whoami
withdata
114:~>hive
hive> create database test_db;
OK
Time taken: 1.295 seconds
hive> use test_db;
OK
Time taken: 0.031 seconds
hive> create table test_tbl(id int);
OK
Time taken: 0.864 seconds

###-- the newly-created database and table belongs to user 'supertool' --
114:~>hadoop fs -ls /user/supertool/hive/warehouse | grep test_db
drwxrwxr-x   - supertool    supertool             0 2015-07-08 15:13 /user/supertool/hive/warehouse/test_db.db
114:~>hadoop fs -ls /user/supertool/hive/warehouse/test_db.db
Found 1 items
drwxrwxr-x   - supertool supertool          0 2015-07-08 15:13 /user/supertool/hive/warehouse/test_db.db/test_tbl

This can be explained by Hive User Impersonation. By default, HiveServer2 performs the query processing as the user who submitted the query. But if related parameters, which are as follows, are set wrongly, the query will run as the user that the hiveserver2 process runs as. The correct way to configure is as below:
<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
  <description>Set this property to enable impersonation in Hive Server 2</description>
</property>
<property>
  <name>hive.metastore.execute.setugi</name>
  <value>true</value>
  <description>Set this property to enable Hive Metastore service impersonation in unsecure mode. In unsecure mode, setting this property to true will cause the metastore to execute DFS operations using the client's reported user and group permissions. Note that this property must be set on both the client and server sides. If the client sets it to true and the server sets it to false, the client setting will be ignored.</description>
</property>

The above settings is self-explained well in their descriptions. Thus there's a need to rectify our hive-site.xml and then restart our hiveserver2(metastore) service.

At this point, there's a puzzling problem that no matter how I change my HIVE_HOME/conf/hive-site.xml, the corresponding setting is not altered at runtime. Eventually, I found that there's another hive-site.xml under HADOOP_HOME/etc/hadoop directory. Consequently, it is advised that we should not put any hive-related configuration files under HADOOP_HOME directory in avoidance of confusion. The official configuration files loading order of precedence can be found at REFERENCE_5.

After revising HIVE_HOME/conf/hive-site.xml, the following commands have guaranteed that the preceding problem is addressed properly.
###-- check runtime hive parameters related with hive user impersonation --
k1227:/home/workspace/hive-0.13.0-bin>hive
hive> set system:user.name;
system:user.name=hadoop
hive> set hive.server2.enable.doAs;
hive.server2.enable.doAs=true
hive> set hive.metastore.execute.setugi;
hive.metastore.execute.setugi=true

###-- start hiveserver2(metastore) again --
k1227:/home/workspace/hive-0.13.0-bin>hive --service metastore
Starting Hive Metastore Server
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
15/07/08 14:28:59 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/07/08 14:28:59 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/workspace/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/workspace/hive-0.13.0-bin/lib/jud_test.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
^Z
[1]+  Stopped                 hive --service metastore
k1227:/home/workspace/hive-0.13.0-bin>bg 1
[1]+ hive --service metastore &
k1227:/home/workspace/hive-0.13.0-bin>ps aux | grep metastore
hadoop    6597 26.6  0.4 1161404 275564 pts/0  Sl   14:28   0:14 /usr/java/jdk1.7.0_11//bin/java -Xmx20000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/workspace/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/workspace/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/workspace/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/workspace/hive-0.13.0-bin/lib/hive-service-0.13.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop   11936  0.0  0.0 103248   868 pts/0    S+   14:29   0:00 grep metastore

In which, `set system:user.name` will display current user executing hive command; `set [parameter]` will display the specific parameter's value at runtime. Alternatively, we could list all runtime parameters via `set` command in hive, or from command line: `hive -e "set;" > hive_runtime_parameters.txt`.

A possible exception 'TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083' will be complained when launching metastore service. According to REFERENCE_6, this is because another metastore or sort of service occupies 9083 port, which is the default port for hive metastore. Kill it beforehand:
k1227:/home/workspace/hive-0.13.0-bin>lsof -i:9083
COMMAND  PID   USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
java    3499 hadoop  236u  IPv4 3913377019      0t0  TCP *:9083 (LISTEN)
k1227:/home/workspace/hive-0.13.0-bin>kill -9 3499

In this way, we could create database/table again, the owner of corresponding HDFS files/dirs will be changed to the user invoking hive command.



REFERENCE:
1. Setting Up HiveServer2 - Impersonation
2. hive-default.xml.template [hive.metastore.execute.setugi]
3. Hive User Impersonation -mapr
4. Configuring User Impersonation with Hive Authorization - drill
5. AdminManual Configuration - hive [order of precedence]
6. TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083 - cloudera community




Thursday, July 2, 2015

Encoding Issues Related With Spring MVC Web-Service Project Based On Tomcat

It won't be more agonised and perplexed when we facing with encoding problems in our project. Here's a brief summary on most of (if possible) scenarios encoding problems may occur. Of all the scenarios, I'll take UTF-8 as an instance.

Project Package Encoding

When building our project via maven,  we need to specify charset encoding in the following way:
<build>
  <plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.0.2</version>
        <configuration>
            <source>1.7</source>
            <target>1.7</target>
            <encoding>UTF-8</encoding>
        </configuration>
    </plugin>
  </plugins>
</build>

Web Request Encoding

As for web request charset encoding, firstly we need to clarify that there are two kinds of request encoding, namely, 'URI encoding' and 'body encoding'. When we do GET operation upon an URL, all parameters in the URL will be applied 'URI encoding',whose default value is 'ISO-8859-1', whereas POST operation will go through 'body encoding'.

URI Encoding

If we don't intend to change the default charset encoding for URI encoding, we could retrieve the correct String parameter using the following code in SpringMVC controller.
new String(request.getParameter("cdc").getBytes("ISO-8859-1"), "utf-8");

Conversely, the way to change the default charset encoding is to edit '$TOMCAT_HOME/conf/server.xml' file, all <connector> tags in which need to be set `URIEncoding="UTF-8"` as an attribute.
<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443"
           URIEncoding="UTF-8" />
...
<Connector port="8009" protocol="AJP/1.3"
           redirectPort="8443"
           URIEncoding="UTF-8" />
...

Body Encoding

This is set by adding filter in web.xml of your project:
<filter>
    <filter-name>utf8-encoding</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>utf-8</param-value>
    </init-param>
    <init-param>
        <param-name>forceEncoding</param-name>
        <param-value>true</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>utf8-encoding</filter-name>
    <servlet-name>your_servlet_name</servlet-name>
</filter-mapping>

HttpClient Request Encoding

When applying HttpClient for POST request, there are times that we need to instantiate a StringEntity object. Be sure that you always add encoding information explicitly.
StringEntity se = new StringEntity(xmlRequestData, CharEncoding.UTF_8);

Java String/Byte Conversion Encoding

Similarly to the above StringEntity scenarios in HttpClient Request Encoding, when it comes to talking about String and byte in Java, we always need to specify encoding voluntarily. String has the concept of charset encoding whereas byte does not.
byte[] bytes = "some_unicode_words".getBytes(CharEncoding.UTF_8);
new String(bytes, CharEncoding.UTF_8);

Tomcat catalog.out Encoding

Tomcat's inner logging charset encoding could be set in '$TOMCAT_HOME/bin/catalina.sh'. Find the location of 'JAVA_OPTS' keyword, then append the following setting:
JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=utf-8"

log4j Log File Encoding

Edit 'log4j.properties' file, add the following property under the corresponding appender.
log4j.appender.appender_name.encoding = UTF-8

OS Encoding

Operation System's charset encoding could be checked by `locale` command on Linux. Append the following content to '/etc/profile' will set encoding to UTF-8.
# -- locale --
export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_COLLATE=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_TELEPHONE=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LC_IDENTIFICATION=en_US.UTF-8
export LC_ALL=en_US.UTF-8





REFERENCE:
1. Get Parameter Encoding
2. Spring MVC UTF-8 Encoding