When executing
hive command from one of our gateways in any user 'A', then doing some operations which will create files/dirs to hive warehouse on HDFS, the owner of newly-created files/dirs will always be 'supertool', which is the creator of
hiveserver2(metastore) process, whichever user 'A' is:
###-- hiveserver2(metastore) belongs to user 'supertool' --
K1201:~>ps aux | grep -v grep | grep metastore.HiveMetaStore --color
500 30320 0.0 0.5 1209800 263548 ? Sl Jan28 59:29 /usr/java/jdk1.7.0_11//bin/java -Xmx10000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/workspace/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/workspace/hadoop -Dhadoop.id.str=supertool -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/workspace/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/workspace/hive-0.13.0-bin/lib/hive-service-0.13.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore
K1201:~>cat /etc/passwd | grep 500
supertool:x:500:500:supertool:/home/supertool:/bin/bash
###-- invoke hive command in user 'withdata' and create a database and table --
114:~>whoami
withdata
114:~>hive
hive> create database test_db;
OK
Time taken: 1.295 seconds
hive> use test_db;
OK
Time taken: 0.031 seconds
hive> create table test_tbl(id int);
OK
Time taken: 0.864 seconds
###-- the newly-created database and table belongs to user 'supertool' --
114:~>hadoop fs -ls /user/supertool/hive/warehouse | grep test_db
drwxrwxr-x - supertool supertool 0 2015-07-08 15:13 /user/supertool/hive/warehouse/test_db.db
114:~>hadoop fs -ls /user/supertool/hive/warehouse/test_db.db
Found 1 items
drwxrwxr-x - supertool supertool 0 2015-07-08 15:13 /user/supertool/hive/warehouse/test_db.db/test_tbl
This can be explained by
Hive User Impersonation. By default, HiveServer2 performs the query processing as the user who submitted the query. But if related parameters, which are as follows, are set wrongly, the query will run as the user that the hiveserver2 process runs as. The correct way to configure is as below:
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
<description>Set this property to enable impersonation in Hive Server 2</description>
</property>
<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
<description>Set this property to enable Hive Metastore service impersonation in unsecure mode. In unsecure mode, setting this property to true will cause the metastore to execute DFS operations using the client's reported user and group permissions. Note that this property must be set on both the client and server sides. If the client sets it to true and the server sets it to false, the client setting will be ignored.</description>
</property>
The above settings is self-explained well in their descriptions. Thus there's a need to rectify our hive-site.xml and then restart our hiveserver2(metastore) service.
At this point, there's a puzzling problem that no matter how I change my HIVE_HOME/conf/hive-site.xml, the corresponding setting is not altered at runtime. Eventually, I found that there's another hive-site.xml under HADOOP_HOME/etc/hadoop directory. Consequently, it is advised that we should not put any hive-related configuration files under HADOOP_HOME directory in avoidance of confusion. The official configuration files loading order of precedence can be found at REFERENCE_5.
After revising HIVE_HOME/conf/hive-site.xml, the following commands have guaranteed that the preceding problem is addressed properly.
###-- check runtime hive parameters related with hive user impersonation --
k1227:/home/workspace/hive-0.13.0-bin>hive
hive> set system:user.name;
system:user.name=hadoop
hive> set hive.server2.enable.doAs;
hive.server2.enable.doAs=true
hive> set hive.metastore.execute.setugi;
hive.metastore.execute.setugi=true
###-- start hiveserver2(metastore) again --
k1227:/home/workspace/hive-0.13.0-bin>hive --service metastore
Starting Hive Metastore Server
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
15/07/08 14:28:59 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
15/07/08 14:28:59 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/07/08 14:28:59 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/workspace/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/workspace/hive-0.13.0-bin/lib/jud_test.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
^Z
[1]+ Stopped hive --service metastore
k1227:/home/workspace/hive-0.13.0-bin>bg 1
[1]+ hive --service metastore &
k1227:/home/workspace/hive-0.13.0-bin>ps aux | grep metastore
hadoop 6597 26.6 0.4 1161404 275564 pts/0 Sl 14:28 0:14 /usr/java/jdk1.7.0_11//bin/java -Xmx20000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/workspace/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/workspace/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/workspace/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/workspace/hive-0.13.0-bin/lib/hive-service-0.13.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop 11936 0.0 0.0 103248 868 pts/0 S+ 14:29 0:00 grep metastore
In which, `
set system:user.name` will display current user executing
hive command; `
set [parameter]` will display the specific parameter's value at runtime. Alternatively, we could list all runtime parameters via `
set` command in hive, or from command line: `
hive -e "set;" > hive_runtime_parameters.txt`.
A possible exception '
TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083' will be complained when launching metastore service. According to REFERENCE_6, this is because another metastore or sort of service occupies 9083 port, which is the default port for hive metastore. Kill it beforehand:
k1227:/home/workspace/hive-0.13.0-bin>lsof -i:9083
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 3499 hadoop 236u IPv4 3913377019 0t0 TCP *:9083 (LISTEN)
k1227:/home/workspace/hive-0.13.0-bin>kill -9 3499
In this way, we could create database/table again, the owner of corresponding HDFS files/dirs will be changed to the user invoking
hive command.
REFERENCE:
1.
Setting Up HiveServer2 - Impersonation
2.
hive-default.xml.template [hive.metastore.execute.setugi]
3.
Hive User Impersonation -mapr
4.
Configuring User Impersonation with Hive Authorization - drill
5.
AdminManual Configuration - hive [order of precedence]
6.
TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083 - cloudera community