Phenomenon
Recently, namenodes in our Hadoop cluster hang frequently, with the phenomenon that HDFS command will stuck or SocketTimeout will be thrown. When checking on 'hadoop-hadoop-namenode.log', no valuable information is provided. But in log 'hadoop-hadoop-namenode.out', the following error is thrown:Exception in thread "Socket Reader #1 for port 8020" java.lang.OutOfMemoryError: Java heap space Exception in thread "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeResourceMonitor@5c5df228" java.lang.OutOfMemoryError: Java heap space
In the meantime, we could diagnose the process of NameNode via `jstat` command:
[hadoop@K1201 hadoop]$ jstat -gcutil 31686 S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 0.00 100.00 100.00 98.79 20 27.508 48 335.378 362.886
As we can see, Full GC Time (FGCT) is predominating the Total Time of GC (GCT), thus a lack of memory for NameNode is the culprit.
Solution
Apparently, namenode is complaining an OOM error. The way to increase heap space for namenode is in the configuration file '$HADOOP_HOME/etc/hadoop/hadoop-env.sh', where to put the following commands:# The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=20000 export HADOOP_NAMENODE_INIT_HEAPSIZE="15000"
Eventually, we should restart our namenode and check its current -Xmx (which stands for heap size) attribute via:
jinfo <namenode_PID> | grep -i xmx --color
We shall see that it has already changed to what we've set previously.
Alternatively, we could check memory status via HDFS monitor webpage as well:
No comments:
Post a Comment