Thursday, April 9, 2015

NameNode Hangs After Startup

Phenomenon

Recently, namenodes in our Hadoop cluster hang frequently, with the phenomenon that HDFS command will stuck or SocketTimeout will be thrown. When checking on 'hadoop-hadoop-namenode.log', no valuable information is provided. But in log 'hadoop-hadoop-namenode.out', the following error is thrown:
Exception in thread "Socket Reader #1 for port 8020" java.lang.OutOfMemoryError: Java heap space
Exception in thread "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeResourceMonitor@5c5df228" java.lang.OutOfMemoryError: Java heap space

In the meantime, we could diagnose the process of NameNode via `jstat` command:
[hadoop@K1201 hadoop]$ jstat -gcutil 31686
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00 100.00 100.00  98.79     20   27.508    48  335.378  362.886

As we can see, Full GC Time (FGCT) is predominating the Total Time of GC (GCT), thus a lack of memory for NameNode is the culprit.

Solution

Apparently, namenode is complaining an OOM error. The way to increase heap space for namenode is in the configuration file '$HADOOP_HOME/etc/hadoop/hadoop-env.sh', where to put the following commands:
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=20000
export HADOOP_NAMENODE_INIT_HEAPSIZE="15000"

Eventually, we should restart our namenode and check its current -Xmx (which stands for heap size) attribute via:
jinfo <namenode_PID> | grep -i xmx --color

We shall see that it has already changed to what we've set previously.

Alternatively, we could check memory status via HDFS monitor webpage as well:












No comments:

Post a Comment