JobHistory Server is a standalone module in hadoop2, and will be started or stopped separately apart from start-all.sh and stop-all.sh. It serves as the job history logger, which will log down all the info in configured filesystem from the birth of a MapReduce task to its death.
JobHistory logs can be found from the page shown below:
Configuration & Command
There are two arguments related to the startup and monitor-page of jobhistory:
<property> <name>mapreduce.jobhistory.address</name> <value>host:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>host:19888</value> </property>
And another three arguments related to the storaging path of job history files:
name | value | description |
yarn.app.mapreduce.am.staging-dir | /tmp/hadoop-yarn/staging | The staging dir used while submitting jobs. |
mapreduce.jobhistory.intermediate-done-dir | ${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate | |
mapreduce.jobhistory.done-dir | ${yarn.app.mapreduce.am.staging-dir}/history/done |
We'd better `mkdir` and `chmod` of the above three directories ourselves.
hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate hadoop fs -chmod -R 777 /tmp/hadoop-yarn/staging/history/done_intermediate hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done hadoop fs -chmod -R 777 /tmp/hadoop-yarn/staging/history/done
The command to start and stop JobHistory Server is quite easy:
${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh stop historyserver
Procedure of Logging in History Server
When a MapReduce application starts, history server will write logs in ${yarn.app.mapreduce.am.staging-dir}/${current_user}/.staging/job_XXXXX_XXX, in which there are three files: .jhist, .summary and .xml, representing job history, job summary and configuration file, respectively.
When this application is finished/killed/failed, the log info will be copied to ${mapreduce.jobhistory.intermediate-done-dir}/${current_user}. This procedure is implemented at "org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler".
After copied to ${mapreduce.jobhistory.intermediate-done-dir}/${current_user}, The job history file will eventually be moved to ${mapreduce.jobhistory.done-dir} by "org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager".
All logs for this procedure will be recorded to ${HADOOP_HOME}/logs/userlogs, which is configured by argument 'yarn.nodemanager.log-dirs', in each NodeManager node provided yarn-log-aggregation is not enabled.
NullPointerException With History Server
We're facing a problem that some of our MapReduce tasks, especially for long time-consuming tasks, will throw NullPointerException when the job completes, the stacktrace is as follows:
14/07/22 06:37:11 INFO mapreduce.Job: map 100% reduce 98% 14/07/22 06:37:44 INFO mapreduce.Job: map 100% reduce 99% 14/07/22 06:38:30 INFO mapreduce.Job: map 100% reduce 100% 14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 14/07/22 06:39:02 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 14/07/22 06:39:02 ERROR security.UserGroupInformation: PriviledgedActionException as: rohitsarewar (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) Exception in thread "main" java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330) at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382) at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) at com.bigdata.mapreduce.esc.escDriver.main(escDriver.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): j at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolH at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBSer at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2. at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(P at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine at com.sun.proxy.$Proxy12.getTaskAttemptCompletionEvents(Unknown Source) at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClie at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDeleg ... 16 more
It seems like the remote history server object can not be found after the MapReduce job is done and try to invoke method on that object via IPC. Finally, we solve this problem by changing argument 'dfs.client.socket-timeout' for JobHistory service to '3600000', which is 1 hour. Because of high pressure on our HDFS cluster, there will be some delay or hanging when sending request to HDFS, thus we have to set this argument separately for JobHistory service.
Notice that argument 'dfs.client.socket-timeout' in hdfs-site.xml for start/stop-dfs.sh should be relatively lower than '3600000', say '60000' or '180000'. Since a map has to wait exactly that much time in order to try again provided it fails in this time.
Thus the procedure should be:
- Set 'dfs.client.socket-timeout' in hdfs-site.xml to 3600000.
- start JobHistory server.
- Set 'dfs.client.socket-timeout' in hdfs-site.xml back to 180000.
- start-dfs.sh
./mr-jobhistory-daemon.sh --config /home/supertool/hadoop-2.2.0/etc/job_history_server_config/ start historyserver
References:
If transfering, please annotate the origin: Jason4Zhu
I configured the hadoop 2.6 according to your instructions and started the history server. But history web server did not show any job information. However, the job's history file was generated and stored in the HDFS. What's the reason for this? Thanks.
ReplyDeleteIf none of job information is shown in job historyserver, the above post, which depicts a randomly loss of job information, is possibly orthogonal to your scenario. If I were you, I would check out the log of job historyserver first to make sure there's no error/exception complains.
Deletei am also facing the same issue. My job history server is not retrieving the jobs i have executed earlier. if you got the solution then please let me know
ReplyDeleteHave you try setting 'dfs.client.socket-timeout' for job historyserver to 3600000?
DeleteThis comment has been removed by the author.
DeleteThis comment has been removed by the author.
ReplyDeleteAs far as I'm concerned, your job historyserver works without malfuncion. Chances are that the culprit of your case is the mapred-site.xml configuration either fails to be loaded successfully, or it has been overriden by some hidden configuration file due to obscure loading order precedence of hadoop job.
ReplyDeleteThe blog is so interactive and Informative , you should write more blogs like this Hadoop Administration Online course
ReplyDeleteNice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating hadoop online training
ReplyDeleteReally very awesome post,keep doing much better posts with us.
ReplyDeleteThank you...
big data online course