jps
-- List all java processes and related information, like fully qualified main class names, arguments passed to main method, etc.[hadoop@m15018 ~]$ jps -ml 30134 org.apache.hadoop.hdfs.server.datanode.DataNode 15141 org.apache.hadoop.mapred.YarnChild 192.168.7.27 28170 attempt_1417419796124_63329_m_000018_1 97 1820 org.apache.hadoop.mapred.YarnChild 192.168.7.14 36058 attempt_1417419796124_63299_m_000037_0 33 28191 org.apache.hadoop.mapred.YarnChild 192.168.7.21 55163 attempt_1417419796124_63322_m_000018_0 21 15275 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 18325 org.apache.hadoop.mapred.YarnChild 192.168.7.55 6926 attempt_1417419796124_63255_m_000014_1 166 10693 org.apache.hadoop.mapred.YarnChild 192.168.7.85 20883 attempt_1417419796124_63282_m_000055_0 34 6001 org.apache.hadoop.mapred.YarnChild 192.168.7.17 9187 attempt_1417419796124_62366_m_000050_0 64 30311 org.apache.hadoop.yarn.server.nodemanager.NodeManager 30111 org.apache.hadoop.mapred.YarnChild 192.168.7.75 23820 attempt_1417419796124_63324_m_000047_0 29 28712 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 4474 org.apache.hadoop.mapred.YarnChild 192.168.7.29 49996 attempt_1417419796124_62449_r_000019_0 101 6041 org.apache.hadoop.mapred.YarnChild 192.168.7.17 9187 attempt_1417419796124_62366_r_000008_0 57 6792 org.apache.hadoop.mapred.YarnChild 192.168.7.20 56878 attempt_1417419796124_63313_m_000016_0 18 25847 org.apache.hadoop.mapred.YarnChild 192.168.7.46 8277 attempt_1417419796124_63290_m_000005_0 7 6089 org.apache.hadoop.mapred.YarnChild 192.168.7.17 9187 attempt_1417419796124_62366_r_000005_0 50 20277 org.apache.hadoop.mapred.YarnChild 192.168.7.26 45093 attempt_1417419796124_63268_m_000014_0 20 5578 org.apache.hadoop.mapred.YarnChild 192.168.7.72 15929 attempt_1417419796124_63271_m_000011_0 14 26194 org.apache.hadoop.mapred.YarnChild 192.168.7.46 8277 attempt_1417419796124_63290_m_000011_0 13 18747 sun.tools.jps.Jps -ml
In which:
-m Output the arguments passed to the main method. The output may be null for embedded JVMs. -l Output the full package name for the application’s main class or the full path name to the application’s JAR file. -v Output the arguments passed to the JVM.
jstack
-- Show current stack trace of a running java process.[hadoop@K1213 ~]$ jstack 19552 2014-12-31 10:21:05 Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.6-b04 mixed mode): "Attach Listener" daemon prio=10 tid=0x000000000a3ae000 nid=0x81a waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "ResponseProcessor for block BP-714842383-192.168.7.11-1393991369860:blk_1111587326_1099569551548" daemon prio=10 tid=0x00002aaab8293000 nid=0x7098 runnable [0x00000000417e5000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000077742ab40> (a sun.nio.ch.Util$2) - locked <0x000000077742ab50> (a java.util.Collections$UnmodifiableSet) - locked <0x000000077742aaf8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1490) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721) "DataStreamer for file /user/monitor/test_with_reduce_zhudi/_temporary/1/_temporary/attempt_1417419796124_63366_m_000033_0/0-m-00033 block BP-714842383-192.168.7.11-1393991369860:blk_1111587326_1099569551548" daemon prio=10 tid=0x00002aaab4cbb000 nid=0x4ca3 runnable [0x0000000040316000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000077742e4e8> (a sun.nio.ch.Util$2) - locked <0x000000077742e4f8> (a java.util.Collections$UnmodifiableSet) - locked <0x000000077742e4a0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x0000000747910230> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0x0000000747910248> (a java.io.DataOutputStream) at org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:278) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:568) ... "main" prio=10 tid=0x0000000009fad000 nid=0x4c61 in Object.wait() [0x000000004089e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hdfs.DFSOutputStream.waitAndQueueCurrentPacket(DFSOutputStream.java:1475) - locked <0x00000007770010a0> (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1543) - locked <0x00000007770abdc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:175) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:150) - locked <0x00000007770abdc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:139) - eliminated <0x00000007770abdc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:130) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) - locked <0x00000007770abdc8> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:59) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0x0000000777294690> (a org.apache.hadoop.hdfs.client.HdfsDataOutputStream) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:83) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:104) - locked <0x000000077722f338> (a org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:411) at AdMonitorDMReaderJob$Mapper.map(AdMonitorDMReaderJob.java:45) at AdMonitorDMReaderJob$Mapper.map(AdMonitorDMReaderJob.java:30) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:772) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) "VM Thread" prio=10 tid=0x000000000a02c000 nid=0x4c62 runnable "VM Periodic Task Thread" prio=10 tid=0x00002aaab401a800 nid=0x4c69 waiting on condition JNI global references: 276
jinfo
-- List all configuration info for current java process, including system properties and java-specified properties.[hadoop@K1213 ~]$ jinfo 19552 Attaching to process ID 19552, please wait... Debugger attached successfully. Server compiler detected. JVM version is 23.6-b04 Java System Properties: java.runtime.name = Java(TM) SE Runtime Environment java.vm.version = 23.6-b04 sun.boot.library.path = /usr/java/jdk1.7.0_11/jre/lib/amd64 hadoop.root.logger = INFO,CLA java.vendor.url = http://java.oracle.com/ java.vm.vendor = Oracle Corporation path.separator = : file.encoding.pkg = sun.io java.vm.name = Java HotSpot(TM) 64-Bit Server VM sun.os.patch.level = unknown sun.java.launcher = SUN_STANDARD user.country = CN user.dir = /home/data4/hdfsdir/nm-local-dir/usercache/monitor/appcache/application_1417419796124_63366/container_1417419796124_63366_01_000063 java.vm.specification.name = Java Virtual Machine Specification java.runtime.version = 1.7.0_11-b21 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment os.arch = amd64 java.endorsed.dirs = /usr/java/jdk1.7.0_11/jre/lib/endorsed line.separator = java.io.tmpdir = /home/data4/hdfsdir/nm-local-dir/usercache/monitor/appcache/application_1417419796124_63366/container_1417419796124_63366_01_000063/tmp yarn.app.container.log.dir = /home/workspace/hadoop/logs/userlogs/application_1417419796124_63366/container_1417419796124_63366_01_000063 java.vm.specification.vendor = Oracle Corporation os.name = Linux log4j.configuration = container-log4j.properties sun.jnu.encoding = UTF-8 java.library.path = /home/data4/hdfsdir/nm-local-dir/usercache/monitor/appcache/application_1417419796124_63366/container_1417419796124_63366_01_000063:/home/workspace/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib java.specification.name = Java Platform API Specification java.class.version = 51.0 sun.management.compiler = HotSpot 64-Bit Tiered Compilers os.version = 2.6.18-308.el5 yarn.app.container.log.filesize = 209715200 user.home = /home/hadoop user.timezone = Asia/Shanghai java.awt.printerjob = sun.print.PSPrinterJob file.encoding = UTF-8 java.specification.version = 1.7 user.name = hadoop java.class.path = ... java.vm.specification.version = 1.7 sun.arch.data.model = 64 sun.java.command = org.apache.hadoop.mapred.YarnChild 192.168.7.86 14667 attempt_1417419796124_63366_m_000033_0 63 java.home = /usr/java/jdk1.7.0_11/jre user.language = zh java.specification.vendor = Oracle Corporation awt.toolkit = sun.awt.X11.XToolkit java.vm.info = mixed mode java.version = 1.7.0_11 java.ext.dirs = /usr/java/jdk1.7.0_11/jre/lib/ext:/usr/java/packages/lib/ext sun.boot.class.path = /usr/java/jdk1.7.0_11/jre/lib/resources.jar:/usr/java/jdk1.7.0_11/jre/lib/rt.jar:/usr/java/jdk1.7.0_11/jre/lib/sunrsasign.jar:/usr/java/jdk1.7.0_11/jre/lib/jsse.jar:/usr/java/jdk1.7.0_11/jre/lib/jce.jar:/usr/java/jdk1.7.0_11/jre/lib/charsets.jar:/usr/java/jdk1.7.0_11/jre/lib/jfr.jar:/usr/java/jdk1.7.0_11/jre/classes java.vendor = Oracle Corporation file.separator = / java.vendor.url.bug = http://bugreport.sun.com/bugreport/ sun.io.unicode.encoding = UnicodeLittle sun.cpu.endian = little sun.cpu.isalist = VM Flags: -XX:+UseSerialGC -Xms1024M -Xmx3096m -XX:PermSize=64m -XX:MaxPermSize=128M -Djava.io.tmpdir=/home/data4/hdfsdir/nm-local-dir/usercache/monitor/appcache/application_1417419796124_63366/container_1417419796124_63366_01_000063/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/workspace/hadoop/logs/userlogs/application_1417419796124_63366/container_1417419796124_63366_01_000063 -Dyarn.app.container.log.filesize=209715200 -Dhadoop.root.logger=INFO,CLA
strace
-- Show calls on function of underlying system (Linux). This is suitable for all processes in Linux.[hadoop@K1213 ~]$ strace -p 19552 Process 19552 attached - interrupt to quit futex(0x408a09d0, FUTEX_WAIT, 19553, NULL <unfinished ...> Process 19552 detached [hadoop@K1213 ~]$ strace -p 19553 Process 19553 attached - interrupt to quit futex(0x9fada54, FUTEX_WAIT_PRIVATE, 1747, NULL) = 0 futex(0x9fada28, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x9fada54, FUTEX_WAIT_PRIVATE, 1749, NULL) = 0 futex(0x9fada28, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x9fada54, FUTEX_WAIT_PRIVATE, 1751, NULL) = 0 futex(0x9fada28, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x9fada54, FUTEX_WAIT_PRIVATE, 1753, NULL) = 0 futex(0x9fada28, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x9fada54, FUTEX_WAIT_PRIVATE, 1755, NULL) = 0 futex(0x9fada28, FUTEX_WAKE_PRIVATE, 1) = 0 read(160, "\0\2\4\4\0\31", 6) = 6 read(160, "\t\0\0 \5\0\0\0\0\21\220\2\0\0\0\0\0\0\30\0%\0\0\2\0\243\255X\321\20\314g"..., 132121) = 68529 read(160, 0x2aaab4ca2457, 63592) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(175, EPOLL_CTL_ADD, 160, {EPOLLIN, {u32=160, u64=12336768231720091808}}) = 0 epoll_wait(175, {{EPOLLIN, {u32=160, u64=12336768231720091808}}}, 8192, 3600000) = 1 epoll_ctl(175, EPOLL_CTL_DEL, 160, {0, {u32=160, u64=719115788937592992}}) = 0 epoll_wait(175, {}, 8192, 0) = 0 read(160, "\332a\375\f\237\331\235\31YD2\304\5\362;#\232A\225\37?\203<w0%\371%S/\275\232"..., 63592) = 10136 read(160, 0x2aaab4ca4bef, 53456) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(175, EPOLL_CTL_ADD, 160, {EPOLLIN, {u32=160, u64=4371809895922532512}}) = 0 ...
jstat
-- Java Virtual Machine Statistics Monitoring Tool, which is used to monitor information on compile, GC, etc.The most frequently-used way for me is to get GC statistical information from a current java process:
[hadoop@K1213 ~]$ jstat -gc 11036 S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 34944.0 34944.0 763.4 0.0 279616.0 218484.1 699072.0 572871.8 65536.0 22317.0 12 0.423 0 0.000 0.423 [hadoop@K1213 ~]$ jstat -gcutil 11036 S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 7.17 93.65 81.95 34.06 13 0.432 0 0.000 0.432
The description for all columns is listed here, and you could find more detailed usage for 'jstat' in that webpage. As for the above command, I always pay more attention to FGC (full GC times). Chances are that there are something wrong like memory leak in our code provided that FGC is relatively a large number.
jmap
-- Retrieve current heap space info, including classes, amount of objects, occupied memory, etc. It's just like getting all the info from 'jhat' at runtime.[hadoop@k1291 ~]$ jmap -histo 11711 | head -n 50 num #instances #bytes class name ---------------------------------------------- 1: 22834 144531152 [B 2: 699375 123144384 [D 3: 1153782 94102184 [Ljava.util.HashMap$Entry; 4: 1827000 89984520 [C 5: 1148737 64329272 java.util.HashMap 6: 331311 58298464 [J 7: 1423855 34172520 java.lang.String 8: 331138 29140144 com.miaozhen.yo.tcpreporter.report.Counter 9: 659720 21111040 java.util.HashMap$Entry 10: 271052 13010496 java.util.StringTokenizer 11: 151911 10937592 com.miaozhen.yo.tcpreporter.report.RFPanel 12: 243543 9741720 java.util.TreeMap$Entry 13: 47619 6636000 <constMethodKlass> 14: 47619 6486248 <methodKlass> 15: 236361 5672664 java.lang.Long 16: 168142 5380544 com.miaozhen.yo.tcpreporter.history.HistoryCount 17: 3844 4571976 <constantPoolKlass> 18: 165569 3973656 com.miaozhen.yo.tcpreporter.report.RFCounter 19: 92516 3700640 java.util.HashMap$EntryIterator 20: 146659 3519816 java.lang.StringBuffer 21: 75587 3023480 com.miaozhen.app.MzSequenceFile$SMeta 22: 3844 2944464 <instanceKlassKlass> 23: 105924 2542176 java.lang.StringBuilder 24: 3185 2520192 <constantPoolCacheKlass> 25: 3288 2124824 [I 26: 42750 1710000 sun.misc.FloatingDecimal 27: 17819 1298352 [Ljava.lang.Object; 28: 9867 1105104 com.miaozhen.yo.tcpreporter.Purelog 29: 41796 1003104 com.miaozhen.tools.MyDouble 30: 51896 830336 java.util.HashMap$EntrySet 31: 33984 815616 java.lang.Double 32: 15043 633656 [Ljava.lang.String; 33: 1291 593104 <methodDataKlass> 34: 11493 551664 java.nio.HeapByteBuffer 35: 11487 551376 java.nio.HeapCharBuffer 36: 33945 543120 com.miaozhen.yo.tcpreporter.history.HistoryLevelOne 37: 4158 503104 java.lang.Class 38: 19564 469536 java.util.Date 39: 9777 469296 java.util.TreeMap$AscendingSubMap 40: 11487 459480 java.util.ArrayList$SubList 41: 11487 459480 java.util.ArrayList$SubList$1 42: 6415 407520 [S 43: 6775 369960 [[I 44: 11816 283584 java.util.ArrayList 45: 15982 255712 com.miaozhen.yo.tcpreporter.history.HistoryLevelTwo 46: 10619 254856 org.apache.hadoop.io.Text 47: 1571 175952 org.apache.hadoop.hdfs.protocol.DatanodeInfo
[hadoop@k1291 ~]$ jmap -heap 11711 Attaching to process ID 11711, please wait... Debugger attached successfully. Server compiler detected. JVM version is 23.6-b04 using thread-local object allocation. Mark Sweep Compact GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 3221225472 (3072.0MB) NewSize = 1310720 (1.25MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) G1HeapRegionSize = 0 (0.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 316669952 (302.0MB) used = 276925888 (264.09710693359375MB) free = 39744064 (37.90289306640625MB) 87.44937315681912% used Eden Space: capacity = 281542656 (268.5MB) used = 266371136 (254.03131103515625MB) free = 15171520 (14.46868896484375MB) 94.61128902612896% used From Space: capacity = 35127296 (33.5MB) used = 10554752 (10.0657958984375MB) free = 24572544 (23.4342041015625MB) 30.04715193563433% used To Space: capacity = 35127296 (33.5MB) used = 0 (0.0MB) free = 35127296 (33.5MB) 0.0% used tenured generation: capacity = 703594496 (671.0MB) used = 625110416 (596.1517486572266MB) free = 78484080 (74.84825134277344MB) 88.84526805621856% used Perm Generation: capacity = 25690112 (24.5MB) used = 25664256 (24.475341796875MB) free = 25856 (0.024658203125MB) 99.89935427295919% used 8870 interned Strings occupying 738840 bytes.
By using the following command, the output HeapDump file can be analysed by 'jhat'.
jmap -dump:live,format=b,file=/home/data8/heapdump.log [PID]
HeapDumpOnOutOfMemoryError
When facing OOM in heap space error, we could simply add the following arguments to our java programming at the startup. In this way, the heap space snapshot is dumped out when OOM occurs. Apply `jhat` command to the HeapDump file for analysing.-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/data8/heapspaceerror.out
FYI: If we intend to specify the above arguments in MapReduce program, we should do it in the following way:
hadoop jar mr.jar MainClass -Dmapreduce.map.java.opts="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/data8/heapspaceerror.out" argu1 argu2
jhat
-- HAT, abbreviation for Heap Analysis Tool, which is used to analyse HeapDump file.$ jhat heapdump.out Reading from log.log... Dump file created Wed Mar 14 12:01:45 CST 2012 Snapshot read, resolving... Resolving 6762 objects... Chasing references, expect 1 dots. Eliminating duplicate references. Snapshot resolved. Started HTTP server on port 7000 Server is ready.
When the server is ready, we could check out the result in browser: http://hostname:7000/. In the webpage, all classes which exists in heap space is listed here, but it is somewhat useless for us. Luckily, some portals which provide with more intuitive analytic result can be found at the end of the webpage.
We could step into "Show instance counts for all classes" to see the count of all objects for each class individually, which is sorted in the webpage.
As we can see in current scenario, there are too many LoggingEvent that results in OOM in heap space, which provides us with a hint where the problem might be.
© 2014-2017 jason4zhu.blogspot.com All Rights Reserved
If transfering, please annotate the origin: Jason4Zhu