Drill down to Java thread consuming CPU on Solaris
| Topic ID: 2476 | |
| Created By: | 2007-JUL-03 17:49:02 [Vitaliy] |
| Updated By: | 2009-FEB-06 12:56:29 [Vitaliy] |
| Status: | Open |
| Severity: | Normal |
| Read Only: | No |
|
7980
2007-JUL-03 17:49:02
|
||||
|
The following is an example of tracing CPU utilization issues with JAVA all
the way to the thread number and it's running code. JAVA application in
question is deployed on Oracle Portal.
## Run prstat to find the PID that consumes the most CPU
oracle.portal-> prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
15894 oracle 232M 137M cpu1 0 10 71:19:44 74% java/54
25502 oracle 155M 66M sleep 59 0 17:14:41 0.2% emagent/17
15892 oracle 449M 248M run 58 0 0:13:28 0.2% java/84
18154 oracle 4672K 4360K cpu0 59 0 0:00:00 0.1% prstat/1
15893 oracle 180M 83M run 59 0 0:03:09 0.0% java/45
15891 oracle 150M 46M run 59 0 0:01:22 0.0% java/22
22753 oracle 35M 12M sleep 59 0 0:00:21 0.0% httpd/3
22750 oracle 35M 15M sleep 59 0 0:00:07 0.0% httpd/3
22745 oracle 34M 2640K sleep 59 0 0:00:54 0.0% httpd/3
22754 oracle 35M 12M sleep 59 0 0:00:03 0.0% httpd/3
22741 oracle 1104K 920K sleep 59 0 0:00:00 0.0% sh/1
22756 oracle 35M 12M sleep 59 0 0:00:04 0.0% httpd/3
22775 oracle 35M 15M sleep 59 0 0:00:04 0.0% httpd/3
16951 oracle 1912K 1272K sleep 59 0 0:00:00 0.0% ksh/1
328 root 1824K 456K sleep 59 0 0:00:00 0.0% ttymon/1
316 root 4928K 1664K sleep 59 0 0:00:00 0.0% vold/2
331 root 1824K 608K sleep 59 0 0:00:00 0.0% ttymon/1
307 root 5344K 1144K sleep 59 0 0:00:00 0.0% dmispd/2
25501 oracle 6960K 4776K sleep 59 0 0:00:18 0.0% perl/1
309 root 5896K 1312K sleep 59 0 0:00:00 0.0% snmpXdmid/2
301 root 5176K 1088K sleep 59 0 0:00:00 0.0% dtlogin/1
296 root 4416K 784K sleep 59 0 0:00:03 0.0% snmpdx/1
300 root 5088K 1744K sleep 59 0 0:00:02 0.0% mibiisa/7
347 root 7576K 5576K sleep 59 0 0:00:32 0.0% svlistener/1
324 root 4736K 2744K sleep 59 0 0:06:20 0.0% svpmgr/1
247 root 1768K 264K sleep 59 0 0:00:00 0.0% smcboot/1
257 root 5488K 904K sleep 59 0 0:00:00 0.0% htt_server/2
246 root 1768K 264K sleep 59 0 0:00:00 0.0% smcboot/1
243 root 1768K 408K sleep 59 0 0:00:00 0.0% smcboot/1
221 root 1088K 560K sleep 59 0 0:00:01 0.0% utmpd/1
228 root 4536K 1336K sleep 59 0 0:00:00 0.0% sendmail/1
330 root 3200K 1040K sleep 59 0 0:00:08 0.0% svcmgr/1
232 root 3336K 1104K sleep 59 0 0:06:55 0.0% sshd/1
210 root 1488K 544K sleep 59 0 0:00:00 0.0% powerd/3
199 root 4544K 744K sleep 59 0 0:00:00 0.0% cron/1
182 root 5752K 1640K sleep 59 0 0:00:03 0.0% syslogd/13
226 smmsp 4512K 920K sleep 59 0 0:00:00 0.0% sendmail/1
160 daemon 2512K 880K sleep 59 0 0:00:00 0.0% statd/1
158 root 2224K 696K sleep 59 0 0:00:00 0.0% lockd/2
146 root 2504K 1432K sleep 59 0 0:00:08 0.0% inetd/1
200 root 3424K 2456K sleep 59 0 0:02:02 0.0% nscd/20
Total: 77 processes, 468 lwps, load averages: 1.67, 1.75, 1.77
In this case it's:
PID 15894
CPU 74%
PROGRAM/Threads java/54
You can find the name of the Portal Component using opmnctl command (in this
case it's OC4J_CustomApp)
oracle.portal-> opmnctl status
Processes in Instance: portal.oracle.portal
-------------------+--------------------+---------+---------
ias-component | process-type | pid | status
-------------------+--------------------+---------+---------
LogLoader | logloaderd | N/A | Down
dcm-daemon | dcm-daemon | 22734 | Alive
DSA | DSA | N/A | Down
OC4J | home | 15891 | Alive
OC4J | OC4J_Portal | 15892 | Alive
OC4J | OC4J_CustomApp | 15894 | Alive
HTTP_Server | HTTP_Server | 15887 | Alive
oracle.portal->
## Drill down to thread utilization for the PID in question
##
oracle.portal-> prstat -Lm -p 15894
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
15894 oracle 61 11 0.1 0.0 0.0 24 0.0 4.2 60K 1K .2M 356 java/42
15894 oracle 56 11 0.1 0.0 0.0 13 0.0 9.2 60K 843 .2M 283 java/28
15894 oracle 22 0.2 0.0 0.0 0.0 78 0.0 0.4 768 294 1K 1 java/2
15894 oracle 0.0 0.1 0.0 0.0 0.0 100 0.0 0.2 650 1 674 6 java/1
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.1 158 2 87 0 java/10
15894 oracle 0.0 0.0 0.0 0.0 0.0 3.3 96 0.2 16 1 7 0 java/17
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 2 0 2 0 java/3
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/46
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/117
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/116
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/115
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/114
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/113
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/102
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/112
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/34
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/100
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/99
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/88
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/147
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/86
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/85
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/25
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/26
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/84
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/22
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/20
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/19
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/18
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/16
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/15
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/14
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/13
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/12
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/21
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/9
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/8
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/7
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/6
15894 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/5
15894 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/4
Total: 1 processes, 54 lwps, load averages: 1.89, 1.87, 1.80
Top CPU hogs are:
java/28
java/42
## Run pstack on the main process ID (15894)
##
oracle.portal-> pstack 15894 > /tmp/pstack_15894.txt
oracle.portal->
vi /tmp/pstack_15894.txt (search for lwp# 28 and 42)
----------------- lwp# 28 / thread# 28 --------------------
ff32033c lwp_mutex_lock (35700)
fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, ad8d08, f0cfee10, 0, 0) + 48
fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 357d0, ad8d08, 35768, 4b80e0, 0) + 234
fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
fee2cdc8 __1cIVMThreadHexecute6FpnMVM_Operation__v_ (4518, ad8d08, f0cff388, ff1d4c00, ff182000, 0) + 26c
fee7bbe8 __1cODeoptimizationNuncommon_trap6FpnKJavaThread_i_pn0ALUnrollBlock__ (ad8d08, fffffffe, fffffffe, 2000, f66d14e0,
f9c35e58) + 8b0
f9c32e1c ???????? (fffffffe, f9fc9108, 0, 2000, fea62000, f9c35e58)
f9c35e58 ???????? (f345d130, dd, 74d070, cfc3bc, 0, 0)
f9ca5004 ???????? (f345d130, b6, 0, f9c152a0, 9a0813f, f0cff3b0)
f9c05804 ???????? (f3082548, b6, 1ad, f9c15e98, 8, f0cff438)
f9c05804 ???????? (f0cff5e4, b6, 0, f9c15e50, c, f0cff4d8)
f9c05804 ???????? (f0cff684, f5a278a8, 0, f9c15e50, c, f0cff580)
f9c05a8c ???????? (f0cff784, b7, 0, f9c163d0, c, f0cff610)
f9c05804 ???????? (f0cff808, f2fda930, 0, f9c160d0, c, f0cff718)
f9c46680 ???????? (f1bd80a0, f30830b0, f30831f8, f30831f8, c, f0cff7c0)
f9ed8808 ???????? (f6262b60, f3026ee8, f3030f10, f30831f8, f5e65798, f306c190)
f9eb709c ???????? (f3083278, b6, 0, f9c15e50, 8, f0cff848)
f9c056e4 ???????? (f0cffa1c, b6, 0, f9c15e98, 1c, f0cff8c8)
f9c05804 ???????? (f0cffaa4, f58212b8, 0, f9c16250, 8, f0cff9b8)
f9c05a8c ???????? (f0cffb24, f58212b8, 0, f9c163d0, 4, f0cffa40)
f9c05a8c ???????? (f0cffb9c, 0, 0, f9c163d0, 4, f0cffac8)
f9c0010c ???????? (f0cffc28, f0cffe90, a, f5821ca0, 4, f0cffb40)
fed5bd28 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (f0cffe88, f0cffcf0, f0
cffda8, ad8d08, ad8d08, f0cffd00) + 27c
fee4a5b4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff18200
0, 7870b0, f0cffd9c, f0cffd98, f0cffda8, ad8d08) + 164
fee5d810 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (f0cffe88, f0cffe84,
f0cffe7c, f0cffe74, f0cffe6c, ad8d08) + 6c
fee6eb60 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (ad8d08, ad8d08, d79f38, 7870b0, 319e24, fee68140) + 128
fee68168 __1cKJavaThreadDrun6M_v_ (ad8d08, 1c, 40, 0, 40, 0) + 284
fee64648 _start (ad8d08, 0, 0, 0, 0, 0) + 134
ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 42 / thread# 42 --------------------
ff32033c lwp_mutex_lock (35700)
fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, e47b40, ebbfee30, 0, 0) + 48
fedb4920 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (e47b40, 1, e47b40, 0, 0, 0) + 160
fee3d730 __1cFMutexbLwait_for_lock_blocking_implementation6MpnKJavaThread__v_ (ff1cc1d0, 5000, 5280, 5000, 5100, 0) + a0
fecd53d4 __1cFMutexElock6M_v_ (357a0, 8e983e33, 5, 359e8, 4b80e0, 1) + 144
fee2cd88 __1cIVMThreadHexecute6FpnMVM_Operation__v_ (4518, 3eec0, 0, 0, 1, 0) + 22c
fed2ac5c __1cbCTwoGenerationCollectorPolicyRmem_allocate_work6MIii_pnIHeapWord__ (3edd8, ff1d4a84, ff1d4a14, ff133fa6, 4800,
49b4) + 160
fed212d8 __1cNinstanceKlassRallocate_instance6MpnGThread__pnPinstanceOopDesc__ (f5844be0, e47b40, e47b40, 2000, fea62000, 0)
+ 180
fed33300 __1cLOptoRuntimeFnew_C6FpnMklassOopDesc_pnKJavaThread__v_ (f5844bd8, e47b40, e47b40, ff182000, 4, ebbff0f0) + 28
f9c35a58 ???????? (f5844bd8, ebbff1d8, 0, f9c152a0, 9a08144, ebbff3b0)
fa03344c ???????? (f2d65980, 1, f3464178, 1271, 1e6, f3468188)
f9f7cbcc ???????? (f3463f60, f68f0dd8, d0a, f68f11a4, 1000, f2c09c00)
f9f65970 ???????? (f3463f60, f2c09c00, f3463f30, ff182000, 4, 0)
f9d24358 ???????? (f1800010, f2c09c00, f3463f30, 0, 0, 0)
f9d80da4 ???????? (f3463f98, f3464178, 0, 2000, f66d14e0, 1046e70)
f9e1e8b8 ???????? (f5847248, f3464178, 0, 2000, fea62000, 1046e70)
f9fc9100 ???????? (f3464010, dc, d9a028, aa77ac, 0, 0)
f9ca5004 ???????? (f3464010, b6, 0, f9c152a0, 9a08144, ebbff3b0)
f9c05804 ???????? (f3082548, b6, 1ad, f9c15e98, 8, ebbff438)
f9c05804 ???????? (ebbff5e4, b6, 0, f9c15e50, c, ebbff4d8)
f9c05804 ???????? (ebbff684, f5a278a8, 0, f9c15e50, c, ebbff580)
f9c05a8c ???????? (ebbff784, b7, 0, f9c163d0, c, ebbff610)
f9c05804 ???????? (ebbff808, f2fda930, 0, f9c160d0, c, ebbff718)
f9c46680 ???????? (f20e46e0, f314a0b8, f314a1e8, f314a1e8, c, ebbff7c0)
f9ed8808 ???????? (f6262b60, f3050338, f30673f0, f314a1e8, f5e65798, f3125270)
f9eb709c ???????? (f314a280, b6, 0, f9c15e50, 8, ebbff848)
f9c056e4 ???????? (ebbffa1c, b6, 0, f9c15e98, 1c, ebbff8c8)
f9c05804 ???????? (ebbffaa4, f58212b8, 0, f9c16250, 8, ebbff9b8)
f9c05a8c ???????? (ebbffb24, f58212b8, 0, f9c163d0, 4, ebbffa40)
f9c05a8c ???????? (ebbffb9c, 0, 0, f9c163d0, 4, ebbffac8)
f9c0010c ???????? (ebbffc28, ebbffe90, a, f5821ca0, 4, ebbffb40)
fed5bd28 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (ebbffe88, ebbffcf0, eb
bffda8, e47b40, e47b40, ebbffd00) + 27c
fee4a5b4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff18200
0, 8d8228, ebbffd9c, ebbffd98, ebbffda8, e47b40) + 164
fee5d810 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (ebbffe88, ebbffe84,
ebbffe7c, ebbffe74, ebbffe6c, e47b40) + 6c
fee6eb60 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (e47b40, e47b40, 65a9c0, 8d8228, 319e24, fee68140) + 128
fee68168 __1cKJavaThreadDrun6M_v_ (e47b40, 2a, 40, 0, 40, 0) + 284
fee64648 _start (e47b40, 0, 0, 0, 0, 0) + 134
ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
## Look at the bottom of each stack for the "_start" string - the first number in ( ) is the tid:
##
lwp# 42 -> e47b40
lwp# 28 -> ad8d08
## Run a thread dump (kill -3 15894)
##
oracle.portal-> kill -3 15894
The thread dump output is in /u01/app/oracle/product/10gAS/opmn/logs/OC4J~OC4J_CustomApp~default_island~1
Here are the relevant thread dumps (search for TID's obtained from pstack --> e47b40 and ad8d08):
"AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 nid=0x2a runnable [ebbff000..ebbffc28]
at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
- waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
at java.nio.charset.CoderResult$Cache.access$200(CoderResult.java:178)
at java.nio.charset.CoderResult.unmappableForLength(CoderResult.java:233)
at sun.nio.cs.ext.DoubleByteEncoder.encodeArrayLoop(DoubleByteEncoder.java:108)
at sun.nio.cs.ext.DoubleByteEncoder.encodeLoop(DoubleByteEncoder.java:162)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:536)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:384)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
- locked <0xf34640a8> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
- locked <0xf34640a8> (a java.io.OutputStreamWriter)
at java.io.BufferedWriter.write(BufferedWriter.java:125)
- locked <0xf34640a8> (a java.io.OutputStreamWriter)
at java.io.PrintWriter.write(PrintWriter.java:179)
- locked <0xf34640c0> (a java.io.BufferedWriter)
at com.customapp.sw.retrieveFile.doGet(retrieveFile.java:48)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:824)
at com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:330)
at com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:830)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:224)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:133)
at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:192)
at java.lang.Thread.run(Thread.java:534)
"AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
- waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
at java.nio.charset.CoderResult$Cache.access$200(CoderResult.java:178)
at java.nio.charset.CoderResult.unmappableForLength(CoderResult.java:233)
at sun.nio.cs.ext.DoubleByteEncoder.encodeArrayLoop(DoubleByteEncoder.java:108)
at sun.nio.cs.ext.DoubleByteEncoder.encodeLoop(DoubleByteEncoder.java:162)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:536)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:384)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
- locked <0xf345d1c8> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
- locked <0xf345d1c8> (a java.io.OutputStreamWriter)
at java.io.BufferedWriter.write(BufferedWriter.java:125)
- locked <0xf345d1c8> (a java.io.OutputStreamWriter)
at java.io.PrintWriter.write(PrintWriter.java:179)
- locked <0xf345d1e0> (a java.io.BufferedWriter)
at com.customapp.sw.retrieveFile.doGet(retrieveFile.java:48)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:824)
at com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:330)
at com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:830)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:224)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:133)
at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:192)
at java.lang.Thread.run(Thread.java:534)
##
## Conclusion:
##
## Both of these threads are CUSTOM progs that do FILE downloads:
##
## com.customapp.sw.retrieveFile.doGet
##
## they cause heavy locking and CPU spins as evident by TRUSS:
## ( note the number of calls to lwp_mutex_wakeup and lwp_mutex_lock )
oracle.portal-> truss -c -p 15894
^Csignals ------------
SIGPIPE 311
total: 311
syscall seconds calls errors
read .005 316
close .000 3
fcntl .000 3
lwp_unpark .001 311
poll .008 368 1
sigprocmask .008 622
setcontext .001 311
lwp_suspend .000 1
lwp_continue .000 1
lwp_mutex_wakeup 3.079 444240
lwp_mutex_lock 3.754 445927 3
lwp_cond_wait .016 1001 495
lwp_cond_signal .004 304
lwp_cond_broadcast .001 133
accept .000 3
send .005 325 311
setsockopt .000 6
-------- ------ ----
sys totals: 6.888 893875 810
usr time: 4.413
elapsed: 20.490
[edited by: Vitaliy at 17:55 (CST) on Jul. 03, 2007]
|
9772
2009-JAN-07 18:22:33
|
||||
|
Why do you do kill -3 will this not kill the app, ow to debug without killing the app.....
|
9774
2009-JAN-07 18:27:12
|
||||
|
kill -3 will just generate a debug file not kill the process ...
|
9775
2009-JAN-07 18:32:54
|
||||
|
> kill -3 will just generate a debug file not kill the process ...
From man signal.h (on Solaris 10) The signals currently defined by
<signal.h> are as follows:
Name Value Default Event
SIGHUP 1 Exit Hangup (see termio(7I))
SIGINT 2 Exit Interrupt (see termio(7I))
SIGQUIT 3 Core Quit (see termio(7I))
So the action is to generate a core.....
|
9776
2009-JAN-07 18:48:58
|
||||
|
I did this writeup on: # uname -a SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240 Here's more details about "kill -3" "method" An excerpt from: solaris_performanceandtools_ch3_processes.pdf In previous versions of the JVM, one option is to dump all the stacks on the running VM by sending a SIGQUIT (signal number 3) to the Java process with the kill command. This dumps the stacks for all VM threads to the standard error as shown below: # kill -3 <pid> Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed mode): "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on condition [0..0] "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait() [f2b7f000..f2b7fc24] at java.lang.Object.wait(Native Method) - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111) - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in Object.wait() [facff000..facffc24] at java.lang.Object.wait(Native Method) - waiting on <f2c00380> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:426) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113) - locked <f2c00380> (a java.lang.ref.Reference$Lock) "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc] at testMain.doit2(testMain.java:12) at testMain.main(testMain.java:64) "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable If the top of the stack for a number of threads terminates in a monitor call, this is the place to drill down and determine what resource is being contended. Sometimes removing a lock that protects a hot structure can require many architectural changes that are not possible. The lock might even be in a third-party library over which you have no control. In such cases, multiple instances of the application are probably the best way to achieve scaling. PS: do you have a test box to try it out? That would clear it up :)
[edited by: Vitaliy at 18:49 (CST) on Jan. 07, 2009]
|
9777
2009-JAN-07 18:59:31
|
||||
|
> I did this writeup on: > > # uname -a > SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240 > > > > Here's more details about "kill -3" "method" > > An excerpt from: > > solaris_performanceandtools_ch3_processes.pdf > > > In previous versions of the JVM, one option is to dump all the stacks on > the running > VM by sending a SIGQUIT (signal number 3) to the Java process with the > kill command. This dumps the stacks for all VM threads to the standard > error as > shown below: > > > # kill -3 <pid> > Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed > mode): > "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on > condition > [0..0] > "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait() > [f2b7f000..f2b7fc24] > at java.lang.Object.wait(Native Method) > - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111) > - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in > Object.wait() > [facff000..facffc24] > at java.lang.Object.wait(Native Method) > - waiting on <f2c00380> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:426) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113) > - locked <f2c00380> (a java.lang.ref.Reference$Lock) > "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc] > at testMain.doit2(testMain.java:12) > at testMain.main(testMain.java:64) > "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable > "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable > "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable > > If the top of the stack for a number of threads terminates in a monitor > call, this > is the place to drill down and determine what resource is being > contended. Sometimes > removing a lock that protects a hot structure can require many > architectural > changes that are not possible. The lock might even be in a third-party > library > over which you have no control. In such cases, multiple instances of the > application > are probably the best way to achieve scaling. > > PS: do you have a test box to try it out? That would clear it up :) A few moments ago, I did in fact try this. I loved the writeup and I admire your patience. I may have a test box, I chose the Xorg process... root 807 805 0 Jan 06 ? 0:50 /usr/X11/bin/Xorg :0 -depth 24 -nobanner -auth /var/dt/A:0-fHaOKb # kill -3 807 # ps -ef | grep 807 guess it was, killed... I then tried on man signal.h (pid 1662) root 1860 1662 0 18:13:43 pts/2 0:00 man signal.h # kill -3 1662 # ps -ef | grep 1662 root 1662 1658 0 12:43:54 pts/2 0:00 ksh -o vi root 1860 1662 0 18:13:43 pts/2 0:00 man signal.h I guess it's still there..... I however did not see the core, so, I used the gcore command... # gcore 1662 gcore: core.1662 dumped and it was in my pwd... -rw-r--r-- 1 root root 1927425 Jan 7 18:51 core.1662 # pstack core.1662 core 'core.1662' of 1662: ksh -o vi d0f2add5 waitid (7, 0, 8047b10, f) d0f1c666 waitpid (ffffffff, 8047c00, c) + 75 08063b8b job_wait (744) + 169 0806df87 sh_exec (808f028, 14) + 115b 080669a0 ???????? (8047e74, 8047de4, d0ffc818, 8047dd4, 8047dd4, d0fe0c5b) 080664b0 main (3, 8047e28, 8047e38) + 6cc 080570b6 ???????? (3, 8047edc, 8047ee0, 8047edc, 0, 8047ee6) # pflags core.1662 core 'core.1662' of 1662: ksh -o vi data model = _ILP32 flags = RLC|MSACCT|MSFORK /1: flags = STOPPED|ISTOP|ASLEEP waitid(0x7,0x0,0x8047b10,0xf) why = PR_REQUESTED So I guess kill -3 is for a java thing? I read that one has to kill sub processes, is this true, a sub process is a parent for something else though!
|
9778
2009-JAN-07 19:01:03
|
||||
|
> > I did this writeup on: > > > > # uname -a > > SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240 > > > > > > > > Here's more details about "kill -3" "method" > > > > An excerpt from: > > > > solaris_performanceandtools_ch3_processes.pdf > > > > > > In previous versions of the JVM, one option is to dump all the stacks > on > > the running > > VM by sending a SIGQUIT (signal number 3) to the Java process with the > > kill command. This dumps the stacks for all VM threads to the standard > > > error as > > shown below: > > > > > > # kill -3 <pid> > > Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed > > mode): > > "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on > > condition > > [0..0] > > "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait() > > [f2b7f000..f2b7fc24] > > at java.lang.Object.wait(Native Method) > > - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111) > > - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127) > > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in > > Object.wait() > > [facff000..facffc24] > > at java.lang.Object.wait(Native Method) > > - waiting on <f2c00380> (a java.lang.ref.Reference$Lock) > > at java.lang.Object.wait(Object.java:426) > > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113) > > - locked <f2c00380> (a java.lang.ref.Reference$Lock) > > "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc] > > at testMain.doit2(testMain.java:12) > > at testMain.main(testMain.java:64) > > "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable > > "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable > > "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable > > > > If the top of the stack for a number of threads terminates in a > monitor > > call, this > > is the place to drill down and determine what resource is being > > contended. Sometimes > > removing a lock that protects a hot structure can require many > > architectural > > changes that are not possible. The lock might even be in a third-party > > > library > > over which you have no control. In such cases, multiple instances of > the > > application > > are probably the best way to achieve scaling. > > > > PS: do you have a test box to try it out? That would clear it up :) > > A few moments ago, I did in fact try this. > I loved the writeup and I admire your patience. I may have a test box, I > chose the Xorg process... > root 807 805 0 Jan 06 ? 0:50 /usr/X11/bin/Xorg :0 -depth > 24 -nobanner -auth /var/dt/A:0-fHaOKb > > # kill -3 807 > # ps -ef | grep 807 > guess it was, killed... > > I then tried on man signal.h (pid 1662) > root 1860 1662 0 18:13:43 pts/2 0:00 man signal.h > > # kill -3 1662 > # ps -ef | grep 1662 > root 1662 1658 0 12:43:54 pts/2 0:00 ksh -o vi > root 1860 1662 0 18:13:43 pts/2 0:00 man signal.h > > I guess it's still there..... > I however did not see the core, so, I used the gcore command... > > # gcore 1662 > gcore: core.1662 dumped > > and it was in my pwd... > -rw-r--r-- 1 root root 1927425 Jan 7 18:51 core.1662 > > > > # pstack core.1662 > core 'core.1662' of 1662: ksh -o vi > d0f2add5 waitid (7, 0, 8047b10, f) > d0f1c666 waitpid (ffffffff, 8047c00, c) + 75 > 08063b8b job_wait (744) + 169 > 0806df87 sh_exec (808f028, 14) + 115b > 080669a0 ???????? (8047e74, 8047de4, d0ffc818, 8047dd4, 8047dd4, d0fe0c5b) > 080664b0 main (3, 8047e28, 8047e38) + 6cc > 080570b6 ???????? (3, 8047edc, 8047ee0, 8047edc, 0, 8047ee6) > # pflags core.1662 > core 'core.1662' of 1662: ksh -o vi > data model = _ILP32 flags = RLC|MSACCT|MSFORK > /1: flags = STOPPED|ISTOP|ASLEEP waitid(0x7,0x0,0x8047b10,0xf) > why = PR_REQUESTED > > So I guess kill -3 is for a java thing? > I read that one has to kill sub processes, is this true, a sub process is a > parent for something else though! # uname -a SunOS xxx 5.10 Generic_137138-09 i86pc i386 i86pc # # cat /etc/release Solaris 10 10/08 s10x_u6wos_07b X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 October 2008 # # prtconf System Configuration: Sun Microsystems i86pc Memory size: 1024 Megabytes
|
9779
2009-JAN-07 19:05:01
|
||||
|
Interesting ... I think you need to try it on a java process to be certain what "kill -3" now does on Solaris 10 ... Thanks for all the details!
|
9780
2009-JAN-07 19:16:50
|
||||
|
> Interesting ... I think you need to try it on a java process to be certain > what "kill -3" now does on Solaris 10 ... > > Thanks for all the details! I will, I'll do it on a tomcat instance....some questions from that document you mentioned..., it says: "If the top of the stack for a number of threads terminates in a monitor call, this is the place to drill down and determine what resource is being contended. Some- times removing a lock that protects a hot structure can require many architec- tural changes that are not possible. The lock might even be in a third-party library over which you have no control. In such cases, multiple instances of the applica- tion are probably the best way to achieve scaling." What is a "monitor call" By, "multiple instances of the applica- tion are probably the best way to achieve scaling." do they mean a failover scenario, in which if one fails, another is available...that would be high(er!) availibility...how is scaling relevant?
|
9782
2009-JAN-07 19:35:52
|
||||
|
> some questions from that document you mentioned..., it says:
>
> "If the top of the stack for a number of threads terminates in a monitor
> call, this is the place to drill down and determine what resource
> is being contended. "
> What is a "monitor call"
This is very similar to the case we had with our download procedure -- it
was taking too long and we had too many requests for it under the same JVM,
so there were number of locks and waits for it. Here's the pstack output:
----------------- lwp# 28 / thread# 28 --------------------
ff32033c lwp_mutex_lock (35700)
fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, ad8d08, f0cfee10, 0, 0) + 48
fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 357d0, ad8d08, 35768, 4b80e0, 0) + 234
fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
^^^^^^^^^^^^
And here's what it looks like when it's dumpped using "kill -3":
"AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 nid=0x2a runnable [ebbff000..ebbffc28]
at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
- waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
"AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
- waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
> "The lock might even be in a third-party library over which you have no
> control. In such cases, multiple instances of the application are
> probably the best way to achieve scaling."
>
>
> By, "multiple instances of the application are probably the best way to achieve scaling."
> do they mean a failover scenario, in which if one fails, another is available...that would be
> high(er!) availibility...how is scaling relevant?
What they are saying is that if you cannot tune/reduce the lock
on the resource the only option you have is to create multiple JVMs and let
the APP server load-balance between them to reduce contention for the
"slow/lousy" resource :)
HTH!
|
9784
2009-JAN-07 20:02:00
|
||||
|
> > some questions from that document you mentioned..., it says: > > > > "If the top of the stack for a number of threads terminates in a > monitor > > call, this is the place to drill down and determine what resource > > is being contended. " > > What is a "monitor call" > > > This is very similar to the case we had with our download procedure -- it > was taking too long and we had too many requests for it under the same JVM, > > so there were number of locks and waits for it. Here's the pstack output: > > > ----------------- lwp# 28 / thread# 28 -------------------- > ff32033c lwp_mutex_lock (35700) > fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, > ad8d08, f0cfee10, 0, 0) + 48 > fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, > 357d0, ad8d08, 35768, 4b80e0, 0) + 234 > fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0 > ^^^^^^^^^^^^ > > And here's what it looks like when it's dumpped using "kill -3": > > "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 > nid=0x2a runnable [ebbff000..ebbffc28] > at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185) > - waiting to lock <0xf2d65980> (a > java.nio.charset.CoderResult$2) > > > "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 > nid=0x1c waiting for monitor entry [f0cff000..f0cffc28] > at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185) > - waiting to lock <0xf2d65980> (a > java.nio.charset.CoderResult$2) > > > > "The lock might even be in a third-party library over which you have > no > > control. In such cases, multiple instances of the application are > > probably the best way to achieve scaling." > > > > > > By, "multiple instances of the application are probably the best way to > achieve scaling." > > do they mean a failover scenario, in which if one fails, another is > available...that would be > > high(er!) availibility...how is scaling relevant? > > What they are saying is that if you cannot tune/reduce the lock > on the resource the only option you have is to create multiple JVMs and let > > the APP server load-balance between them to reduce contention for the > "slow/lousy" resource :) > > HTH! thanks! So the app server load balances, is this true for all app servers? A document, titled, Sun Gathering Debug Data for Sun Java SystemWeb Server, June 2008, from:http://dlc.sun.com/pdf/820-4983/820-4983.pdf (mentions gcore) Get core files and the output of the following commands. If a process hangs, it is helpful to compare several core iles to review the state of the threads over time. Make a copy of the core ile to a new name, wait for approximately oneminute then rerun the following commands, so that the core files are not overwritten.Do this three times to obtain three core files. Solaris: cd server-root/bin/https/bin; gcore -o /tmp/web-core web-pid; pstack /tmp/web-core Some docs are listed on: they are probably for Sun's java app/web servers, good references.....
|
9785
2009-JAN-07 20:14:48
|
||||
|
> thanks! So the app server load balances, is this true for all app servers?
>
I've only worked with Oracle APP servers and the latest one does it very nicely
-- it can either balance across the local JVM or JVMs running on clustered APP
servers. The best approach is to use hybrid solution where you load balance in
two places:
+--> jvm1
+--> app1 |--> jvm2
| +--> jvmN
F5 load balancer --> |
| +--> jvm1
+--> app2 |--> jvm2
+--> jvmN
Here, we balance across two clustered APP server nodes using LBR and each node
does it's own local balancing using multiple (local) JVMs. You have to manage
session state in the database for this to work because LBR will span both APP
nodes. On the database you run Oracle RAC and everybody is happy :)
HTH.
[edited by: Vitaliy at 12:56 (CST) on Feb. 06, 2009]
|
9786
2009-JAN-07 20:09:09
|
||||
|
> > > some questions from that document you mentioned..., it says: > > > > > > "If the top of the stack for a number of threads terminates in a > > monitor > > > call, this is the place to drill down and determine what resource > > > is being contended. " > > > What is a "monitor call" > > > > > > This is very similar to the case we had with our download procedure -- it > > > was taking too long and we had too many requests for it under the same > JVM, > > > > so there were number of locks and waits for it. Here's the pstack > output: > > > > > > ----------------- lwp# 28 / thread# 28 -------------------- > > ff32033c lwp_mutex_lock (35700) > > fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, > > ad8d08, f0cfee10, 0, 0) + 48 > > fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, > > 357d0, ad8d08, 35768, 4b80e0, 0) + 234 > > fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0 > > ^^^^^^^^^^^^ > > > > And here's what it looks like when it's dumpped using "kill -3": > > > > "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 > > nid=0x2a runnable [ebbff000..ebbffc28] > > at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185) > > - waiting to lock <0xf2d65980> (a > > java.nio.charset.CoderResult$2) > > > > > > "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 > > nid=0x1c waiting for monitor entry [f0cff000..f0cffc28] > > at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185) > > - waiting to lock <0xf2d65980> (a > > java.nio.charset.CoderResult$2) > > > > > > > "The lock might even be in a third-party library over which you have > > > no > > > control. In such cases, multiple instances of the application are > > > probably the best way to achieve scaling." > > > > > > > > > By, "multiple instances of the application are probably the best way to > > > achieve scaling." > > > do they mean a failover scenario, in which if one fails, another is > > available...that would be > > > high(er!) availibility...how is scaling relevant? > > > > What they are saying is that if you cannot tune/reduce the lock > > on the resource the only option you have is to create multiple JVMs and > let > > > > the APP server load-balance between them to reduce contention for the > > "slow/lousy" resource :) > > > > HTH! > > > thanks! So the app server load balances, is this true for all app servers? > > A document, titled, Sun Gathering Debug Data for Sun Java SystemWeb Server, > June 2008, from:http://dlc.sun.com/pdf/820-4983/820-4983.pdf > (mentions gcore) > > Get core files and the output of the following commands. > If a process hangs, it is helpful to compare several core iles to review > the state of the threads over time. Make a copy of the core ile to a new > name, wait for approximately oneminute then rerun the following commands, > so that the core files are not overwritten.Do this three times to obtain > three core files. > > Solaris: > cd server-root/bin/https/bin; > gcore -o /tmp/web-core web-pid; > pstack /tmp/web-core > > Some docs are listed on: they are probably for Sun's java app/web servers, > good references..... lots of docs here: http://docs.sun.com/...ocs/prod/websrv?l=en