Search: For:
Browsing Single Category
www.dbatoolz.com ORACLE DBA Forums Solutions › Topic Id: 2476 | Permalink

Drill down to Java thread consuming CPU on Solaris

Topic ID: 2476
Created By: 2007-JUL-03 17:49:02 [Vitaliy]
Updated By: 2009-FEB-06 12:56:29 [Vitaliy]
Status: Open
Severity: Normal
Read Only: No
7980
2007-JUL-03 17:49:02
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
The following is an example of tracing CPU utilization issues with JAVA all 
the way to the thread number and it's running code.  JAVA application in 
question is deployed on Oracle Portal.

## Run prstat to find the PID that consumes the most CPU

   oracle.portal-> prstat
   
      PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
    15894 oracle    232M  137M cpu1     0   10  71:19:44  74% java/54
    25502 oracle    155M   66M sleep   59    0  17:14:41 0.2% emagent/17
    15892 oracle    449M  248M run     58    0   0:13:28 0.2% java/84
    18154 oracle   4672K 4360K cpu0    59    0   0:00:00 0.1% prstat/1
    15893 oracle    180M   83M run     59    0   0:03:09 0.0% java/45
    15891 oracle    150M   46M run     59    0   0:01:22 0.0% java/22
    22753 oracle     35M   12M sleep   59    0   0:00:21 0.0% httpd/3
    22750 oracle     35M   15M sleep   59    0   0:00:07 0.0% httpd/3
    22745 oracle     34M 2640K sleep   59    0   0:00:54 0.0% httpd/3
    22754 oracle     35M   12M sleep   59    0   0:00:03 0.0% httpd/3
    22741 oracle   1104K  920K sleep   59    0   0:00:00 0.0% sh/1
    22756 oracle     35M   12M sleep   59    0   0:00:04 0.0% httpd/3
    22775 oracle     35M   15M sleep   59    0   0:00:04 0.0% httpd/3
    16951 oracle   1912K 1272K sleep   59    0   0:00:00 0.0% ksh/1
      328 root     1824K  456K sleep   59    0   0:00:00 0.0% ttymon/1
      316 root     4928K 1664K sleep   59    0   0:00:00 0.0% vold/2
      331 root     1824K  608K sleep   59    0   0:00:00 0.0% ttymon/1
      307 root     5344K 1144K sleep   59    0   0:00:00 0.0% dmispd/2
    25501 oracle   6960K 4776K sleep   59    0   0:00:18 0.0% perl/1
      309 root     5896K 1312K sleep   59    0   0:00:00 0.0% snmpXdmid/2
      301 root     5176K 1088K sleep   59    0   0:00:00 0.0% dtlogin/1
      296 root     4416K  784K sleep   59    0   0:00:03 0.0% snmpdx/1
      300 root     5088K 1744K sleep   59    0   0:00:02 0.0% mibiisa/7
      347 root     7576K 5576K sleep   59    0   0:00:32 0.0% svlistener/1
      324 root     4736K 2744K sleep   59    0   0:06:20 0.0% svpmgr/1
      247 root     1768K  264K sleep   59    0   0:00:00 0.0% smcboot/1
      257 root     5488K  904K sleep   59    0   0:00:00 0.0% htt_server/2
      246 root     1768K  264K sleep   59    0   0:00:00 0.0% smcboot/1
      243 root     1768K  408K sleep   59    0   0:00:00 0.0% smcboot/1
      221 root     1088K  560K sleep   59    0   0:00:01 0.0% utmpd/1
      228 root     4536K 1336K sleep   59    0   0:00:00 0.0% sendmail/1
      330 root     3200K 1040K sleep   59    0   0:00:08 0.0% svcmgr/1
      232 root     3336K 1104K sleep   59    0   0:06:55 0.0% sshd/1
      210 root     1488K  544K sleep   59    0   0:00:00 0.0% powerd/3
      199 root     4544K  744K sleep   59    0   0:00:00 0.0% cron/1
      182 root     5752K 1640K sleep   59    0   0:00:03 0.0% syslogd/13
      226 smmsp    4512K  920K sleep   59    0   0:00:00 0.0% sendmail/1
      160 daemon   2512K  880K sleep   59    0   0:00:00 0.0% statd/1
      158 root     2224K  696K sleep   59    0   0:00:00 0.0% lockd/2
      146 root     2504K 1432K sleep   59    0   0:00:08 0.0% inetd/1
      200 root     3424K 2456K sleep   59    0   0:02:02 0.0% nscd/20
   Total: 77 processes, 468 lwps, load averages: 1.67, 1.75, 1.77

In this case it's:
  PID			15894
  CPU			74%
  PROGRAM/Threads	java/54   

You can find the name of the Portal Component using opmnctl command (in this 
case it's OC4J_CustomApp)

   oracle.portal-> opmnctl status
   
   Processes in Instance: portal.oracle.portal
   -------------------+--------------------+---------+---------
   ias-component      | process-type       |     pid | status
   -------------------+--------------------+---------+---------
   LogLoader          | logloaderd         |     N/A | Down
   dcm-daemon         | dcm-daemon         |   22734 | Alive
   DSA                | DSA                |     N/A | Down
   OC4J               | home               |   15891 | Alive
   OC4J               | OC4J_Portal        |   15892 | Alive
   OC4J               | OC4J_CustomApp     |   15894 | Alive
   HTTP_Server        | HTTP_Server        |   15887 | Alive
   
   oracle.portal->


## Drill down to thread utilization for the PID in question
##

oracle.portal-> prstat -Lm -p 15894

      PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
    15894 oracle    61  11 0.1 0.0 0.0  24 0.0 4.2 60K  1K .2M 356 java/42
    15894 oracle    56  11 0.1 0.0 0.0  13 0.0 9.2 60K 843 .2M 283 java/28
    15894 oracle    22 0.2 0.0 0.0 0.0  78 0.0 0.4 768 294  1K   1 java/2
    15894 oracle   0.0 0.1 0.0 0.0 0.0 100 0.0 0.2 650   1 674   6 java/1
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.1 158   2  87   0 java/10
    15894 oracle   0.0 0.0 0.0 0.0 0.0 3.3  96 0.2  16   1   7   0 java/17
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   2   0   2   0 java/3
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/46
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/117
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/116
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/115
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/114
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/113
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/102
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/112
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/34
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/100
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/99
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/88
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/147
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/86
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/85
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/25
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/26
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/84
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/22
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/20
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/19
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/18
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/16
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/15
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/14
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/13
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0 java/12
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/21
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/9
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/8
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/7
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/6
    15894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/5
    15894 oracle   0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/4
   Total: 1 processes, 54 lwps, load averages: 1.89, 1.87, 1.80
   
Top CPU hogs are:

   java/28
   java/42


## Run pstack on the main process ID (15894)
##

oracle.portal-> pstack 15894 > /tmp/pstack_15894.txt
oracle.portal->

vi /tmp/pstack_15894.txt (search for lwp# 28 and 42)


   -----------------  lwp# 28 / thread# 28  --------------------
    ff32033c lwp_mutex_lock (35700)
    fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, ad8d08, f0cfee10, 0, 0) + 48
    fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 357d0, ad8d08, 35768, 4b80e0, 0) + 234
    fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
    fee2cdc8 __1cIVMThreadHexecute6FpnMVM_Operation__v_ (4518, ad8d08, f0cff388, ff1d4c00, ff182000, 0) + 26c
    fee7bbe8 __1cODeoptimizationNuncommon_trap6FpnKJavaThread_i_pn0ALUnrollBlock__ (ad8d08, fffffffe, fffffffe, 2000, f66d14e0,
   f9c35e58) + 8b0
    f9c32e1c ???????? (fffffffe, f9fc9108, 0, 2000, fea62000, f9c35e58)
    f9c35e58 ???????? (f345d130, dd, 74d070, cfc3bc, 0, 0)
    f9ca5004 ???????? (f345d130, b6, 0, f9c152a0, 9a0813f, f0cff3b0)
    f9c05804 ???????? (f3082548, b6, 1ad, f9c15e98, 8, f0cff438)
    f9c05804 ???????? (f0cff5e4, b6, 0, f9c15e50, c, f0cff4d8)
    f9c05804 ???????? (f0cff684, f5a278a8, 0, f9c15e50, c, f0cff580)
    f9c05a8c ???????? (f0cff784, b7, 0, f9c163d0, c, f0cff610)
    f9c05804 ???????? (f0cff808, f2fda930, 0, f9c160d0, c, f0cff718)
    f9c46680 ???????? (f1bd80a0, f30830b0, f30831f8, f30831f8, c, f0cff7c0)
    f9ed8808 ???????? (f6262b60, f3026ee8, f3030f10, f30831f8, f5e65798, f306c190)
    f9eb709c ???????? (f3083278, b6, 0, f9c15e50, 8, f0cff848)
    f9c056e4 ???????? (f0cffa1c, b6, 0, f9c15e98, 1c, f0cff8c8)
    f9c05804 ???????? (f0cffaa4, f58212b8, 0, f9c16250, 8, f0cff9b8)
    f9c05a8c ???????? (f0cffb24, f58212b8, 0, f9c163d0, 4, f0cffa40)
    f9c05a8c ???????? (f0cffb9c, 0, 0, f9c163d0, 4, f0cffac8)
    f9c0010c ???????? (f0cffc28, f0cffe90, a, f5821ca0, 4, f0cffb40)
    fed5bd28 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (f0cffe88, f0cffcf0, f0
   cffda8, ad8d08, ad8d08, f0cffd00) + 27c
    fee4a5b4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff18200
   0, 7870b0, f0cffd9c, f0cffd98, f0cffda8, ad8d08) + 164
    fee5d810 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (f0cffe88, f0cffe84,
    f0cffe7c, f0cffe74, f0cffe6c, ad8d08) + 6c
    fee6eb60 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (ad8d08, ad8d08, d79f38, 7870b0, 319e24, fee68140) + 128
    fee68168 __1cKJavaThreadDrun6M_v_ (ad8d08, 1c, 40, 0, 40, 0) + 284
    fee64648 _start   (ad8d08, 0, 0, 0, 0, 0) + 134
    ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
   
   -----------------  lwp# 42 / thread# 42  --------------------
    ff32033c lwp_mutex_lock (35700)
    fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, e47b40, ebbfee30, 0, 0) + 48
    fedb4920 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (e47b40, 1, e47b40, 0, 0, 0) + 160
    fee3d730 __1cFMutexbLwait_for_lock_blocking_implementation6MpnKJavaThread__v_ (ff1cc1d0, 5000, 5280, 5000, 5100, 0) + a0
    fecd53d4 __1cFMutexElock6M_v_ (357a0, 8e983e33, 5, 359e8, 4b80e0, 1) + 144
    fee2cd88 __1cIVMThreadHexecute6FpnMVM_Operation__v_ (4518, 3eec0, 0, 0, 1, 0) + 22c
    fed2ac5c __1cbCTwoGenerationCollectorPolicyRmem_allocate_work6MIii_pnIHeapWord__ (3edd8, ff1d4a84, ff1d4a14, ff133fa6, 4800,
    49b4) + 160
    fed212d8 __1cNinstanceKlassRallocate_instance6MpnGThread__pnPinstanceOopDesc__ (f5844be0, e47b40, e47b40, 2000, fea62000, 0)
    + 180
    fed33300 __1cLOptoRuntimeFnew_C6FpnMklassOopDesc_pnKJavaThread__v_ (f5844bd8, e47b40, e47b40, ff182000, 4, ebbff0f0) + 28
    f9c35a58 ???????? (f5844bd8, ebbff1d8, 0, f9c152a0, 9a08144, ebbff3b0)
    fa03344c ???????? (f2d65980, 1, f3464178, 1271, 1e6, f3468188)
    f9f7cbcc ???????? (f3463f60, f68f0dd8, d0a, f68f11a4, 1000, f2c09c00)
    f9f65970 ???????? (f3463f60, f2c09c00, f3463f30, ff182000, 4, 0)
    f9d24358 ???????? (f1800010, f2c09c00, f3463f30, 0, 0, 0)
    f9d80da4 ???????? (f3463f98, f3464178, 0, 2000, f66d14e0, 1046e70)
    f9e1e8b8 ???????? (f5847248, f3464178, 0, 2000, fea62000, 1046e70)
    f9fc9100 ???????? (f3464010, dc, d9a028, aa77ac, 0, 0)
    f9ca5004 ???????? (f3464010, b6, 0, f9c152a0, 9a08144, ebbff3b0)
    f9c05804 ???????? (f3082548, b6, 1ad, f9c15e98, 8, ebbff438)
    f9c05804 ???????? (ebbff5e4, b6, 0, f9c15e50, c, ebbff4d8)
    f9c05804 ???????? (ebbff684, f5a278a8, 0, f9c15e50, c, ebbff580)
    f9c05a8c ???????? (ebbff784, b7, 0, f9c163d0, c, ebbff610)
    f9c05804 ???????? (ebbff808, f2fda930, 0, f9c160d0, c, ebbff718)
    f9c46680 ???????? (f20e46e0, f314a0b8, f314a1e8, f314a1e8, c, ebbff7c0)
    f9ed8808 ???????? (f6262b60, f3050338, f30673f0, f314a1e8, f5e65798, f3125270)
    f9eb709c ???????? (f314a280, b6, 0, f9c15e50, 8, ebbff848)
    f9c056e4 ???????? (ebbffa1c, b6, 0, f9c15e98, 1c, ebbff8c8)
    f9c05804 ???????? (ebbffaa4, f58212b8, 0, f9c16250, 8, ebbff9b8)
    f9c05a8c ???????? (ebbffb24, f58212b8, 0, f9c163d0, 4, ebbffa40)
    f9c05a8c ???????? (ebbffb9c, 0, 0, f9c163d0, 4, ebbffac8)
    f9c0010c ???????? (ebbffc28, ebbffe90, a, f5821ca0, 4, ebbffb40)
    fed5bd28 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (ebbffe88, ebbffcf0, eb
   bffda8, e47b40, e47b40, ebbffd00) + 27c
    fee4a5b4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff18200
   0, 8d8228, ebbffd9c, ebbffd98, ebbffda8, e47b40) + 164
    fee5d810 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (ebbffe88, ebbffe84,
    ebbffe7c, ebbffe74, ebbffe6c, e47b40) + 6c
    fee6eb60 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (e47b40, e47b40, 65a9c0, 8d8228, 319e24, fee68140) + 128
    fee68168 __1cKJavaThreadDrun6M_v_ (e47b40, 2a, 40, 0, 40, 0) + 284
    fee64648 _start   (e47b40, 0, 0, 0, 0, 0) + 134
    ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
   

## Look at the bottom of each stack for the "_start" string - the first number in ( ) is the tid:
##

   lwp# 42 -> e47b40
   lwp# 28 -> ad8d08

## Run a thread dump (kill -3 15894)
##

oracle.portal-> kill -3 15894

The thread dump output is in /u01/app/oracle/product/10gAS/opmn/logs/OC4J~OC4J_CustomApp~default_island~1

Here are the relevant thread dumps (search for TID's obtained from pstack --> e47b40 and ad8d08):

   "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 nid=0x2a runnable [ebbff000..ebbffc28]
           at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
           - waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
           at java.nio.charset.CoderResult$Cache.access$200(CoderResult.java:178)
           at java.nio.charset.CoderResult.unmappableForLength(CoderResult.java:233)
           at sun.nio.cs.ext.DoubleByteEncoder.encodeArrayLoop(DoubleByteEncoder.java:108)
           at sun.nio.cs.ext.DoubleByteEncoder.encodeLoop(DoubleByteEncoder.java:162)
           at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:536)
           at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:384)
           at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
           - locked <0xf34640a8> (a java.io.OutputStreamWriter)
           at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
           at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
           - locked <0xf34640a8> (a java.io.OutputStreamWriter)
           at java.io.BufferedWriter.write(BufferedWriter.java:125)
           - locked <0xf34640a8> (a java.io.OutputStreamWriter)
           at java.io.PrintWriter.write(PrintWriter.java:179)
           - locked <0xf34640c0> (a java.io.BufferedWriter)
           at com.customapp.sw.retrieveFile.doGet(retrieveFile.java:48)
           at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
           at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
           at com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:824)
           at com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:330)
           at com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:830)
           at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:224)
           at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:133)
           at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:192)
           at java.lang.Thread.run(Thread.java:534)
   
   
   "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
           at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
           - waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)
           at java.nio.charset.CoderResult$Cache.access$200(CoderResult.java:178)
           at java.nio.charset.CoderResult.unmappableForLength(CoderResult.java:233)
           at sun.nio.cs.ext.DoubleByteEncoder.encodeArrayLoop(DoubleByteEncoder.java:108)
           at sun.nio.cs.ext.DoubleByteEncoder.encodeLoop(DoubleByteEncoder.java:162)
           at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:536)
           at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:384)
           at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
           - locked <0xf345d1c8> (a java.io.OutputStreamWriter)
           at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
           at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
           - locked <0xf345d1c8> (a java.io.OutputStreamWriter)
           at java.io.BufferedWriter.write(BufferedWriter.java:125)
           - locked <0xf345d1c8> (a java.io.OutputStreamWriter)
           at java.io.PrintWriter.write(PrintWriter.java:179)
           - locked <0xf345d1e0> (a java.io.BufferedWriter)
           at com.customapp.sw.retrieveFile.doGet(retrieveFile.java:48)
           at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
           at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
           at com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:824)
           at com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:330)
           at com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:830)
           at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:224)
           at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:133)
           at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:192)
           at java.lang.Thread.run(Thread.java:534)

##
## Conclusion:
##
## Both of these threads are CUSTOM progs that do FILE downloads:
##
##    com.customapp.sw.retrieveFile.doGet
##
## they cause heavy locking and CPU spins as evident by TRUSS:
##   ( note the number of calls to lwp_mutex_wakeup and lwp_mutex_lock )

oracle.portal-> truss -c -p 15894
^Csignals ------------
SIGPIPE          311
total:           311

syscall               seconds   calls  errors
read                     .005     316
close                    .000       3
fcntl                    .000       3
lwp_unpark               .001     311
poll                     .008     368      1
sigprocmask              .008     622
setcontext               .001     311
lwp_suspend              .000       1
lwp_continue             .000       1
lwp_mutex_wakeup        3.079  444240
lwp_mutex_lock          3.754  445927      3
lwp_cond_wait            .016    1001    495
lwp_cond_signal          .004     304
lwp_cond_broadcast       .001     133
accept                   .000       3
send                     .005     325    311
setsockopt               .000       6
                     --------  ------   ----
sys totals:             6.888  893875    810
usr time:               4.413
elapsed:               20.490
[edited by: Vitaliy at 17:55 (CST) on Jul. 03, 2007]
9772
2009-JAN-07 18:22:33
User
 
 
Registered On: Jan 2009
Total Posts: 7
Why do you do kill -3  will this not kill the app, ow to debug without killing 
the app.....
9774
2009-JAN-07 18:27:12
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
kill -3 will just generate a debug file not kill the process ...
9775
2009-JAN-07 18:32:54
User
 
 
Registered On: Jan 2009
Total Posts: 7
> kill -3 will just generate a debug file not kill the process ...

 From man signal.h (on Solaris 10)    The signals currently defined by 
<signal.h> are as follows:

     Name             Value   Default    Event
     SIGHUP           1       Exit       Hangup (see termio(7I))
     SIGINT           2       Exit       Interrupt (see termio(7I))
     SIGQUIT          3       Core       Quit (see termio(7I))

So the action is to generate a core.....
9776
2009-JAN-07 18:48:58
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
I did this writeup on:

# uname -a
SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240



Here's more details about "kill -3" "method"

An excerpt from:

   solaris_performanceandtools_ch3_processes.pdf


   In previous versions of the JVM, one option is to dump all the stacks on the running
   VM by sending a SIGQUIT (signal number 3) to the Java process with the
   kill command. This dumps the stacks for all VM threads to the standard error as
   shown below:
   
   
      # kill -3 <pid>
      Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed mode):
      "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on condition
      [0..0]
      "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait()
      [f2b7f000..f2b7fc24]
      at java.lang.Object.wait(Native Method)
      - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
      at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
      - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
      at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
      at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
      "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in Object.wait()
      [facff000..facffc24]
      at java.lang.Object.wait(Native Method)
      - waiting on <f2c00380> (a java.lang.ref.Reference$Lock)
      at java.lang.Object.wait(Object.java:426)
      at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113)
      - locked <f2c00380> (a java.lang.ref.Reference$Lock)
      "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc]
      at testMain.doit2(testMain.java:12)
      at testMain.main(testMain.java:64)
      "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable
      "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable
      "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable
   
   If the top of the stack for a number of threads terminates in a monitor call, this
   is the place to drill down and determine what resource is being contended. Sometimes
   removing a lock that protects a hot structure can require many architectural
   changes that are not possible. The lock might even be in a third-party library
   over which you have no control. In such cases, multiple instances of the application
   are probably the best way to achieve scaling.

PS: do you have a test box to try it out?  That would clear it up :)
[edited by: Vitaliy at 18:49 (CST) on Jan. 07, 2009]
9777
2009-JAN-07 18:59:31
User
 
 
Registered On: Jan 2009
Total Posts: 7
> I did this writeup on:
> 
> # uname -a
> SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240
> 
> 
> 
> Here's more details about "kill -3" "method"
> 
> An excerpt from:
> 
>    solaris_performanceandtools_ch3_processes.pdf
> 
> 
>    In previous versions of the JVM, one option is to dump all the stacks on 
> the running
>    VM by sending a SIGQUIT (signal number 3) to the Java process with the
>    kill command. This dumps the stacks for all VM threads to the standard 
> error as
>    shown below:
>    
>    
>       # kill -3 <pid>
>       Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed 
> mode):
>       "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on 
> condition
>       [0..0]
>       "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait()
>       [f2b7f000..f2b7fc24]
>       at java.lang.Object.wait(Native Method)
>       - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
>       - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
>       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>       "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in 
> Object.wait()
>       [facff000..facffc24]
>       at java.lang.Object.wait(Native Method)
>       - waiting on <f2c00380> (a java.lang.ref.Reference$Lock)
>       at java.lang.Object.wait(Object.java:426)
>       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113)
>       - locked <f2c00380> (a java.lang.ref.Reference$Lock)
>       "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc]
>       at testMain.doit2(testMain.java:12)
>       at testMain.main(testMain.java:64)
>       "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable
>       "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable
>       "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable
>    
>    If the top of the stack for a number of threads terminates in a monitor 
> call, this
>    is the place to drill down and determine what resource is being 
> contended. Sometimes
>    removing a lock that protects a hot structure can require many 
> architectural
>    changes that are not possible. The lock might even be in a third-party 
> library
>    over which you have no control. In such cases, multiple instances of the 
> application
>    are probably the best way to achieve scaling.
> 
> PS: do you have a test box to try it out?  That would clear it up :)

A few moments ago, I did in fact try this.
I loved the writeup and I admire your patience. I may have a test box, I chose 
the Xorg process...
 root   807   805   0   Jan 06 ?           0:50 /usr/X11/bin/Xorg :0 -depth 24 
-nobanner -auth /var/dt/A:0-fHaOKb

# kill -3 807
# ps -ef | grep 807
guess it was, killed...

I then tried on man signal.h (pid 1662)
root  1860  1662   0 18:13:43 pts/2       0:00 man signal.h

# kill -3 1662
# ps -ef | grep 1662
    root  1662  1658   0 12:43:54 pts/2       0:00 ksh -o vi
    root  1860  1662   0 18:13:43 pts/2       0:00 man signal.h

I guess it's still there.....
I however did not see the core, so, I used the gcore command...

# gcore 1662
gcore: core.1662 dumped

and it was in my pwd...
-rw-r--r--   1 root     root     1927425 Jan  7 18:51 core.1662



# pstack core.1662
core 'core.1662' of 1662:       ksh -o vi
 d0f2add5 waitid   (7, 0, 8047b10, f)
 d0f1c666 waitpid  (ffffffff, 8047c00, c) + 75
 08063b8b job_wait (744) + 169
 0806df87 sh_exec  (808f028, 14) + 115b
 080669a0 ???????? (8047e74, 8047de4, d0ffc818, 8047dd4, 8047dd4, d0fe0c5b)
 080664b0 main     (3, 8047e28, 8047e38) + 6cc
 080570b6 ???????? (3, 8047edc, 8047ee0, 8047edc, 0, 8047ee6)
# pflags core.1662
core 'core.1662' of 1662:       ksh -o vi
        data model = _ILP32  flags = RLC|MSACCT|MSFORK
 /1:    flags = STOPPED|ISTOP|ASLEEP  waitid(0x7,0x0,0x8047b10,0xf)
        why = PR_REQUESTED

So I guess kill -3 is for a java thing?
I read that one has to kill sub processes, is this true, a sub process is a 
parent for something else though!
9778
2009-JAN-07 19:01:03
User
 
 
Registered On: Jan 2009
Total Posts: 7
> > I did this writeup on:
> > 
> > # uname -a
> > SunOS io 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V240
> > 
> > 
> > 
> > Here's more details about "kill -3" "method"
> > 
> > An excerpt from:
> > 
> >    solaris_performanceandtools_ch3_processes.pdf
> > 
> > 
> >    In previous versions of the JVM, one option is to dump all the stacks 
> on 
> > the running
> >    VM by sending a SIGQUIT (signal number 3) to the Java process with the
> >    kill command. This dumps the stacks for all VM threads to the standard 
> 
> > error as
> >    shown below:
> >    
> >    
> >       # kill -3 <pid>
> >       Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed 
> > mode):
> >       "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on 
> > condition
> >       [0..0]
> >       "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait()
> >       [f2b7f000..f2b7fc24]
> >       at java.lang.Object.wait(Native Method)
> >       - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
> >       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
> >       - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)
> >       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
> >       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> >       "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in 
> > Object.wait()
> >       [facff000..facffc24]
> >       at java.lang.Object.wait(Native Method)
> >       - waiting on <f2c00380> (a java.lang.ref.Reference$Lock)
> >       at java.lang.Object.wait(Object.java:426)
> >       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113)
> >       - locked <f2c00380> (a java.lang.ref.Reference$Lock)
> >       "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc]
> >       at testMain.doit2(testMain.java:12)
> >       at testMain.main(testMain.java:64)
> >       "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable
> >       "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable
> >       "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable
> >    
> >    If the top of the stack for a number of threads terminates in a 
> monitor 
> > call, this
> >    is the place to drill down and determine what resource is being 
> > contended. Sometimes
> >    removing a lock that protects a hot structure can require many 
> > architectural
> >    changes that are not possible. The lock might even be in a third-party 
> 
> > library
> >    over which you have no control. In such cases, multiple instances of 
> the 
> > application
> >    are probably the best way to achieve scaling.
> > 
> > PS: do you have a test box to try it out?  That would clear it up :)
> 
> A few moments ago, I did in fact try this.
> I loved the writeup and I admire your patience. I may have a test box, I 
> chose the Xorg process...
>  root   807   805   0   Jan 06 ?           0:50 /usr/X11/bin/Xorg :0 -depth 
> 24 -nobanner -auth /var/dt/A:0-fHaOKb
> 
> # kill -3 807
> # ps -ef | grep 807
> guess it was, killed...
> 
> I then tried on man signal.h (pid 1662)
> root  1860  1662   0 18:13:43 pts/2       0:00 man signal.h
> 
> # kill -3 1662
> # ps -ef | grep 1662
>     root  1662  1658   0 12:43:54 pts/2       0:00 ksh -o vi
>     root  1860  1662   0 18:13:43 pts/2       0:00 man signal.h
> 
> I guess it's still there.....
> I however did not see the core, so, I used the gcore command...
> 
> # gcore 1662
> gcore: core.1662 dumped
> 
> and it was in my pwd...
> -rw-r--r--   1 root     root     1927425 Jan  7 18:51 core.1662
> 
> 
> 
> # pstack core.1662
> core 'core.1662' of 1662:       ksh -o vi
>  d0f2add5 waitid   (7, 0, 8047b10, f)
>  d0f1c666 waitpid  (ffffffff, 8047c00, c) + 75
>  08063b8b job_wait (744) + 169
>  0806df87 sh_exec  (808f028, 14) + 115b
>  080669a0 ???????? (8047e74, 8047de4, d0ffc818, 8047dd4, 8047dd4, d0fe0c5b)
>  080664b0 main     (3, 8047e28, 8047e38) + 6cc
>  080570b6 ???????? (3, 8047edc, 8047ee0, 8047edc, 0, 8047ee6)
> # pflags core.1662
> core 'core.1662' of 1662:       ksh -o vi
>         data model = _ILP32  flags = RLC|MSACCT|MSFORK
>  /1:    flags = STOPPED|ISTOP|ASLEEP  waitid(0x7,0x0,0x8047b10,0xf)
>         why = PR_REQUESTED
> 
> So I guess kill -3 is for a java thing?
> I read that one has to kill sub processes, is this true, a sub process is a 
> parent for something else though!

# uname -a
SunOS xxx 5.10 Generic_137138-09 i86pc i386 i86pc
#
# cat /etc/release
                       Solaris 10 10/08 s10x_u6wos_07b X86
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 27 October 2008
#
# prtconf
System Configuration:  Sun Microsystems  i86pc
Memory size: 1024 Megabytes
9779
2009-JAN-07 19:05:01
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
Interesting ... I think you need to try it on a java process to be certain what 
"kill -3" now does on Solaris 10 ...

Thanks for all the details!
9780
2009-JAN-07 19:16:50
User
 
 
Registered On: Jan 2009
Total Posts: 7
> Interesting ... I think you need to try it on a java process to be certain 
> what "kill -3" now does on Solaris 10 ...
> 
> Thanks for all the details!


I will, I'll do it on a tomcat instance....some questions from that document 
you mentioned..., it says:
"If the top of the stack for a number of threads terminates in a monitor call, 
this
is the place to drill down and determine what resource is being contended. 
Some-
times removing a lock that protects a  hot structure can require many architec-
tural changes that are not possible. The lock might even be in a third-party 
library
over which you have no control. In such cases, multiple instances of the 
applica-
tion are probably the best way to achieve scaling."

What is a  "monitor call"
By, "multiple instances of the applica-
tion are probably the best way to achieve scaling." do they mean a failover 
scenario, in which if one fails, another is available...that would be high(er!) 
availibility...how is scaling relevant?
9782
2009-JAN-07 19:35:52
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
> some questions from that document you mentioned..., it says:
>
>    "If the top of the stack for a number of threads terminates in a monitor 
>     call, this is the place to drill down and determine what resource 
>     is being contended. "
> What is a  "monitor call"


This is very similar to the case we had with our download procedure -- it 
was taking too long and we had too many requests for it under the same JVM, 
so there were number of locks and waits for it.  Here's the pstack output:


   -----------------  lwp# 28 / thread# 28  --------------------
    ff32033c lwp_mutex_lock (35700)
    fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, ad8d08, f0cfee10, 0, 0) + 48
    fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 357d0, ad8d08, 35768, 4b80e0, 0) + 234
    fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
                  ^^^^^^^^^^^^

And here's what it looks like when it's dumpped using "kill -3":

   "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 nid=0x2a runnable [ebbff000..ebbffc28]
           at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
           - waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)


   "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
           at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
           - waiting to lock <0xf2d65980> (a java.nio.charset.CoderResult$2)


>    "The lock might even be in a third-party library over which you have no 
>     control. In such cases, multiple instances of the  application are 
>     probably the best way to achieve scaling."
>
> 
> By, "multiple instances of the application are probably the best way to achieve scaling." 
> do they mean a failover scenario, in which if one fails, another is available...that would be 
> high(er!) availibility...how is scaling relevant?

What they are saying is that if you cannot tune/reduce the lock 
on the resource the only option you have is to create multiple JVMs and let 
the APP server load-balance between them to reduce contention for the 
"slow/lousy" resource :)

HTH!
9784
2009-JAN-07 20:02:00
User
 
 
Registered On: Jan 2009
Total Posts: 7
> > some questions from that document you mentioned..., it says:
> >
> >    "If the top of the stack for a number of threads terminates in a 
> monitor 
> >     call, this is the place to drill down and determine what resource 
> >     is being contended. "
> > What is a  "monitor call"
> 
> 
> This is very similar to the case we had with our download procedure -- it 
> was taking too long and we had too many requests for it under the same JVM, 
> 
> so there were number of locks and waits for it.  Here's the pstack output:
> 
> 
>    -----------------  lwp# 28 / thread# 28  --------------------
>     ff32033c lwp_mutex_lock (35700)
>     fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, 
> ad8d08, f0cfee10, 0, 0) + 48
>     fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 
> 357d0, ad8d08, 35768, 4b80e0, 0) + 234
>     fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
>                   ^^^^^^^^^^^^
> 
> And here's what it looks like when it's dumpped using "kill -3":
> 
>    "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 
> nid=0x2a runnable [ebbff000..ebbffc28]
>            at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
>            - waiting to lock <0xf2d65980> (a 
> java.nio.charset.CoderResult$2)
> 
> 
>    "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 
> nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
>            at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
>            - waiting to lock <0xf2d65980> (a 
> java.nio.charset.CoderResult$2)
> 
> 
> >    "The lock might even be in a third-party library over which you have 
> no 
> >     control. In such cases, multiple instances of the  application are 
> >     probably the best way to achieve scaling."
> >
> > 
> > By, "multiple instances of the application are probably the best way to 
> achieve scaling." 
> > do they mean a failover scenario, in which if one fails, another is 
> available...that would be 
> > high(er!) availibility...how is scaling relevant?
> 
> What they are saying is that if you cannot tune/reduce the lock 
> on the resource the only option you have is to create multiple JVMs and let 
> 
> the APP server load-balance between them to reduce contention for the 
> "slow/lousy" resource :)
> 
> HTH!


thanks! So the app server load balances, is this true for all app servers?

A document, titled, Sun Gathering Debug Data for Sun Java SystemWeb Server,  
June 2008, from:http://dlc.sun.com/pdf/820-4983/820-4983.pdf 
 (mentions gcore)

Get core files and the output of the following commands.
If a process hangs, it is helpful to compare several core iles to review the 
state of the threads over time. Make a copy of the core ile to a new name, wait 
for approximately oneminute then rerun the following commands, so that the core 
files are not overwritten.Do this three times to obtain three core files.

Solaris: 
cd server-root/bin/https/bin;
gcore -o /tmp/web-core web-pid;
pstack /tmp/web-core

Some docs are listed on:  they are probably for Sun's java app/web servers, 
good references.....
9785
2009-JAN-07 20:14:48
Moderator
 
 
Registered On: Mar 2006
Total Posts: 267
> thanks! So the app server load balances, is this true for all app servers?
> 

I've only worked with Oracle APP servers and the latest one does it very nicely 
-- it can either balance across the local JVM or JVMs running on clustered APP 
servers.  The best approach is to use hybrid solution where you load balance in 
two places:

                                 +--> jvm1
                       +--> app1 |--> jvm2
                       |         +--> jvmN
  F5 load balancer --> |
                       |         +--> jvm1
                       +--> app2 |--> jvm2
                                 +--> jvmN

Here, we balance across two clustered APP server nodes using LBR and each node 
does it's own local balancing using multiple (local) JVMs.  You have to manage 
session state in the database for this to work because LBR will span both APP 
nodes.  On the database you run Oracle RAC and everybody is happy :)

HTH.
[edited by: Vitaliy at 12:56 (CST) on Feb. 06, 2009]
9786
2009-JAN-07 20:09:09
User
 
 
Registered On: Jan 2009
Total Posts: 7
> > > some questions from that document you mentioned..., it says:
> > >
> > >    "If the top of the stack for a number of threads terminates in a 
> > monitor 
> > >     call, this is the place to drill down and determine what resource 
> > >     is being contended. "
> > > What is a  "monitor call"
> > 
> > 
> > This is very similar to the case we had with our download procedure -- it 
> 
> > was taking too long and we had too many requests for it under the same 
> JVM, 
> > 
> > so there were number of locks and waits for it.  Here's the pstack 
> output:
> > 
> > 
> >    -----------------  lwp# 28 / thread# 28  --------------------
> >     ff32033c lwp_mutex_lock (35700)
> >     fecca928 __1cFMutexbClock_without_safepoint_check6M_v_ (356d0, 1, 
> > ad8d08, f0cfee10, 0, 0) + 48
> >     fedb49f4 __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (ad8d08, 
> > 357d0, ad8d08, 35768, 4b80e0, 0) + 234
> >     fed94320 __1cHMonitorEwait6Mil_i_ (0, 0, cb62c0, 4000, 4194, 0) + 5c0
> >                   ^^^^^^^^^^^^
> > 
> > And here's what it looks like when it's dumpped using "kill -3":
> > 
> >    "AJPRequestHandler-ApplicationServerThread-23" prio=5 tid=0x00e47b40 
> > nid=0x2a runnable [ebbff000..ebbffc28]
> >            at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
> >            - waiting to lock <0xf2d65980> (a 
> > java.nio.charset.CoderResult$2)
> > 
> > 
> >    "AJPRequestHandler-ApplicationServerThread-9" prio=5 tid=0x00ad8d08 
> > nid=0x1c waiting for monitor entry [f0cff000..f0cffc28]
> >            at java.nio.charset.CoderResult$Cache.get(CoderResult.java:185)
> >            - waiting to lock <0xf2d65980> (a 
> > java.nio.charset.CoderResult$2)
> > 
> > 
> > >    "The lock might even be in a third-party library over which you have 
> 
> > no 
> > >     control. In such cases, multiple instances of the  application are 
> > >     probably the best way to achieve scaling."
> > >
> > > 
> > > By, "multiple instances of the application are probably the best way to 
> 
> > achieve scaling." 
> > > do they mean a failover scenario, in which if one fails, another is 
> > available...that would be 
> > > high(er!) availibility...how is scaling relevant?
> > 
> > What they are saying is that if you cannot tune/reduce the lock 
> > on the resource the only option you have is to create multiple JVMs and 
> let 
> > 
> > the APP server load-balance between them to reduce contention for the 
> > "slow/lousy" resource :)
> > 
> > HTH!
> 
> 
> thanks! So the app server load balances, is this true for all app servers?
> 
> A document, titled, Sun Gathering Debug Data for Sun Java SystemWeb Server, 
>  June 2008, from:http://dlc.sun.com/pdf/820-4983/820-4983.pdf 
>  (mentions gcore)
> 
> Get core files and the output of the following commands.
> If a process hangs, it is helpful to compare several core iles to review 
> the state of the threads over time. Make a copy of the core ile to a new 
> name, wait for approximately oneminute then rerun the following commands, 
> so that the core files are not overwritten.Do this three times to obtain 
> three core files.
> 
> Solaris: 
> cd server-root/bin/https/bin;
> gcore -o /tmp/web-core web-pid;
> pstack /tmp/web-core
> 
> Some docs are listed on:  they are probably for Sun's java app/web servers, 
> good references.....

lots of docs here: http://docs.sun.com/...ocs/prod/websrv?l=en