High load average in test server

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

High load average in test server

satish
Dear erman,

This is status of our test instance.

orauat@node1 u02]$ top
top - 09:25:24 up 832 days, 16:54,  3 users,  load average: 1556.44, 1555.65, 1554.67
Tasks: 8872 total,   1 running, 8870 sleeping,   0 stopped,   1 zombie
%Cpu(s):  5.3 us,  1.0 sy,  0.0 ni, 92.8 id,  0.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13173105+total,  1558120 free, 29288668 used, 10088426+buff/cache
KiB Swap: 26843545+total, 26841614+free,    19308 used. 71956976 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3127 orauat    20   0  166920  10924   1500 R  42.3  0.0   0:00.12 top
20074 root      rt   0 1019840 125764  81692 S  23.1  0.1   1293:44 osysmond.bin
19968 griduat   -2   0 1380312  15716  13412 S   7.7  0.0 737:27.47 asm_vktm_+asm1
 7659 orauat    20   0 1390184  48756  24108 S   3.8  0.0 236:21.37 oraagent.bin
17832 root      20   0 1885056  91288  32044 S   3.8  0.1 518:39.10 ohasd.bin
18207 griduat   20   0 1254876  43608  22944 S   3.8  0.0 226:59.27 oraagent.bin
20172 root      20   0 1117504  35796  19148 S   3.8  0.0 477:11.80 orarootagent.bi
    1 root      20   0  196936   9816   3788 S   0.0  0.0 773:14.62 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   4:14.23 kthreadd


[orauat@node1 u02]$ sar -u 2 5
Linux 3.10.0-514.el7.x86_64 (node1.ttd.com)         Friday 01 October 2021  _x86_64_        (8 CPU)

09:25:47  IST     CPU     %user     %nice   %system   %iowait    %steal     %idle
09:25:49  IST     all      1.07      0.00      3.22      0.13      0.00     95.59
09:25:51  IST     all      5.17      0.00      9.14      0.06      0.00     85.62
09:25:53  IST     all      2.84      0.00      5.24      0.06      0.00     91.85
09:25:55  IST     all      1.45      0.00      2.14      0.13      0.00     96.28
09:25:57  IST     all      3.98      0.00      0.63      0.06      0.00     95.33
Average:        all      2.90      0.00      4.08      0.09      0.00     92.93
[orauat@node1 u02]$


[orauat@node1 u02]$ ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep " D" |wc -l
1556
[orauat@node1 u02]$

Many such below processes

32153 32158 root     D     0.0 lsof            rpc_wait_bit_killable
32173 32177 root     D     0.0 lsof            rpc_wait_bit_killable
32201 32205 root     D     0.0 lsof            rpc_wait_bit_killable
32198 32207 root     D     0.0 lsof            rpc_wait_bit_killable
32226 32230 root     D     0.0 lsof            rpc_wait_bit_killable
32231 32235 root     D     0.0 lsof            rpc_wait_bit_killable
32237 32241 root     D     0.0 lsof            rpc_wait_bit_killable
32253 32257 root     D     0.0 lsof            rpc_wait_bit_killable
32321 32325 root     D     0.0 lsof            rpc_wait_bit_killable
32345 32349 root     D     0.0 lsof            rpc_wait_bit_killable
32350 32354 root     D     0.0 lsof            rpc_wait_bit_killable
32366 32370 root     D     0.0 lsof            rpc_wait_bit_killable
32395 32399 root     D     0.0 lsof            rpc_wait_bit_killable
32401 32405 root     D     0.0 lsof            rpc_wait_bit_killable
32463 32467 root     D     0.0 lsof            rpc_wait_bit_killable
32473 32477 root     D     0.0 lsof            rpc_wait_bit_killable
32497 32501 root     D     0.0 lsof            rpc_wait_bit_killable
32533 32537 root     D     0.0 lsof            rpc_wait_bit_killable
32532 32541 root     D     0.0 lsof            rpc_wait_bit_killable
32572 32577 root     D     0.0 lsof            rpc_wait_bit_killable
32591 32595 root     D     0.0 lsof            rpc_wait_bit_killable
32624 32628 root     D     0.0 lsof            rpc_wait_bit_killable
32647 32651 root     D     0.0 lsof            rpc_wait_bit_killable
32663 32667 root     D     0.0 lsof            rpc_wait_bit_killable
32668 32672 root     D     0.0 lsof            rpc_wait_bit_killable
32681 32685 root     D     0.0 lsof            rpc_wait_bit_killable
32715 32719 root     D     0.0 lsof            rpc_wait_bit_killable
32725 32729 root     D     0.0 lsof            rpc_wait_bit_killable
32737 32741 root     D     0.0 lsof            rpc_wait_bit_killable
[orauat@node1 u02]$


Our concern is does these processes with D state take cpu?
And how does it impact system?

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: High load average in test server

ErmanArslansOracleBlog
Administrator
D state is a special sleep mode..
In D state , the code can not be interrupted..
When the process in D state, actually It seems blocked from our perspective, but actually nothing is blocked inside the kernel.
For example, when a process issues an I/O operation , the kernel is triggered to run the relevant system call..
This code goes from filename to filesystem, from filesystem to block device and device driver, and then device driver sends the command to the hardware to fetch a block on disk.
The process, on the other hand ; is put in sleeping state (D). When the data is fetched, the process is put in runnable state again. After this point, the process will run(continue its work) when the scheduler allow it to.
D state processes,  can not be killed with kill signals..

See -> https://ermanarslan.blogspot.com/2013/08/linux-d-state-processes.html

So probably, you have an I/O problem.. Check I/O subsystem. (including NFS)

Reply | Threaded
Open this post in threaded view
|

Re: High load average in test server

satish
does these processes with D state hold cpu?
Reply | Threaded
Open this post in threaded view
|

Re: High load average in test server

ErmanArslansOracleBlog
Administrator
Read -> https://ermanarslan.blogspot.com/2014/03/linux-d-state-processes-and-load-average.html

I wrote about that subject in detail in the above post.
Reply | Threaded
Open this post in threaded view
|

Re: High load average in test server

satish
Excellent article.

Second, Cpu load is different than Cpu utilization.. This explains why we see Cpu as idle even if the load average is high. The difference can be explained by the following picture:

So , we can have free cpu cyles in system but we should have to wait for other things.. These situation will lead our load average to increase while our cpu utilization does not change..  


What are those other things and will it impact the performance of application?
Reply | Threaded
Open this post in threaded view
|

Re: High load average in test server

ErmanArslansOracleBlog
Administrator
Your last question is so generic. Basically anything in the access path can impact the performans of an application..
applicaton is in the top layer, so anything below that laye may impact.. Apps Node OS, Network, Database, Database OS, Storage network, Storage controllers, filers and so on.