high load average

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

high load average

satish
This post was updated on .
Dear Erman,

Today we have faced a situation where we could see both our linux servers(running R12.2 apps) had high load average around 60%.we have verified if any apps process was consuming CPU but none of them are consuming CPU.we could see idle CPU was 99% in both the linux servers.

when we try to open application,we are getting gateway timeout error.we are not able to run the commands on server and  finally we have done the reboot.Later the load average was normal and application worked as usual

How the load average is high when CPU was idle?is this normal?can you please let us know what all to check.because we didnt find any application process that was consuming high cpu or memory.

collected during the issue:


[applprod@node1 ~]$ sar -u 2 5
Linux 3.10.0-514.el7.x86_64 (erpprodapp01.ttd.com)      Wednesday 18 July 2018 _x86_64_ (8 CPU)

10:24:32  IST     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:24:34  IST     all      0.19      0.00      0.13      0.00      0.00     99.69
10:24:36  IST     all      0.19      0.00      0.13      0.00      0.00     99.69
10:24:38  IST     all      0.06      0.00      0.19      0.00      0.00     99.75
10:24:40  IST     all      0.19      0.00      0.13      0.00      0.00     99.69
10:24:42  IST     all      0.19      0.00      0.13      0.00      0.00     99.69
Average:        all      0.16      0.00      0.14      0.00      0.00     99.70



[applprod@node1 ~]$ top
top - 10:16:48 up 85 days, 21:50,  4 users,  load average: 54.83, 54.09, 52.72
Tasks: 541 total,   1 running, 540 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.2 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65673132 total,  2327608 free, 18032896 used, 45312628 buff/cache
KiB Swap: 26843545+total, 26832688+free,   108564 used. 44360980 avail Mem


Thanks for all the support here
Reply | Threaded
Open this post in threaded view
|

Re: high load average

ErmanArslansOracleBlog
Administrator
I think you should read more about Linux's Load average..
It is not required to have a high CPU load , in order to have a High load average.
Network activity and blocked processes can also cause high load average (without increasing the CPU usage)

So , this load average you see may be caused by some blocked processes (probably around 50 of them)

Check your NFS mount points(if there are any) , check all the processes ("D" states, "Z" states"), check your network activity, and you will find the reason.
Reply | Threaded
Open this post in threaded view
|

Re: high load average

satish
Thanks for the update erman.

Processes running during the issue:
======================

[applprod@node1 ~]$ ps ucx
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
applprod   913  0.0  0.0 113120  1208 ?        Ss   09:15   0:00 sh
applprod   914  0.0  0.0 113120  1188 ?        S    09:15   0:00 sh
applprod   915  0.0  0.0 113120   564 ?        S    09:15   0:00 sh
applprod   916  0.0  0.0 151056  1840 ?        D    09:15   0:00 ps
applprod   917  0.0  0.0 112652   964 ?        S    09:15   0:00 grep
applprod   918  0.0  0.0 112648   964 ?        S    09:15   0:00 grep
applprod   919  0.0  0.0 112648   948 ?        S    09:15   0:00 grep
applprod   920  0.0  0.0 113484   968 ?        S    09:15   0:00 awk
applprod   921  0.0  0.0 113484   964 ?        S    09:15   0:00 awk
applprod   922  0.0  0.0 113484   964 ?        S    09:15   0:00 awk
applprod   923  0.0  0.0 113120  1192 ?        S    09:15   0:00 sh
applprod   924  0.0  0.0 107904   664 ?        S    09:15   0:00 wc
applprod  1189  0.0  0.0 113284  1612 ?        S    Jul16   0:00 sh
applprod  1194  0.0  0.0  22556 10080 ?        S    Jul16   0:27 FNDLIBR
applprod  1245  0.0  0.0  20040  7316 ?        Ss   Jul16   0:01 FNDSM
applprod  1643  0.0  0.0  20216  7132 ?        S    Jul16   0:01 FNDIMON
applprod  1644  0.0  0.0  22912  9484 ?        S    Jul16   0:01 RCVOLTM
applprod  1645  0.0  0.0  21720  7216 ?        S    Jul16   0:00 POXCON
applprod  1646  0.0  0.0  22492  7216 ?        S    Jul16   0:01 INCTM
applprod  1648  0.0  0.3 658288 235728 ?       Sl   Jul16   2:03 java
applprod  1667  0.0  0.3 648896 232088 ?       Sl   Jul16   1:31 java
applprod  1690  0.0  0.2 658440 159612 ?       Sl   Jul16   1:40 java
applprod  1701  0.0  0.0  20868  7892 ?        S    Jul16   0:15 FNDCRM
applprod  1707  0.0  0.0  21308  8176 ?        S    Jul16   0:02 FNDSCH
applprod  1716  0.0  0.0  21688  9304 ?        S    Jul16   0:00 FNDLIBR
applprod  1717  0.0  0.0  22904  8540 ?        S    Jul16   0:00 INVLIBR
applprod  1718  0.0  0.0  20376  7164 ?        S    Jul16   0:00 PALIBR
applprod  1719  0.0  0.0  20328  7572 ?        S    Jul16   0:00 FNDLIBR
applprod  1720  0.0  0.0  21656  9588 ?        S    Jul16   0:03 FNDLIBR
applprod  1722  0.0  0.0  21728  9508 ?        D    Jul16   0:04 FNDLIBR
applprod  1723  0.0  0.0  21840  9648 ?        S    Jul16   0:05 FNDLIBR
applprod  1724  0.0  0.0  21588  9452 ?        D    Jul16   0:03 FNDLIBR
applprod  1727  0.0  0.0  21596  9580 ?        S    Jul16   0:04 FNDLIBR
applprod  1729  0.0  0.0  21828  9632 ?        D    Jul16   0:05 FNDLIBR
applprod  1730  0.0  0.0  21612  9636 ?        S    Jul16   0:03 FNDLIBR
applprod  1731  0.0  0.0  21724  9592 ?        S    Jul16   0:04 FNDLIBR
applprod  1732  0.0  0.0  21760  9532 ?        D    Jul16   0:04 FNDLIBR
applprod  1737  0.0  0.0  21764  9616 ?        S    Jul16   0:04 FNDLIBR
applprod  1738  0.0  0.0  21808  9612 ?        D    Jul16   0:04 FNDLIBR
applprod  1739  0.0  0.0  21612  9440 ?        S    Jul16   0:04 FNDLIBR
applprod  1742  0.0  0.0  21704  9588 ?        S    Jul16   0:04 FNDLIBR
applprod  1743  0.0  0.0  21820  9700 ?        D    Jul16   0:05 FNDLIBR
applprod  1744  0.0  0.0  21376  9236 ?        S    Jul16   0:04 FNDLIBR
applprod  1745  0.0  0.0  21664  9488 ?        D    Jul16   0:04 FNDLIBR
applprod  1746  0.0  0.0  21912  9816 ?        D    Jul16   0:05 FNDLIBR
applprod  1752  0.0  0.0  21488  9312 ?        D    Jul16   0:03 FNDLIBR
applprod  1754  0.0  0.0  21780  9636 ?        S    Jul16   0:05 FNDLIBR
applprod  1756  0.0  0.0  21672  9496 ?        S    Jul16   0:04 FNDLIBR
applprod  1759  0.0  0.0  21792  9728 ?        S    Jul16   0:04 FNDLIBR
applprod  1760  0.0  0.0  21764  9716 ?        S    Jul16   0:04 FNDLIBR
applprod  1762  0.0  0.0  20332  7576 ?        S    Jul16   0:00 FNDLIBR
applprod  2625  0.0  0.0  21636  9520 ?        S    Jul17   0:04 FNDLIBR
applprod  2626  0.0  0.0  21860  9648 ?        S    Jul17   0:04 FNDLIBR
applprod  2628  0.0  0.0  21512  9392 ?        S    Jul17   0:03 FNDLIBR
applprod  2634  0.0  0.0  21680  9592 ?        D    Jul17   0:05 FNDLIBR
applprod  2636  0.0  0.0  21336  9348 ?        D    Jul17   0:04 FNDLIBR
applprod  2638  0.0  0.0  21896  9772 ?        S    Jul17   0:03 FNDLIBR
applprod  4117  0.0  0.0 113120  1208 ?        Ss   09:30   0:00 sh
applprod  4118  0.0  0.0 113120  1188 ?        S    09:30   0:00 sh
applprod  4120  0.0  0.0 113120   568 ?        S    09:30   0:00 sh
applprod  4121  0.0  0.0 151056  1840 ?        D    09:30   0:00 ps
applprod  4122  0.0  0.0 112652   960 ?        S    09:30   0:00 grep
applprod  4123  0.0  0.0 112648   964 ?        S    09:30   0:00 grep
applprod  4124  0.0  0.0 112648   948 ?        S    09:30   0:00 grep
applprod  4126  0.0  0.0 113484   968 ?        S    09:30   0:00 awk
applprod  4127  0.0  0.0 113484   968 ?        S    09:30   0:00 awk
applprod  4128  0.0  0.0 113484   964 ?        S    09:30   0:00 awk
applprod  4129  0.0  0.0 113120  1196 ?        S    09:30   0:00 sh
applprod  4130  0.0  0.0 107904   672 ?        S    09:30   0:00 wc
applprod  6114  0.0  0.0 151808  9392 ?        D    09:39   0:00 httpd.worker
applprod  7043  0.0  0.0 113120  1204 ?        Ss   09:45   0:00 sh
applprod  7044  0.0  0.0 113120  1188 ?        S    09:45   0:00 sh
applprod  7045  0.0  0.0 113120   564 ?        S    09:45   0:00 sh
applprod  7046  0.0  0.0 151056  1840 ?        D    09:45   0:00 ps
applprod  7047  0.0  0.0 112652   960 ?        S    09:45   0:00 grep
applprod  7048  0.0  0.0 112648   960 ?        S    09:45   0:00 grep
applprod  7049  0.0  0.0 112648   952 ?        S    09:45   0:00 grep
applprod  7050  0.0  0.0 113484   968 ?        S    09:45   0:00 awk
applprod  7051  0.0  0.0 113484   964 ?        S    09:45   0:00 awk
applprod  7052  0.0  0.0 113484   964 ?        S    09:45   0:00 awk
applprod  7053  0.0  0.0 113120  1192 ?        S    09:45   0:00 sh
applprod  7054  0.0  0.0 107904   664 ?        S    09:45   0:00 wc
applprod  9986  0.0  0.0 113120  1208 ?        Ss   10:00   0:00 sh
applprod  9989  0.0  0.0 113120  1184 ?        S    10:00   0:00 sh
applprod  9993  0.0  0.0 113120   564 ?        S    10:00   0:00 sh
applprod  9994  0.0  0.0 151056  1844 ?        D    10:00   0:00 ps
applprod  9995  0.0  0.0 112652   964 ?        S    10:00   0:00 grep
applprod  9996  0.0  0.0 112648   960 ?        S    10:00   0:00 grep
applprod  9997  0.0  0.0 112648   948 ?        S    10:00   0:00 grep
applprod  9999  0.0  0.0 113484   964 ?        S    10:00   0:00 awk
applprod 10000  0.0  0.0 113484   964 ?        S    10:00   0:00 awk
applprod 10001  0.0  0.0 113484   964 ?        S    10:00   0:00 awk
applprod 10002  0.0  0.0 113120  1192 ?        S    10:00   0:00 sh
applprod 10004  0.0  0.0 107904   664 ?        S    10:00   0:00 wc
applprod 12953  0.0  0.0 113120  1204 ?        Ss   10:15   0:00 sh
applprod 12954  0.0  0.0 113120  1192 ?        S    10:15   0:00 sh
applprod 12955  0.0  0.0 113120   568 ?        S    10:15   0:00 sh
applprod 12956  0.0  0.0 151056  1840 ?        D    10:15   0:00 ps
applprod 12957  0.0  0.0 112652   964 ?        S    10:15   0:00 grep
applprod 12958  0.0  0.0 112648   960 ?        S    10:15   0:00 grep
applprod 12959  0.0  0.0 112648   952 ?        S    10:15   0:00 grep
applprod 12960  0.0  0.0 113484   968 ?        S    10:15   0:00 awk
applprod 12961  0.0  0.0 113484   964 ?        S    10:15   0:00 awk
applprod 12962  0.0  0.0 113484   964 ?        S    10:15   0:00 awk
applprod 12963  0.0  0.0 113120  1192 ?        S    10:15   0:00 sh
applprod 12964  0.0  0.0 107904   672 ?        S    10:15   0:00 wc
applprod 13299  1.5  0.0 116808  3440 pts/2    S    10:16   0:00 bash
applprod 13447  0.0  0.0 151084  1864 pts/2    R+   10:16   0:00 ps
applprod 16162  0.1  0.4 19287600 313440 ?     Sl   Jul13  12:36 java
applprod 16205  0.2  1.6 6083392 1070484 ?     Ssl  Jul13  19:30 java
applprod 16530  0.0  0.0 140744 11672 ?        S    Jul13   0:00 perl
applprod 16646  0.0  0.1 471736 115820 ?       Sl   Jul13   5:41 java
applprod 16858  0.0  0.0  72976  2904 ?        Ss   Jul13   0:00 opmn
applprod 16859  0.0  0.0 1230164 12752 ?       Sl   Jul13   0:36 opmn
applprod 16913  0.0  0.0  21324  9044 ?        D    Jul17   0:01 FNDLIBR
applprod 17158  0.0  0.0 151808 19800 ?        S    Jul13   0:13 httpd.worker
applprod 17201  0.0  0.0  30532  1032 ?        S    Jul13   0:00 odl_rotatelogs
applprod 17202  0.0  0.0  30532   868 ?        S    Jul13   0:34 odl_rotatelogs
applprod 17203  0.0  0.0  30464   704 ?        S    Jul13   0:00 rotatelogs
applprod 17204  0.0  0.0  30464   708 ?        S    Jul13   0:00 rotatelogs
applprod 17285  0.0  0.0  17744  4564 ?        Ss   Jul13   0:03 tnslsnr
applprod 17317  0.0  0.0  30532   964 ?        S    Jul13   0:00 odl_rotatelogs
applprod 17318  0.0  0.0 349196  9560 ?        Sl   Jul13   0:04 httpd.worker
applprod 18069  0.0  0.0 113120  1208 ?        Ss   08:00   0:00 sh
applprod 18074  0.0  0.0 113120  1192 ?        S    08:00   0:00 sh
applprod 18078  0.0  0.0 113120   564 ?        S    08:00   0:00 sh
applprod 18079  0.0  0.0 151056  1836 ?        D    08:00   0:00 ps
applprod 18080  0.0  0.0 112648   964 ?        S    08:00   0:00 grep
applprod 18081  0.0  0.0 112648   964 ?        S    08:00   0:00 grep
applprod 18082  0.0  0.0 112648   948 ?        S    08:00   0:00 grep
applprod 18084  0.0  0.0 113484   964 ?        S    08:00   0:00 awk
applprod 18085  0.0  0.0 113484   968 ?        S    08:00   0:00 awk
applprod 18086  0.0  0.0 113484   964 ?        S    08:00   0:00 awk
applprod 18087  0.0  0.0 113120  1192 ?        S    08:00   0:00 sh
applprod 18088  0.0  0.0 107904   664 ?        S    08:00   0:00 wc
applprod 19362  0.5  2.6 6126368 1735928 ?     Ssl  Jul13  42:56 java
applprod 19367  0.1  2.1 6026724 1423124 ?     Ssl  Jul13  12:57 java
applprod 19379  0.5  2.5 6065972 1680916 ?     Ssl  Jul13  44:05 java
applprod 19461  0.3  1.2 1754080 816804 ?      Ssl  Jul13  25:48 java
applprod 19485  0.3  1.2 1759756 822428 ?      Ssl  Jul13  24:58 java
applprod 21339  0.0  0.0 113120  1200 ?        Ss   08:15   0:00 sh
applprod 21340  0.0  0.0 113120  1184 ?        S    08:15   0:00 sh
applprod 21341  0.0  0.0 113120   564 ?        S    08:15   0:00 sh
applprod 21342  0.0  0.0 151056  1836 ?        D    08:15   0:00 ps
applprod 21343  0.0  0.0 112648   964 ?        S    08:15   0:00 grep
applprod 21344  0.0  0.0 112648   960 ?        S    08:15   0:00 grep
applprod 21345  0.0  0.0 112648   948 ?        S    08:15   0:00 grep
applprod 21346  0.0  0.0 113484   968 ?        S    08:15   0:00 awk
applprod 21347  0.0  0.0 113484   968 ?        S    08:15   0:00 awk
applprod 21348  0.0  0.0 113484   968 ?        S    08:15   0:00 awk
applprod 21349  0.0  0.0 113120  1192 ?        S    08:15   0:00 sh
applprod 21350  0.0  0.0 107904   672 ?        S    08:15   0:00 wc
applprod 24274  0.0  0.0 113120  1204 ?        Ss   08:30   0:00 sh
applprod 24275  0.0  0.0 113120  1188 ?        S    08:30   0:00 sh
applprod 24277  0.0  0.0 113120   568 ?        S    08:30   0:00 sh
applprod 24279  0.0  0.0 151056  1836 ?        D    08:30   0:00 ps
applprod 24280  0.0  0.0 112648   964 ?        S    08:30   0:00 grep
applprod 24281  0.0  0.0 112648   960 ?        S    08:30   0:00 grep
applprod 24282  0.0  0.0 112648   948 ?        S    08:30   0:00 grep
applprod 24283  0.0  0.0 113484   968 ?        S    08:30   0:00 awk
applprod 24284  0.0  0.0 113484   968 ?        S    08:30   0:00 awk
applprod 24285  0.0  0.0 113484   964 ?        S    08:30   0:00 awk
applprod 24286  0.0  0.0 113120  1196 ?        S    08:30   0:00 sh
applprod 24287  0.0  0.0 107904   664 ?        S    08:30   0:00 wc
applprod 25864  0.0  0.1 2132332 98188 ?       Sl   Jul13   3:24 httpd.worker
applprod 27185  0.0  0.0 113120  1204 ?        Ss   08:45   0:00 sh
applprod 27186  0.0  0.0 113120  1188 ?        S    08:45   0:00 sh
applprod 27187  0.0  0.0 113120   564 ?        S    08:45   0:00 sh
applprod 27188  0.0  0.0 151056  1836 ?        D    08:45   0:00 ps
applprod 27189  0.0  0.0 112648   960 ?        S    08:45   0:00 grep
applprod 27190  0.0  0.0 112648   964 ?        S    08:45   0:00 grep
applprod 27191  0.0  0.0 112648   952 ?        S    08:45   0:00 grep
applprod 27192  0.0  0.0 113484   960 ?        S    08:45   0:00 awk
applprod 27193  0.0  0.0 113484   968 ?        S    08:45   0:00 awk
applprod 27194  0.0  0.0 113484   964 ?        S    08:45   0:00 awk
applprod 27195  0.0  0.0 113120  1192 ?        S    08:45   0:00 sh
applprod 27196  0.0  0.0 107904   668 ?        S    08:45   0:00 wc
applprod 27215  0.0  0.0  21544  9400 ?        S    Jul17   0:02 FNDLIBR
applprod 27833  0.6  4.7 3519672 3139292 ?     Sl   Jul17   9:20 java
applprod 27847  0.2  2.0 3524820 1362872 ?     Sl   Jul17   3:42 java
applprod 27865  0.7  4.8 3522324 3169820 ?     Sl   Jul17   9:46 java
applprod 30350  0.0  0.0 113120  1200 ?        Ss   09:00   0:00 sh
applprod 30351  0.0  0.0 113120  1192 ?        S    09:00   0:00 sh
applprod 30353  0.0  0.0 113120   568 ?        S    09:00   0:00 sh
applprod 30355  0.0  0.0 151056  1836 ?        D    09:00   0:00 ps
applprod 30356  0.0  0.0 112648   960 ?        S    09:00   0:00 grep
applprod 30357  0.0  0.0 112648   964 ?        S    09:00   0:00 grep
applprod 30358  0.0  0.0 112648   952 ?        S    09:00   0:00 grep
applprod 30359  0.0  0.0 113484   964 ?        S    09:00   0:00 awk
applprod 30360  0.0  0.0 113484   968 ?        S    09:00   0:00 awk
applprod 30361  0.0  0.0 113484   956 ?        S    09:00   0:00 awk
applprod 30362  0.0  0.0 113120  1192 ?        S    09:00   0:00 sh
applprod 30363  0.0  0.0 107904   664 ?        S    09:00   0:00 wc
applprod 30827  0.0  0.0 2132332 39032 ?       Sl   Jul16   1:34 httpd.worker
applprod 31573  0.0  0.1 2132332 122828 ?      Sl   Jul13   4:00 httpd.worker
[applprod@node1 ~]$

I thought this might be due to cronjob issue by looking into awk and grep processes above,we have schedule 2 jobs that runs every 15 minutes and check the mount point space and alert us

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: high load average

satish
Dear erman,

What would be the reasons for the processes(ps,FNDLIBR) to go into D state

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: high load average

satish
I found a very good blog on internet and it was very useful,you covered almost every thing in a simple and with little writing

http://ermanarslan.blogspot.com/2014/03/linux-d-state-processes-and-load-average.html

Now i have to understand What might  be the reasons for the processes(ps,FNDLIBR) to go into D state

Thanks
 
Reply | Threaded
Open this post in threaded view
|

Re: high load average

ErmanArslansOracleBlog
Administrator
Good :)
Reply | Threaded
Open this post in threaded view
|

Re: high load average

satish
Dear Erman,

What would be the reasons for the processes(ps,FNDLIBR) to go into D state?

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: high load average

ErmanArslansOracleBlog
Administrator
Probaby I/O.. They are running heavy concurrent requests which do non-stop I/O. (probably)