This post was updated on .
Dear Erman,
Today we have faced a situation where we could see both our linux servers(running R12.2 apps) had high load average around 60%.we have verified if any apps process was consuming CPU but none of them are consuming CPU.we could see idle CPU was 99% in both the linux servers. when we try to open application,we are getting gateway timeout error.we are not able to run the commands on server and finally we have done the reboot.Later the load average was normal and application worked as usual How the load average is high when CPU was idle?is this normal?can you please let us know what all to check.because we didnt find any application process that was consuming high cpu or memory. collected during the issue: [applprod@node1 ~]$ sar -u 2 5 Linux 3.10.0-514.el7.x86_64 (erpprodapp01.ttd.com) Wednesday 18 July 2018 _x86_64_ (8 CPU) 10:24:32 IST CPU %user %nice %system %iowait %steal %idle 10:24:34 IST all 0.19 0.00 0.13 0.00 0.00 99.69 10:24:36 IST all 0.19 0.00 0.13 0.00 0.00 99.69 10:24:38 IST all 0.06 0.00 0.19 0.00 0.00 99.75 10:24:40 IST all 0.19 0.00 0.13 0.00 0.00 99.69 10:24:42 IST all 0.19 0.00 0.13 0.00 0.00 99.69 Average: all 0.16 0.00 0.14 0.00 0.00 99.70 [applprod@node1 ~]$ top top - 10:16:48 up 85 days, 21:50, 4 users, load average: 54.83, 54.09, 52.72 Tasks: 541 total, 1 running, 540 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 65673132 total, 2327608 free, 18032896 used, 45312628 buff/cache KiB Swap: 26843545+total, 26832688+free, 108564 used. 44360980 avail Mem Thanks for all the support here |
Administrator
|
I think you should read more about Linux's Load average..
It is not required to have a high CPU load , in order to have a High load average. Network activity and blocked processes can also cause high load average (without increasing the CPU usage) So , this load average you see may be caused by some blocked processes (probably around 50 of them) Check your NFS mount points(if there are any) , check all the processes ("D" states, "Z" states"), check your network activity, and you will find the reason. |
Thanks for the update erman.
Processes running during the issue: ====================== [applprod@node1 ~]$ ps ucx USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND applprod 913 0.0 0.0 113120 1208 ? Ss 09:15 0:00 sh applprod 914 0.0 0.0 113120 1188 ? S 09:15 0:00 sh applprod 915 0.0 0.0 113120 564 ? S 09:15 0:00 sh applprod 916 0.0 0.0 151056 1840 ? D 09:15 0:00 ps applprod 917 0.0 0.0 112652 964 ? S 09:15 0:00 grep applprod 918 0.0 0.0 112648 964 ? S 09:15 0:00 grep applprod 919 0.0 0.0 112648 948 ? S 09:15 0:00 grep applprod 920 0.0 0.0 113484 968 ? S 09:15 0:00 awk applprod 921 0.0 0.0 113484 964 ? S 09:15 0:00 awk applprod 922 0.0 0.0 113484 964 ? S 09:15 0:00 awk applprod 923 0.0 0.0 113120 1192 ? S 09:15 0:00 sh applprod 924 0.0 0.0 107904 664 ? S 09:15 0:00 wc applprod 1189 0.0 0.0 113284 1612 ? S Jul16 0:00 sh applprod 1194 0.0 0.0 22556 10080 ? S Jul16 0:27 FNDLIBR applprod 1245 0.0 0.0 20040 7316 ? Ss Jul16 0:01 FNDSM applprod 1643 0.0 0.0 20216 7132 ? S Jul16 0:01 FNDIMON applprod 1644 0.0 0.0 22912 9484 ? S Jul16 0:01 RCVOLTM applprod 1645 0.0 0.0 21720 7216 ? S Jul16 0:00 POXCON applprod 1646 0.0 0.0 22492 7216 ? S Jul16 0:01 INCTM applprod 1648 0.0 0.3 658288 235728 ? Sl Jul16 2:03 java applprod 1667 0.0 0.3 648896 232088 ? Sl Jul16 1:31 java applprod 1690 0.0 0.2 658440 159612 ? Sl Jul16 1:40 java applprod 1701 0.0 0.0 20868 7892 ? S Jul16 0:15 FNDCRM applprod 1707 0.0 0.0 21308 8176 ? S Jul16 0:02 FNDSCH applprod 1716 0.0 0.0 21688 9304 ? S Jul16 0:00 FNDLIBR applprod 1717 0.0 0.0 22904 8540 ? S Jul16 0:00 INVLIBR applprod 1718 0.0 0.0 20376 7164 ? S Jul16 0:00 PALIBR applprod 1719 0.0 0.0 20328 7572 ? S Jul16 0:00 FNDLIBR applprod 1720 0.0 0.0 21656 9588 ? S Jul16 0:03 FNDLIBR applprod 1722 0.0 0.0 21728 9508 ? D Jul16 0:04 FNDLIBR applprod 1723 0.0 0.0 21840 9648 ? S Jul16 0:05 FNDLIBR applprod 1724 0.0 0.0 21588 9452 ? D Jul16 0:03 FNDLIBR applprod 1727 0.0 0.0 21596 9580 ? S Jul16 0:04 FNDLIBR applprod 1729 0.0 0.0 21828 9632 ? D Jul16 0:05 FNDLIBR applprod 1730 0.0 0.0 21612 9636 ? S Jul16 0:03 FNDLIBR applprod 1731 0.0 0.0 21724 9592 ? S Jul16 0:04 FNDLIBR applprod 1732 0.0 0.0 21760 9532 ? D Jul16 0:04 FNDLIBR applprod 1737 0.0 0.0 21764 9616 ? S Jul16 0:04 FNDLIBR applprod 1738 0.0 0.0 21808 9612 ? D Jul16 0:04 FNDLIBR applprod 1739 0.0 0.0 21612 9440 ? S Jul16 0:04 FNDLIBR applprod 1742 0.0 0.0 21704 9588 ? S Jul16 0:04 FNDLIBR applprod 1743 0.0 0.0 21820 9700 ? D Jul16 0:05 FNDLIBR applprod 1744 0.0 0.0 21376 9236 ? S Jul16 0:04 FNDLIBR applprod 1745 0.0 0.0 21664 9488 ? D Jul16 0:04 FNDLIBR applprod 1746 0.0 0.0 21912 9816 ? D Jul16 0:05 FNDLIBR applprod 1752 0.0 0.0 21488 9312 ? D Jul16 0:03 FNDLIBR applprod 1754 0.0 0.0 21780 9636 ? S Jul16 0:05 FNDLIBR applprod 1756 0.0 0.0 21672 9496 ? S Jul16 0:04 FNDLIBR applprod 1759 0.0 0.0 21792 9728 ? S Jul16 0:04 FNDLIBR applprod 1760 0.0 0.0 21764 9716 ? S Jul16 0:04 FNDLIBR applprod 1762 0.0 0.0 20332 7576 ? S Jul16 0:00 FNDLIBR applprod 2625 0.0 0.0 21636 9520 ? S Jul17 0:04 FNDLIBR applprod 2626 0.0 0.0 21860 9648 ? S Jul17 0:04 FNDLIBR applprod 2628 0.0 0.0 21512 9392 ? S Jul17 0:03 FNDLIBR applprod 2634 0.0 0.0 21680 9592 ? D Jul17 0:05 FNDLIBR applprod 2636 0.0 0.0 21336 9348 ? D Jul17 0:04 FNDLIBR applprod 2638 0.0 0.0 21896 9772 ? S Jul17 0:03 FNDLIBR applprod 4117 0.0 0.0 113120 1208 ? Ss 09:30 0:00 sh applprod 4118 0.0 0.0 113120 1188 ? S 09:30 0:00 sh applprod 4120 0.0 0.0 113120 568 ? S 09:30 0:00 sh applprod 4121 0.0 0.0 151056 1840 ? D 09:30 0:00 ps applprod 4122 0.0 0.0 112652 960 ? S 09:30 0:00 grep applprod 4123 0.0 0.0 112648 964 ? S 09:30 0:00 grep applprod 4124 0.0 0.0 112648 948 ? S 09:30 0:00 grep applprod 4126 0.0 0.0 113484 968 ? S 09:30 0:00 awk applprod 4127 0.0 0.0 113484 968 ? S 09:30 0:00 awk applprod 4128 0.0 0.0 113484 964 ? S 09:30 0:00 awk applprod 4129 0.0 0.0 113120 1196 ? S 09:30 0:00 sh applprod 4130 0.0 0.0 107904 672 ? S 09:30 0:00 wc applprod 6114 0.0 0.0 151808 9392 ? D 09:39 0:00 httpd.worker applprod 7043 0.0 0.0 113120 1204 ? Ss 09:45 0:00 sh applprod 7044 0.0 0.0 113120 1188 ? S 09:45 0:00 sh applprod 7045 0.0 0.0 113120 564 ? S 09:45 0:00 sh applprod 7046 0.0 0.0 151056 1840 ? D 09:45 0:00 ps applprod 7047 0.0 0.0 112652 960 ? S 09:45 0:00 grep applprod 7048 0.0 0.0 112648 960 ? S 09:45 0:00 grep applprod 7049 0.0 0.0 112648 952 ? S 09:45 0:00 grep applprod 7050 0.0 0.0 113484 968 ? S 09:45 0:00 awk applprod 7051 0.0 0.0 113484 964 ? S 09:45 0:00 awk applprod 7052 0.0 0.0 113484 964 ? S 09:45 0:00 awk applprod 7053 0.0 0.0 113120 1192 ? S 09:45 0:00 sh applprod 7054 0.0 0.0 107904 664 ? S 09:45 0:00 wc applprod 9986 0.0 0.0 113120 1208 ? Ss 10:00 0:00 sh applprod 9989 0.0 0.0 113120 1184 ? S 10:00 0:00 sh applprod 9993 0.0 0.0 113120 564 ? S 10:00 0:00 sh applprod 9994 0.0 0.0 151056 1844 ? D 10:00 0:00 ps applprod 9995 0.0 0.0 112652 964 ? S 10:00 0:00 grep applprod 9996 0.0 0.0 112648 960 ? S 10:00 0:00 grep applprod 9997 0.0 0.0 112648 948 ? S 10:00 0:00 grep applprod 9999 0.0 0.0 113484 964 ? S 10:00 0:00 awk applprod 10000 0.0 0.0 113484 964 ? S 10:00 0:00 awk applprod 10001 0.0 0.0 113484 964 ? S 10:00 0:00 awk applprod 10002 0.0 0.0 113120 1192 ? S 10:00 0:00 sh applprod 10004 0.0 0.0 107904 664 ? S 10:00 0:00 wc applprod 12953 0.0 0.0 113120 1204 ? Ss 10:15 0:00 sh applprod 12954 0.0 0.0 113120 1192 ? S 10:15 0:00 sh applprod 12955 0.0 0.0 113120 568 ? S 10:15 0:00 sh applprod 12956 0.0 0.0 151056 1840 ? D 10:15 0:00 ps applprod 12957 0.0 0.0 112652 964 ? S 10:15 0:00 grep applprod 12958 0.0 0.0 112648 960 ? S 10:15 0:00 grep applprod 12959 0.0 0.0 112648 952 ? S 10:15 0:00 grep applprod 12960 0.0 0.0 113484 968 ? S 10:15 0:00 awk applprod 12961 0.0 0.0 113484 964 ? S 10:15 0:00 awk applprod 12962 0.0 0.0 113484 964 ? S 10:15 0:00 awk applprod 12963 0.0 0.0 113120 1192 ? S 10:15 0:00 sh applprod 12964 0.0 0.0 107904 672 ? S 10:15 0:00 wc applprod 13299 1.5 0.0 116808 3440 pts/2 S 10:16 0:00 bash applprod 13447 0.0 0.0 151084 1864 pts/2 R+ 10:16 0:00 ps applprod 16162 0.1 0.4 19287600 313440 ? Sl Jul13 12:36 java applprod 16205 0.2 1.6 6083392 1070484 ? Ssl Jul13 19:30 java applprod 16530 0.0 0.0 140744 11672 ? S Jul13 0:00 perl applprod 16646 0.0 0.1 471736 115820 ? Sl Jul13 5:41 java applprod 16858 0.0 0.0 72976 2904 ? Ss Jul13 0:00 opmn applprod 16859 0.0 0.0 1230164 12752 ? Sl Jul13 0:36 opmn applprod 16913 0.0 0.0 21324 9044 ? D Jul17 0:01 FNDLIBR applprod 17158 0.0 0.0 151808 19800 ? S Jul13 0:13 httpd.worker applprod 17201 0.0 0.0 30532 1032 ? S Jul13 0:00 odl_rotatelogs applprod 17202 0.0 0.0 30532 868 ? S Jul13 0:34 odl_rotatelogs applprod 17203 0.0 0.0 30464 704 ? S Jul13 0:00 rotatelogs applprod 17204 0.0 0.0 30464 708 ? S Jul13 0:00 rotatelogs applprod 17285 0.0 0.0 17744 4564 ? Ss Jul13 0:03 tnslsnr applprod 17317 0.0 0.0 30532 964 ? S Jul13 0:00 odl_rotatelogs applprod 17318 0.0 0.0 349196 9560 ? Sl Jul13 0:04 httpd.worker applprod 18069 0.0 0.0 113120 1208 ? Ss 08:00 0:00 sh applprod 18074 0.0 0.0 113120 1192 ? S 08:00 0:00 sh applprod 18078 0.0 0.0 113120 564 ? S 08:00 0:00 sh applprod 18079 0.0 0.0 151056 1836 ? D 08:00 0:00 ps applprod 18080 0.0 0.0 112648 964 ? S 08:00 0:00 grep applprod 18081 0.0 0.0 112648 964 ? S 08:00 0:00 grep applprod 18082 0.0 0.0 112648 948 ? S 08:00 0:00 grep applprod 18084 0.0 0.0 113484 964 ? S 08:00 0:00 awk applprod 18085 0.0 0.0 113484 968 ? S 08:00 0:00 awk applprod 18086 0.0 0.0 113484 964 ? S 08:00 0:00 awk applprod 18087 0.0 0.0 113120 1192 ? S 08:00 0:00 sh applprod 18088 0.0 0.0 107904 664 ? S 08:00 0:00 wc applprod 19362 0.5 2.6 6126368 1735928 ? Ssl Jul13 42:56 java applprod 19367 0.1 2.1 6026724 1423124 ? Ssl Jul13 12:57 java applprod 19379 0.5 2.5 6065972 1680916 ? Ssl Jul13 44:05 java applprod 19461 0.3 1.2 1754080 816804 ? Ssl Jul13 25:48 java applprod 19485 0.3 1.2 1759756 822428 ? Ssl Jul13 24:58 java applprod 21339 0.0 0.0 113120 1200 ? Ss 08:15 0:00 sh applprod 21340 0.0 0.0 113120 1184 ? S 08:15 0:00 sh applprod 21341 0.0 0.0 113120 564 ? S 08:15 0:00 sh applprod 21342 0.0 0.0 151056 1836 ? D 08:15 0:00 ps applprod 21343 0.0 0.0 112648 964 ? S 08:15 0:00 grep applprod 21344 0.0 0.0 112648 960 ? S 08:15 0:00 grep applprod 21345 0.0 0.0 112648 948 ? S 08:15 0:00 grep applprod 21346 0.0 0.0 113484 968 ? S 08:15 0:00 awk applprod 21347 0.0 0.0 113484 968 ? S 08:15 0:00 awk applprod 21348 0.0 0.0 113484 968 ? S 08:15 0:00 awk applprod 21349 0.0 0.0 113120 1192 ? S 08:15 0:00 sh applprod 21350 0.0 0.0 107904 672 ? S 08:15 0:00 wc applprod 24274 0.0 0.0 113120 1204 ? Ss 08:30 0:00 sh applprod 24275 0.0 0.0 113120 1188 ? S 08:30 0:00 sh applprod 24277 0.0 0.0 113120 568 ? S 08:30 0:00 sh applprod 24279 0.0 0.0 151056 1836 ? D 08:30 0:00 ps applprod 24280 0.0 0.0 112648 964 ? S 08:30 0:00 grep applprod 24281 0.0 0.0 112648 960 ? S 08:30 0:00 grep applprod 24282 0.0 0.0 112648 948 ? S 08:30 0:00 grep applprod 24283 0.0 0.0 113484 968 ? S 08:30 0:00 awk applprod 24284 0.0 0.0 113484 968 ? S 08:30 0:00 awk applprod 24285 0.0 0.0 113484 964 ? S 08:30 0:00 awk applprod 24286 0.0 0.0 113120 1196 ? S 08:30 0:00 sh applprod 24287 0.0 0.0 107904 664 ? S 08:30 0:00 wc applprod 25864 0.0 0.1 2132332 98188 ? Sl Jul13 3:24 httpd.worker applprod 27185 0.0 0.0 113120 1204 ? Ss 08:45 0:00 sh applprod 27186 0.0 0.0 113120 1188 ? S 08:45 0:00 sh applprod 27187 0.0 0.0 113120 564 ? S 08:45 0:00 sh applprod 27188 0.0 0.0 151056 1836 ? D 08:45 0:00 ps applprod 27189 0.0 0.0 112648 960 ? S 08:45 0:00 grep applprod 27190 0.0 0.0 112648 964 ? S 08:45 0:00 grep applprod 27191 0.0 0.0 112648 952 ? S 08:45 0:00 grep applprod 27192 0.0 0.0 113484 960 ? S 08:45 0:00 awk applprod 27193 0.0 0.0 113484 968 ? S 08:45 0:00 awk applprod 27194 0.0 0.0 113484 964 ? S 08:45 0:00 awk applprod 27195 0.0 0.0 113120 1192 ? S 08:45 0:00 sh applprod 27196 0.0 0.0 107904 668 ? S 08:45 0:00 wc applprod 27215 0.0 0.0 21544 9400 ? S Jul17 0:02 FNDLIBR applprod 27833 0.6 4.7 3519672 3139292 ? Sl Jul17 9:20 java applprod 27847 0.2 2.0 3524820 1362872 ? Sl Jul17 3:42 java applprod 27865 0.7 4.8 3522324 3169820 ? Sl Jul17 9:46 java applprod 30350 0.0 0.0 113120 1200 ? Ss 09:00 0:00 sh applprod 30351 0.0 0.0 113120 1192 ? S 09:00 0:00 sh applprod 30353 0.0 0.0 113120 568 ? S 09:00 0:00 sh applprod 30355 0.0 0.0 151056 1836 ? D 09:00 0:00 ps applprod 30356 0.0 0.0 112648 960 ? S 09:00 0:00 grep applprod 30357 0.0 0.0 112648 964 ? S 09:00 0:00 grep applprod 30358 0.0 0.0 112648 952 ? S 09:00 0:00 grep applprod 30359 0.0 0.0 113484 964 ? S 09:00 0:00 awk applprod 30360 0.0 0.0 113484 968 ? S 09:00 0:00 awk applprod 30361 0.0 0.0 113484 956 ? S 09:00 0:00 awk applprod 30362 0.0 0.0 113120 1192 ? S 09:00 0:00 sh applprod 30363 0.0 0.0 107904 664 ? S 09:00 0:00 wc applprod 30827 0.0 0.0 2132332 39032 ? Sl Jul16 1:34 httpd.worker applprod 31573 0.0 0.1 2132332 122828 ? Sl Jul13 4:00 httpd.worker [applprod@node1 ~]$ I thought this might be due to cronjob issue by looking into awk and grep processes above,we have schedule 2 jobs that runs every 15 minutes and check the mount point space and alert us Thank you |
Dear erman,
What would be the reasons for the processes(ps,FNDLIBR) to go into D state Thank you |
I found a very good blog on internet and it was very useful,you covered almost every thing in a simple and with little writing
http://ermanarslan.blogspot.com/2014/03/linux-d-state-processes-and-load-average.html Now i have to understand What might be the reasons for the processes(ps,FNDLIBR) to go into D state Thanks |
Administrator
|
Good :)
|
Dear Erman,
What would be the reasons for the processes(ps,FNDLIBR) to go into D state? Thank you |
Administrator
|
Probaby I/O.. They are running heavy concurrent requests which do non-stop I/O. (probably)
|
Free forum by Nabble | Edit this page |