Dear Sir,
In my Oracle Linux Env, there is a process which auto generating and taking 98% cpu. many times ive killed it but its generating automatic after killing the process. Custom Oracle forms and reports installed on this server along with Oracle weblogic and 19c database. Ive attached the top command output for your reference- top - 11:24:54 up 200 days, 20:25, 2 users, load average: 6.73, 5.38, 5.29 Tasks: 454 total, 3 running, 338 sleeping, 0 stopped, 0 zombie %Cpu(s): 99.5 us, 0.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st KiB Mem : 65464312 total, 5090044 free, 9776780 used, 50597488 buff/cache KiB Swap: 39063548 total, 39062000 free, 1548 used. 36064516 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19510 applmgr 20 0 2460800 2.3g 4 S 394.7 3.7 21:13.61 -bash Thanks Vikash |
Administrator
|
You need to provide more..
That process is executing bash, but what is it doing.. Check using /proc filesystem, and use ps with related arguments the find the command that is executed by the process. |
Dear Sir,
Please see below- [root@dev proc]# ps -elf |grep 8879 1 S applmgr 8879 1 99 80 0 - 615175 ep_pol 14:21 ? 01:04:25 -bash 0 R root 18229 470 0 80 0 - 28574 - 14:38 pts/0 00:00:00 grep --color=auto 8879 [root@dev proc]# Thanks Vikash |
Administrator
|
That bash is sleeping.. (s state)
1)What is the output of the command "w" ? 2)Execute the following commands and send the outputs; cat /proc/8879/cmdline cat /proc/8879/cwd cd /proc/8879/fd; ls -al |
Dear Sir,
1. [root@test 29289]# w 10:32:08 up 43 days, 21:57, 6 users, load average: 7.03, 7.06, 7.04 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root pts/2 123.252.234.32 10:29 0.00s 0.05s 0.01s w root pts/4 :4 04Apr24 23days 0.07s 30.26s /usr/libexec/gnome-terminal-server root pts/1 :4 03Apr24 3days 0.11s 0.11s bash root pts/6 :4 03Apr24 5days 0.31s 0.05s -bash root pts/3 :1 03Apr24 32days 0.11s 1.28s /usr/libexec/gnome-terminal-server root pts/0 :4 04Apr24 16days 0.48s 0.46s ssh 192.168.0.61 [root@test 29289]# 2. [root@test 29289]# cat cmdline -bash[root@test 29289]# cwd is a directory- [root@test 29289]# cd cwd [root@test cwd]# ls -ltr total 56 drwxr-xr-x. 2 root root 6 Apr 11 2018 srv drwxr-xr-x. 2 root root 6 Apr 11 2018 media drwxr-xr-x. 2 root root 6 Sep 13 2023 TEST lrwxrwxrwx. 1 root root 7 Sep 13 2023 bin -> usr/bin lrwxrwxrwx. 1 root root 9 Sep 13 2023 lib64 -> usr/lib64 lrwxrwxrwx. 1 root root 7 Sep 13 2023 lib -> usr/lib lrwxrwxrwx. 1 root root 8 Sep 13 2023 sbin -> usr/sbin drwxr-xr-x. 7 root root 78 Sep 14 2023 mnt drwxr-xr-x. 14 root root 4096 Jan 19 17:15 usr dr-xr-xr-x. 13 root root 0 Mar 23 12:34 sys drwxr-xr-x. 21 root root 3680 Mar 23 12:35 dev dr-xr-xr-x. 668 root root 0 Mar 23 18:04 proc dr-xr-xr-x. 5 root root 4096 Apr 18 15:42 boot drwxrwxrwx. 6 1005 dba 102 Apr 24 05:56 opt drwxr-xr-x. 187 root root 12288 Apr 24 14:52 etc dr-xr-x---+ 24 root root 4096 Apr 24 18:11 root drwxr-xr-x. 7 root root 78 Apr 25 11:51 home drwxr-xr-x. 24 root root 4096 Apr 30 11:55 var drwxr-xr-x. 11 root root 4096 May 1 13:06 u01 drwxr-xr-x. 7 root root 4096 May 1 14:40 data drwxr-xr-x. 58 root root 1600 May 6 07:03 run drwxrwxrwt. 21 root root 8192 May 6 10:32 tmp [root@test cwd]# [root@test 29289]# cd fd [root@test fd]# ls -la total 0 dr-x------. 2 appsdev dba 0 May 6 06:55 . dr-xr-xr-x. 9 appsdev dba 0 May 6 06:55 .. lr-x------. 1 appsdev dba 64 May 6 06:55 0 -> /dev/null l-wx------. 1 appsdev dba 64 May 6 06:55 1 -> /dev/null lrwx------. 1 appsdev dba 64 May 6 06:55 10 -> anon_inode:[eventfd] lrwx------. 1 appsdev dba 64 May 6 06:55 11 -> anon_inode:[eventfd] lr-x------. 1 appsdev dba 64 May 6 06:55 12 -> /dev/null lrwx------. 1 appsdev dba 64 May 6 06:55 13 -> socket:[1406029245] l-wx------. 1 appsdev dba 64 May 6 06:55 2 -> /dev/null lrwx------. 1 appsdev dba 64 May 6 06:55 3 -> /tmp/.lock lrwx------. 1 appsdev dba 64 May 6 06:55 4 -> anon_inode:[eventpoll] lr-x------. 1 appsdev dba 64 May 6 06:55 5 -> pipe:[1227521575] l-wx------. 1 appsdev dba 64 May 6 06:55 6 -> pipe:[1227521575] lr-x------. 1 appsdev dba 64 May 6 06:55 7 -> pipe:[1227523640] l-wx------. 1 appsdev dba 64 May 6 06:55 8 -> pipe:[1227523640] lrwx------. 1 appsdev dba 64 May 6 06:55 9 -> anon_inode:[eventfd] [root@test fd]# ################top############################# top - 10:38:56 up 43 days, 22:04, 6 users, load average: 6.93, 7.21, 7.14 Tasks: 606 total, 1 running, 494 sleeping, 0 stopped, 0 zombie %Cpu(s): 88.0 us, 11.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.5 hi, 0.0 si, 0.0 st KiB Mem : 65464312 total, 1531280 free, 34196688 used, 29736344 buff/cache KiB Swap: 29298684 total, 29297648 free, 1036 used. 21559320 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29289 appsdev 20 0 2476096 2.3g 4 S 348.8 3.7 29726:22 -bash ############### Thanks Vikash |
Administrator
|
Looks like it has the file descriptors which point to the files owned by appsdev. I guess this is your development application owner ( applmgr of DEV EBS), so I don't think this bash process of yours is important.. It is just waiting there.
We see "ep_poll" there.. It is waiting in the kernel, the epoll -> poll, ppoll - wait for some event on a file descriptor.. Looks like it is somehow locked, waiting for something I/O-related to happen. Send me the output of the following 3 commands ; ls -l /proc/29289/cwd strace -p 29289 lsof -p 29289 |
[appsdev@test 20930]$ ls -l cwd
lrwxrwxrwx. 1 appsdev dba 0 May 6 12:53 cwd -> / [appsdev@test 20930]$ [appsdev@test 20930]$ strace -p 20930 strace: Process 20930 attached epoll_pwait(4, [], 1024, 182, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802790, tv_nsec=652151298}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802790, tv_nsec=652213801}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802790, tv_nsec=652270236}) = 0 epoll_pwait(4, [], 1024, 220, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802790, tv_nsec=873128134}) = 0 epoll_pwait(4, [], 1024, 279, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156134222}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156161769}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156206343}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156243307}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156282947}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=156317821}) = 0 epoll_pwait(4, [], 1024, 500, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=661157441}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=661210880}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=661232881}) = 0 epoll_pwait(4, [], 1024, 212, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802791, tv_nsec=874128337}) = 0 epoll_pwait(4, [], 1024, 287, NULL, 8) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802792, tv_nsec=163136044}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802792, tv_nsec=163162663}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=3802792, tv_nsec=163209900}) = 0 epoll_pwait(4, ^Cstrace: Process 20930 detached <detached ...> [appsdev@test 20930]$ [appsdev@test 20930]$ lsof -p 20930 lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing Output information may be incomplete. lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/0/gvfs Output information may be incomplete. lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/0/gvfs Output information may be incomplete. lsof: WARNING: can't stat() fuse file system /run/user/0/doc Output information may be incomplete. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME -bash 20930 appsdev cwd DIR 252,0 4096 128 / -bash 20930 appsdev rtd DIR 252,0 4096 128 / -bash 20930 appsdev txt REG 252,5 2365672 167 /tmp/-bash (deleted) -bash 20930 appsdev 0r CHR 1,3 0t0 2059 /dev/null -bash 20930 appsdev 1w CHR 1,3 0t0 2059 /dev/null -bash 20930 appsdev 2w CHR 1,3 0t0 2059 /dev/null -bash 20930 appsdev 3u REG 252,5 0 168 /tmp/.lock -bash 20930 appsdev 4u a_inode 0,14 0 11741 [eventpoll] -bash 20930 appsdev 5r FIFO 0,13 0t0 2070640197 pipe -bash 20930 appsdev 6w FIFO 0,13 0t0 2070640197 pipe -bash 20930 appsdev 7r FIFO 0,13 0t0 2070641666 pipe -bash 20930 appsdev 8w FIFO 0,13 0t0 2070641666 pipe -bash 20930 appsdev 9u a_inode 0,14 0 11741 [eventfd] -bash 20930 appsdev 10u a_inode 0,14 0 11741 [eventfd] -bash 20930 appsdev 11u a_inode 0,14 0 11741 [eventfd] -bash 20930 appsdev 12r CHR 1,3 0t0 2059 /dev/null -bash 20930 appsdev 13u IPv4 2070641667 0t0 TCP test.xyz.com:apani1->sable-lamp.aeza.network:http (ESTABLISHED) [appsdev@test 20930]$ |
Administrator
|
This seems related with the dev env..
That process seems blocked while doing a general IO call? What is this ? test.xyz.com:apani1->sable-lamp.aeza.network:http Seems like the process is trying to reach somewhere using http.. Is that dest server okay? This might be the problem.. Bytheway, I don't see an important situation there.. All the file descriptors belong to the appsdev, so seems killable, but the risk is yours. |
Dear Sir,
Yes...This is dev env. this is hostname of the machine(xyz is replaced as it denotes company name) test.xyz.com:apani1->sable-lamp.aeza.network:http I've killed them...but its generating again and again automatically. Thanks Vikash |
Administrator
|
Okay.. We don't have required info yet.
Unknown process with -bash not showing it: This process might be a child process spawned by the bash shell itself, or another system service running in the background. 4 -> anon_inode:[eventpoll]: This indicates the process is using an event poll mechanism to monitor events from various sources. 9 -> anon_inode:[eventfd]: This suggests the process might be using an eventfd for efficient inter-process communication or signaling. It is probably a OS process.. Probably, OS or a daemon starts it.. It may belong to a monitoring process such as systemd-monitor. *Use ps aux or pstree to get a detailed listing of running processes. Look for processes with a parent process ID (PPID) matching the bash shell (bash). |
Here, i can see some extra process...what are these?
[appsdev@test ~]$ ps -u appsdev PID TTY TIME CMD 8049 pts/2 00:00:00 bash 8274 pts/2 00:00:00 ps 8275 pts/2 00:00:00 ps 10111 ? 00:16:52 klibsystem5.sys-----?? 10139 ? 00:00:00 -python3-------------?? 16225 ? 00:00:20 httpd 20930 ? 14:28:35 -bash-------------this is the -bash process...we are talking abt 23515 pts/6 00:00:00 bash 27547 ? 00:00:00 startWebLogic.s 27596 ? 00:14:01 java 27597 ? 01:52:27 java 27919 ? 00:00:00 sh 27920 ? 00:00:00 startNodeManage 27966 ? 01:45:26 java 28340 ? 00:00:00 startWebLogic.s 28360 ? 00:00:00 startWebLogic.s 28414 ? 02:02:03 java 28462 ? 01:54:29 java 29681 ? 00:00:06 httpd 29682 ? 00:00:05 httpd 29756 ? 00:00:00 httpd 29766 ? 00:00:10 httpd [appsdev@test ~]$ |
Administrator
|
*Use ps aux or pstree to get a detailed listing of running processes. Look for processes with a parent process ID (PPID) matching the bash shell (bash).
|
Administrator
|
Okay . This may be related with the gnome-terminal.. (─gnome-terminal-─┬─bash───su───bash)
GNOME Terminal is a terminal emulator for the GNOME desktop environment.. So maybe a terminal in GUI is open and that's why you get that bash.. (this is just a guess -- we don't have anything else in our hands to comment..) Can you close these gnome terminals, and then kill that bash? and see whether or not it will started again? |
Administrator
|
So you closed all the terminal application that are running on Gnome / Linux GUI, and still the bash is there, right?
Lets you pstree with p argument.. pstree -p This will display the process ids.. Lets be sure exactly where in pstree output corresponds to that problematic bash... check the process id of that bash using ps -ef.. Note that down. Then check with pstree -p and see the parents of it.. Then we will plan our next action. |
Yes....
[appsdev@test ~]$ pstree -p 26758 -bash(26758)─┬─{-bash}(26759) ├─{-bash}(26760) ├─{-bash}(26761) ├─{-bash}(26762) ├─{-bash}(26763) ├─{-bash}(26794) ├─{-bash}(26795) ├─{-bash}(26796) └─{-bash}(26797) [appsdev@test ~]$ |
Administrator
|
What is your OS distribution and version? (For ex: Oracle Linux 7.5)
|
Administrator
|
Background information: (for the strace output)
----------------------------------- The strace output confirms the bash process is stuck in the epoll_pwait system call. This indicates it's waiting for events from an epoll instance. Here's how to interpret the output and troubleshoot further: Understanding epoll_pwait: epoll_pwait: This system call waits for events on an epoll instance. It's a mechanism for efficient I/O waiting in applications. The arguments to epoll_pwait specify the epoll instance, timeout values, and number of events to wait for. Analysis of strace Output: The process repeatedly calls epoll_pwait with a timeout (values like 182, 220, etc.). Between calls, it uses clock_gettime to get the current time. This suggests the process isn't receiving expected events and keeps waiting with timeouts. |
Administrator
|
Did you checked your cron jobs and systemd services? -- bash may be executed by something from there.
crontab -l systemd services with systemctl list-unit-files and systemctl status <service_name>. |
Free forum by Nabble | Edit this page |