Hi Erman,
We are doing a weekly bounce of our application server. All of the sudden,apache didnt start on node2 We are using 2 node shared application file system erpprodapp01 and erpprodapp02. We are unable to start apache in second node. Error from opmn.out [2023-05-09T22:57:01+05:30] [opmn] [TRACE:32] [] [internal] Server shut down: status 4000 [2023-05-09T22:57:01+05:30] [opmn] [ERROR:1] [752] [pm-internal] Failed to open locale state file /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/states/.locale (read: No such file or directory) ^C Error from opmn.log [2023-05-09T22:58:35+05:30] [opmn] [TRACE:32] [] [internal] ORACLE_HOME: /u01/PRODAPPS/fs2/FMW_Home/webtier [2023-05-09T22:58:35+05:30] [opmn] [TRACE:32] [] [internal] ORACLE_INSTANCE: /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2 [2023-05-09T22:58:35+05:30] [opmn] [NOTIFICATION:1] [90] [ons-internal] ONS server initiated [2023-05-09T22:58:35+05:30] [opmn] [TRACE:32] [] [internal] Host: erpprodapp02.ttd.com; Remote Port: 6210; Local Port: 6110; Pid: 8003; 11.1.1.9.0 [2023-05-09T22:58:35+05:30] [opmn] [NOTIFICATION:1] [520] [pm-internal] Create pm state directory: /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/states [2023-05-09T22:58:35+05:30] [opmn] [TRACE:1] [526] [pm-internal] PM state file does not exist: /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/states/.opmndat [2023-05-09T22:58:35+05:30] [opmn] [NOTIFICATION:1] [675] [pm-internal] OPMN server ready. Request handling enabled [2023-05-09T22:58:39+05:30] [opmn] [NOTIFICATION:1] [667] [pm-requests] Request 4 Started. Command: /start?ias-component=EBS_web [2023-05-09T22:58:39+05:30] [opmn] [NOTIFICATION:1] [662] [pm-process] Starting Process: EBS_web~OHS~OHS~1 (46341956:0) [2023-05-09T23:00:40+05:30] [opmn] [TRACE:32] [] [pm-monitor] Proc 46341956:8297 Start->timeout->Stop (ready, busy, 0 [run:1683653440:1683653420]) [2023-05-09T23:00:40+05:30] [opmn] [NOTIFICATION:1] [668] [pm-requests] Request 4 Completed. Command: /start?ias-component=EBS_web [2023-05-09T23:00:49+05:30] [opmn] [NOTIFICATION:1] [663] [pm-process] Stopping Process: EBS_web~OHS~OHS~1 (46341956:8297) [2023-05-09T23:01:01+05:30] [opmn] [NOTIFICATION:1] [666] [pm-process] Process Stopped: EBS_web~OHS~OHS~1 (46341956:8297) We stopped opmn on node 2 using adopmnctl.sh stopall,removed states directory from /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn and started opmn->but it failed to start [applprod@erpprodapp02 opmn]$ adopmnctl.sh startall You are running adopmnctl.sh version 120.0.12020000.2 Starting Apache... EXIT CODE is 152. Please check the log file for more details. adopmnctl.sh: exiting with status 152 adopmnctl.sh: check the logfile /u02/PRODINST/fs2/inst/apps/PRODDB_erpprodapp02/logs/appl/admin/log/adopmnctl.txt for more information ... opmnctl startproc: starting opmn managed processes... ================================================================================ opmn id=erpprodapp02.ttd.com:6210 Response: 0 of 1 processes started. ias-instance id=EBS_web_OHS2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -------------------------------------------------------------------------------- ias-component/process-type/process-set: EBS_web/OHS/OHS/ Error --> Process (index=1,uid=46341957,pid=12627) time out while waiting for a managed process to start Log: /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/console~OHS~1.log 05/09/23-23:12:14 :: adapcctl.sh: exiting with status 152 ================================================================================ Please guide. Thanks, Satish |
Administrator
|
Exit code 152.. It seems related with the cached files that Apache uses. So you will still need to clean some files related with opmn. ".opmndat" may be causing this, but we need to be sure.
What do you have in "/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/console~OHS~1.log" + in all related diag files of Apache/OHS? |
Administrator
|
Also ensure you have read/write access (from the applications OS user) for the directory named "u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/states", for all the files located in it.
|
Hi Erman,
We have verified permissions and it is same as node1. Logs are generic. We deleted files under states folder and started opmn,but no luck. We also raised SR but no luck. Any other checks which we can do? -- Log File -- File Name or Source ------------------------- console~OHS~1.log Description -------------- 23/05/13 14:38:08 Start process -------- /u01/PRODAPPS/fs2/FMW_Home/webtier/ohs/bin/apachectl startssl: execing httpd ModSecurity: WARNING Using transformations in SecDefaultAction is deprecated (/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/security2.conf:75). ModSecurity: WARNING Using transformations in SecDefaultAction is deprecated (/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/security2.conf:90). [Sat May 13 14:38:08 2023] [warn] Errors will be logged into /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/EBS_web.log ModSecurity: WARNING Using transformations in SecDefaultAction is deprecated (/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/security2.conf:75). ModSecurity: WARNING Using transformations in SecDefaultAction is deprecated (/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/security2.conf:90). [Sat May 13 14:38:09 2023] [warn] Errors will be logged into /u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/EBS_web.log Audit init File: EBS_web.log [2023-05-16T08:07:13.4282+05:30] [OHS] [WARNING:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10602 still did not exit, sending a SIGTERM [2023-05-16T08:07:15.4296+05:30] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10597 still did not exit, sending a SIGKILL [2023-05-16T08:07:15.4297+05:30] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10598 still did not exit, sending a SIGKILL [2023-05-16T08:07:15.4298+05:30] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10599 still did not exit, sending a SIGKILL [2023-05-16T08:07:15.4300+05:30] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10600 still did not exit, sending a SIGKILL [2023-05-16T08:07:15.4300+05:30] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: erpprodapp02.ttd.com] [host_addr: ***********] [pid: 10568] [tid: 140683997681536] [user: applprod] [VirtualHost: main] child process 10602 still did not exit, sending a SIGKILL When we monitor in another window,we can see the processes opmn and httpd processes being started but again they are going down. Thanks, Satish |
Administrator
|
"VirtualHost: main] child process 10597 still did not exit, sending" -> these may be misleading.. It may be during shutdown.. If the child processes can not shutdown, then the parent httpd process will send a "SIGKILL" to terminate all the processes. As the child processe(s) has not responded to a SIGTERM in the 1st instance this usually means it is busy servicing requests or might have hung... These may be ignored.
But! You have Status 152 during start.. Start apache manually and "strace" it.. use something like this ( you may need to change it according to your env); strace -f -o /tmp/strace.out .apachectl start we need to see the system call before failing. Also, ensure you don't have any active security mechanism (selinux, firewall etc..) that may prevent apache from starting. Ensure you have all the necessary file permissions in place.. Ensure you have OS limits (ulimit) in place for the user that start the apache/OHS. Ensure you don't have any space shortage in the filesystems.. You get status 152.. This one is for 150, but still check -> adapcctl.sh: exiting with status 150 (Doc ID 1106795.1) |
Dear Erman,
Thanks for the update. We have verified permissions and it is same as in node 1. We tried deleting opmn.dat in states folder but no luck. on node1->apache is running and on node2->Apache failed to start. on node1 and node2,firewall is disabled and selinux as well. Please find the below status. [applprod@erpprodapp02 ~]$ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) [applprod@erpprodapp02 ~]$ cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of three two values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted We monitored from other window,processes are getting started but after few seconds,they are going down. status from init to down applprod 18979 0.0 0.0 113444 1664 ? S 08:09 0:00 adapcctl.sh applprod 18994 0.0 0.0 126768 3588 ? S 08:09 0:00 opmnctl applprod 19002 0.0 0.0 80388 14152 ? S 08:09 0:00 opmn applprod 19004 0.8 0.0 154280 31044 ? S 08:09 0:00 httpd.worker applprod 19017 0.0 0.0 30592 1456 ? S 08:10 0:00 odl_rotatelogs applprod 19019 0.0 0.0 30592 1360 ? S 08:10 0:00 odl_rotatelogs applprod 19020 0.0 0.0 30524 1088 ? S 08:10 0:00 rotatelogs applprod 19021 0.0 0.0 30524 1084 ? S 08:10 0:00 rotatelogs applprod 19027 0.0 0.0 30592 1352 ? S 08:10 0:00 odl_rotatelogs applprod 19028 0.0 0.0 285872 19864 ? Sl 08:10 0:00 httpd.worker applprod 19030 0.0 0.0 496404 20524 ? Sl 08:10 0:00 httpd.worker applprod 19032 0.0 0.0 496404 20524 ? Sl 08:10 0:00 httpd.worker applprod 19033 0.0 0.0 496404 20524 ? Sl 08:10 0:00 httpd.worker applprod 19034 0.0 0.0 496404 20528 ? Sl 08:10 0:00 httpd.worker applprod 19036 0.0 0.0 496404 20528 ? Sl 08:10 0:00 httpd.worker applprod 20510 0.0 0.0 155480 1912 pts/5 R+ 08:10 0:00 ps applprod 27990 0.0 0.0 154812 2332 ? S May16 0:00 sshd applprod 27991 0.0 0.0 117096 3800 pts/2 Ss+ May16 0:00 bash You have new mail in /var/spool/mail/applprod [applprod@erpprodapp02 appl]$ adapcctl.sh status You are running adapcctl.sh version 120.0.12020000.6 Checking status of OPMN managed Oracle HTTP Server (OHS) instance ... Processes in Instance: EBS_web_OHS2 ---------------------------------+--------------------+---------+--------- ias-component | process-type | pid | status ---------------------------------+--------------------+---------+--------- EBS_web | OHS | 19004 | Init .... Processes in Instance: EBS_web_OHS2 ---------------------------------+--------------------+---------+--------- ias-component | process-type | pid | status ---------------------------------+--------------------+---------+--------- EBS_web | OHS | 19004 | Down Uploaded strace output. Can you please advice. Thanks, Satishstrace_apache.zip |
Administrator
|
Okay.. It seems child tread(s) can not be started. we need to see what they are doing. We may have issues while opening files required for Apache to run (missing, privileges), creating or updating (log/pid) files (privileges, size of log file hitting 2GB limit) and Memory issues.
1)))) You need to run the strace with "-ff" option. -ff makes that each child process started is logged in separate log file where the <PID> is added to the file name. -ff --follow-forks --output-separately Combine the effects of --follow-forks and --output-separately options. This is incompatible with -c, since no per-process counts are kept. One might want to consider using strace-log-merge(1) to obtain a combined strace log view. An example: strace -o startapache.trc -ff -t $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl startssl -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf & But you have to modify the command according to your env (which is EBS & you use adapcctl.sh to start the apache.. Just don't specify httpd.conf and anything else, just use -ff and -o and give a trace file name to it) 2)))) *And you said "We are unable to start apache in second node".. So just check if there is anything wrong with the second node config.. I mean in terms of Shared Filesystem I/O.. Ensure you are OK/aligned with the Shared Apps Filesystem config of EBS 12.2 (if non shared then check the non-shared multi node config of EBS 12.2..) .. There may be a contention between nodes while reaching some files required by Apache/OHS.. But! again, child thread(s) is failing and we can't see what is their problem unless you specify -ff option to strace. |
Thanks for the update.
As you suggested, we have verified permissions,space,logfile sizes,memory. Node1 and node2 are using shared appltop. Uploaded the strace output for review. Please guide. Thanks, Satishstraceff.zip |
Administrator
|
FINDINGS:
----------- 1) connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused) shutdown(5, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected) close(5) = 0 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 connect(5, {sa_family=AF_INET, sin_port=htons(6110), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused) shutdown(5, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected) ... ...... exit_group(2) = ? +++ exited with 2 +++ 2) futex(0x7f55b79aaea4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f55b79aae78, 14) = 1 futex(0x7f55b79aae78, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0xba5724, FUTEX_WAIT_PRIVATE, 1, NULL) = 0 futex(0xba56f8, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x92054c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {tv_sec=1684324311, tv_nsec=5000000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out) 3) tgkill(28998, 29132, SIGHUP) = 0 tgkill(28998, 29132, SIG_0) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout) tgkill(28998, 29132, SIGHUP) = 0 tgkill(28998, 29132, SIG_0) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout) tgkill(28998, 29132, SIGHUP) = 0 tgkill(28998, 29132, SIG_0) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout) tgkill(28998, 29132, SIGHUP) = 0 .. .... --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28976, si_uid=54321} --- rt_sigreturn({mask=~[ILL TRAP ABRT BUS FPE KILL SEGV USR2 PIPE TERM STOP SYS RTMIN RT_1]}) = -1 EINTR (Interrupted system call) futex(0x7fce16c809d0, FUTEX_WAIT, 29011, NULL) = ? +++ killed by SIGKILL +++ 4) getrlimit(RLIMIT_NOFILE, {rlim_cur=4*1024, rlim_max=64*1024}) = 0 close(3) = -1 EBADF (Bad file descriptor) close(4) = -1 EBADF (Bad file descriptor) close(5) = -1 EBADF (Bad file descriptor) close(6) = -1 EBADF (Bad file descriptor) close(7) = -1 EBADF (Bad file descriptor) close(8) = -1 EBADF (Bad file descriptor) close(9) = -1 EBADF (Bad file descriptor) close(10) = -1 EBADF (Bad file descriptor) close(11) = -1 EBADF (Bad file descriptor) ... ............. 5) access("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/ewallet.p12", F_OK) = -1 ENOENT (No such file or directory) access("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/cwallet.sso", F_OK) = 0 open("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/cwallet.sso", O_RDONLY) = 22 6) clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28993 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28994 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28995 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28996 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28998 write(12, "[2023-05-17T17:21:56.2408+05:30]"..., 338) = 338 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28981 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28982 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28983 wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0 select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) write(13, "!", 1) = 1 wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0 select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) write(13, "!", 1) = 1 wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0 select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) .... ....... ................ wait4(28993, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 wait4(28994, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 wait4(28995, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 wait4(28996, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 wait4(28998, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 .... ................... ............................ kill(28993, SIGTERM) = 0 wait4(28994, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 write(12, "[2023-05-17T17:24:07.4981+05:30]"..., 264) = 264 kill(28994, SIGTERM) = 0 wait4(28995, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 write(12, "[2023-05-17T17:24:07.4984+05:30]"..., 264) = 264 kill(28995, SIGTERM) = 0 wait4(28996, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 write(12, "[2023-05-17T17:24:07.4986+05:30]"..., 264) = 264 kill(28996, SIGTERM) = 0 wait4(28998, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0 7) fstat(25, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0 read(25, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27 close(25) = 0 write(24, "POST /connect HTTP/1.1\r\nVersion:"..., 163) = 163 read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}]) read(24, "POST /status HTTP/1.1\r\nVersion: "..., 2048) = 207 write(24, "POST /subscribe HTTP/1.1\r\nConten"..., 106) = 106 read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}]) read(24, "POST /status HTTP/1.1\r\nVersion: "..., 2048) = 102 futex(0x116d9bc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x116d990, 2) = 1 read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}]) read(24, "POST /event HTTP/1.1\r\norigin: 00"..., 2048) = 571 read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}]) read(24, "SubscriberID: 1\r\n\r\n", 2048) = 19 futex(0x116daf4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x116daf0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) 8) read(5, "# Generated by NetworkManager\nse"..., 4096) = 108 read(5, "", 4096) = 0 close(5) = 0 munmap(0x7f8880605000, 4096) = 0 open("/u01/PRODAPPS/fs2/FMW_Home/webtier/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/FMW_Home/webtier/opmn/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/EBSapps/10.1.2/jdk/jre/lib/i386/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/EBSapps/10.1.2/jdk/jre/lib/i386/server/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/EBSapps/appl/cz/12.0.0/bin/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/EBSapps/10.1.2/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/X11R6/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/u01/PRODAPPS/fs2/EBSapps/appl/sht/12.0.0/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFREG|0644, st_size=118394, ...}) = 0 mmap(NULL, 118394, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7f88805e9000 close(5) = 0 open("/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 5 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260!\0\0\0\0\0\0"..., 832) = 832 fstat(5, {st_mode=S_IFREG|0755, st_size=61560, ...}) = 0 mmap(NULL, 2173048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7f887bdcf000 mprotect(0x7f887bddb000, 2093056, PROT_NONE) = 0 mmap(0x7f887bfda000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0xb000) = 0x7f887bfda000 mmap(0x7f887bfdc000, 22648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f887bfdc000 close(5) = 0 access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory) mprotect(0x7f887bfda000, 4096, PROT_READ) = 0 munmap(0x7f88805e9000, 118394) = 0 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFREG|0644, st_size=603, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8880605000 read(5, "127.0.0.1 localhost localhost."..., 4096) = 603 read(5, "", 4096) = 0 close(5) = 0 munmap(0x7f8880605000, 4096) = 0 socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 5 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 open("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0 read(6, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27 close(6) = 0 write(5, "POST /connect HTTP/1.1\r\nContent-"..., 238) = 238 read(5, "HTTP/1.1 408 Request Time-out\r\nC"..., 8192) = 116 read(5, "<?xml version='1.0' encoding='UT"..., 8192) = 731 write(2, "================================"..., 81) = 81 write(2, "opmn id=erpprodapp02.ttd.com:621"..., 34) = 34 write(2, "Response: 0 of 1 processes start"..., 36) = 36 write(2, "\nias-instance id=EBS_web_OHS2\n", 30) = 30 write(2, "++++++++++++++++++++++++++++++++"..., 81) = 81 write(2, "--------------------------------"..., 81) = 81 write(2, "ias-component/process-type/proce"..., 66) = 66 write(2, "--> Process (index=1,uid=1070952"..., 47) = 47 write(2, " time out while waiting for a m"..., 56) = 56 write(2, " Log:\n /u01/PRODAPPS/fs2/FMW_H"..., 115) = 115 read(5, "", 8192) = 0 shutdown(5, SHUT_RDWR) = 0 close(5) = 0 munmap(0x7f887c1d2000, 266240) = 0 munmap(0x7f8880576000, 303104) = 0 exit_group(408) = ? +++ exited with 152 +++ poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout) read(24, 0x7fce10000b60, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000 <unfinished ...>) = ? +++ killed by SIGKILL +++ 8) fcntl(22, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGHUP {si_signo=SIGHUP, si_code=SI_TKILL, si_pid=28993, si_uid=54321} - 9) access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory) mprotect(0x7f887bfda000, 4096, PROT_READ) = 0 munmap(0x7f88805e9000, 118394) = 0 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFREG|0644, st_size=603, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8880605000 read(5, "127.0.0.1 localhost localhost."..., 4096) = 603 read(5, "", 4096) = 0 close(5) = 0 munmap(0x7f8880605000, 4096) = 0 socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 5 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 open("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0 read(6, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27 close(6) = 0 write(5, "POST /connect HTTP/1.1\r\nContent-"..., 238) = 238 read(5, "HTTP/1.1 408 Request Time-out\r\nC"..., 8192) = 116 read(5, "<?xml version='1.0' encoding='UT"..., 8192) = 731 write(2, "================================"..., 81) = 81 write(2, "opmn id=erpprodapp02.ttd.com:621"..., 34) = 34 write(2, "Response: 0 of 1 processes start"..., 36) = 36 write(2, "\nias-instance id=EBS_web_OHS2\n", 30) = 30 write(2, "++++++++++++++++++++++++++++++++"..., 81) = 81 write(2, "--------------------------------"..., 81) = 81 write(2, "ias-component/process-type/proce"..., 66) = 66 write(2, "--> Process (index=1,uid=1070952"..., 47) = 47 write(2, " time out while waiting for a m"..., 56) = 56 write(2, " Log:\n /u01/PRODAPPS/fs2/FMW_H"..., 115) = 115 read(5, "", 8192) = 0 shutdown(5, SHUT_RDWR) = 0 close(5) = 0 munmap(0x7f887c1d2000, 266240) = 0 munmap(0x7f8880576000, 303104) = 0 exit_group(408) = ? +++ exited with 152 +++ |
This post was updated on .
Dear erman,
Thanks for your time. Any clue of what went wrong from the above trace.We have raised sr but no progress. We have a hope that we will find out a solution here.. Thanks, Satish |
Dear erman,
From you analysis,can we know what might be causing the issue? Thanks, Satish |
Dear erman,
This fixed the issue. Any clue about this in trace files?Can these kind of issues be fixed using strace? We changed parameters in below files. [appiprod@erpprod02 EBS_web]$ pwd /u01/PRODAPPS/fs2/FMW_Home/webtier/ instances/EBS_web_OHS2/config/OHS/EBS_web [applprod@erpprod02 EBS_web]$ grep -i LockFile httpd.conf # mounted filesystem then please read the LockFile documentation available # at <URL:http://httpd.apache.org/docs-2.2/ mod/mpm_common.htm|#lockfile>); # and specify a LockFile on a local filesystem, you will save yourself a lot of trouble. #LockFile "${ORACLE_INSTANCE}/ diagnostics/logs/${COMPONENT_TYPE}/$ {COMPONENT_NAME}/accept.lock" #LockFile "${ORACLE_INSTANCE}/ diagnostics/logs/${COMPONENT_TYPE}/$ {COMPONENT_NAME}/http_lock" #LockFile "${ORACLE_INSTANCE}/ diagnostics/logs/${COMPONENT_TYPE}/$ {COMPONENT_NAME}/http_lock" [applprod@erpprod02 EBS_web]$ grep -i AcceptMutex httpd.conf AcceptMutex sysvsem AcceptMutex sysvsem [appiprod@erpprod EBS_web]$ Thanks, Satish |
Administrator
|
Hi Satish,
I couldn't find time to analyze the strace output, but it seems you fixed the issue. **Is it a cloud instance? Because This is already documented for OCI and that directive should have been there already. But! I think it should be applicable to ON-PREM as well! So, cool solution .. So, under the hood, you made semaphores to be used rather than the lock files. Lock files are not required. (according to the FMW 11.1.1.9 Admin Guide) * Here -> Sharing the Application Tier File System in Oracle E-Business Suite Release 12.2 or 12.1.3 Using the Oracle Cloud Infrastructure File Storage Service (Doc ID 2794300.1) .. And there reference of that MOS note comes from the FMW documented itself (from Oracle HTTP Server 11.1.1.9 Fusion Middleware Administrator's Guide for Oracle HTTP Server) Beginning with the primary application tier node, update the httpd.conf as follows: Launch the Fusion Middle Control. For example, use the following URL: http://<hostname.domain:admin_port>/em Select and edit the httpd.conf file. Update AcceptMutex fcntl to the following AcceptMutex sysvsem (found in two places in the httpd.conf file). Comment out the LockFile directive (found in three places in the httpd.conf file). Save the file and exit the Fusion Middleware Control. Restart the HTTP server for the configuration changes to take effect. Repeat steps 1 through 6 for all secondary nodes in the environment. Note that: We had fnctl in the strace, just before the timeout output is given : fcntl(5, F_SETFD, FD_CLOEXEC) = 0 connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 open("/u01/PRODAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0 Some additional info: The AcceptMutex directives sets the method that Apache uses to serialize multiple children accepting requests on network sockets. sysvsem : Uses SySV-style semaphores to implement the mutex.. fcntl:Uses the fnctl system call to lock the file defined by the LockFile directive Maybe ( I didn't test it), Linux semaphores will need to be manually cleaned up if HTTP Server crashes abnormally. If such a crash happens, you can use ipcs -a to see that, and ipcrm -s to clean them. (but only maybe, I didn't test it... just saying) |
Free forum by Nabble | Edit this page |