RAC(ASM) 11g
Hi Erman, I have started cluster for a node. It is completing with error ORA11g_DB2>./crsctl start cluster CRS-2672: Attempting to start 'ora.cssd' on 'rhis-cr-0613-04' CRS-2672: Attempting to start 'ora.diskmon' on 'rhis-cr-0613-04' CRS-2676: Start of 'ora.diskmon' on 'rhis-cr-0613-04' succeeded CRS-2674: Start of 'ora.cssd' on 'rhis-cr-0613-04' failed CRS-2679: Attempting to clean 'ora.cssd' on 'rhis-cr-0613-04' CRS-2681: Clean of 'ora.cssd' on 'rhis-cr-0613-04' succeeded CRS-5804: Communication error with agent process CRS-2672: Attempting to start 'ora.cssd' on 'rhis-cr-0613-04' CRS-2672: Attempting to start 'ora.diskmon' on 'rhis-cr-0613-04' CRS-2676: Start of 'ora.diskmon' on 'rhis-cr-0613-04' succeeded CRS-2674: Start of 'ora.cssd' on 'rhis-cr-0613-04' failed CRS-2679: Attempting to clean 'ora.cssd' on 'rhis-cr-0613-04' CRS-2681: Clean of 'ora.cssd' on 'rhis-cr-0613-04' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rhis-cr-0613-04' CRS-2672: Attempting to start 'ora.diskmon' on 'rhis-cr-0613-04' CRS-5804: Communication error with agent process CRS-2676: Start of 'ora.diskmon' on 'rhis-cr-0613-04' succeeded CRS-2674: Start of 'ora.cssd' on 'rhis-cr-0613-04' failed CRS-2679: Attempting to clean 'ora.cssd' on 'rhis-cr-0613-04' CRS-2681: Clean of 'ora.cssd' on 'rhis-cr-0613-04' succeeded CRS-5804: Communication error with agent process CRS-4000: Command Start failed, or completed with errors. ORA11g_DB2>id I have checked ossd logfile; 2017-06-27 11:51:13.112: [ CSSD][1100745024]clssnmvDHBValidateNCopy: node 1, rhis-cr-0613-03, has a disk HB, but no network HB, DHB has rcfg 266426551, wrtcnt, 126894370, LATS 1054544, lastSeqNo 126894369, uniqueness 1469772618, timestamp 1498549872/3050568238 ORA11g_DB2>./crsctl start crs CRS-4640: Oracle High Availability Services is already active CRS-4000: Command Start failed, or completed with errors. Please advise, Regards |
Administrator
|
Normally, if ohasd.bin is already up, CRS-4640 will be reported if another start up attempt is made.
However; in this case, it is obvious that you have a communication problem. As you see that network related error in CSSD logs, then it is better to solve it first. (it is seen in "crsctl start cluster output", as well.) You are facing problem almost in the first stages of a grid start.. (you are failing while communicating cssdagent - Agent responsible for spawning CSSD.THis means you are failing while starting CSSD.) Do you have problem with your private interconnect interfaces? Please check those. (check if they are up, check if they are up with correct IP addresses) Check this note as well: GI Fails to Start as no Private Network Interface is Available (Doc ID 1481176.1) Send me the logs, and the state of your private network interfaces for further diagnostics. Note that: Your Cssd agent may not be running or can not be started as well. You should check OHASD logs, as well, if it is the case. (OHASD spawns cssd agent) |
Hi,
please find attached some tests I did both on working and issue node privateIPtests.txt From working node ping 192.168.124.61 PING 192.168.124.61 (192.168.124.61) 56(84) bytes of data. 64 bytes from 192.168.124.61: icmp_seq=1 ttl=64 time=0.168 ms 64 bytes from 192.168.124.61: icmp_seq=2 ttl=64 time=0.196 ms --- 192.168.124.61 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.168/0.182/0.196/0.014 ms [root@RHIS-CR-0613-03 ~]# traceroute 192.168.124.61 traceroute to 192.168.124.61 (192.168.124.61), 30 hops max, 40 byte packets 1 rac2-priv (192.168.124.61) 0.126 ms 0.098 ms 0.088 ms From issue node: # ping 192.168.124.60 PING 192.168.124.60 (192.168.124.60) 56(84) bytes of data. 64 bytes from 192.168.124.60: icmp_seq=1 ttl=64 time=0.095 ms 64 bytes from 192.168.124.60: icmp_seq=2 ttl=64 time=0.165 ms traceroute 192.168.124.60 traceroute to 192.168.124.60 (192.168.124.60), 30 hops max, 40 byte packets 1 rac1-priv (192.168.124.60) 0.213 ms 0.173 ms 0.160 ms |
Extract from log file - OHASD
017-06-27 13:57:03.768: [ CRSPE][1150110016] {0:48:4} CRS-2676: Start of 'ora.cssdmonitor' on 'rhis-cr-0613-04' succeeded 2017-06-27 13:57:03.768: [ CRSPE][1150110016] {0:48:4} PE Command [ Resource State Change (ora.cssdmonitor 1 1) : 0x19e54410 ] has completed 2017-06-27 13:57:03.769: [ AGFW][1139603776] {0:48:4} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:2540 2017-06-27 13:57:03.769: [ AGFW][1139603776] {0:48:4} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:2540 2017-06-27 13:57:03.769: [ AGFW][1139603776] {0:48:4} Agfw received reply from PE for resource state change for ora.cssdmonitor 1 1 2017-06-27 13:57:03.774: [ AGFW][1139603776] {0:52:2} Received the reply to the message: RESTYPE_ADD[ora.cssd.type] ID 8196:2533 from the agent /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root 2017-06-27 13:57:03.776: [ AGFW][1139603776] {0:52:2} Received the reply to the message: RESOURCE_ADD[ora.cssd 1 1] ID 4356:2534 from the agent /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root 2017-06-27 13:57:03.776: [ AGFW][1139603776] {0:43:4} Received the reply to the message: RESOURCE_CLEAN[ora.cssd 1 1] ID 4100:2535 from the agent /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root 2017-06-27 13:57:03.776: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.cssd 1 1] ID 4100:2512 2017-06-27 13:57:03.777: [ CRSPE][1150110016] {0:43:4} Received reply to action [Clean] message ID: 2512 2017-06-27 13:57:03.777: [ AGFW][1139603776] {0:43:4} Received the reply to the message: RESOURCE_CLEAN[ora.cssd 1 1] ID 4100:2535 from the agent /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root 2017-06-27 13:57:03.778: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.cssd 1 1] ID 4100:2512 2017-06-27 13:57:03.778: [ CRSPE][1150110016] {0:43:4} Received reply to action [Clean] message ID: 2512 2017-06-27 13:57:03.778: [ CRSPE][1150110016] {0:43:4} RI [ora.cssd 1 1] new internal state: [STABLE] old value: [CLEANING] 2017-06-27 13:57:03.778: [ CRSPE][1150110016] {0:43:4} CRS-2681: Clean of 'ora.cssd' on 'rhis-cr-0613-04' succeeded 2017-06-27 13:57:03.778: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server received the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:22 2017-06-27 13:57:03.779: [ AGFW][1139603776] {0:43:4} Shutdown request received from /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root 2017-06-27 13:57:03.779: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server replying to the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:22 2017-06-27 13:57:03.779: [ CRSPE][1150110016] {0:43:4} Sequencer for [ora.cssd 1 1] has completed with error: CRS-5804: Communication error with agent process 2017-06-27 13:57:03.779: [ CRSPE][1150110016] {0:43:4} Starting resource state restoration for: START of [ora.cssd 1 1] on [rhis-cr-0613-04] : local=1, unplanned=00x1a0d31d0 2017-06-27 13:57:03.779: [ CRSPE][1150110016] {0:43:4} PE Command [ Resource State Change (ora.cssdmonitor 1 1) : 0x2aaab806abd0 ] has completed 2017-06-27 13:57:03.779: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:2546 2017-06-27 13:57:03.779: [ AGFW][1139603776] {0:43:4} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:2546 2017-06-27 13:57:03.780: [ AGFW][1139603776] {0:43:4} Agfw received reply from PE for resource state change for ora.cssdmonitor 1 1 2017-06-27 13:57:14.087: [ CRSCOMM][1102092608][FFAIL] Ipc: Couldnt clscreceive message, no message: 11 2017-06-27 13:57:14.087: [ CRSCOMM][1102092608] Ipc: Client disconnected. 2017-06-27 13:57:14.087: [ CRSCOMM][1102092608][FFAIL] IpcL: Listener got clsc error 11 for memNum. 52 2017-06-27 13:57:14.087: [ CRSCOMM][1102092608] IpcL: connection to member 52 has been removed 2017-06-27 13:57:14.087: [CLSFRAME][1102092608] Removing IPC Member:{Relative|Node:0|Process:52|Type:3} 2017-06-27 13:57:14.087: [CLSFRAME][1102092608] Disconnected from AGENT process: {Relative|Node:0|Process:52|Type:3} 2017-06-27 13:57:14.087: [ CRSPE][1150110016] {0:0:712} Disconnected from server: 2017-06-27 13:57:14.087: [ AGFW][1139603776] {0:0:715} Agfw Proxy Server received process disconnected notification, count=1 2017-06-27 13:57:14.088: [ AGFW][1139603776] {0:0:715} /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root disconnected. 2017-06-27 13:57:14.088: [ AGFW][1139603776] {0:0:715} Agent /oracle/grid11g/grid_infra/11.2.0/bin/cssdagent_root[10887] stopped! 2017-06-27 13:57:14.088: [ CRSCOMM][1139603776] {0:0:715} IpcL: removeConnection: Member 52 does not exist. 2017-06-28 09:20:14.829: [UiServer][1085446464] CS(0x2aaab8055db0)set Properties ( root,0x1a09f8c0) 2017-06-28 09:20:14.829: [UiServer][1085446464] SS(0x19f7ecc0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=700)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET)) 2017-06-28 09:20:14.841: [UiServer][1152211264] {0:0:719} processMessage called 2017-06-28 09:20:14.842: [UiServer][1152211264] {0:0:719} Sending message to PE. ctx= 0x19f671e0, Client PID: 18030 2017-06-28 09:20:14.842: [UiServer][1152211264] {0:0:719} Sending command to PE: 3 2017-06-28 09:20:14.842: [ CRSPE][1150110016] {0:0:719} Processing PE command id=131. Description: [Stat Resource : 0x19f4aec0] 2017-06-28 09:20:14.857: [UiServer][1152211264] {0:0:719} Done for ctx=0x19f671e0 2017-06-28 09:20:14.869: [UiServer][1085446464] Closed: remote end failed/disc. Regards. Roshan |
Administrator
|
Please send ocssd.log from both of the nodes.
I will check whether the CSSD is picking up the correct interface. |
Administrator
|
Also send me the following;
1)output of command "ping <private_ip_address>" from both of the nodes. 2)output of command "ethtool eth3" from both of the nodes |
Administrator
|
Send me your OS vendor+version and database+GRID version as well.
|
Hi,
please find attached. node52 folder is the working node node53 folder is the issue node crestelRAC.rar ./crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.2.0.3.0] Red Hat Enterprise Linux Server release 5.8 (Tikanga) Database version: 11.2.0.3.0 |
From issue node:
# ping 192.168.124.60 PING 192.168.124.60 (192.168.124.60) 56(84) bytes of data. 64 bytes from 192.168.124.60: icmp_seq=1 ttl=64 time=0.095 ms 64 bytes from 192.168.124.60: icmp_seq=2 ttl=64 time=0.165 ms traceroute 192.168.124.60 traceroute to 192.168.124.60 (192.168.124.60), 30 hops max, 40 byte packets 1 rac1-priv (192.168.124.60) 0.213 ms 0.173 ms 0.160 ms #From working node ping 192.168.124.61 PING 192.168.124.61 (192.168.124.61) 56(84) bytes of data. 64 bytes from 192.168.124.61: icmp_seq=1 ttl=64 time=0.168 ms 64 bytes from 192.168.124.61: icmp_seq=2 ttl=64 time=0.196 ms --- 192.168.124.61 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.168/0.182/0.196/0.014 ms [root@RHIS-CR-0613-03 ~]# traceroute 192.168.124.61 traceroute to 192.168.124.61 (192.168.124.61), 30 hops max, 40 byte packets 1 rac2-priv (192.168.124.61) 0.126 ms 0.098 ms 0.088 ms |
Administrator
|
Well.. the nodes can be pinged through private interfaces.
Your firewall may be running and blocking the private interfaces. Please check it and disable it on both nodes; service iptables stop service ip6tables stop then permenantly disable it. chkconfig iptables off chkconfig ip6tables off update me with the outcome. Reference: 11gR2 Grid: root.sh Fails to Start the Clusterware on the Second Node Due to Firewall on Private Network (Doc ID 981357.1) |
It should normally work. The system admin has disabled cluster services on this node. I think this is why it is not starting.
|
Administrator
|
So, the issue solved??
|
Administrator
|
what you mean by "System admin disabled the cluster services"?
|
Administrator
|
Update please.
|
Hi,
I am trying to start cluster on all nodes from node1. below is the steps i am following. crsctl enable crs crsctl start has crsctl start cluster -all but above works only on node1 not on other nodes. Please help. |
Administrator
|
Why?
1)Are those services configured to run only the first node? 2)Or are those services getting errros while starting on the second node? |
Free forum by Nabble | Edit this page |