node eviction

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

node eviction

Roshan
Oracle DB 12.1.0.2 RAC 2 nodes

Solaris 11.4

Hello Erman,

kindly note that server has been rebooted on 2 consecutive days as a result of node eviction.


The picture below has been taken from dmesg.
eviction.png
I have checked the ASM alert log but did not find any clue.

Kindly advise how to proceed in case of node eviction.
alert_+ASM1.rar

Thanks,

Roshan
Reply | Threaded
Open this post in threaded view
|

Re: node eviction

ErmanArslansOracleBlog
Administrator
It seems like you have a connection issue.  (related with 3.1.1.12)
Also the reason given there is 0x2 (The instance eviction reason is 0x2) .. This is kinda generic..

We need to check the alert log of the evicted instance.. In this case , instance 2...

Also , what is that iscsi for? (interconnect or what?)
Reply | Threaded
Open this post in threaded view
|

Re: node eviction

Roshan
Hi Erman,

I cannot tell whether it is internetwork.

I got the issue around 00:30 am today.

Please find attached alert log
alert_dware12.rar

I can send you the AHF logfiles by email.

I have checked doc 1367153.1 and issue seems to be more related to 'Issue #2: The node rebooted because it was evicted due to missing network heartbeats.'

How can I proceed with the private node check? (ref 1546004.1)

Thanks,

Roshan
Reply | Threaded
Open this post in threaded view
|

Re: node eviction

ErmanArslansOracleBlog
Administrator
You mean private interconnect check, right?

Did you see thid document? -> Oracle Grid Infrastructure: How to Troubleshoot Missed Network Heartbeat Evictions (Doc ID 1534949.1)

But it seems we checked the wrong alert messages ...
You have another eviction in the year of 2019 and we actually take it into consideration..

Your current eviction is caused by "abnormal instance termination"

Waiting for instances to leave: 2
2020-11-29T06:25:59.926754+04:00
Dumping diagnostic data in directory=[cdmp_20201129062559], requested by (instance=2, osid=12884907632 (LMS0)), summary=[abnormal instance termination].

Well, in node 2 you have lots of error during tat time.. (Nov 29 around 6 am)

ORA-04030: out of process memory when trying to allocate 1052696 bytes (klcliti:kghds,kllcqgf:kllsltba)
ORA-07445: exception encountered: core dump [lbivffs()+28]
ORA-04030: out of process memory when trying to allocate 33584 bytes (qmxtcxEvalXMLE,kghsseg:qmxtixBuildXQ)
ORA-48132: requested file lock is busy, [INCIDENT] [/u01/app/ora12c/diag/rdbms/dware1/dware12/lck/AM_1762783_4031814035.lck]
ORA-48170: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable

These Ora-04030s and ORA-48170s are already looking related with OS..

Seems like your actual problem is ORA-04030 and the other are the results.. (probably!)

This is IBM right?

On alert log 2, it all starts with "ORA-04030: out of process memory when trying to allocate 311832 bytes (kxs-heap-w,control file i/o buffer)"

Check -> "How to resolve ORA-04030: out of process memory when trying to allocate 248 bytes (kxs-heap-w,ctx:keswxCurPrepare) Errors on IBM AIX Platforms (Doc ID 1934141.1)"

Note that, this is a quick response..  I hope it will help you and make your problem dissapear.
But, also check the situation with all its details, work hard on your alert logs and the errors that you see at the time of eviction.. Check the OS logs as well.. Your cause is there..

So if it is IBM , then we may be hitting


Reply | Threaded
Open this post in threaded view
|

Re: node eviction

Roshan
Hi,

thanks for the update.

It is Sparc Soalris on ZS7 storage.

I minimized the number of applications connecting to this instance to reduce resource consumption. Can you please advise how to set ulimits to unlimited as per DOC 1934141.1

Thanks,

Roshan
Reply | Threaded
Open this post in threaded view
|

Re: node eviction

ErmanArslansOracleBlog
Administrator
Check the following MOS note ->

How To Set The Limit For The Maximum Number Of Open Files Per Process In Solaris 10 And Solaris 11 (Doc ID 1408563.1)

It is for maximum number of open  files , but it will give you the idea about setting ulimits..