RAC performance issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RAC performance issue

Roshan
Oracle RAC 11g
Oracle Database 11g
Red Hat Linux 6

Hi,

we were having some performance issues on RAC cluster(2 nodes).

We performed a healthcheck and found:

1. IO wait was high on the node where database was not running while IO wait was low on node where Oracle DB was running
IOWAIT.jpg

2. The tps on first node(no database) was higher
IOSTAT.png
IOSTATDBUP.png

3. SAN related warnings were found on first node
SANLOG.png

I have some queries based on the above findings.

What does tps mean? Which metric from iostat should be monitored for performance?

Where can I find the log files for monitoring SAN performance on linux?


Thanks,

Roshan
Reply | Threaded
Open this post in threaded view
|

Re: RAC performance issue

ErmanArslansOracleBlog
Administrator
interesting.

tps stands for Transactions Per Second (ie IOPS).
use iostat -x for a more detailed output.
Read this for the quick info: https://dom.as/2009/03/11/iostat/

Consider using iotop for finding the cause. (read this: http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/)

Check the OS of that node, and see there is no problematic disk device. ( I mean lost IOs making the process D state - http://ermanarslan.blogspot.com.tr/2013/08/linux-d-state-processes.html)

Check the processes as a whole, there may be some backup programs running. Use top to check the heavy processes in the first place.
Reply | Threaded
Open this post in threaded view
|

Re: RAC performance issue

Roshan
We found the RAID controller was faulty. We replaced the RAID controller on one of the nodes.
Reply | Threaded
Open this post in threaded view
|

Re: RAC performance issue

ErmanArslansOracleBlog
Administrator
Good. So it was an IO related thing solved in the I/O path.
However, per-second values in your iostat output says you have I/O.
I mean your problematic node had more I/O requests than the other one.
Anyways, those per-second values were not so high, so there should not be those %iowaits there..