Thankfully this is only being seen in our test environment where we're trying to perform benchmarking of an X5-2 system. The behavior of the system is as follows:
- shared Repositories show ONLINE - VMs on the system show UNKNOWN VMs are unreachable by ssh or console. The only 'solution' is to reboot both DOM0 and DOM1. Even restarting the oda_base leads to unexpected results... In the 'hung' state, I see: [root@tpin-oda0 ~]# oakcli show repo NAME TYPE NODENUM FREE SPACE STATE SIZE bigrepo shared 0 2.12% ONLINE 5120000.0M bigrepo shared 1 2.12% ONLINE 5120000.0M odarepo1 local 0 N/A N/A N/A odarepo2 local 1 N/A N/A N/A shared shared 0 32.21% ONLINE 204800.0M shared shared 1 32.21% ONLINE 204800.0M [root@tpin-oda0 ~]# oakcli show vm NAME NODENUM MEMORY VCPU STATE REPOSITORY biserver 0 16384M 2 UNKNOWN bigrepo sol10-113-test 0 2048M 2 OFFLINE shared sol11-3-tcs 0 8192M 8 UNKNOWN shared sol11-3-test 0 2048M 2 ONLINE shared sol11-3-web 0 16000M 2 ONLINE shared [root@tpin-oda0 ~]# oakcli show vm sol11-3-tcs The Resource is : sol11-3-tcs AutoStart : restore CPUPriority : 100 Disks : |file:/OVS/Repositories/shared/.ACF S/snaps/sol11-3-tcs/VirtualMachines /sol11-3-tcs/030967ab55794ba195605e 813f35a6c6.img,xvda,w| Domain : XEN_PVM DriverDomain : False ExpectedState : online FailOver : false IsSharedRepo : true Keyboard : en-us MaxMemory : 32768M MaxVcpu : 16 Memory : 8192M Mouse : OS_DEFAULT Name : sol11-3-tcs Networks : NodeNumStart : 0 OS : OL_5 PrefNodeNum : 0 PrivateIP : None ProcessorCap : 0 RepoName : shared State : Unknown TemplateName : otml_Solaris11-3 VDisks : |0| Vcpu : 8 cpupool : default-unpinned-pool vncport : 5904 [root@tpin-oda1 ~]# oakcli show ismaster OAKD is in Master Mode [root@tpin-oda0 ~]# oakcli show ismaster OAKD is in Slave Mode Ok..so now on DOM0/DOM1 perform oakcli restart oda_base: [root@dom0 ~]# oakcli restart oda_base INFO: Stopping ODA base domain... ERROR: Exception encountered while stopping oda_base [Errno 104] Connection reset by peer [root@dom0 ~]# [root@dom1 ~]# oakcli restart oda_base INFO: Stopping ODA base domain... INFO: Stopping all the shared repos INFO: Starting ODA base domain... INFO: Started ODA base domain [root@dom1 ~]# And afterwards... [root@tpin-oda0 ~]# oakcli show ismaster OAKD is in Master Mode [root@tpin-oda0 ~]# oakcli show repo NAME TYPE NODENUM FREE SPACE STATE SIZE odarepo1 local 0 N/A N/A N/A odarepo2 local 1 N/A N/A N/A [root@tpin-oda0 ~]# oakcli show vm NAME NODENUM MEMORY VCPU STATE REPOSITORY [root@tpin-oda0 ~]# [root@tpin-oda0 ~]# oakcli show repo NAME TYPE NODENUM FREE SPACE STATE SIZE odarepo1 local 0 N/A N/A N/A odarepo2 local 1 N/A N/A N/A ....and after 10 mins...it changes to: [root@tpin-oda0 ~]# oakcli show repo NAME TYPE NODENUM FREE SPACE STATE SIZE bigrepo shared 0 N/A UNKNOWN N/A bigrepo shared 1 N/A UNKNOWN N/A odarepo1 local 0 N/A N/A N/A odarepo2 local 1 N/A N/A N/A shared shared 0 N/A UNKNOWN N/A shared shared 1 N/A UNKNOWN N/A [root@tpin-oda0 ~]# oakcli show vm NAME NODENUM MEMORY VCPU STATE REPOSITORY The only 'solution' that we have is run a 'reboot -f' from Dom0/1 and wait 20 mins. Any ideas? |
Administrator
|
In order to be able to comment on this, logs should be reviewed.
As you said, "VMs are unreachable by ssh or console" -> a full investigation is needed. Start the from the User domain. (DB and App logs + OS logs) then continue with the DOM0 (Oracle VM Server related logs) |
Hi,
I figured, no problem next time I encounter the issue, I'll grab the logs and .trc files. But if I look at what's changed in the past day there's hundreds of log files and dozens of traces. I'm sure 99% of this is just computer sewage....so can you please provide details on which exact traces and logs I need to provide. The user domain is much worse for the number of logs created but even the DOM0 is no treat...can you confirm you just need the latest files in: /var/log/xen/ /var/log/ ?? As always, thanks so much. Alex |
Administrator
|
I already send you the logs to be collected.
-> Start the from the User domain. (DB and App logs + OS logs) -- /var/log/messages, alert log , dmesg and everything. then continue with the DOM0 (Oracle VM Server related logs) Oracle VM Server related logs -> read -> https://docs.oracle.com/cd/E50245_01/E50251/html/vmadm-tshoot-server-logs.html Also check your ODA_BASE logs. The OS logs of ODA_BASE + the logs under /opt/oracle/oak/log |
Free forum by Nabble | Edit this page |