VMs in UNKNOWN state
Posted by alexb on Mar 15, 2018; 4:06pm
URL: http://erman-arslan-s-oracle-forum.124.s1.nabble.com/VMs-in-UNKNOWN-state-tp5396.html
Thankfully this is only being seen in our test environment where we're trying to perform benchmarking of an X5-2 system. The behavior of the system is as follows:
- shared Repositories show ONLINE
- VMs on the system show UNKNOWN
VMs are unreachable by ssh or console. The only 'solution' is to reboot both DOM0 and DOM1. Even restarting the oda_base leads to unexpected results...
In the 'hung' state, I see:
[root@tpin-oda0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
bigrepo shared 0 2.12% ONLINE 5120000.0M
bigrepo shared 1 2.12% ONLINE 5120000.0M
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
shared shared 0 32.21% ONLINE 204800.0M
shared shared 1 32.21% ONLINE 204800.0M
[root@tpin-oda0 ~]# oakcli show vm
NAME NODENUM MEMORY VCPU STATE REPOSITORY
biserver 0 16384M 2 UNKNOWN bigrepo
sol10-113-test 0 2048M 2 OFFLINE shared
sol11-3-tcs 0 8192M 8 UNKNOWN shared
sol11-3-test 0 2048M 2 ONLINE shared
sol11-3-web 0 16000M 2 ONLINE shared
[root@tpin-oda0 ~]# oakcli show vm sol11-3-tcs
The Resource is : sol11-3-tcs
AutoStart : restore
CPUPriority : 100
Disks : |file:/OVS/Repositories/shared/.ACF
S/snaps/sol11-3-tcs/VirtualMachines
/sol11-3-tcs/030967ab55794ba195605e
813f35a6c6.img,xvda,w|
Domain : XEN_PVM
DriverDomain : False
ExpectedState : online
FailOver : false
IsSharedRepo : true
Keyboard : en-us
MaxMemory : 32768M
MaxVcpu : 16
Memory : 8192M
Mouse : OS_DEFAULT
Name : sol11-3-tcs
Networks :
NodeNumStart : 0
OS : OL_5
PrefNodeNum : 0
PrivateIP : None
ProcessorCap : 0
RepoName : shared
State : Unknown
TemplateName : otml_Solaris11-3
VDisks : |0|
Vcpu : 8
cpupool : default-unpinned-pool
vncport : 5904
[root@tpin-oda1 ~]# oakcli show ismaster
OAKD is in Master Mode
[root@tpin-oda0 ~]# oakcli show ismaster
OAKD is in Slave Mode
Ok..so now on DOM0/DOM1 perform oakcli restart oda_base:
[root@dom0 ~]# oakcli restart oda_base
INFO: Stopping ODA base domain...
ERROR: Exception encountered while stopping oda_base [Errno 104] Connection reset by peer
[root@dom0 ~]#
[root@dom1 ~]# oakcli restart oda_base
INFO: Stopping ODA base domain...
INFO: Stopping all the shared repos
INFO: Starting ODA base domain...
INFO: Started ODA base domain
[root@dom1 ~]#
And afterwards...
[root@tpin-oda0 ~]# oakcli show ismaster
OAKD is in Master Mode
[root@tpin-oda0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
[root@tpin-oda0 ~]# oakcli show vm
NAME NODENUM MEMORY VCPU STATE REPOSITORY
[root@tpin-oda0 ~]#
[root@tpin-oda0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
....and after 10 mins...it changes to:
[root@tpin-oda0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
bigrepo shared 0 N/A UNKNOWN N/A
bigrepo shared 1 N/A UNKNOWN N/A
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
shared shared 0 N/A UNKNOWN N/A
shared shared 1 N/A UNKNOWN N/A
[root@tpin-oda0 ~]# oakcli show vm
NAME NODENUM MEMORY VCPU STATE REPOSITORY
The only 'solution' that we have is run a 'reboot -f' from Dom0/1 and wait 20 mins.
Any ideas?