VMware

High availability and power issues

Scenario: 2 hosts with ESXi 6.5 in cluster, HA enabled, 15 vm’s splitted on both hosts.

Last week we had a power issue in server room that caused the unmanaged power off of both hosts, we were able to power on again only host #1, all vm’s that were running on this host were back on line, all the vm’s in the powered off host #2 were not migrated by AH on host #1.

We saw vm’s migration only when host #2 was put on line.

Question: why HA did not moved all vm’s on host #1 when this was rebooted ???


View Reddit by DZAUKERView Source

Related Articles

4 Comments

  1. I would investigate if there was a VM startup issue on host #1 or a capacity issue on host #1; i.e., not enough RAM, CPU, etc. Assuming you’ve got your HA cluster spec’d. properly this seems less likely but who knows.

    Like start the VM that HAD been on host #2 and see if you get any errors when trying to start it manually.

    Logically yes HA should handle restarting the VM on another host.

    Only other thing I can think of is perhaps the VMs weren’t set to auto-start on cluster start. Since you had a cold start on the cluster from power loss that could explain why they’re not magically starting on whatever host is available.

  2. So you’re saying you lost both hosts? If you lost both hosts and they both came online around same time ha would have no idea since the master would have been down along with slave.

    Even after one comes up and the other comes up it wouldn’t just know anything. Also where was vcenter hosted on this on one of these machines or another location all together.

  3. Because host#1 may has fallen into “isolated mode”, due to the improper shutdown it did not see the host#2 fail but both failed, so turning on, if network is not up fast enough it can get the impression that it was the one that failed/disconnected and it is isolated, so no action is taken other that keep VMs running.

  4. HA isn’t really designed to handle whole rack power issues. it’s more for the loss of *some* of the cluster. You should be mitigating these kind of power issues with a generator and UPSs to handle the generator spin-up time.

Leave a Reply

Your email address will not be published. Required fields are marked *

Close