VMware

ESXi snapshot removal logs

I have completed most of the work to upgrade from ESXi6.5 -> 6.7U3. I’m in the last part.. The rolling upgrade of ESXi servers themselves. I’ve completed 4 / 10 of them.

I’m starting to have issues with my VEEAM backups. The snapshot removal is taking longer on several virtual machines. While the removal is taking place, the Virtual Machine becomes unresponsive. Once the removal completes, the virtual machine starts working properly again.

I don’t think it’s a storage issue. We have an all-SSD storage array, that demonstrates low latency, and some of these snapshots are very small. It feels more like a locking type issue to me. The problem started when I introduced 6.7, and while it has mostly occurred on virtual machines running on the 4 updated hosts, it has even happened on 6.5 hosts.

I would like to research the issue in-depth. Can someone tell me where ESXi logs the snapshot removal process? I believe there is some info in the virtual machine directory itself in the vmware.log file, but .. hostd? vpxd? Does VCENTER have any logs?


View Reddit by cranky9View Source

Related Articles

2 Comments

  1. Best guess is the vmware.log from the virtual machine. As virtual_nerd already said the vmkernel.log and the hostd.log also show some info. But it’s basically the same as in the vmware.log file.

    Is the issue only happening with VM’s on the 6.7U3 machines? Or also on the older 6.5 machines?

    BTW: There is a patch on the VMware Website for [vSphere ESXi 6.7U3b](https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-201912001.html) that patches the following issue:
    *Virtual machines with CBT enabled might report long waiting times during snapshot creation due to the 8 KB buffer used for CBT file copying. With this fix, the buffer size is increased to 1 MB to overcome multiple reads and writes of a large CBT file copy, and reduce waiting time.*

    This might be what you are facing. Looks very familiar to your issue. The longer the vm-stun time is, the longer the VM is unresponsive. (VM-stun always happens on snapshot removals)

  2. You are already aware of the vmware.log so for the host:

    /var/log/vmkernel.log
    /var/log/hostd.log

    those will be the main ones you want to review. You could also check ESXTOP while the snapshot is being consolidated to see what else may be going on.

    vCenter wont have much on why it takes so long as snapshots are host-based operations.

Leave a Reply

Your email address will not be published. Required fields are marked *

Close