VMware

ESXi host freezing advice please

So, I have a dell R210 II running Dell version of ESXi 6.7u3 with 3 VMs: a vSan Witness, vSphere Appliance, and pfsense. I thought about posting this question on r/pfsense, but I guess I’ll start here and check if anyone has seen this before. I’m completely aware this version of esxi isn’t supported on this machine. If you want to post that that’s my problem, fine, but it doesn’t really help. It’s been running fine for a while now.

Basically it is my virtualized router and vsan utility box. It works great and I absolutely love it. But occasionally, and becoming more frequently, the box just freezes. So I’ll explain the circumstances.

It’s kind of hard to tell what is failing, but I believe it to be esxi itself. I’d also appreciate any help in debugging this if Im missing something obvious. When it happens, I notice because internet is down. So since I can’t log into esxi or Dirac, I plugged in KVM and see the greyed out ESXi screen in it’s normal mode. Not coredump mode. But it is completely unresponsive. Can’t f2, can’t f12, etc. A reboot does bring it all back up fine.

Logs show nothing, not in esxi, pfsense, or the appliances.. Like a blank slate, no error before and then the normal boot up logs. No CPU spike, memory seems fine. All I can think is that it’s some kind of vmkernel problem.

I have watchdog running on pfsense to automatically restart failed services, such as my vpn, dhcp server, etc. One time this happened I got an email saying DHCP server had crashed and was being restarted. But the other times no such email.

I have tried that pfsense “offload checksum” fix that people use for unsupported hardware but that didn’t work. I guess I’m having trouble figuring out next steps.


View Reddit by MrSavagerView Source

Related Articles

6 Comments

  1. Hard locks like that usually signal hardware problems.

    If you think OS (ESXi, in this case) is causing the issue, you can try booting to another OS – you can find Live Linux distros to boot to. You can also try running diagnostics to see if it will give you a better idea if there is a specific hardware that’s causing the problem

  2. Did you patch either recently? As you state, it’s been running fine for a while now. This means something must have changed. If you can, see if you can get any data off the host, like CPU/Memory/Network stuff. Especially if you can get the deeper metrics exposed in esxtop. It’s possible that your host is hitting some bottleneck (maybe from another VM on the host) and it’s hanging because it’s attempting to respond to other requests.

    What is the time interval between hangs? Does it happen somewhat regularly? If you have other VMs on the host, have you tried shutting them down? Or reducing the resources available to them?

  3. My canned response here is that the R210 II doesn’t appear to be on the HCL for anything past 5.5. Now that doesn’t mean that is the cause, but keep that in mind – the HCL exists for a reason. With that being said, I have 0 pfsense as a VM experience. Good luck.

Leave a Reply

Your email address will not be published. Required fields are marked *

Close