Any reason VMWare hosts go incommunciado the same time every other morning following patches?

I have 10 Dell Poweredge R630s running VMWare ESXI (Dell Customised). We patched 9 of them with all stable patches 2 weeks ago. Since then, most of them (different numbers on different days, always between 5-9 of them) go incommunicado and we receive alerts from our Zabbix monitoring system between 7:30-8:30 AM PST EVERY OTHER day. The hosts are NOT down or unreachable; I can get to them in vCenter and the guests are up and reachable fine. This system has been in place for several years and we’ve never seen this before the latest round of patches. Any thoughts as to what could be happening?

We have 1 of our 10 hosts which is unpatched due to a hardware dependency and that never goes down, so we do not believe there is an issue with our zabbix system. We believe SNMP is going down at the same time every day. Additionally we have confirmed ping and vsphere access during “downtime,” so the network is not an issue.

  1. So what test is failing in Zabbix, and to what IP address on the hosts?
    snmp test to dell idrac, or esxi mgmt interface?

    Is it a quick down/up alert, or does the “outage” last more than 5/10 minutes.

    Were NIC firmware updates also applied? Could you try reversing one to see if that stabilizes?

  2. Somebody else recently had some oddball problems on here with strange network issues. I’m on mobile right now and can’t find it in the comments. I’ll try to dig it up…

    They also were using the Dell customized image and their problems were fixed simply by updating the nic drivers.

    They actually downgraded the driver version. But depending on what you are dealing with and versions you are on, I’d try any different version up or down.

  3. If the host is up, and the monitoring tool says its down, look at the monitoring tool.

    I’ve never used Zabbix, but a quick google seems to suggest false positives are a pretty common problem.

