4 node cluster running 6.5Redhat VMdVswitch 2 links connected to a nexus core LB set to Route based on physical NIC load
I have one VM that seems to lose network connectivity in a strange way. From inside the VM if I ping a certain hostname it resolves but fails to ping the IP, but if I ping the IP directly it will respond. I’m told this happens about every 6 months and the last time was Dec 2019.
I’m told in the past when this occurs they would migrate the VM to a different host in a cluster and it would just start working.
I’m here now and this isn’t a solution, but the system is working again because we did the host migration so I can’t reproduce it and the last time it occurred was back in Dec 2019.
I’m at a loss because in the past when I’ve seen these types of issues it was related to the load balancing setting in the switch or portgroup and it usually affected multiple VMs but this only occurs on 1 VM, and there is another identical VM for this app in the VM cluster that never experiences this.
Can I get some ideas on where to troubleshoot next?
Now the only strange thing I have found is the host that the effect VM is on is showing the wrong CDP info, it’s actually showing the CDP as a neighbor as one of the hosts in the cluster. I’ve never seen this before… related. No other VMs on this host is having issues or has in the past.
I’ve compared the networking settings between all the hosts and they appear to be the same, but I’m going to go over them again but any pointers anyone has would be great, this is a real head-scratcher for me but maybe something people have run into before?
View Reddit by InvisiblePinkUnic0rn – View Source