High Packet Loss When Using VPN in OPNsense Virtualized in vSphere 7.0

I have having a weird problem that I cannot figure out despite many hours of work. I hope someone here could give some thoughts as to why this is happening.

**Background**: I am running vSphere 7.0. I have a VM that is running OPNsense 20.1.9_1 (although I have also tried this with OPNsense 20.7.2 and get the same result). To be very conservative, I assigned 4 CPUs and 6 GB of RAM for this VM, and OPNsense reports that it is nowhere near using that much resources. I have two distributed vswitches, one for the LAN and one for the WAN. On the particular LAN and WAN port associated with the OPNsense VM, I disabled all security (accept promiscuous mode, MAC address changes, and forged transmits) and I set VLAN trunking (0-4094). I do run a few VLANs, so the VLAN trunking is needed. I don’t think the reduced security is needed, but I just set everything to “accept” in case it was causing this problem. I have gigabit fiber on the WAN physical uplink, connected to the AT&T gateway. Inside OPNsense, I set it to disable all hardware offloading (TSO, LRO, checksum, and VLAN checksum).


**Problem**: Inside OPNsense, I have two gateways: one for the WAN and one through Mullvad VPN (I use OpenVPN). There is no problem on the WAN gateway. I’ve tested large downloads that are tens of GBs from the internet and am able to get sustained full gigabit speed (around 90 MB/s) with 0% packet loss. So this indicates to me that there is no problem with the VM, the vSphere distributed switches, or anything. However, when I use firewall rules to direct traffic through the VPN gateway, I have problems. When the speed is lowish, around 10 MB/s, there is no packet loss according to OPNsense. However, when the speed is higher, around 20 MB/s, packet loss climbs very quickly, 30%, 50%, and pretty soon the VPN connection just stops responding. The packet loss will go back down to 0% and work again if I stop the download. To be clear, the VPN connection is capable of much faster than 20 MB/s — when I run OPNsense bare metal, I can easily get 50 MB/s on the VPN gateway with 0% packet loss.


So here is what is confusing me. When I run OPNsense on the VM, everything on the WAN works perfectly, at full gigabit speed with 0% packet loss. But when I direct traffic to VPN, I get huge packet loss that shuts down the gateway.

However, if I run OPNsense bare metal, I don’t get any packet loss on WAN and VPN gateway. This indicates to me that the problem is not the VPN. There seems to be some weird interaction between using the VPN inside a VM that is causing the problem. I’ve tried everything, so what could it be?

  1. FYI I don’t know OPNsense or openvpn but this sounds like a fragmentation issue. See if the config has a option to set MSS (or mss clamping) to 1380 or something like that. If it is a fragmentation issue you should be able to see it with a packet capture on the interface that terminates the VPN tunnel. If viewing in wireshark you might have to disable fragmentation reassembly.

