Hi, I have now several weeks the following issue that when I am playing a game, ESXI 6.7.0 Update 3 (Build 14320388) will random crash with a PSOD and the error message:
>LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem;
My system is:
* Dell R720 (Got super lucky with that one and I am super happy to have it)
* 2x Xeon E5-2650 v2
* 128GB RAM
* perc 710p mini with 5x 900GB SAS HDD (RAID5)
* 512GB NVME (using CPU2)
* 64GB SATA SSD (for testing out vFlash)
* GTX 980 passthrough (tried on different slots, CPU1 and CPU2)
* According to the iDRAC internal web panel, all firmwares are up to date
That thing is, it’s like super random. For now I am playing Xcom2 and Borderlands 3 and both had those errors.
iDRAC is telling the following after such an crash:
>A bus fatal error was detected on a component at bus 64 device 2 function 0.
A bus fatal error was detected on a component at slot 4.
I have literally 0 clue why this is happening. The most annoying part is, that it will crash also all other VM’s where I am testing stuff out to educate myself with VM stuff.
For now I really really liked ESXI, but that one is a real issue, as I have no tower to put the GTX into to play the games 🙁
* Windows 10
* 8 vCPU (was at first on 16)
* 24GB RAM (16 did not helped)
* 100 GB NVME VHD for OS
* 200 GB NVME VHD for Games
* 200 GB perc VHD for Games
* GTX 980 with passthrough
* GTX 980 HiDef Audio passthrough (for now not added, as I assumed that that thing was causing issues)
I tried the following, to no avail:
* Disabling C1E (read it somewhere)
* Low latency with frequency reservation (assumed it maybe was a timing issue)
* hypervisor.cpuid.v0 = FALSE (otherwise the driver install is not possible)
* pciPassthru.64bitMMIOSizeGB = 64 (tried also 4 and 24, as I am was not sure if it’s meant for RAM or VRAM or both)
* pciPassthru.use64bitMMIO = TRUE
* pciHole.start = 2048
* pciHole.end = 6144 (also tried 8196, as somewhere it was stated to increase stability, but other places told that ESXI 6.5+ is doing it automatically? Additional I am not sure if it should hole exactly 4GB or more)
* pciPassthru0.msiEnabled = FALSE
* Swapping the GPU, so that the other CPU is handling it (it then only changes the slot and bus numbers in the iDRAC log)
* simplifying the VM (not using NVME with less connected VHDs)
* ran memtest
What really weird is, that running **folding@home GPU folding** has **no issue at all**. It never crashed when FAH is running and I have it running near all the time (get that covid).
At best would be that I get a desktop and put the GTX in there for baremetal, but thanks to unforeseen circumstances I can’t afford one and I am desperately trying to get it to work. The GTX was itself a present from a friend to me to be again able to play some games, but unfortunately those PSOD happend.
I am not sure if this is the right sub, but r/esxi does not has a lot of users and r/VFIO is unfortunately the wrong one, as the are handling proxmot, KVM and similar.
Any clues what I could try? I can’t find anything useful on the internet anymore.
View Reddit by KeMushi – View Source