VMware

Question about vSAN approach/architecture

I have entered a group that has a vSAN design that I think is a bit wonky from a disk failure view.

The vSAN is composed of five esxi servers, each having five disks, and esxi installed on a thumb drive. The disks in each esxi is striped via server hardware:

​

-> ESXi01 hdd0 hdd1 hdd2 hdd3 hdd4 (Raid-0/Stripe)

-> ESXi02 hdd0 hdd1 hdd2 hdd3 hdd4 (Raid-0/Stripe)

-> ESXi03 hdd0 hdd1 hdd2 hdd3 hdd4 (Raid-0/Stripe)

-> ESXi04 hdd0 hdd1 hdd2 hdd3 hdd4 (Raid-0/Stripe)

-> ESXi05 hdd0 hdd1 hdd2 hdd3 hdd4 (Raid-0/Stripe)

vSAN = (Stripe0 Stripe1 Stripe2 Stripe3 Stripe4)

If a disk fails, the ESXi stripe requires rebuilding in order to replace a single drive. I don’t know much about vSAN, but this sounds counterproductive.

​

A better plan I would think would be to do nothing from a hardware raid point of view, and just let vSAN determine the RAID based on design of VM.

-> ESXi01 hdd0 hdd1 hdd2 hdd3 hdd4 (no Raid)

-> ESXi02 hdd0 hdd1 hdd2 hdd3 hdd4 (no Raid)

-> ESXi03 hdd0 hdd1 hdd2 hdd3 hdd4 (no Raid)

-> ESXi04 hdd0 hdd1 hdd2 hdd3 hdd4 (no Raid)

-> ESXi05 hdd0 hdd1 hdd2 hdd3 hdd4 (no Raid)

​

vSAN = (Raid-5 utilizing all the hdds above)

​

If a disk fails. vSAN will mark it as failed and can be hot-swapped, maybe leave some as “spares” to automate)

Any ideas? Recommendations? Other than getting rid of vSAN! 🙂

​

Thanks,


View Reddit by AliveInPhillyView Source

Related Articles

5 Comments

  1. Your post is confusing. Are you referring to VMware vSAN, or some other virtual SAN solution?

    Your current config and proposed config don’t make sense for VMware vSAN – you’re missing a caching tier. You also refer to VMware vSAN features that don’t exist (e.g. warm spare drives).

  2. From my knowledge, you’re 100% correct.
    vSAN wants to manage the disks.
    Get HBAs, not raid controllers.

    Also from your example, it does not look like you have cache disks. vSAN will require a cache tier of SSD disks.

    And with vSAN, your Erasure Coding (I think that is what they call their file redundancy) is on a per file basis. So, on the same vSAN datastore, you can have a vmdk protected at a raid 5 equivalent sitting right beside a vmdk protected by a raid 1 equivalent. That is the beauty of vSAN, redundancy is managed at the software level, not the storage controller level.

  3. Hi, I’m from the HCI group within VMware.

    Several have pointed you in the right direction here: do not use disk RAID across disks in vSAN. For an authority on this, see the [“Storage Controllers” section in the vSAN docs](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vsan-planning.doc/GUID-4B738A10-4506-4D70-8339-28D8C8331A15.html).

    You don’t need “spares” in vSAN. It will not automatically claim or use them for safety/sanity purposes. Instead, use all devices for cache or capacity. Additionally, vSAN requires flash cache disks to function, so it is good to understand the types of disks involved in vSAN. I suggest familiarizing yourself with the [vSAN Design & Sizing Guide](https://storagehub.vmware.com/t/vmware-vsan/vmware-r-vsan-tm-design-and-sizing-guide-2/), as it’s a nice resource for understanding overall vSAN architectural concepts. I’d be remiss to not point out may of these resources I linked here come from the excellent [storagehub.vmware.com](https://storagehub.vmware.com), a place for information on all things VMware storage, including vSAN.

    I would open a case with VMware support, who can advise you best on how to proceed sorting this cluster out. I suggest ahead of time doing a few things to accelerate your case:

    1. **Verify hardware compatibility:** you can do this via the [built-in health checks](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vsan-monitoring.doc/GUID-68CDE86F-C5A7-4B3E-9DA8-BD8165D3A9AF.html), which will tell you if the hardware is compatible. It is crucial that all disks and disk controllers used are on the vSAN HCL. If any incompatible components exist, replace them via your vendor or a third-party who can provide compatible components. Incompatible hardware is the usual cause of poor vSAN behavior, which is why we have solutions out there like [ReadyNodes](http://vsanreadynode.vmware.com) and [VxRail](https://www.dellemc.com/en-us/converged-infrastructure/vxrail/index.htm).

    2. [Enable CEIP to allow leveraging vSAN Support Insight](https://storagehub.vmware.com/t/vmware-vsan/vsan-support-insight/)

    3. [Export a log bundle](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vcsa.doc/GUID-C54CA3F8-BD74-4339-A2A5-AE89F1C55175.html). Include the hosts in the vSAN cluster. You shouldn’t need the vCenter logs.

    Hope this helps!

  4. Rule for any software defined storage is to not use hardware raid and use just an HBA or NVMe. The other rule I follow for VMware is, if it’s prod use VSAN ready nodes or there is a high likelihood of a bad time in your future. If you were using a VSAN ready node it would have been an HBA instead of a raid card because there is no point in spending money on a raid card to put it in HBA mode.

    The other thing with VSAN is you will want to read the documentation on the technology as there is many different ways to shoot yourself in misconfiguration. You are up the right path though on having VMware own the disks and not the raid card.

  5. I’m not a fan of vSAN, however, in your case I would do neither of what you state. I wouldn’t trust vSAN to correctly do RAID 5 across 20 drives on 4 different hosts.

    What I would do, is setup hardware raid on each host RAID 5, then just use vSAN in a stripe to loop them all together. Then if you have a HDD failure, vSAN won’t even know about it. The hardware raid controller will take care of it. You replace the drive in the host that has the failure, then it rebuilds.

    Everything I’ve seen and read is that vSAN really has issues recovering from failed “members”. So the more that you can keep vSAN from seeing a failure the better you are off.

Leave a Reply

Your email address will not be published. Required fields are marked *

Close