Disk consolidation extremely slow – high CMD/sec low MB/sec read/write.

I recently discovered a very old snapshot on a server we use for monitoring (zabbix). Base disk is 800GB and the sesparse disks are 272GB and 26GB (two snapshots apparently). The snapshot delete failed (not really surprising) and disk consolidation was needed.

This host stops responding eventually when disk consolidation is run, but I have it isolated and don’t really care if it takes another 24 hours … however I think I might have a lost cause here. Reviewing the stats in esxtop and on my storage array I see an average of 250 IOPS for read/write (combined with some bursting), but I see a VERY low figure for both MBREAD/s and MBWRTN/s, the two having an average of about .50.

I have never seen a consolidation this slow and I have seen much larger snapshots gone crazy in the past. Our SAN is a Nimble CS500 with 4 10Gbit uplinks (two active) and the hosts are on 40Gbit combined. Performance is not suffering elsewhere, the volume isn’t limited in IO or MiB/sec.

Currently the VM is powered off, the host is about 90% unresponsive (many commands wont run but esxtop will, etc). Any ideas on this? I’m thinking it will probably be a burn it down and rebuild but I’d love to avoid that if possible.

One Comment

  1. I had a client who had a 3 year old snapshot on a 200GB database server. The snapshot took 4 days and 2.7TB of space to complete. It was faster in hindsight to shut the server down, run a differential backup and then restore a fresh server and can the original one. That process took less than 2h.

