VMware

USENIX ATC ’13 – Optimizing VM Checkpointing for Restore Performance in VMware ESXi



Optimizing VM Checkpointing for Restore Performance in VMware ESXi

Irene Zhang, University of Washington and VMware; Tyler Denniston, MIT CSAIL and VMware; Yury Baskakov,VMware; Alex Garthwaite, CloudPhysics and VMware

Cloud providers are increasingly looking to use virtual machine checkpointing for new applications beyond fault tolerance. Existing checkpointing systems designed for fault tolerance only optimize for saving checkpointed state, so they cannot support these new applications, which require better restore performance. Improving restore performance requires a predictive technique to reduce the number of disk accesses to bring in the VM’s memory on restore. However, complex VM workloads can diverge at any time due to external inputs, background processes, and timing variation, so predicting which pages the VM will access on restore to reduce faults to disk is impossible. Instead, we focus on predicting which pages the VM will access together on restore to improve the efficiency of disk accesses.

To reduce the number of faults to disk on restore, we group memory pages likely to be accessed together into locality blocks. On each fault, we can load a block of pages that are likely to be accessed with the faulting page, eliminating future faults and increasing disk efficiency. We implement support for locality blocks, along with several other optimizations, in a new checkpointing system for VMware ESXi Server called Halite. Our experiments show that Halite reduces restore overhead by up to 94% for a range of workloads.

View the full USENIX ATC ’13 program at

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

Leave a Reply