Deduplication and Compression (DD&C) with VMWare VSAN

Deduplication and compression (DD&C) are simple and effective ways to save space. The DD&C architecture in vSAN provides a data channel to keep guest VM latency to a minimum. While the two-tier architecture of vSAN helps limit performance effects, inadequate hardware setup or a collection of high-demand applications may overwhelm the configuration, causing VMs to experience higher latency. You can customize your settings and hardware combinations to meet the needs of your workloads, thanks to vSAN’s flexibility.

What is DeDup and Compression:

When one or more duplicate blocks are identified, data deduplication recognizes them. It utilizes a hash table to refer to a single data structure block rather than storing the same block multiple times. Data compression uses encoding techniques to store a certain quantity of data, such as the information within a data block, more efficiently. These two strategies are different, yet they both aim for the same thing: space efficiency.

The approach used to execute space efficiency is determined by the solution and can affect the amount of space saved and the work required to accomplish the desired outcome. Both deduplication and compression approaches are opportunistic space efficiency features, regardless of how they are implemented. The amount of capacity saved cannot be guaranteed. Data placement strategies that use erasure codes, such as RAID-5 or RAID-6, on the other hand, are deterministic: they ensure a certain degree of space efficiency for data stored robustly.

How is it implemented:

In vSAN, DD&C is activated as a single space-saving function at the cluster level. The procedure happens once the data has been destaged to the capacity tier and the write acknowledgments have been returned to the VM. Keeping any data modification to a minimum until after the acknowledgment has been issued helps keep the guest VM’s write latency low.

Deduplication will hunt for chances to deduplicate the 4KB blocks of data it discovers inside a disk group when data is destaged: vSAN’s deduplication domain. The compression procedure follows this duty. If the 4KB block can be compressed by 50% or greater, it will do so. Otherwise, it will continue to destaging the data as-is.

Performance:

Deduplication and compression demand effort, both computationally and in RAM use and extra I/O. It is only a matter of when, where, and how it occurs. The job limits the effective destaging throughput to the capacity tier in vSAN since it happens once the data in the write buffer begins to destage. In other words, a cluster with DD&C enabled may operate similarly to a cluster with DD&C disabled that has significantly lower performing capacity tier devices. The cluster’s maximum steady-state rate would be reduced as a result.

Options:

Adjustments can be made to fit the environment’s needs after the priorities have been identified. This might involve the following:

One of the most prevalent causes of performance concerns is a lack of performing devices at the capacity tier. Devices in the capacity tier are faster. Consider faster capacity devices at the capacity tier if you implemented DD&C and see more significant than expected VM latency. This can assist in compensating for the capacity tier’s lower performance when DD&C is enabled.

Increase the number of disk groups. More buffer capacity will be added, increasing the capacity for hot working set data and lowering the rate of data destruction. The minimal setup would be two disk groups, with three disk groups being the ideal arrangement.

Look at modern high-density storage devices for the capacity tier that meets your requirements. If they fulfill your performance criteria, these increased densities may eliminate the necessity for DD&C. See vSAN Design for more information.

Install the most recent version of vSAN. The focus of recent vSAN versions has been on boosting performance for clusters running DD&C: Increasing the destage rate and improving the consistency of latency to the VM through software improvements.

Only enable in a few clusters. Only enable DD&C in clusters with hardware capable of supporting the workloads’ performance requirements. Alternatively, space-saving storage rules such as RAID-5/6 might be applied to discrete workloads where it makes sense. It’s worth noting that erasure coding has its performance considerations.