In this module, learn about the methods the system supports for data reduction and how those methods work together to save capacity.
When storing data, you can sometimes run out of space from saving too much data. Data reduction can increase storage efficiency and performance and reduce storage costs. Data reduction reduces the amount of data that is stored on the system using a number of methods. The system supports data reduction pools, which contain thin-provisioned, compressed, and deduplicated volumes.
When you create a volume, you can designate it as thin-provisioned. With thin provisioning, storage administrators can configure volumes that can grow into the physical capacity they need based on the storage needs of the host. This flexibility saves on the costs to manage the storage capacity for host applications. Apart from fully allocated volumes, all volume types provide capacity savings and benefits from thin-provisioning. It can be used with compression and deduplication, or it can be used by itself. Thin-provisioning is supported in both standard pools and data reduction pools. However, data reduction pools support reclaiming capacity when it is no longer needed by hosts and then can redistribute it automatically for other uses.
Unlike thin-provisioning, which removes unused capacity, compression increases the capacity of physical storage by reducing the size of the data itself. When you create volumes, you can specify compression as a method to save capacity for the volume. With compressed volumes, data is compressed as it is written to disk, saving more space. When data is read to hosts, the data is decompressed. If you want volumes to use compression as part of data reduction support, compressed volumes must belong to data reduction pools. Compressed volumes can also be created in standard pools, but data reduction pools support additional capacity savings functions, such as automatic redistribution of reclaimed storage. Total storage savings values are determined from both of these sources of compression, if they are used on the system.
Deduplication can be configured with thin-provisioned and compressed volumes in data reduction pools. Deduplication is a type of data reduction that eliminates duplicate copies of data. With deduplication, the system identifies unique chunks of data, called signatures, to determine whether new data is written to the storage. Deduplication is a hash-based solution, which means chunks of data are compared to their signatures rather than to the data itself. If the signature of the new data matches an existing signature that is stored on the system, then the new data is replaced with a reference. The reference points to the stored data, instead of writing the data to storage. This process saves capacity on the backend storage by avoiding writing new data to storage. Deduplicated volumes must be created in data reduction pools. If you have existing volumes in standard pools, you can migrate them to data reduction pools and add deduplication to increase capacity savings for the volume.
If the system contains volumes that use all of these data reduction methods, then overall savings is calculated by applying each of these methods systematically. The system applies thin-provisioning first and removes the unwritten capacity. Deduplication is applied next where data is compared and matching data is removed from write operation. Compression is applied last and only unique data is compressed on the storage.
You can monitor the overall capacity savings of these data reduction technologies in the management GUI. Each of these capacity savings methods requires capacity monitoring to ensure efficient capacity provisioning on the system. Capacity savings result from using these methods and establishing best practices for capacity management in your organization and actual savings is contingent on these practices. However, compression and deduplication might provide data reduction two times greater than without these methods and thin-provisioning might provide two times the data reduction. When these data reduction methods are used together, overall capacity savings are combined.
To learn more about data reduction, see the topics in IBM(R) Knowledge Center.