HP 12000 HP 12200 Gateway Virtual Library System User Guide (BW403-10001, June - Page 70
Deduplication, How It Works, Getting Deduplication Running on the VLS, Considerations
View all HP 12000 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 70 highlights
6 Deduplication Deduplication is the functionality in which only a single copy of a data block is stored on a device. Duplicate information is removed, allowing you to store more data in a given amount of space and restore data using lower bandwidth links. The HP StorageWorks virtual library system uses Accelerated deduplication. This section describes deduplication including getting deduplication running on your system, configuring deduplication, and viewing reports. NOTE: See the HP StorageWorks VLS and D2D Solutions Guide for more detailed information. How It Works HP Accelerated deduplication compares the most recent version of a backup to the previous version using object-level differencing code. It places pointers in the earlier version that identify duplicated content in the new version. Deduplication then eliminates the redundant data in the earlier version while retaining the complete, new version. You can improve deduplication performance simply by adding additional nodes. NOTE: Deduplication takes place after the data has been processed to the backup tapes. Therefore, any data backed up to compression-enabled virtual tape drives (both software and hardware compression) is compressed before it is deduplicated. The following is an overview of the deduplication process. See the HP StorageWorks VLS and D2D Solutions Guide for more detailed information. 1. When a backup runs, a data grooming exercise is performed on the fly. Using meta-data attached by the backup application, data grooming maps the content or "objects" of the backup, and assembles a content database. This process has minimal performance impact. 2. After the scheduled backups have completed, the content database is used to "delta-difference" (compare) objects in current and previous backups from the same hosts. There are different levels of comparison. For example, files may be compared using a strong hashing function, while other objects may be compared at a byte level. 3. When duplicate data is found in an older backup, it is replaced by a pointer to the most recent copy of the same data. Because the most recent backup is a full version, you achieve the fastest possible restores. 4. Space reclamation occurs when duplicate data from previous backups is removed from the disk. This can take some time, but results in previously consumed capacity being returned to a free pool on the device. Getting Deduplication Running on the VLS This section explains how to get deduplication running on your VLS system including some considerations for setting up the system, installing the firmware, and installing the deduplication licenses. Considerations To make the most of the deduplication benefits, review these considerations before setting it up on your VLS system: • Virtual cartridge sizing - The system cannot deduplicate versions of a backup that are on the same cartridge; the versions are not deduplicated until a new version is written to a different virtual cartridge. Therefore, you want the cartridges to be sized big enough to contain an 70 Deduplication