HP StoreOnce 4430 HP StoreOnce Backup System Concepts and Configuration Guidel - Page 14

HP StoreOnce technology, Data deduplication

Page 14 highlights

2 HP StoreOnce technology A basic understanding of the way that HP StoreOnce Technology works is necessary in order to understand factors that may impact performance of the overall system and to ensure optimal performance of your backup solution. Data deduplication HP StoreOnce Technology is an "inline" data deduplication process. It uses hash-based chunking technology, which analyzes incoming backup data in "chunks" that average 4K in size. The hashing algorithm generates a unique hash value that identifies each chunk and points to its location in the deduplication store. Hash values are stored in an index that is referenced when subsequent backups are performed. When data generates a hash value that already exists in the index, the data is not stored a second time, but rather a count is increased showing how many times that hash code has been seen. Unique data generates a new hash code and that is stored on the appliance. Typically about 2% of every new backup is new data that generates new hash codes. With Virtual Tape Library and NAS shares, deduplication always occurs on the StoreOnce Backup system. With Catalyst stores, deduplication may be configured to occur on the media server (recommended) or on the StoreOnce Backup system. Key performance factors with deduplication that occurs on the StoreOnce Backup system The inline nature of the deduplication process means that it is a very processor and memory intensive task. HP StoreOnce appliances have been designed with appropriate processing power and memory to minimize the backup performance impact of deduplication. • Best performance will be obtained by configuring a larger number of libraries/shares/Catalyst stores with multiple backup streams to each device, although this has a trade off with overall deduplication ratio. ◦ If servers with lots of similar data are to be backed up, a higher deduplication ratio can be achieved by backing them all up to the same library/share/Catalyst store, even if this means directing different media servers to the same data type device configured on the StoreOnce appliance. ◦ If servers contain dissimilar data types, the best deduplication ratio/performance compromise will be achieved by grouping servers with similar data types together into their own dedicated libraries/shares/Catalyst stores. For example, a requirement to back up a set of exchange servers, SQL database servers, file servers and application servers would be best served by creating four virtual libraries, NAS shares or Catalyst stores; one for each server data type. • The best backup performance to a device configured on a StoreOnce appliance is achieved using somewhere below the maximum number of streams per device (the maximum number of streams varies between models • When restoring data from a deduplicating device it must reconstruct the original un-deduplicated data stream from all of the data chunks contained in the deduplication stores. This can result in lower performance than that of the backup process (typically 80%). Restores also typically use only a single stream. • Full backup jobs will result in higher deduplication ratios and better restore performance. Incremental and differential backups will not deduplicate as well. 14 HP StoreOnce technology

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122

2 HP StoreOnce technology
A basic understanding of the way that HP StoreOnce Technology works is necessary in order to
understand factors that may impact performance of the overall system and to ensure optimal
performance of your backup solution.
Data deduplication
HP StoreOnce Technology is an “inline” data deduplication process. It uses hash-based chunking
technology, which analyzes incoming backup data in “chunks” that average 4K in size. The hashing
algorithm generates a unique hash value that identifies each chunk and points to its location in the
deduplication store.
Hash values are stored in an index that is referenced when subsequent backups are performed.
When data generates a hash value that already exists in the index, the data is not stored a second
time, but rather a count is increased showing how many times that hash code has been seen.
Unique data generates a new hash code and that is stored on the appliance. Typically about 2%
of every new backup is new data that generates new hash codes.
With Virtual Tape Library and NAS shares, deduplication always occurs on the StoreOnce Backup
system. With Catalyst stores, deduplication may be configured to occur on the media server
(recommended) or on the StoreOnce Backup system.
Key performance factors with deduplication that occurs on the StoreOnce Backup
system
The inline nature of the deduplication process means that it is a very processor and memory intensive
task. HP StoreOnce appliances have been designed with appropriate processing power and
memory to minimize the backup performance impact of deduplication.
Best performance will be obtained by configuring a larger number of libraries/shares/Catalyst
stores with multiple backup streams to each device, although this has a trade off with overall
deduplication ratio.
If servers with lots of similar data are to be backed up, a higher deduplication ratio can
be achieved by backing them all up to the same library/share/Catalyst store, even if this
means directing different media servers to the same data type device configured on the
StoreOnce appliance.
If servers contain dissimilar data types, the best deduplication ratio/performance
compromise will be achieved by grouping servers with similar data types together into
their own dedicated libraries/shares/Catalyst stores. For example, a requirement to back
up a set of exchange servers, SQL database servers, file servers and application servers
would be best served by creating four virtual libraries, NAS shares or Catalyst stores;
one for each server data type.
The best backup performance to a device configured on a StoreOnce appliance is achieved
using somewhere below the maximum number of streams per device (the maximum number
of streams varies between models
When restoring data from a deduplicating device it must reconstruct the original un-deduplicated
data stream from all of the data chunks contained in the deduplication stores. This can result
in lower performance than that of the backup process (typically 80%). Restores also typically
use only a single stream.
Full backup jobs will result in higher deduplication ratios and better restore performance.
Incremental and differential backups will not deduplicate as well.
14
HP StoreOnce technology