HP StoreOnce 4430 HP StoreOnce Backup System Concepts and Configuration Guidel - Page 16

Backup Application considerations, Multi-stream or multiplex, what do they mean?

Page 16 highlights

reclamation). The process of removing chunks of data is not an inline operation because this would significantly impact performance. This process, termed "housekeeping", runs on the appliance as a background operation. Housekeeping is triggered in different ways depending on device type and backup application: • VTL: media on which the data retention period has expired will be overwritten by the backup application. The act of overwriting triggers the housekeeping of the expired data. If media is not overwritten (if backup application chooses to use blank media in preference to overwriting), the expired media continues to occupy disk space. • NAS shares: Some backup applications overwrite with the same file names after expiration; others do an expiry check before writing new data to the share; others might do a quota check before overwriting. Any of these actions triggers housekeeping. • Catalyst stores: The backup application clean-up process, the running of which is configurable, regularly checks for expired backups and removes catalog entries. This provides a much more structured space reclamation process. See Housekeeping (page 106) for more information about configuring housekeeping and best practices. Backup Application considerations "Multiplexing" data streams from different sources into a single stream in order to get higher throughput used to be a common best practice when using physical tape drives. This was a necessity in order to make the physical tape drive run in streaming mode, especially if the individual hosts could not supply data fast enough. But multiplexing is not required and is in fact a BAD practice if used with HP StoreOnce StoreOnce deduplication devices. Multi-stream or multiplex, what do they mean? Multi-streaming is often confused with Multiplexing; these are however two different (but related) terms. Multi-streaming is when multiple data streams are sent to the StoreOnce Backup system simultaneously but separately. Multiplexing is a configuration whereby data from multiple sources (for example multiple client servers) is backed up to a single tape drive device by interleaving blocks of data from each server simultaneously and combined into a single stream. Multiplexing is a hangover from using physical tape device, and was required in order to maintain good performance where source servers were slow because it aggregates multiple source server backups into a single stream. A multiplexed data stream configuration is NOT recommended for use with a StoreOnce system or any other deduplicating device. This is because the interleaving of data from multiple sources is not consistent from one backup to the next and significantly reduces the ability of the deduplication process to work effectively; it also reduces restore performance. Care must be taken to ensure that multiplexing is not happening by default in a backup application configuration. For example when using HP Data Protector to back up multiple client servers in a single backup job, it will default to writing four concurrent multiplexed servers in a single stream. This must be disabled by reducing the "Concurrency" configuration value for the tape device from 4 to 1. Why multiplexing is a bad practice HP StoreOnce Backup systems rely on very similar repetitive data streams in order to de-duplicate data effectively. When multiplexing is deployed the backup data streams are not guaranteed to be similar, since the multiplexing can jumble up the data streams from one backup to the next backup in different ways - hence drastically reducing the deduplication ratios. There is no need for multiplexing to get higher performance - quite the contrary, because the best way to get performance from any HP StoreOnce Backup system is to send multiple streams in parallel. Sending only a single multiplexed stream actually reduces performance. 16 HP StoreOnce technology

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122

reclamation). The process of removing chunks of data is not an inline operation because this would
significantly impact performance. This process, termed “housekeeping”, runs on the appliance as
a background operation.
Housekeeping is triggered in different ways depending on device type and backup application:
VTL: media on which the data retention period has expired will be overwritten by the backup
application. The act of overwriting triggers the housekeeping of the expired data. If media is
not overwritten (if backup application chooses to use blank media in preference to overwriting),
the expired media continues to occupy disk space.
NAS shares: Some backup applications overwrite with the same file names after expiration;
others do an expiry check before writing new data to the share; others might do a quota check
before overwriting. Any of these actions triggers housekeeping.
Catalyst stores: The backup application clean-up process, the running of which is configurable,
regularly checks for expired backups and removes catalog entries. This provides a much more
structured space reclamation process.
See
Housekeeping (page 106)
for more information about configuring housekeeping and best
practices.
Backup Application considerations
“Multiplexing” data streams from different sources into a single stream in order to get higher
throughput used to be a common best practice when using physical tape drives. This was a necessity
in order to make the physical tape drive run in streaming mode, especially if the individual hosts
could not supply data fast enough. But multiplexing is not required and is in fact a BAD practice
if used with HP StoreOnce StoreOnce deduplication devices.
Multi-stream or multiplex, what do they mean?
Multi-streaming is often confused with Multiplexing; these are however two different (but related)
terms. Multi-streaming is when multiple data streams are sent to the StoreOnce Backup system
simultaneously but separately. Multiplexing is a configuration whereby data from multiple sources
(for example multiple client servers) is backed up to a single tape drive device by interleaving
blocks of data from each server simultaneously and combined into a single stream. Multiplexing
is a hangover from using physical tape device, and was required in order to maintain good
performance where source servers were slow because it aggregates multiple source server backups
into a single stream.
A multiplexed data stream configuration is NOT recommended for use with a StoreOnce system
or any other deduplicating device. This is because the interleaving of data from multiple sources
is not consistent from one backup to the next and significantly reduces the ability of the deduplication
process to work effectively; it also reduces restore performance. Care must be taken to ensure that
multiplexing is not happening by default in a backup application configuration. For example when
using HP Data Protector to back up multiple client servers in a single backup job, it will default to
writing four concurrent multiplexed servers in a single stream. This must be disabled by reducing
the “Concurrency” configuration value for the tape device from 4 to 1.
Why multiplexing is a bad practice
HP StoreOnce Backup systems rely on very similar repetitive data streams in order to de-duplicate
data effectively. When multiplexing is deployed the backup data streams are not guaranteed to
be similar, since the multiplexing can jumble up the data streams from one backup to the next
backup in different ways – hence drastically reducing the deduplication ratios.
There is no need for multiplexing to get higher performance – quite the contrary, because the best
way to get performance from any HP StoreOnce Backup system is to send multiple streams in
parallel. Sending only a single multiplexed stream actually reduces performance.
16
HP StoreOnce technology