HP StoreOnce 4430 HP StoreOnce Backup System Concepts and Configuration Guidel - Page 84

Seeding and why it is required

Page 84 highlights

As a general rule of thumb, however, a minimum bandwidth of 2 Mb/s per replication job should be allowed. For example, if a replication target is capable of accepting 8 concurrent replication jobs (HP 4220) and there are enough concurrently running source jobs to reach that maximum, the WAN link needs to be able to provide 16 Mb/s to ensure that replication will run correctly at maximum efficiency - below this threshold replication jobs may begin to pause and restart due to link contention. It is important to note that this minimum value does not ensure that replication will meet the performance requirements of the replication solution, a lot more bandwidth may be required to deliver optimal performance. Seeding and why it is required One of the benefits of deduplication is the ability to identify unique data, which then enables us to replicate between a source and a target StoreOnce Backup system, only transferring the unique data identified. This process only requires low bandwidth WAN links, which is a great advantage to the customer because it delivers automated disaster recovery in a very cost-effective manner. The StoreOnce Management GUI reports bandwidth saving as a key metric of the replication process and in general it is around the 95-98% mark (depending on data change rate). However prior to being able to replicate only unique data between source and target StoreOnce Backup system, we must first ensure that each site has the same hash codes or "bulk data" loaded on it - this can be thought of as the reference data against which future backups are compared to see if the hash codes exist already on the target. The process of getting the same bulk data or reference data loaded on the StoreOnce source and StoreOnce target is known as "seeding". NOTE: With Catalyst the very first low bandwidth backup effectively performs its very own seeding operation. Seeding is generally a one-time operation which must take place before steady-state, low bandwidth replication can commence. Seeding can take place in a number of ways: • Over the WAN link - although this can take some time for large volumes of data. A temporary increase in WAN bandwidth provision by your telco can often alleviate this problem. • Using co-location where two devices are physically in the same location and can use a GbE replication link for seeding- (this is best for Active/Active, Active Passive configurations). After seeding is complete, one unit is physically shipped to its permanent destination. • Using a "Floating" StoreOnce device which moves between multiple remote sites ( best for many to one replication scenarios) • Using a form of removable media (physical tape or portable USB disks) to "ship data" between sites. The recommended way to accelerate seeding is by co-location of the source and target systems on the same LAN whilst performing the first replicate. This process will obviously involve moving one or both of the appliances and will thus prevent them from running their normal backup routines. In order to minimize disruption seeding should ideally only be done once; in this case all backup jobs that are going to be replicated must have completed their first full backup to the source appliance before commencing a seeding operation. Once seeding is complete there will typically be a 90+% hit rate, meaning most of the hash codes are already loaded on the source and target and only the unique data will be transferred during replication. It is good practice to plan for seeding time in your StoreOnce Backup system deployment plan as it can sometimes be very time consuming or manually intensive work. The Sizing Tool calculates expected seeding times over Wan and LAN to help set expectations for how long seeding will take place. In practice a gradual migration of backup jobs to the StoreOnce appliance ensures there is not a sudden surge in seeding requirements but a gradual one, with weekends being used to performer high volume seeding jobs. 84 Replication

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122

As a general rule of thumb, however, a minimum bandwidth of 2 Mb/s per replication job should
be allowed. For example, if a replication target is capable of accepting 8 concurrent replication
jobs (HP 4220) and there are enough concurrently running source jobs to reach that maximum,
the WAN link needs to be able to provide 16 Mb/s to ensure that replication will run correctly at
maximum efficiency – below this threshold replication jobs may begin to pause and restart due to
link contention. It is important to note that this minimum value does not ensure that replication will
meet the performance requirements of the replication solution, a lot more bandwidth may be
required to deliver optimal performance.
Seeding and why it is required
One of the benefits of deduplication is the ability to identify unique data, which then enables us
to replicate between a source and a target StoreOnce Backup system, only transferring the unique
data identified. This process only requires low bandwidth WAN links, which is a great advantage
to the customer because it delivers automated disaster recovery in a very cost-effective manner.
The StoreOnce Management GUI reports bandwidth saving as a key metric of the replication
process and in general it is around the 95-98% mark (depending on data change rate).
However prior to being able to replicate only unique data between source and target StoreOnce
Backup system, we must first ensure that each site has the same hash codes or “bulk data” loaded
on it – this can be thought of as the reference data against which future backups are compared
to see if the hash codes exist already on the target. The process of getting the same bulk data or
reference data loaded on the StoreOnce source and StoreOnce target is known as “seeding”.
NOTE:
With Catalyst the very first low bandwidth backup effectively performs its very own
seeding operation.
Seeding is generally a one-time operation which must take place before steady-state, low bandwidth
replication can commence. Seeding can take place in a number of ways:
Over the WAN link – although this can take some time for large volumes of data. A temporary
increase in WAN bandwidth provision by your telco can often alleviate this problem.
Using co-location where two devices are physically in the same location and can use a GbE
replication link for seeding– (this is best for Active/Active, Active Passive configurations). After
seeding is complete, one unit is physically shipped to its permanent destination.
Using a “Floating” StoreOnce device which moves between multiple remote sites ( best for
many to one replication scenarios)
Using a form of removable media (physical tape or portable USB disks) to “ship data” between
sites.
The recommended way to accelerate seeding is by co-location of the source and target systems
on the same LAN whilst performing the first replicate. This process will obviously involve moving
one or both of the appliances and will thus prevent them from running their normal backup routines.
In order to minimize disruption seeding should ideally only be done once; in this case all backup
jobs that are going to be replicated must have completed their first full backup to the source
appliance before commencing a seeding operation.
Once seeding is complete there will typically be a 90+% hit rate, meaning most of the hash codes
are already loaded on the source and target and only the unique data will be transferred during
replication.
It is good practice to plan for seeding time in your StoreOnce Backup system deployment plan as
it can sometimes be very time consuming or manually intensive work. The Sizing Tool calculates
expected seeding times over Wan and LAN to help set expectations for how long seeding will
take place. In practice a gradual migration of backup jobs to the StoreOnce appliance ensures
there is not a sudden surge in seeding requirements but a gradual one, with weekends being used
to performer high volume seeding jobs.
84
Replication