HP D2D D2D Best Practices for VTL, NAS and Replication implementations (EH985- - Page 55

55

Amount of data in each backup

Data change per backup (deduplication ratio)

Number of D2D systems replicating

Number of concurrent replication jobs from each source

Number of concurrent replication jobs to each target

As a general rule of thumb, however, a minimum bandwidth of 2 Mb/s per replication job should be allowed.

For example, if a replication target is capable of accepting 8 concurrent replication jobs (HP D2D4112) and

there are enough concurrently running source jobs to reach that maximum, the WAN link needs to be able to

provide 16 Mb/s to ensure that replication will run correctly at maximum efficiency

–

below this threshold

replication jobs will begin to pause and restart due to link contention. It is important to note that this minimum

value does not ensure that replication will meet the performance requirements of the replication solution, a lot

more bandwidth may be required to deliver optimal performance.

Seeding and why it is required

One of the benefits of deduplication is the ability to identify unique data, which then enables us to replicate

between a source and a target D2D, only transferring the unique data identified. This process only requires low

bandwidth WAN links, which is a great advantage to the customer because it delivers automated disaster

recovery in a very cost-effective manner.

However

prior

to being able to replicate only unique data between source and target D2D, we must first ensure

that each site has the

same

hash codes or ―bulk data‖ loaded on it –

this can be thought of as the reference data

against which future backups are compared to see if the hash codes exist already on either source or target. The

process of getting the

same

bulk data or reference data loaded on the D2D source and D2D target is known as

―seeding‖.

Seeding is generally is a one-time operation which must take place before steady-state, low bandwidth

replication can commence. Seeding can take place in a number of ways:

Over the WAN link

–

although this can take some time for large volumes of data

Using co-location where two devices are physically in the same location and can use a GbE replication

link for seeding. After seeding is complete, one unit is physically shipped to its permanent destination.

Using a form of removable media (physical tape or portable USB disks) to ―ship data‖ between sites.

Once seeding is complete there will typically be a 90+% hit rate, meaning most of the hash codes are already

loaded on the source and target and only the unique data will be transferred during replication.

It is good practice to plan for seeding time in your D2D deployment plan as it can sometimes be very time

consuming or manually intensive work.

During the seeding process it is recommended that no other operations are taking place on the source D2D, such

as further backups or tape copies. It is also important to ensure that the D2D has no failed disks and that RAID

parity initialization is complete because these will impact performance.

When seeding over fast networks (co-located D2D devices) it should be expected that performance to replicate a

cartridge or file is similar to the performance of the original backup. If, however, a lot of replication jobs are

running to a single target appliance from several source appliances, performance will be reduced due to the

amount of disk activity required on the target system.

Replication models and seeding

The diagrams in

Replication usage models

starting on page

49

indicate the different replication models

supported by HP D2D Backup Systems; the complexity of the replication models has a direct influence on which

seeding process is best.

For example an Active

–

Passive replication model can easily use co-location to quickly

seed the target device, where as co-location may not be the best seeding method to use with a 50:1, many to 1

replication model.

HP D2D D2D Best Practices for VTL, NAS and Replication implementations (EH985- - Page 55

Seeding and why it is required - appliance state not running

Page 55 highlights