HP D2D HP D2D Backup System Concepts guide (EH985-90915, March 2011) - Page 33

Housekeeping, What is housekeeping?, What effect does housekeeping have on performance? - erase virtual cartridges

Page 33 highlights

6 Housekeeping In this chapter: • What is housekeeping? • What effect does housekeeping have on performance • Why is housekeeping important? • What do I need to do? What is housekeeping? If data is deleted from the D2D system (e.g a virtual cartridge is overwritten or erased), any unique chunks will be marked for removal, any non-unique chunks are de-referenced and their reference count decremented. The process of removing chunks of data is not an inline operation because this would significantly impact performance. This process, termed "housekeeping", runs on the appliance as a background operation, it runs on a per cartridge and NAS file basis and will run as soon as the cartridge is unloaded and returned to its storage slot or a NAS file has completed writing and has been closed by the appliance. What effect does housekeeping have on performance? Whilst the housekeeping process can run as soon as a virtual cartridge is returned to its slot, this could cause a high level of disk access and processing overhead, which would affect other operations such as further backups, restores, tape offload jobs or replication. In order to avoid this problem the housekeeping process will check for available resources before running and, if other operations are in progress, the housekeeping will dynamically hold-off to prevent impacting the performance of other operations. It is, however, important to note that the hold-off is not binary, (i.e. on or off) so, even if backup jobs are in process, some low level of housekeeping will still take place which may have a slight impact on backup performance. Why is housekeeping important? Housekeeping is an important process in order to maximize the deduplication efficiency of the appliance and, as such, it is important to ensure that it has enough time to complete. Running backup, restore, tape offload and replication operations with no break (i.e. 24 hours a day) will result in housekeeping never being able to complete. As a general rule a number of minutes per day should be allowed for every 100 GB of data overwritten on a virtual cartridge or NAS share. For example: if, on a daily basis, the backup application overwrites two cartridges in different virtual libraries with 400 GB of data on each cartridge, an HP D2D4106 appliance would need approximately 30 minutes of quiescent time over the course of the next 24 hours to run housekeeping in order to de-reference data and reclaim any free space. What do I need to do? Configuring backup rotation schemes correctly is very important to ensure the maximum efficiency of the product; doing so reduces the amount of housekeeping that is required and creates a predictable load. As backup on one library or directory in a NAS share finishes it triggers Housekeeping, which then impacts the performance of the backup on the next library or NAS share. If backup jobs can be scheduled to complete at the same time, the impact of Housekeeping on backup performance will be greatly reduced Large housekeeping loads are created when large numbers of cartridges are manually erased or re-formatted. In general all media overwrites should be controlled by the backup rotation scheme What is housekeeping? 33

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52

6 Housekeeping
In this chapter:
What is housekeeping?
What effect does housekeeping have on performance
Why is housekeeping important?
What do I need to do?
What is housekeeping?
If data is deleted from the D2D system (e.g a virtual cartridge is overwritten or erased), any unique
chunks will be marked for removal, any non-unique chunks are de-referenced and their reference
count decremented. The process of removing chunks of data is not an inline operation because
this would significantly impact performance. This process, termed “housekeeping”, runs on the
appliance as a background operation, it runs on a per cartridge and NAS file basis and will run
as soon as the cartridge is unloaded and returned to its storage slot or a NAS file has completed
writing and has been closed by the appliance.
What effect does housekeeping have on performance?
Whilst the housekeeping process can run as soon as a virtual cartridge is returned to its slot, this
could cause a high level of disk access and processing overhead, which would affect other
operations such as further backups, restores, tape offload jobs or replication.
In order to avoid this problem the housekeeping process will check for available resources before
running and, if other operations are in progress, the housekeeping will dynamically hold-off to
prevent impacting the performance of other operations. It is, however, important to note that the
hold-off is not binary, (i.e. on or off) so, even if backup jobs are in process, some low level of
housekeeping will still take place which may have a slight impact on backup performance.
Why is housekeeping important?
Housekeeping is an important process in order to maximize the deduplication efficiency of the
appliance and, as such, it is important to ensure that it has enough time to complete. Running
backup, restore, tape offload and replication operations with no break (i.e. 24 hours a day) will
result in housekeeping never being able to complete.
As a general rule a number of minutes per day should be allowed for every 100 GB of data
overwritten on a virtual cartridge or NAS share. For example: if, on a daily basis, the backup
application overwrites two cartridges in different virtual libraries with 400 GB of data on each
cartridge, an HP D2D4106 appliance would need approximately 30 minutes of quiescent time
over the course of the next 24 hours to run housekeeping in order to de-reference data and reclaim
any free space.
What do I need to do?
Configuring backup rotation schemes correctly is very important to ensure the maximum efficiency
of the product; doing so reduces the amount of housekeeping that is required and creates a
predictable load. As backup on one library or directory in a NAS share finishes it triggers
Housekeeping, which then impacts the performance of the backup on the next library or NAS
share. If backup jobs can be scheduled to complete at the same time, the impact of Housekeeping
on backup performance will be greatly reduced
Large housekeeping loads are created when large numbers of cartridges are manually erased or
re-formatted. In general all media overwrites should be controlled by the backup rotation scheme
What is housekeeping?
33