Dell DX6004S DX Object Storage Administration Guide - Page 57

B.3.3.5. volumeRecoverySuspend, B.3.4. Practical SNMP with DX Storage, B.3.4.1. Health Monitoring - drivers

Page 57 highlights

B.3.3.5. volumeRecoverySuspend Writing to this object allows an administrator to suspend volume recovery behavior in the cluster during an upgrade or a network outage. B.3.4. Practical SNMP with DX Storage This section outlines some practical approaches to using the built-in SNMP agent in order to monitor the health and operational aspects of a DX Storage cluster. Although an administrator may setup a simple ICMP ping monitor of a DX Storage node, using the SNMP variables allows detailed indications of disk and capacity problems. B.3.4.1. Health Monitoring The following variables are useful for monitoring the basic health of a DX Storage node. The volume table will have n from 1 to the number of disk volumes. • caringo.castor.castorState : should equal "ok" • caringo.castor.castorVolTable.volEntry.volState.n : should equal "ok" • caringo.castor.castorVolTable.volEntry.volErrors.n : should be zero If the monitoring console receives timeouts when trying to read these variables, there is something wrong with the node. If the state values are anything other than "ok," then the node or the disks are transitioning from their normal state. The valid states for a node are: ok, retiring, retired. The valid states for a disk volume are: ok, retiring, retired, unavailable. Any non-zero value in a volume's error count indicates that a hard error has surfaced from the disk hardware through the OS driver and to the DX Storage process. B.3.4.2. Capacity Monitoring The following variables can be monitored and collected for capacity alerting and reporting. The volume table will have n from 1 to the number of disk volumes. • caringo.castor.castorFreeSlots : should be greater than 0 • caringo.castor.castorVolTable.volEntry.volMaxMbytes.n • caringo.castor.castorVolTable.volEntry.volFreeMbytes.n • caringo.castor.castorVolTable.volEntry.volTrappedMbytes.n The castorFreeSlots variable indicates how many more objects a node can hold before it exhausts its memory index. If this happens, the node will be unable to store additional objects until streams are deleted or moved to other cluster nodes. In order to compute the amount of disk space that is available for writing content, add the values volFreeMbytes and volTrappedMbytes. Thus, the percent free space on a disk volume is: (volFreeMbytes + volTrappedMbytes) / volMaxMbytes Similarly, the percent of space being used by current content is: Copyright © 2010 Caringo, Inc. All rights reserved 52 Version 5.0 December 2010

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74

Copyright © 2010 Caringo, Inc.
All rights reserved
52
Version 5.0
December 2010
B.3.3.5. volumeRecoverySuspend
Writing to this object allows an administrator to suspend volume recovery behavior in the cluster
during an upgrade or a network outage.
B.3.4. Practical SNMP with DX Storage
This section outlines some practical approaches to using the built-in SNMP agent in order to monitor
the health and operational aspects of a DX Storage cluster. Although an administrator may setup
a simple ICMP ping monitor of a DX Storage node, using the SNMP variables allows detailed
indications of disk and capacity problems.
B.3.4.1. Health Monitoring
The following variables are useful for monitoring the basic health of a DX Storage node. The volume
table will have n from 1 to the number of disk volumes.
caringo.castor.castorState : should equal “ok”
caringo.castor.castorVolTable.volEntry.volState.n : should equal “ok”
caringo.castor.castorVolTable.volEntry.volErrors.n : should be zero
If the monitoring console receives timeouts when trying to read these variables, there is something
wrong with the node. If the state values are anything other than “ok,” then the node or the disks are
transitioning from their normal state.
The valid states for a node are: ok, retiring, retired.
The valid states for a disk volume are: ok, retiring, retired, unavailable.
Any non-zero value in a volume’s error count indicates that a hard error has surfaced from the disk
hardware through the OS driver and to the DX Storage process.
B.3.4.2. Capacity Monitoring
The following variables can be monitored and collected for capacity alerting and reporting. The
volume table will have n from 1 to the number of disk volumes.
caringo.castor.castorFreeSlots : should be greater than 0
• caringo.castor.castorVolTable.volEntry.volMaxMbytes.n
• caringo.castor.castorVolTable.volEntry.volFreeMbytes.n
• caringo.castor.castorVolTable.volEntry.volTrappedMbytes.n
The castorFreeSlots variable indicates how many more objects a node can hold before it exhausts
its memory index. If this happens, the node will be unable to store additional objects until streams
are deleted or moved to other cluster nodes.
In order to compute the amount of disk space that is available for writing content, add the values
volFreeMbytes and volTrappedMbytes. Thus, the percent free space on a disk volume is:
(volFreeMbytes + volTrappedMbytes) / volMaxMbytes
Similarly, the percent of space being used by current content is: