Computer Associates SQLSTQ99000600 Diagnostics Guide - Page 25

Where It Happened, Separate the Problem from the Symptoms, Basic Troubleshooting

Page 25 highlights

Separate the Problem from the Symptoms Where It Happened The next step is to further isolate the location of the error to determine how widespread it is. Useful questions to ask include the following: „ Which machines are affected (for example, was software inventory collected from all but three computers in a selected computer group, is the problem specific to a particular computer or subnet)? Note: Limit the number of functions being performed by the suspect machine to further isolate the problem. This will also reduce the amount of data that will need to be sifted though in the log files. „ If more than one machine is affected, what do they have in common (for example, does the installation only fail on computers using a dial-up connection)? „ Which specific component had the error (for example, DSM Agent on MachineA)? „ If more than one component is affected, how do these components relate to one another? By isolating the specific machines on which the error is observed, you can find other similarities that may identify the root of the problem. When It Happened Another crucial step is to identify the time the problem occurred. This can help you limit potential causes to only those events that happened during that time. Useful questions to ask include the following: „ What function or functions were being performed at the time? „ When did the error first occur (for example, the Monday after a long weekend, shortly after the router was replaced, after a change to daylight savings time, and so on)? „ Has the problem been repeated since that first observation? If so, is there a pattern to that repetition (for example, every Friday after the weekly backup is performed)? „ What changes were made to the product before the problem occurred (for example, was an upgrade recently applied)? „ How often does the problem recur (for example, if it occurs during the execution of a particular process does it ALWAYS occur when that process executes)? „ What other events occurred at that time (for example, does the error occur only during times of heavy network traffic and disappear when the load is lighter)? Chapter 4: Basic Troubleshooting 4-3

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70

Separate the Problem from the Symptoms
Where It Happened
The next step is to further isolate the location of the error to determine how
widespread it is. Useful questions to ask include the following:
Which machines are affected (for example, was software inventory
collected from all but three computers in a selected computer group, is the
problem specific to a particular computer or subnet)?
Note:
Limit the number of functions being performed by the suspect
machine to further isolate the problem.
This will also reduce the amount
of data that will need to be sifted though in the log files.
If more than one machine is affected, what do they have in common (for
example, does the installation only fail on computers using a dial-up
connection)?
Which specific component had the error (for example, DSM Agent on
MachineA)?
If more than one component is affected, how do these components relate
to one another?
By isolating the specific machines on which the error is observed, you can find
other similarities that may identify the root of the problem.
When It Happened
Another crucial step is to identify the time the problem occurred. This can help
you limit potential causes to only those events that happened during that time.
Useful questions to ask include the following:
What function or functions were being performed at the time?
When did the error first occur (for example, the Monday after a long
weekend, shortly after the router was replaced, after a change to daylight
savings time, and so on)?
Has the problem been repeated since that first observation? If so, is there
a pattern to that repetition (for example, every Friday after the weekly
backup is performed)?
What changes were made to the product before the problem occurred (for
example, was an upgrade recently applied)?
How often does the problem recur (for example, if it occurs during the
execution of a particular process does it ALWAYS occur when that process
executes)?
What other events occurred at that time (for example, does the error
occur only during times of heavy network traffic and disappear when the
load is lighter)?
Chapter 4: Basic Troubleshooting
4–3