IBM E027SLL-H Troubleshooting Guide - Page 142

Heartbeat issues when running on a Linux guest using VMware

Page 142 highlights

v During startup, do not start any user interaction when there is not a connected console. v Ensure that a korn shell [ksh] is available. In general, any shell can be used for .profile except csh, which has problems with output redirection. v Eliminate any logic that can create an error associated with undefined variable evaluation; or use korn file controls to suppress the errors. v Set the PATH statements to what is needed for the environment. v Ensure that the .profile completes and does not loop. If any of these requirements are violated, then the results can be failure to start or even failure for normal server processes to start. The .profile should be simple and clear. This might require creating a special user ID for this purpose to avoid impacting other users. Heartbeat issues when running on a Linux guest using VMware When the Linux operating system is run as a guest using VMware, it is possible for the clock of the Linux guest to run faster or slower than real world time. If any IBM Tivoli Monitoring products are installed on Linux guests whose clocks are not running correctly, the result can be erratic system behavior. For example, if the Linux OS monitoring agent is installed on a Linux operating system guest whose clock is running too slow, heartbeats from the agent are not produced on time. This results in the agent continuously going OFFLINE and ONLINE at the Tivoli Enterprise Monitoring Server because the heartbeats arrive after the time interval has expired. The VMware company is aware of this issue, and has written several articles that address this problem. Search on "linux guest clock" in the VMware Knowledge Base (http://kb.vmware.com/selfservice/microsites/microsite.do). See also IBM Service Management Connect (http://www.vmware.com/files/pdf/TimekeepingIn-VirtualMachines.pdf) How to tell if you have this problem: A simple way for determining whether your Linux guest has a clock problem is to benchmark it against a real world clock. Here is an example of a procedure that you can use: 1. From a Linux shell prompt, type "date" to get the current system date and time. While you are pressing Enter, look at a "real" clock (wall clock, watch, etc...) to get the real world time in minutes and seconds. Record the time from both your Linux guest and the "real" clock. Example: Real Clock = 10:30:00, Linux Clock = 10:20:35 2. After 10 real time minutes have expired, type the "date" command again (you should type the "date" command ahead of time, so you only have to press Enter when 10 minutes have elapsed). Record the new times from both your Linux guest and "real" clock. Example: Real Clock = 10:40:00, Linux Clock = 10:26:35 3. Compute the elapsed time for both your Linux guest and "real" clock. If the elapsed times are not the same, your Linux guest has a clock problem. Since we waited exactly 10 minutes using the "real" clock, we would expect that the elapsed time for the Linux clock would also be 10 minutes. Using the above figures, we can see that the elapsed time for the Linux guest is 6 minutes (10:26:35 - 10:20:35). Since this is less than the real world time, this means that 124 IBM Tivoli Monitoring: Troubleshooting Guide

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310

v
During startup, do not start any user interaction when there is not a connected
console.
v
Ensure that a korn shell [ksh] is available. In general, any shell can be used for
.profile
except csh, which has problems with output redirection.
v
Eliminate any logic that can create an error associated with undefined variable
evaluation; or use korn file controls to suppress the errors.
v
Set the PATH statements to what is needed for the environment.
v
Ensure that the
.profile
completes and does not loop.
If any of these requirements are violated, then the results can be failure to start or
even failure for normal server processes to start. The
.profile
should be simple
and clear. This might require creating a special user ID for this purpose to avoid
impacting other users.
Heartbeat issues when running on a Linux guest using VMware
When the Linux operating system is run as a guest using VMware, it is possible
for the clock of the Linux guest to run faster or slower than real world time. If any
IBM Tivoli Monitoring products are installed on Linux guests whose clocks are not
running correctly, the result can be erratic system behavior.
For example, if the Linux OS monitoring agent is installed on a Linux operating
system guest whose clock is running too slow, heartbeats from the agent are not
produced on time. This results in the agent continuously going OFFLINE and
ONLINE at the Tivoli Enterprise Monitoring Server because the heartbeats arrive
after the time interval has expired.
The VMware company is aware of this issue, and has written several articles that
address this problem. Search on “linux guest clock” in the VMware Knowledge
In-VirtualMachines.pdf)
How to tell if you have this problem:
A simple way for determining whether your Linux guest has a clock problem is to
benchmark it against a real world clock. Here is an example of a procedure that
you can use:
1.
From a Linux shell prompt, type "date" to get the current system date and time.
While you are pressing
Enter
, look at a "real" clock (wall clock, watch, etc...) to
get the real world time in minutes and seconds. Record the time from both
your Linux guest and the "real" clock.
Example:
Real Clock = 10:30:00, Linux Clock = 10:20:35
2.
After 10 real time minutes have expired, type the "date" command again (you
should type the "date" command ahead of time, so you only have to press
Enter
when 10 minutes have elapsed). Record the new times from both your
Linux guest and "real" clock.
Example:
Real Clock = 10:40:00, Linux Clock = 10:26:35
3.
Compute the elapsed time for both your Linux guest and "real" clock. If the
elapsed times are not the same, your Linux guest has a clock problem.
Since we waited exactly 10 minutes using the "real" clock, we would expect
that the elapsed time for the Linux clock would also be 10 minutes. Using the
above figures, we can see that the elapsed time for the Linux guest is 6 minutes
(10:26:35 - 10:20:35). Since this is less than the real world time, this means that
124
IBM Tivoli Monitoring: Troubleshooting Guide