HP ProLiant DL380G5-WSS 3.7.0 HP StorageWorks HP Scalable NAS File Serving Sof - Page 370

Increase the membership partition timeout, Before setting the timeout

Page 370 highlights

unlikely; however, if HP Scalable NAS cannot be started on any server in the cluster, you can use the following command to determine whether all membership partitions have a valid Cluster-ID. mprepair --sync-clusterids The command displays the Cluster-IDs found in each membership partition and flags those partitions containing an invalid ID. You can then specify whether you want the command to repair the partitions having a mismatched Cluster-ID. mprepair --get_current_mps can also be used to obtain more information about the membership partitions. Increase the membership partition timeout Under heavy I/O load, I/O timeouts can occur on membership partition accesses. The I/O timeouts are reported as SCSI error : return code = 50000 in the file /var/log/messages. The I/O timeouts can cause problems such as the following: • Excessive path switching. • Filesystems appearing to be hung when a node crashes. Large numbers of I/O timeouts can extend the time it takes to fence the node, and filesystem operations cannot resume until the node is fenced. If your site is experiencing the above problems due to I/O timeouts, you may want to increase the I/O timeout parameter for accessing membership partitions. You will need to set the timeout on each node currently in the cluster and on any nodes added to the cluster. Before setting the timeout, be sure to stop HP Scalable NAS. To increase the timeout, edit the file /etc/opt/hpcfs/mxinit.conf. Locate the following line in the file: # sanpulse_start_options = { "--mxinit" }; You will need to add the parameter "-o sdmp_io_timeout=" to the start options. Also remove the comment character (#) from the beginning of the line: sanpulse_start_options = { "--mxinit","-o sdmp_io_timeout= " }; is the number of milliseconds to be used as the I/O timeout for accessing membership partitions. The default value is 30,000ms (30 seconds). Be sure to increase the timeout value in small increments, such as 5,000ms. If the timeout 370 SAN maintenance

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • 314
  • 315
  • 316
  • 317
  • 318
  • 319
  • 320
  • 321
  • 322
  • 323
  • 324
  • 325
  • 326
  • 327
  • 328
  • 329
  • 330
  • 331
  • 332
  • 333
  • 334
  • 335
  • 336
  • 337
  • 338
  • 339
  • 340
  • 341
  • 342
  • 343
  • 344
  • 345
  • 346
  • 347
  • 348
  • 349
  • 350
  • 351
  • 352
  • 353
  • 354
  • 355
  • 356
  • 357
  • 358
  • 359
  • 360
  • 361
  • 362
  • 363
  • 364
  • 365
  • 366
  • 367
  • 368
  • 369
  • 370
  • 371
  • 372
  • 373
  • 374
  • 375
  • 376
  • 377
  • 378
  • 379
  • 380
  • 381
  • 382
  • 383
  • 384
  • 385
  • 386
  • 387
  • 388
  • 389
  • 390
  • 391
  • 392
  • 393
  • 394
  • 395
  • 396
  • 397
  • 398
  • 399
  • 400
  • 401
  • 402
  • 403
  • 404
  • 405
  • 406
  • 407
  • 408
  • 409
  • 410
  • 411
  • 412
  • 413
  • 414
  • 415
  • 416
  • 417
  • 418
  • 419
  • 420
  • 421
  • 422
  • 423
  • 424
  • 425
  • 426
  • 427
  • 428
  • 429
  • 430
  • 431
  • 432
  • 433
  • 434
  • 435

unlikely; however, if HP Scalable NAS cannot be started on any server in the cluster,
you can use the following command to determine whether all membership partitions
have a valid Cluster-ID.
mprepair --sync-clusterids
The command displays the Cluster-IDs found in each membership partition and flags
those partitions containing an invalid ID. You can then specify whether you want the
command to repair the partitions having a mismatched Cluster-ID.
mprepair --get_current_mps
can also be used to obtain more information
about the membership partitions.
Increase the membership partition timeout
Under heavy I/O load, I/O timeouts can occur on membership partition accesses.
The I/O timeouts are reported as
SCSI error : <...> return code = 50000
in the file
/var/log/messages
. The I/O timeouts can cause problems such as the
following:
Excessive path switching.
Filesystems appearing to be hung when a node crashes. Large numbers of I/O
timeouts can extend the time it takes to fence the node, and filesystem operations
cannot resume until the node is fenced.
If your site is experiencing the above problems due to I/O timeouts, you may want
to increase the I/O timeout parameter for accessing membership partitions. You will
need to set the timeout on each node currently in the cluster and on any nodes added
to the cluster.
Before setting the timeout, be sure to stop HP Scalable NAS.
To increase the timeout, edit the file
/etc/opt/hpcfs/mxinit.conf
. Locate the
following line in the file:
# sanpulse_start_options = { "--mxinit" };
You will need to add the parameter
"-o sdmp_io_timeout=<millisec>"
to
the start options. Also remove the comment character (#) from the beginning of the
line:
sanpulse_start_options = { "--mxinit","-o sdmp_io_timeout= <millisec>" };
<millisec>
is the number of milliseconds to be used as the I/O timeout for
accessing membership partitions. The default value is 30,000ms (30 seconds). Be
sure to increase the timeout value in small increments, such as 5,000ms. If the timeout
SAN maintenance
370