IBM BS029ML Self Help Guide - Page 174

Problem determination, 5.3.1 Identify the failing component, 5.3.2 JVM problems

Page 174 highlights

5.3 Problem determination Dealing with WebSphere Portal Server problems can at first seem a daunting prospect, even to the most accomplished Portal Administrator. However, with a little knowledge and direction, you can quickly become the master of a situation, quickly identifying and rectifying the problem to a successful resolution. In this section, we endeavour to share with you some of the techniques commonly used and endorsed by the IBM WebSphere Support Team in determining the root cause of problems and solving the problems. Understanding the components involved with WebSphere Portal Server will greatly help your diagnostic and problem solving skills. So, we strongly recommend that you first understand how all the WebSphere Portal Server components work and how all the components are integrated to efficiently debug a problem. This section of the Redpaper compliments WebSphere Portal Version 6 Enterprise Scale Deployment Best Practices, SG24-7387 and the InfoCenter, which have very good information about problem determination and troubleshooting. Refer to: http://publib.boulder.ibm.com/infocenter/wpdoc/v6r0/index.jsp?topic=/com.ibm.wp.en t.doc/wps/pd_stepone.html http://www.redbooks.ibm.com/redbooks/pdfs/sg247387.pdf 5.3.1 Identify the failing component WebSphere Portal Server should be considered as a horizontal framework rather than a sole application because a complete Portal Solution is comprised of many different components. So, with all these components in place, it is very important to narrow down exactly the failing component in case there is a problem. 5.3.2 JVM problems Understanding JVM is very important because the IBM WebSphere platform is built on Java and the Application Server is a Java-servlet-based application deployment environment for server-side applications and JavaBeans. Let us discuss some common problems with JVM and how we can diagnose them. When is a crash a crash and not a hang The properties of a crash and a hang at either level are basically the same. A hang occurs when a process or thread gets stuck waiting for something (usually a lock of some kind or some software/hardware resource) to become free. Waiting for a lock or a resource is not uncommon, but it is when that lock or resource does not become available that a hang occurs. It is also important to note that hangs can sometimes be diagnosed too early. For example, a resource is very busy at a given time; a process or thread that needs to use that resource may then have to wait an unusually long time for that resource to become free. A user may be unaware that the resource is busy and only sees the process waiting, so he interprets that as a hang when it is actually working as designed, albeit slowly. A crash is very different from a hang and occurs when an unexpected hardware or software error occurs. When these errors occur, special error handling is hopefully invoked to dump out diagnostic information and reports that will hopefully be useful to track down the cause of the error. Crashes can be thought of as point-in-time problems that require post-mortem analysis, and hangs can be thought of as real-time problems that one can analyze live. 160 IBM WebSphere Portal V6 Self Help Guide

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242

160
IBM WebSphere Portal V6 Self Help Guide
5.3
Problem determination
Dealing with WebSphere Portal Server problems can at first seem a daunting prospect, even
to the most accomplished Portal Administrator. However, with a little knowledge and direction,
you can quickly become the master of a situation, quickly identifying and rectifying the
problem to a successful resolution.
In this section, we endeavour to share with you some of the techniques commonly used and
endorsed by the IBM WebSphere Support Team in determining the root cause of problems
and solving the problems. Understanding the components involved with WebSphere Portal
Server will greatly help your diagnostic and problem solving skills. So, we strongly
recommend that you first understand how all the WebSphere Portal Server components work
and how all the components are integrated to efficiently debug a problem.
This section of the Redpaper compliments
WebSphere Portal Version 6 Enterprise Scale
Deployment Best Practices
, SG24-7387
and the InfoCenter, which have very good
information about problem determination and troubleshooting. Refer to:
t.doc/wps/pd_stepone.html
5.3.1
Identify the failing component
WebSphere Portal Server should be considered as a horizontal framework rather than a sole
application because a complete Portal Solution is comprised of many different components.
So, with all these components in place, it is very important to narrow down exactly the failing
component in case there is a problem.
5.3.2
JVM problems
Understanding JVM is very important because the IBM WebSphere platform is built on Java
and the Application Server is a Java-servlet-based application deployment environment for
server-side applications and JavaBeans. Let us discuss some common problems with JVM
and how we can diagnose them.
When is a crash a crash and not a hang
The properties of a crash and a hang at either level are basically the same. A hang occurs
when a process or thread gets stuck waiting for something (usually a lock of some kind or
some software/hardware resource) to become free. Waiting for a lock or a resource is not
uncommon, but it is when that lock or resource does not become available that a hang
occurs.
It is also important to note that hangs can sometimes be diagnosed too early. For example, a
resource is very busy at a given time; a process or thread that needs to use that resource may
then have to wait an unusually long time for that resource to become free. A user may be
unaware that the resource is busy and only sees the process waiting, so he interprets that as
a hang when it is actually working as designed, albeit slowly.
A crash is very different from a hang and occurs when an unexpected hardware or software
error occurs. When these errors occur, special error handling is hopefully invoked to dump out
diagnostic information and reports that will hopefully be useful to track down the cause of the
error. Crashes can be thought of as point-in-time problems that require post-mortem analysis,
and hangs can be thought of as real-time problems that one can analyze live.