HP Cluster Platform Interconnects v2010 Quadrics QsNetII Interconnect - Page 120
failing data bit.
View all HP Cluster Platform Interconnects v2010 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 120 highlights
since the last time that all the boards and chips were selected and cleared. When using this raw format of error data, you must decide whether the registers are reporting genuine link errors or simply errors due to node reboots. You look for a link to show errors repetitively, every day, during normal production mode testing. Use the following procedure to run this test: 1. Open a connection to the interconnect's master control card, or launch the jtest utility remotely as described in Section 11.2. 2. At the jtest utility prompt, select all boards as follows: # jtest> b -1 board in slot 0 is of type QM501_CU board in slot 4 is of type QM502_CU board in slot 8 is of type QM503 board in slot 9 is of type QM503 3. At the jtest utility prompt, select all switch chips as follows: # jtest> c -1 4. At the jtest utility prompt, enter the error command: # jtest> error jtest: no errors on boards 0 4 8 9 chips : 0 1 2 3 4 5 6 7 jtest> If you see the same repetitive error occurring on a link, that error indicates a potential fault. The error registers do not count the number of errors, just indicate that at least 1 error has occurred since the register was last cleared. The jtest error command generates the following information: • B:C:L The board, chip and link being reported. • E An error has occurred. • RtCRC CRC error on route byte (packet and transaction error). This indicates some bit errors on the route values. • TrCRC CRC error on transaction (packet and transaction error). This indicates some bit errors in one of the transactions. • RcvLk Receiver lock error (low level line error). Problems with the received or local clock. • Dskew Deskew error (low level line error). Only likely to be caused by a hard failing data bit. • Phase Phase error (low level line error). Probably a missed clock on the incoming link. • DataE Data error (low level line error). Not a valid data value or a valid token. • ChM45 Mod 4/5 change detected on link (low level line error). • Fifo0 FIFO overrun on virtual channel 0 (protocol error). • Fifo1 FIFO overrun on virtual channel 1 (protocol error). • OpenT Packet has been open at the input for too long (protocol error). • PktRT Packet acknowledge return error (protocol error). Protocol errors are normally caused by very high rates of errors on another link. They can only be caused by double or triple bit errors converting one type of token into another valid token. Note that data errors occur when a node is reset. The following example demonstrates a protocol error: B:C:L E RtCRC TrCRC RcvLk Dskew Phase Fifo0 Fifo1 OpenT PktRT ChM45 DataE Value 0:0:0 1 0 0 10 0 1 1 1 0 0 0 1 00f022 12-18 Maintenance and Diagnostic Procedures