Compaq ProLiant CL1850 ServerNet II SAN Interconnect for Scalable Computing Cl - Page 7

Low Latency Techniques Improve Performance, ServerNet II Performance Summary

Page 7 highlights

WHITE PAPER (cont.) Doc Number TC000602WP ... Low Latency Techniques Improve Performance ServerNet II uses two techniques to ensure low-latency data transmission: wormhole routing and the push/pull approach. In wormhole routing, ServerNet II retrieves the data and divides it into 512-byte packets. The destination address is added to the front end of each packet, which allows the switch to route all packets to the destination node. If necessary, packets from the same data file can travel different paths to reach the destination. As the first bytes of a packet reach the router, the router decodes the packet address and routes the head of the packet to its destination before the entire packet is received. The push/pull (write/read) approach of ServerNet II allows the burden of data movement to be absorbed by either the source or target server. At the beginning of a push (write) transaction, the source notifies the destination to allocate enough buffers to receive a large message. Before sending the data, the source waits for acknowledgment from the destination that the buffers are available. To pull (read) data, the destination allocates buffers before it requests data. Then, it transfers the data through the ServerNet II PCI Adapters without interrupting the OS or application. ServerNet II Performance Summary The efficiency of messaging is defined in terms of CPU utilization, latency, and bandwidth. ServerNet II communication protocols are implemented in native hardware to reduce CPU utilization to a small fraction of that experienced with traditional protocols such as TCP/IP. ServerNet II protocols also reduce the typical operating system services necessary to support traditional protocols. For small messages (64 Bytes or less), latency and CPU measurements show that ServerNet II provides a three-fold improvement over Gigabit Ethernet and TCP/IP. The latency of small messages approaches ten microseconds. For large messages (16KBytes or greater), measurements show that each ServerNet II link delivers a bi-directional bandwidth of 180 MB/s while consuming less than 2 percent of CPU resources. By comparison, gigabit Ethernet bandwidth can approach 100 MB/s, but only by consuming intolerable amounts of CPU resources in both the source and the destination nodes. The table below summarizes results of various ServerNet II hardware tests performed using a Compaq ProLiant 8500 server configured with eight 500-MHz processors. The server was processing 512-byte bursts over a 64-bit, 66-MHz PCI bus. ServerNet II Performance Summary* Performance Test: Measured 8 byte delivery latency with Send/Receive poll 12 µs 8 byte delivery latency with Send/Receive wait 32 µs 64 byte CPU cost with Send/Receive wait 64 byte CPU cost with Lazy Send/Receive wait¶ 27 µs 29 µs 64K 1-way throughput RDMA writes (1VI-4 VIs) 92-132 MB/s 64K 1-way throughput RDMA reads (1 VI-4 VIs) 129-134 MB/s 64K 2-way throughput RDMA (reads-writes) 181-194 MB/s 64K RDMA throughput test CPU utilization ~0% * This number was measured on 33MHz, 32-bit PCI with a 500MHz CPU. 7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

W
HITE
P
APER
(cont.)
7
Doc Number
TC000602WP
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Low Latency Techniques Improve Performance
ServerNet II uses two techniques to ensure low-latency data transmission: wormhole routing and
the push/pull approach.
In wormhole routing, ServerNet II retrieves the data and divides it into
512-byte packets.
The destination address is added to the front end of each packet, which allows
the switch to route all packets to the destination node.
If necessary, packets from the same data file
can travel different paths to reach the destination.
As the first bytes of a packet reach the router, the
router decodes the packet address and routes the head of the packet to its destination before the
entire packet is received.
The push/pull (write/read) approach of ServerNet II allows the burden of data movement to be
absorbed by either the source or target server.
At the beginning of a push (write) transaction, the
source notifies the destination to allocate enough buffers to receive a large message.
Before
sending the data, the source waits for acknowledgment from the destination that the buffers are
available.
To pull (read) data, the destination allocates buffers before it requests data.
Then, it
transfers the data through the ServerNet II PCI Adapters without interrupting the OS or application.
ServerNet II Performance Summary
The efficiency of messaging is defined in terms of CPU utilization, latency, and bandwidth.
ServerNet II communication protocols are implemented in native hardware to reduce CPU
utilization to a small fraction of that experienced with traditional protocols such as TCP/IP.
ServerNet II protocols also reduce the typical operating system services necessary to support
traditional protocols.
For small messages (64 Bytes or less), latency and CPU measurements show
that ServerNet II provides a three-fold improvement over Gigabit Ethernet and TCP/IP.
The
latency of small messages approaches ten microseconds.
For large messages (16KBytes or
greater), measurements show that each ServerNet II link delivers a bi-directional bandwidth of 180
MB/s while consuming less than 2 percent of CPU resources.
By comparison, gigabit Ethernet
bandwidth can approach 100 MB/s, but only by consuming intolerable amounts of CPU resources
in both the source and the destination nodes.
The table below summarizes results of various
ServerNet II hardware tests performed using a Compaq ProLiant 8500 server configured with eight
500-MHz processors.
The server was processing 512-byte bursts over a 64-bit, 66-MHz PCI bus.
ServerNet II Performance Summary*
Performance Test:
Measured
8 byte delivery latency with Send/Receive poll
12 μs
8 byte delivery latency with Send/Receive wait
32 μs
64 byte CPU cost with Send/Receive wait
27 μs
64 byte CPU cost with Lazy Send/Receive wait
29 μs
64K 1-way throughput RDMA writes (1VI-4 VIs)
92-132 MB/s
64K 1-way throughput RDMA reads (1 VI-4 VIs)
129-134 MB/s
64K 2-way throughput RDMA (reads-writes)
181-194 MB/s
64K RDMA throughput test CPU utilization
~0%
* This number was measured on 33MHz, 32-bit PCI with a 500MHz CPU.