HP DL740 HP F8 Architecture Technology Brief - Page 8

Architectural Differences From Storage Subsystem RAID, F8 Crossbar Switch, Buffer Design

Page 8 highlights

HP F8 Architecture Architectural Differences From Storage Subsystem RAID F8 Crossbar Switch Buffer Design The technology used in HP Hot-Plug RAID Memory is conceptually similar to RAID technology that provides fault tolerance and high availability in storage subsystems for servers. However, there are some key performance and implementation differences between Hot-Plug RAID Memory and typical storage subsystem RAID. Hot-Plug RAID Memory does not have the mechanical delays of seek time and rotational latency associated with hard disk drive arrays. Storage subsystem arrays use a single bus to write the stripes sequentially across multiple drives. In contrast, HP Hot-Plug RAID Memory uses parallel point-to-point connections so that data is written simultaneously across multiple memory cartridges. Also, HP Hot-Plug RAID Memory eliminates the write bottleneck associated with typical storage subsystem RAID implementations. In a storage array, the RAID controller generally performs a read operation of existing parity before a write operation can be completed. If a dedicated parity drive is being used, a bottleneck occurs. But because HP Hot-Plug RAID Memory operates on an entire cache line of data, there is no need to read existing parity before a write operation, thus eliminating this performance bottleneck. When a traditional striped RAID storage subsystem rebuilds data, there is no data protection should another drive fail. However, the F8 chipset operates in a typical (nonredundant) ECC mode while data is being rebuilt. As a result, even if a secondary memory failure occurs during a rebuild operation, the data is protected by ECC One of the key advantages that the Profusion architecture has over other 8-way designs is its use of a nonblocking, multiported crossbar switch. This switch allows simultaneous communication among the processors, I/O, and memory. The F8 architecture also uses a nonblocking, multiported crossbar switch that provides even higher performance than the Profusion crossbar switch and accommodates increased processor speeds and peripheral bandwidths. The F8 chipset also includes a cache coherency filter, or cache accelerator, similar to that in the Profusion architecture. The cache coherency filter removes (or filters) unnecessary snoop cycles on the processor buses. HP engineers designed the F8 crossbar switch to increase bus efficiency far beyond that of the Profusion crossbar switch. The design includes: • Larger and reorganized buffers. The F8 crossbar switch can hold 128 cache lines, twice the number that the Profusion chipset can hold in its buffers. • More ports. The F8 crossbar switch has thirteen read and four write ports, compared with five read and five write ports used in the Profusion chipset. This increases the number of transactions that can run concurrently. • Optimized cross-bus traffic through a patent-pending algorithm. Optimizing the cross-bus traffic significantly enhances the ability to scale beyond 4-way multiprocessing. The Profusion chipset uses a single centralized buffer, or queue, for storing data requests. In certain cases, a processor on one bus could request the same address as a processor on the other bus, resulting in the need to arbitrate for which request could be granted first. One of the requests has to go through a retry process, using up additional bandwidth on the processor bus. In the F8 architecture, the crossbar switch (Figure 5) contains a separate buffer for each of the processor buses, the I/O subsystem, and the memory subsystem. The buffers in the 8

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

HP F8 Architecture
8
Architectural
Differences From
Storage Subsystem
RAID
The technology used in HP Hot-Plug RAID Memory is conceptually similar to RAID technology
that provides fault tolerance and high availability in storage subsystems for servers.
However, there are some key performance and implementation differences between Hot-Plug
RAID Memory and typical storage subsystem RAID.
Hot-Plug RAID Memory does not have the mechanical delays of seek time and rotational
latency associated with hard disk drive arrays. Storage subsystem arrays use a single bus to
write the stripes sequentially across multiple drives. In contrast, HP Hot-Plug RAID Memory
uses parallel point-to-point connections so that data is written simultaneously across multiple
memory cartridges.
Also, HP Hot-Plug RAID Memory eliminates the write bottleneck associated with typical
storage subsystem RAID implementations. In a storage array, the RAID controller generally
performs a read operation of existing parity before a write operation can be completed. If a
dedicated parity drive is being used, a bottleneck occurs. But because HP Hot-Plug RAID
Memory operates on an entire cache line of data, there is no need to read existing parity
before a write operation, thus eliminating this performance bottleneck.
When a traditional striped RAID storage subsystem rebuilds data, there is no data protection
should another drive fail. However, the F8 chipset operates in a typical (nonredundant) ECC
mode while data is being rebuilt. As a result, even if a secondary memory failure occurs
during a rebuild operation, the data is protected by ECC
F8 Crossbar
Switch
One of the key advantages that the Profusion architecture has over other 8-way designs is its
use of a nonblocking, multiported crossbar switch. This switch allows simultaneous
communication among the processors, I/O, and memory. The F8 architecture also uses a
nonblocking, multiported crossbar switch that provides even higher performance than the
Profusion crossbar switch and accommodates increased processor speeds and peripheral
bandwidths. The F8 chipset also includes a cache coherency filter, or cache accelerator,
similar to that in the Profusion architecture. The cache coherency filter removes (or filters)
unnecessary snoop cycles on the processor buses.
HP engineers designed the F8 crossbar switch to increase bus efficiency far beyond that of
the Profusion crossbar switch. The design includes:
Larger and reorganized buffers.
The F8 crossbar switch can hold 128 cache
lines, twice the number that the Profusion chipset can hold in its buffers.
More ports.
The F8 crossbar switch has thirteen read and four write ports, compared
with five read and five write ports used in the Profusion chipset. This increases the
number of transactions that can run concurrently.
Optimized cross-bus traffic through a patent-pending algorithm.
Optimizing the cross-bus traffic significantly enhances the ability to scale beyond 4-way
multiprocessing.
Buffer Design
The Profusion chipset uses a single centralized buffer, or queue, for storing data requests. In
certain cases, a processor on one bus could request the same address as a processor on the
other bus, resulting in the need to arbitrate for which request could be granted first. One of
the requests has to go through a retry process, using up additional bandwidth on the
processor bus.
In the F8 architecture, the crossbar switch (Figure 5) contains a separate buffer for each of
the processor buses, the I/O subsystem, and the memory subsystem. The buffers in the