Compaq ProLiant 8000 Compaq ProLiant 8000 Server Technology - Page 7

Cache Accelerators, I/O Filter, cont., ProLiant

Page 7 highlights

TC000603TB TECHNOLOGY BRIEF (cont.) ... The memory subsystem uses uniform memory access, which reduces latency and gives all processors equal access times to either memory bus. In systems using nonuniform memory access architectures, a processor has quick access to one memory bus but incurs a lag time (or latency) when accessing a second memory bus. The ProLiant 8000 server supports up to 16 GB of error checking and correcting SDRAM that corrects all single-bit errors and detects double-bit errors. Memory is divided into eight banks, each consisting of two dual inline memory modules. Although the Profusion chipset supports up to 32 GB of memory, industry-standard OSs provide only minimal support and scalability for this memory capacity, and enhanced support will not be available for an extended period. Also, in discussions with customers, Compaq learned that very few server implementations are fully configured with memory. With this in mind, Compaq used the internal server space to provide additional drive capacity in the ProLiant 8000 server. As customer requirements and OS capabilities increase in the future, Compaq will continue to modify servers to match these requirements. Cache Accelerators One of the main challenges of designing an efficient SMP architecture is maintaining cache coherency. To allow faster access to memory, most processors write data to cache memory rather than to main memory. When a processor writes data to its cache, the cache has a newer copy of the data than main memory. Cache coherency ensures that the most recent copy of the data is read by any device that requests it. The cache coherency protocol essentially makes the cache look like main memory. Cache coherency is critical for the proper operation of an SMP architecture, and the performance and scalability of the architecture is affected by how efficiently it maintains cache coherency. With multiple processor buses and a separate I/O bus, it is extremely challenging to maintain cache coherency in the 8-way architecture. Each memory access must look at, or snoop, the caches on its local processor bus and snoop all caches on the remote processor bus and the I/O bus. The amount of snoop traffic can significantly impact the scalability of the system. The ProLiant 8000 architecture uses cache accelerators to minimize snoop traffic to the remote processor bus and I/O bus. The cache accelerators store the address and state of the data for all caches on their respective buses. The Profusion crossbar switch uses this information to determine whether to snoop the remote processor and I/O buses. Depending on how often a software application shares data, the reduction in snoop traffic can significantly improve overall system performance and scalability. I/O Filter The ProLiant 8000 server also includes three Compaq host-to-PCI bridges with prefetch buffers, so they act as caching bridges. The Profusion chipset contains a built-in I/O filter for the caching bridges on the I/O bus. The I/O filter enhances performance by reducing snoop traffic on the I/O bus. This I/O filter is designed to work with all three of the Compaq host-to-PCI bridges. When a processor requests a cache line with the intent to modify it, the MAC performs a lookup into the I/O filter to determine if that line resides in one of the caching bridges. If it does reside there, the MAC initiates a transaction on the I/O bus to invalidate that cache line. If the cache line is not present in one of the bridges, then no transaction is run on the bus. 7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

T
ECHNOLOGY
B
RIEF
(cont.)
7
TC000603TB
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The memory subsystem uses uniform memory access, which reduces latency and gives all
processors equal access times to either memory bus.
In systems using nonuniform memory access
architectures, a processor has quick access to one memory bus but incurs a lag time (or latency)
when accessing a second memory bus.
The
ProLiant
8000 server supports up to 16 GB of error checking and correcting SDRAM that
corrects all single-bit errors and detects double-bit errors.
Memory is divided into eight banks,
each consisting of two dual inline memory modules.
Although the Profusion chipset supports up to 32 GB of memory, industry-standard OSs provide
only minimal support and scalability for this memory capacity, and enhanced support will not be
available for an extended period.
Also, in discussions with customers, Compaq learned that very
few server implementations are fully configured with memory.
With this in mind, Compaq used the
internal server space to provide additional drive capacity in the
ProLiant
8000 server.
As customer
requirements and OS capabilities increase in the future, Compaq will continue to modify servers to
match these requirements.
Cache Accelerators
One of the main challenges of designing an efficient SMP architecture is maintaining cache
coherency.
To allow faster access to memory, most processors write data to cache memory rather
than to main memory.
When a processor writes data to its cache, the cache has a newer copy of the
data than main memory.
Cache coherency ensures that the most recent copy of the data is read by
any device that requests it.
The cache coherency protocol essentially makes the cache look like
main memory.
Cache coherency is critical for the proper operation of an SMP architecture, and the
performance and scalability of the architecture is affected by how efficiently it maintains cache
coherency.
With multiple processor buses and a separate I/O bus, it is extremely challenging to maintain cache
coherency in the 8-way architecture.
Each memory access must look at, or snoop, the caches on its
local processor bus and snoop all caches on the remote processor bus and the I/O bus.
The amount
of snoop traffic can significantly impact the scalability of the system.
The
ProLiant
8000 architecture uses cache accelerators to minimize snoop traffic to the remote
processor bus and I/O bus.
The cache accelerators store the address and state of the data for all
caches on their respective buses.
The Profusion crossbar switch uses this information to determine
whether to snoop the remote processor and I/O buses.
Depending on how often a software
application shares data, the reduction in snoop traffic can significantly improve overall system
performance and scalability.
I/O Filter
The
ProLiant
8000 server also includes three Compaq host-to-PCI bridges with prefetch buffers, so
they act as caching bridges.
The Profusion chipset contains a built-in I/O filter for the caching
bridges on the I/O bus.
The I/O filter enhances performance by reducing snoop traffic on the I/O
bus.
This I/O filter is designed to work with all three of the Compaq host-to-PCI bridges.
When a
processor requests a cache line with the intent to modify it, the MAC performs a lookup into the
I/O filter to determine if that line resides in one of the caching bridges.
If it does reside there, the
MAC initiates a transaction on the I/O bus to invalidate that cache line.
If the cache line is not
present in one of the bridges, then no transaction is run on the bus.