HP DL740 HP F8 Architecture Technology Brief - Page 10

Optimizing Cross-bus Traffic, bus to obtain the data

Page 10 highlights

HP F8 Architecture Figure 6. Comparison of a snoop cycle with and without a cache coherency filter Processors Host Controller Processors Remote Bus Snoop cycle without filter goes to remote bus Optimizing Cross-bus Traffic Processors Processors F8 Crossbar Switch Snoop cycle with filter stays off remote bus Cache Coherency Filter Remote Bus The F8 chipset uses a cache coherency filter to reduce the number of snoop cycles on the remote processor bus. The cache coherency filter is also known as a cache accelerator. It holds the addresses of data stored in all of the L2 processor caches, as well as information about the state of the data. For example, the state information may describe whether the data is owned by a particular L2 cache or shared between multiple caches. The cache coherency filter also acts as a filter for the I/O bus, keeping track of which cache lines are owned on the I/O bus for the PCI devices. When a processor requests a cache line, the crossbar switch snoops the I/O filter to determine if that cache line resides in one of the PCI bridges on the I/O bus. If the cache line is not present in one of the bridges, then no transaction is run on the I/O bus. This reduces snoop traffic on the I/O bus whenever a processor requests data. The F8 chipset alleviates some inefficiency that the Profusion chipset has when snoop traffic must cross to the remote processor bus. When a processor requests data, the Profusion chipset checks the cache coherency filter to determine the specific location of the data it needs. If the data is located in an L2 cache on the remote bus, the chipset snoops the remote bus to obtain the data, causing cross-bus traffic. In the Profusion chipset, a read request that requires a snoop cycle on the remote bus is automatically deferred,4 causing a reply to be sent at a later time. This situation generates two cycles on the processor bus for every single read request. 4 A deferred request is split into two transactions so that the processor makes a read request and gets off the bus. Then a reply is sent when the data is available. 10

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

HP F8 Architecture
10
Figure 6.
Comparison of a snoop cycle with and without a cache coherency filter
The F8 chipset uses a cache coherency filter to reduce the number of snoop cycles on the
remote processor bus. The cache coherency filter is also known as a cache accelerator. It
holds the addresses of data stored in all of the L2 processor caches, as well as information
about the state of the data. For example, the state information may describe whether the
data is owned by a particular L2 cache or shared between multiple caches.
The cache coherency filter also acts as a filter for the I/O bus, keeping track of which cache
lines are owned on the I/O bus for the PCI devices. When a processor requests a cache
line, the crossbar switch snoops the I/O filter to determine if that cache line resides in one of
the PCI bridges on the I/O bus. If the cache line is not present in one of the bridges, then no
transaction is run on the I/O bus. This reduces snoop traffic on the I/O bus whenever a
processor requests data.
Optimizing
Cross-bus Traffic
The F8 chipset alleviates some inefficiency that the Profusion chipset has when snoop traffic
must cross to the remote processor bus. When a processor requests data, the Profusion
chipset checks the cache coherency filter to determine the specific location of the data it
needs. If the data is located in an L2 cache on the remote bus, the chipset snoops the remote
bus to obtain the data, causing cross-bus traffic. In the Profusion chipset, a read request that
requires a snoop cycle on the remote bus is automatically
deferred
,
4
causing a reply to be
sent at a later time. This situation generates two cycles on the processor bus for every single
read request.
Host Controller
Snoop cycle without filter
goes to remote bus
F8 Crossbar Switch
Snoop cycle with filter
stays off remote bus
Cache
Coherency Filter
Remote Bus
Remote Bus
Host Controller
Snoop cycle without filter
goes to remote bus
F8 Crossbar Switch
Snoop cycle with filter
stays off remote bus
Processors
Cache
Coherency Filter
Cache
Coherency Filter
Remote Bus
Remote Bus
Remote Bus
Remote Bus
Processors
Processors
Processors
4
A deferred request is split into two transactions so that the processor makes a read request and gets off the bus. Then a
reply is sent when the data is available.