Dell PowerEdge R940xa GPU Database Acceleration on - Page 11

Disk-IO bottleneck, PCIe bottleneck

Page 11 highlights

In GDBMS, there are two major IO bottlenecks. The first is the disk IO and second bottleneck is the PCIe bus: 2.6.1 Disk-IO bottleneck GPUs will not improve performance for disk-based database systems, since most of the time will be spent in disk IO. GPUs improve performance only when data is in main system memory, hence it's much better to keep hot data in main memory. 2.6.2 PCIe bottleneck Data transfers can be significantly accelerated by keeping 'semi-hot data' in host memory and hot data in GPU RAM. But since the GPU RAM is smaller (GBs) vs host memory (TBs), data has to be still transferred over x16 PCIe bus. Since GPUs have large number of cores, it's faster on certain task like numerical computations than CPUs but GPUs are only faster if the task can be parallelized. In order to avoid PCIe bottlenecks and use the full capabilities of CPU and GPU, it is better to have a ratio of 1:1. This would allow optimal processing for a given operation. In order to overcome some of the bottlenecks explained in the above sections, we chose a different architecture in R940xa where we wanted to maximize the performance between CPU and GPU & avoid PCIe bottleneck. That is why we kept a ratio of 1:1. We also wanted to have a larger memory capacity, so we could take advantage of both in-memory and GPU, so we could support large databases within RAM and move data into GPU without paying PCIe penalty. 11 GPU Database Acceleration on PowerEdge R940xa

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

11
GPU Database Acceleration on PowerEdge R940xa
In GDBMS, there are two major IO bottlenecks. The first is the disk IO and second bottleneck is the PCIe bus:
2.6.1
Disk-IO bottleneck
GPUs will not improve performance for disk-based database systems, since most of the time will be spent in
disk IO. GPUs improve performance only when data is in main system memory, hence it’s much better to
keep
hot data
in main memory.
2.6.2
PCIe bottleneck
D
ata transfers can be significantly accelerated by keeping ‘
semi-hot data
’ in host memory and
hot data
in
GPU RAM. But since the GPU RAM is smaller (GBs) vs host memory (TBs), data has to be still transferred
over x16 PCIe bus.
Since GPUs have large number
of cores, it’s faster on certain task like numerical computations than CPUs
but GPUs are only faster if the task can be parallelized. In order to avoid PCIe bottlenecks and use the full
capabilities of CPU and GPU, it is better to have a ratio of 1:1. This would allow optimal processing for a given
operation.
In order to overcome some of the bottlenecks explained in the above sections, we chose a different
architecture in R940xa where we wanted to maximize the performance between CPU and GPU & avoid PCIe
bottleneck. That is why we kept a ratio of 1:1. We also wanted to have a larger memory capacity, so we could
take advantage of both in-memory and GPU, so we could support large databases within RAM and move
data into GPU without paying PCIe penalty.