Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 32

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

31

7.2

Throughput images/s

–

Multi Node

7.2.1

PowerEdge C4130-P100 16GB PCIe- Multi Node

PowerEdge C4130 each with 4 P100-PCIe GPUs were configured in multi-node using InfiniBand

RDMA to run the TensorFlow in distributed mode.

Figure 25: Training with PowerEdge C4130-P100-16GB-PCle in multi-node

PowerEdge C4130 server scales very well within a node with 97% efficiency and 92% across the

nodes. The ideal performance is computed by multiplying the single-GPU throughput by the

number of GPUs in the system. See

Figure 26

.

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 32

Throughput images/s, Multi Node

Page 32 highlights