Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 34

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

33

7.2.2

PowerEdge C4140-K-V100-16GB and V100-32GB: SXM2 Multi Node

Figure 27: Training with PowerEdge C4140-V100-16&32GB-SXM2 in multi-node

PowerEdge C4140-V100-16GB-SXM2 and PowerEdge C4140-V100-32GB-SXM2 with 4 GPUs each

were configured in multi-node to run the TensorFlow in distributed mode, extract the throughput

performance, and determine its scaling efficiency. The GPUs scale very well within a node to 97%

and 90% across the nodes. The ideal performance is computed by multiplying the single-GPU

throughput by the number of GPUs in the system. See

Figure 28

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 34

PowerEdge C4140-K-V100-16GB and V100-32GB: SXM2 Multi Node

Page 34 highlights