Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 41

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

40

7.2.6

PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB-

SXM2

In the results shown in the

Figure 31

and

Figure 33

we configured the multi-node system with

servers PowerEdge C4140-V100-SXM2- Configuration-K Intel Xeon4116 CPU and Configuration-

M Intel Xeon6148 CPU respectively, versus single-node training non-Dell EMC 8xV100-16GB-

SXM2.

To show the impact of the CPU in the training of deep learning workloads, we run additional tests

configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M

and Intel Xeon6148 CPU. In the

Figure 34

we see how advance CPU models boost even more the

gpu performance, since most of the data loading, data preprocessing, and batch transformation

tasks occur at the CPU level, whereas the training tasks occur at the gpu level.

Figure 34 . Multi-node training PowerEdge C4140-V100-SXM2- Configuration-K with IntelXeon4116 cpu,

Multi-node training PowerEdge C4140-V100-SXM2 Configuration-M with IntelXeon6148 cpu, versus

single-node training non Dell 8xV100-16GB-SXM2

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 41

PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB

Page 41 highlights