Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 41

PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB

Page 41 highlights

Deep Learning Performance: Scale-up vs Scale-out 7.2.6 PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GBSXM2 In the results shown in the Figure 31 and Figure 33 we configured the multi-node system with servers PowerEdge C4140-V100-SXM2- Configuration-K Intel Xeon4116 CPU and ConfigurationM Intel Xeon6148 CPU respectively, versus single-node training non-Dell EMC 8xV100-16GBSXM2. To show the impact of the CPU in the training of deep learning workloads, we run additional tests configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M and Intel Xeon6148 CPU. In the Figure 34 we see how advance CPU models boost even more the gpu performance, since most of the data loading, data preprocessing, and batch transformation tasks occur at the CPU level, whereas the training tasks occur at the gpu level. Figure 34 . Multi-node training PowerEdge C4140-V100-SXM2- Configuration-K with IntelXeon4116 cpu, Multi-node training PowerEdge C4140-V100-SXM2 Configuration-M with IntelXeon6148 cpu, versus single-node training non Dell 8xV100-16GB-SXM2 Architectures & Technologies Dell EMC | Infrastructure Solutions Group 40

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies
Dell
EMC
| Infrastructure Solutions Group
40
7.2.6
PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB-
SXM2
In the results shown in the
Figure 31
and
Figure 33
we configured the multi-node system with
servers PowerEdge C4140-V100-SXM2- Configuration-K Intel Xeon4116 CPU and Configuration-
M Intel Xeon6148 CPU respectively, versus single-node training non-Dell EMC 8xV100-16GB-
SXM2.
To show the impact of the CPU in the training of deep learning workloads, we run additional tests
configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M
and Intel Xeon6148 CPU. In the
Figure 34
we see how advance CPU models boost even more the
gpu performance, since most of the data loading, data preprocessing, and batch transformation
tasks occur at the CPU level, whereas the training tasks occur at the gpu level.
Figure 34 . Multi-node training PowerEdge C4140-V100-SXM2- Configuration-K with IntelXeon4116 cpu,
Multi-node training PowerEdge C4140-V100-SXM2 Configuration-M with IntelXeon6148 cpu, versus
single-node training non Dell 8xV100-16GB-SXM2