Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 40

SN_8X V100_16GB- SXM2, PowerEdge C4140, M-V100-SXM2 16GB, Inception-v4, VGG-19, VGG-16, Inception-v3

Page 40 highlights

Deep Learning Performance: Scale-up vs Scale-out 7.2.5 PowerEdge C4140-M Multi Node Training vs Non-Dell EMC 8x V100-16GB-SXM2 Figure 33. Training with PowerEdge C4140-M-V100-16GB-SXM2 (8 GPUs) - multi-node versus Non-Dell EMC SN_8x-V100-16GB-SXM2 In the Figure 33 above we can appreciate the throughput improvement when using a sever with a higher capacity CPU; as seen in the table below, almost all the models trained with C4140-MV100-16GB-SXM2 - CPU IntelXeon6148 (8 GPUs) - multi-node performed better than SN-8xV100. The exception was AlexNet which still performed under SN_8xV100; however, it improved its throughput significantly compared when trained with the server with C4140-K-V100-16GB-SXM2 - IntelXeon4116. See the summary in the below table SN_8X V100_16GB- SXM2 MN- PowerEdge C4140- % Diff M-V100-SXM2 16GB Inception-v4 1606 1993 19% VGG-19 2449 3205 24% VGG-16 2762 3734 26% Inception-v3 3077 3685 16% ResNet-50 4852 5904 18% GoogLeNet 7894 10801 27% AlexNet 16977 14969 -13% Table 6: Table 5: 8x GPU Comparison between PowerEdge C4140-M multi-node and 8X SXM2 Architectures & Technologies Dell EMC | Infrastructure Solutions Group 39

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies
Dell
EMC
| Infrastructure Solutions Group
39
7.2.5
PowerEdge
C4140-M Multi Node Training vs Non-Dell EMC 8x V100-16GB-SXM2
Figure 33. Training with PowerEdge C4140-M-V100-16GB-SXM2 (8 GPUs)
multi-node versus
Non-Dell EMC SN_8x-V100-16GB-SXM2
In the
Figure 33
above we can appreciate the throughput improvement when using a sever with
a higher capacity CPU; as seen in the table below, almost all the models trained with C4140-M-
V100-16GB-SXM2 - CPU IntelXeon6148 (8 GPUs)
multi-node performed better than SN-8xV100.
The exception was AlexNet which still performed under SN_8xV100; however, it improved its
throughput significantly compared when trained with the server with C4140-K-V100-16GB-SXM2
- IntelXeon4116. See the summary in the below table
SN_8X V100_16GB- SXM2
MN-
PowerEdge C4140-
M-V100-SXM2 16GB
% Diff
Inception-v4
1606
1993
19%
VGG-19
2449
3205
24%
VGG-16
2762
3734
26%
Inception-v3
3077
3685
16%
ResNet-50
4852
5904
18%
GoogLeNet
7894
10801
27%
AlexNet
16977
14969
-13%
Table 6: Table 5: 8x GPU Comparison between PowerEdge C4140-M multi-node and 8X SXM2