Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 40

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

39

7.2.5

PowerEdge

C4140-M Multi Node Training vs Non-Dell EMC 8x V100-16GB-SXM2

Figure 33. Training with PowerEdge C4140-M-V100-16GB-SXM2 (8 GPUs)

–

multi-node versus

Non-Dell EMC SN_8x-V100-16GB-SXM2

In the

Figure 33

above we can appreciate the throughput improvement when using a sever with

a higher capacity CPU; as seen in the table below, almost all the models trained with C4140-M-

V100-16GB-SXM2 - CPU IntelXeon6148 (8 GPUs)

–

multi-node performed better than SN-8xV100.

The exception was AlexNet which still performed under SN_8xV100; however, it improved its

throughput significantly compared when trained with the server with C4140-K-V100-16GB-SXM2

- IntelXeon4116. See the summary in the below table

SN_8X V100_16GB- SXM2

MN-

PowerEdge C4140-

M-V100-SXM2 16GB

% Diff

Inception-v4

1606

1993

19%

VGG-19

2449

3205

24%

VGG-16

2762

3734

26%

Inception-v3

3077

3685

16%

ResNet-50

4852

5904

18%

GoogLeNet

7894

10801

27%

AlexNet

16977

14969

-13%

Table 6: Table 5: 8x GPU Comparison between PowerEdge C4140-M multi-node and 8X SXM2

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 40

SN_8X V100_16GB- SXM2, PowerEdge C4140, M-V100-SXM2 16GB, Inception-v4, VGG-19, VGG-16, Inception-v3

Page 40 highlights