Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 27

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

26

Figure 20: PowerEdge C4140-V100-SXM2- Configuration-K vs PowerEdge C4140-V100-SXM2

Configuration-M

As shown in

Figure 21

below, it shows that the number of CPU cores does play a role in terms of

throughput. And the biggest difference is when running AlexNet.

7.1.8

What role does CPU play in Deep learning?

The CPU plays a major role in the initial phase called data preprocessing. The steps below show

an instruction pipeline, with the following 4 instructions happening in parallel:

a.

Train on batch n (on GPUs)

b.

Copy batch n+1 to GPU memory

c.

Transform batch n+2 (on CPU)

d.

Load batch n+3 from disk (on CPU)

The loop for the data processing when training is:

a.

Load mini-batch

b.

Preprocess mini-batch

c.

Train on mini-batch

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 27

Transform batch n+2 on CPU

Page 27 highlights