Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 49

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

48

Figure 41: Effect of the hyper-parameter tuning in the throughput performance

7.4.2

Learning Rate Effect in Distributed Mode

In this experiment, we used the learning rate schedule as follows: the initial learning rate was

set up to 0.4 for the first 10 epochs, after that the learning rate was decreased to 0.04 until the

model reached 60 epochs of training, finally it was decreased to 0.004 until the end of the

training with 90 epochs.

Figure 42

&

Figure 43

we show how learning rate has a different effect in single node versus

multi node. In multi node mode the learning rate update is not reflected immediately and there

is a delay until the change is reflected in every processor.

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 49

Learning Rate Effect in Distributed Mode

Page 49 highlights