Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 49

Learning Rate Effect in Distributed Mode

Page 49 highlights

Deep Learning Performance: Scale-up vs Scale-out Figure 41: Effect of the hyper-parameter tuning in the throughput performance 7.4.2 Learning Rate Effect in Distributed Mode In this experiment, we used the learning rate schedule as follows: the initial learning rate was set up to 0.4 for the first 10 epochs, after that the learning rate was decreased to 0.04 until the model reached 60 epochs of training, finally it was decreased to 0.004 until the end of the training with 90 epochs. Figure 42 & Figure 43 we show how learning rate has a different effect in single node versus multi node. In multi node mode the learning rate update is not reflected immediately and there is a delay until the change is reflected in every processor. Architectures & Technologies Dell EMC | Infrastructure Solutions Group 48

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies
Dell
EMC
| Infrastructure Solutions Group
48
Figure 41: Effect of the hyper-parameter tuning in the throughput performance
7.4.2
Learning Rate Effect in Distributed Mode
In this experiment, we used the learning rate schedule as follows: the initial learning rate was
set up to 0.4 for the first 10 epochs, after that the learning rate was decreased to 0.04 until the
model reached 60 epochs of training, finally it was decreased to 0.004 until the end of the
training with 90 epochs.
Figure 42
&
Figure 43
we show how learning rate has a different effect in single node versus
multi node. In multi node mode the learning rate update is not reflected immediately and there
is a delay until the change is reflected in every processor.