Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 49
Learning Rate Effect in Distributed Mode
View all Dell PowerEdge C4140 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 49 highlights
Deep Learning Performance: Scale-up vs Scale-out Figure 41: Effect of the hyper-parameter tuning in the throughput performance 7.4.2 Learning Rate Effect in Distributed Mode In this experiment, we used the learning rate schedule as follows: the initial learning rate was set up to 0.4 for the first 10 epochs, after that the learning rate was decreased to 0.04 until the model reached 60 epochs of training, finally it was decreased to 0.004 until the end of the training with 90 epochs. Figure 42 & Figure 43 we show how learning rate has a different effect in single node versus multi node. In multi node mode the learning rate update is not reflected immediately and there is a delay until the change is reflected in every processor. Architectures & Technologies Dell EMC | Infrastructure Solutions Group 48