Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 52
Conclusion and Future Work
![]() |
View all Dell PowerEdge C4140 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 52 highlights
Deep Learning Performance: Scale-up vs Scale-out 8 Conclusion and Future Work PowerEdge C4140 using Nvidia 4x NVLink architecture scales relatively well when using Uber Horovod distributed training library and Mellanox InfiniBand RDMA as the highspeed link between nodes. Table 5 shows that PowerEdge C4140 in multi-node configuration for most widely used model ResNet-50 is within 7.8% of single node Non-Dell EMC 8x-NVLink system. But with C4140-M in multi-node out performs single node 8x NVLink by at least 18% using ResNet-50. The only disclaimer is that C4140-M results are using the latest version of NCCL & TensorFlow containers. There is lot of performance improvement being added continuously either at the GPU level, library level or framework level. We are continuously looking at how we can improve our performance results by experimenting with different hyper parameters. Some of our future work in this area will be related to exploring the latest software optimizations being released by Nvidia and looking at fast.ai library where Jeremy Howard and researchers at fast.ai achieved training time of 3 hours on 8x V100 on ResNet-50. Architectures & Technologies Dell EMC | Infrastructure Solutions Group 51
![](/manual_guide/products/dell-poweredge-c4140-deep-learning-performance-comparison-scaleup-vs-scaleout-ccc37c0/52.png)