Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 46
C4140-V100-SXM2 Configuration-M and IntelXeon6148 CPU. In
![]() |
View all Dell PowerEdge C4140 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 46 highlights
Deep Learning Performance: Scale-up vs Scale-out Figure 39: Training long tests to extract accuracy convergence and training time with PowerEdge C4140K multi-node and single-node 8x V100-SXM2 with different models Figure 39 above shows comparison between 8X SXM2 and PowerEdge C4140 Configuration-K in multi-node configuration using ResNet-50. The training time difference between 8X SXM2 and multi-node PowerEdge C4140 is within 7% which shows that using Mellanox InfiniBand RDMA allows PowerEdge C4140 to achieve similar performance as a scale-up server. To show the impact of the CPU in the training of deep learning workloads to reach the accuracy convergence, we run additional tests configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M and IntelXeon6148 CPU. In the Figure 40 we see the multinode system C4140-V100-SXM2 Configuration-M and IntelXeon6148 CPU performs 1.3X faster than SN-8xV100 for the model resnet50 trained in several batch sizes. Again, it shows the relationship between the CPU model and the Deep Learning performance, where most of the data loading, data preprocessing, and batch transformation tasks occur at the CPU level, whereas the training tasks occur at the gpu level. Architectures & Technologies Dell EMC | Infrastructure Solutions Group 45
![](/manual_guide/products/dell-poweredge-c4140-deep-learning-performance-comparison-scaleup-vs-scaleout-ccc37c0/46.png)