Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 13

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies

Dell

EMC

| Infrastructure Solutions Group

12

Figure 7: Testing Methodology Workflow

1)

Phase 1

–

In this phase we are performing some of the basic tests like PCIe bandwidth

and latency tests to ensure it aligns to what we expect based on theoretical numbers. We

then ran Baidu Deep Bench Benchmarks to evaluate deep learning performance for the

accelerators and the system. The results for this step are presented in a separate

whitepaper.

2)

Phase 2

- We used the official TensorFlow benchmarks which were run across several

servers.

3)

Phase 3

–

To benchmarks the servers in the distributed mode we used Horovod, which is

a distributed training framework for TensorFlow.

4.1

Testing Methodology

We cover the testing methodology for Phase 1 in detail with its results in a separate whitepaper.

Here we will describe the methodology we used in Phase 2 and Phase 3.

To establish a baseline, we divided it into short tests and long tests.

4.1.1

Short Test

The short tests consisted of 10 warmup steps and then another for another 100 steps which were

averaged to get the actual throughput. The benchmarks were run with 1 GPU to establish a

baseline number of images/sec and then increasing the number of GPUs under test based on the

number of GPUs supported.

Phase 1

Phase 2

Phase 3

Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 13

Testing Methodology

Page 13 highlights