Dell PowerEdge C4140 Deep Learning Performance Comparison - Scale-up vs. Scale - Page 6

Scale up, Scale out

Page 6 highlights

Deep Learning Performance: Scale-up vs Scale-out 1 Overview The objective of this whitepaper is to compare Dell's PowerEdge acceleration optimized servers and determine their performance when running deep learning workloads. The purpose is to highlight how Dell's scale out solution is ideally suited for these emerging workloads. We will compare how PowerEdge C4140 performs using one of the more popular frameworks like TensorFlow with various neural architectures and compare it to other acceleration optimized servers in the market, targeting the same workloads. The idea is to investigate whether the architectural implementation helps PowerEdge C4140 in better utilizing the accelerators when running hardware level benchmarks like Baidu Deep bench and TensorFlow based benchmarks. Using Baidu Deep bench, we can profile kernel operations, the lowest-level compute and communication primitives for Deep Learning (DL) applications. This allows us to profile how the accelerators are performing at the component level in different server systems. This is very important since it allows us to look at which hardware provides the best performance on the basic operations used for deep neural networks. Deep Bench includes operations and workloads that are important to both training and inference. Using TensorFlow as the primary framework, we compare the performance in terms of throughput, and training time to achieve certain accuracy on ImageNet dataset. We look at performance at a single node level and multi-node level. We use some of the popular neural architectures like ResNet-50, VGG Net, GoogLeNet, and AlexNet to do this performance comparison. 1.1 Definition Scale up : Scale up is achieved by putting the workload on a bigger, more powerful server (e.g., migrating from a two-socket server to a four- or eight-socket x86 server in a rack-based or blade form factor). This is a common way to scale databases and several other workloads. It has the advantage of allowing organizations to avoid making significant changes to the workload; IT managers can just install the workload on a bigger box and keep running it the way they always have. Scale out: Scale out refers to expanding to multiple servers rather than a single bigger server. The use of availability and clustering software (ACS) and its server node management, which enables IT managers to move workloads from one server to another or to combine them into a single computing resource, represents a prime example of scale out. It adds flexibility in allowing IT organizations to add nodes as the number of users or workloads increase and this helps in better control of IT budgets. Architectures & Technologies Dell EMC | Infrastructure Solutions Group 5

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies
Dell
EMC
| Infrastructure Solutions Group
5
1
Overview
The objective of this whitepaper
is to compare Dell’s
PowerEdge acceleration optimized servers
and determine their performance when running deep learning workloads. The purpose is to
highlight how Dell’s scale out solution is ideally suited for these emerging workloads.
We will compare how PowerEdge C4140 performs using one of the more popular frameworks
like TensorFlow with various neural architectures and compare it to other acceleration optimized
servers in the market, targeting the same workloads. The idea is to investigate whether the
architectural implementation helps PowerEdge C4140 in better utilizing the accelerators when
running hardware level benchmarks like Baidu Deep bench and TensorFlow based benchmarks.
Using Baidu Deep bench, we can profile kernel operations, the lowest-level compute and
communication primitives for Deep Learning (DL) applications. This allows us to profile how the
accelerators are performing at the component level in different server systems. This is very
important since it allows us to look at which hardware provides the best performance on the
basic operations used for deep neural networks.
Deep Bench includes operations and workloads
that are important to both training and inference.
Using TensorFlow as the primary framework, we compare the performance in terms of
throughput, and training time to achieve certain accuracy on ImageNet dataset. We look at
performance at a single node level and multi-node level. We use some of the popular neural
architectures like ResNet-50, VGG Net, GoogLeNet, and AlexNet to do this performance
comparison.
1.1
Definition
Scale up :
Scale up is achieved by putting the workload on a bigger, more powerful server
(e.g., migrating from a two-socket server to a four- or eight-socket x86 server in a rack-based or
blade form factor). This is a common way to scale databases and several other workloads. It has
the advantage of allowing organizations to avoid making significant changes to the workload; IT
managers can just install the workload on a bigger box and keep running it the way they always
have.
Scale out:
Scale out refers to expanding to multiple servers rather than a single bigger server.
The use of availability and clustering software (ACS) and its server node management, which
enables IT managers to move workloads from one server to another or to combine them into a
single computing resource, represents a prime example of scale out. It adds flexibility in allowing
IT organizations to add nodes as the number of users or workloads increase and this helps in
better control of IT budgets.