Dell PowerEdge SDS 100 Improving NFS performance on HPC clusters with Dell Flu
Dell PowerEdge SDS 100 Manual
View all Dell PowerEdge SDS 100 manuals
Add to My Manuals
Save this manual to your list of manuals |
Dell PowerEdge SDS 100 manual content summary:
- Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 1
Performance on HPC Clusters with Dell Fluid Cache for DAS This Dell technical white paper explains how to improve Network File System I/O performance by using Dell Fluid Cache for Direct Attached Storage in a High Performance Computing Cluster. Garima Kochhar Dell HPC Engineering March 2013, Version - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 2
or omissions in typography or photography. Dell, the Dell logo, PowerVault, and PowerEdge are trademarks of Dell Inc. Intel and Xeon are registered trademarks of Intel Corporation in the U.S. and other countries. Microsoft, Windows, and Windows Server are either trademarks or registered trademarks - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 3
10 2.4. Solution tuning 12 2.4.1. Storage 13 2.4.2. NFS server 13 2.4.3. Dell Fluid Cache for DAS 14 2.5. Tracking solution health and performance 14 2.5.1. Server health and monitoring 14 2.5.2. Dell PowerEdge Express Flash PCIe SSD health and monitoring 14 2.5.3. Dell Fluid Cache for DAS - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 4
Dell Fluid Cache for DAS Tables Table 1. Table 2. Table 3. Table 4. Table 5. NFS server and storage hardware configuration 8 NFS server software and firmware configuration 9 Hardware configuration for DFC 10 Software and firmware configuration 11. Metadata file remove performance 22 Figure 12 - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 5
at a reasonable cost, but with an inherent performance limitation for random I/O patterns. This technical white paper describes how to improve I/O performance in such a NFS storage solution with the use of Dell Fluid Cache for DAS (DFC) technology. It describes the solution and presents cluster - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 6
the DFC technology. The baseline for comparison is an NFS server with direct-attached external SAS storage. The configuration of this NFS server is augmented with PCIe SSDs and DFC software for the DFC comparison. A 64-server Dell PowerEdge cluster was used as I/O clients to provide I/O load to the - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 7
Clusters with Dell Fluid Cache for DAS sections provide details on each of these components as well as information on tuning and monitoring the solution. 2.1. NFS storage solution (baseline) The baseline in this study is an NFS configuration. One PowerEdge R720 is used as the NFS server. PowerVault - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 8
2. NFS server Table 1. NFS server and storage hardware configuration Server configuration NFS SERVER PowerEdge R720 PROCESSORS Dual Intel(R) Xeon(R) CPU E5-2680 @ 2.70 GHz MEMORY 128 GB. 16 * 8 GB 1600MT/s RDIMMs INTERNAL DISKS 5 * 300 GB 15 K SAS disks Two drives configured in RAID-0 for - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 9
00.00.06.14-rh1 INFINIBAND FIRMWARE 2.11.500 INFINIBAND DRIVER Mellanox OFED 1.5.3-3.1.0 The baseline described in this section is very similar to the Dell NSS. One key difference is the use of a single RAID controller to connect to all four storage arrays. In a pure-NSS environment, two - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 10
NFS SERVER PowerEdge R720 CACHE POOL Two 350GB Dell PowerEdge Express Flash PCIe SSD SSD CONTROLLER Internal (slot 4) Rest of the configuration is the same as baseline, as described in Table 1 Storage configuration Same as baseline, as described in Table 1 Table 4. Software and firmware - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 11
master node. CLIENT I/O compute node configuration PowerEdge M420 blade server PROCESSORS Dual Intel(R) Xeon(R) CPU E5-2470 @ 2.30 GHz MEMORY 48 GB. 6 * 8 GB 1600 MT/s RDIMMs INTERNAL DISK 1 50GB SATA SSD INTERNAL RAID CONTROLLER PERC H310 Embedded CLUSTER ADMINISTRATION INTERCONNECT - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 12
server and the attached storage arrays are configured and tuned for optimal performance. These options were selected based on extensive studies done by the Dell storage solution. Detailed instructions on configuring this storage solution are provided in Appendix A: Step-by-step configuration of Dell - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 13
for large capacity at a cost-effective price point. • Virtual disks are created using a RAID 60 layout. The RAID 6 span is across 10 data disks and 2 parity disks and the stripe is across all four storage enclosures. This RAID configuration provides a good balance between capacity, reliability - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 14
's server and storage management components be installed on the NFS server in this solution. OMSA v7.1.2 provides support for DFC. 2.5.2. Dell PowerEdge Express Flash PCIe SSD health and monitoring Configuring the PCIe SSDs in the NFS server is a very straightforward task. The drivers are controlled - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 15
(PBW). For the recommended 350GB SSD drive, the standard warranty is 3 years, 25 PBW. The health of the device can be monitored using Dell OMSA utilities. OMSA reports the SSD "Device Life Remaining" and "Failure Predicted". "Device Life Remaining" is an indication of the amount of data written - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 16
amount of data written was 256 GB. (Recall from Table 1 that the NFS server RAM is 128 GB.) This is to ensure that the total I/O exceeds the NFS server memory since the goal is to test the disk and storage solution performance. The small random tests were performed with 4 KB record sizes since the - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 17
configuration builds on the baseline, but here DFC is configured in Write-Through (WT) mode. WT mode forces writes to both the cache and virtual back end to two factors: the pure sequential write performance of the SSDs is lower than the storage array and the write-back cache has to replicate dirty - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 18
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Throughput in MiB/s Figure 5. Large sequential write performance Sequential writes 2500 2000 1500 1000 500 show the peak read throughput for the baseline is ~2,500 MiB/s. With the DFC configuration, reads are 13% to 60% 18 - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 19
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS better than the baseline random writes to their files. The baseline configuration can sustain ~1,600 IOPs on writes. Random writes are limited by capability of the RAID controller and the disks seek latency of the backend - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 20
HPC Clusters with Dell Fluid Cache for unlinked concurrently from multiple NFS clients on the NFS server. These results are presented in Figure 9, Figure 10 threads of the benchmark. From the figures it can be seen that DFC-WT and DFC-WB create and remove tests have similar performance. This indicates - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 21
Number of concurrent clients baseline DFC-WB DFC-WT File create and file remove tests show similar results with the baseline out-performing the DFC configuration. File stat is a read operation and here the DFC configuration outperforms the baseline by up to 80%. Figure 10. Metadata file stat - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 22
test iterations. This was done to eliminate the impact of caching in the client and server RAM memory, and to present true disk performance. However, with DFC there is another layer of caching - the DFC cache pool or the SSDs. DFC treats the SSDs plus the backend virtual disk as part of the DFC - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 23
NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 12 shows that on a cold-cache read for sequential tests, the throughput of the DFC configuration drops from a peak of ~3,050 MiB/s to ~1,050 Mi/s. Data needs to be pulled from backend storage and hence the drop in performance - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 24
Clusters with Dell Fluid Cache for DAS Figure 13 shows that on a cold-cache read for the random tests, peak IOPS of the DFC configurations drop from activity is being served mostly by the cache while, a negative value indicates more disk IO is being performed than cache IO. Figure 14 plots - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 25
State Drive vs. Hard Disk Drive Price and Performance Study http://www.dell.com/downloads/global/products/pvaul/en/ssd_vs_hdd_price_and_performance_s tudy.pdf 3. Dell Fluid Cache for DAS Dell Fluid Cache for DAS User's Guide at www.dell.com/support http://www.dell.com/us/enterprise/p/poweredge-r720 - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 26
-management/w/wiki/1760.openmanageserver-administrator-omsa.aspx 8. Dell PowerEdge Express Flash PCIe SSD www.dell.com/poweredge/expressflash http://support.dell.com/support/edocs/storage/Storlink/PCIe%20SSD/UG/en/index.htm http://content.dell.com/us/en/home/d/solutions/limited-hardware-warranties - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 27
Dell Fluid Cache for DAS (DFC) software. A PowerEdge R720 is used as the NFS server. This solution uses four PowerVault MD1200s as the attached storage to provide 144TB of raw capacity. Refer to Table 1, Table 2, Table 3 and Table 4 for the complete hardware and firmware configuration of the server - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 28
disks of the PowerEdge R720. This can be done through the Ctrl+R menu on server boot-up. • One RAID 1 virtual disk on two drives. This will be used for the operating system. Configure one additional drive as the hot spare for this RAID group. One RAID 0 virtual disk on two drives. This will be - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 29
fldc is running... [root@nss-rna ~]# 9. Install the appropriate Dell drivers for the PowerEdge R720. At a minimum, the PCIe SSD driver must be updated. v2.1.0 is the recommended version. 10. Install Mellanox OFED 1.5.3-3.1.0 for RHEL 6.3 on the server. • First build Mellanox OFED for errata kernel - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 30
the device name for the RAID 0 virtual disk (/dev/sdb) from output of the command below. [root@nfs-dfc ~]# omreport storage controller List of Controllers in the system Controllers ID Status Name Slot ID State Firmware Version Minimum Required Firmware Version Driver Version : 0 : Ok : PERC H710P - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 31
all these changes to take effect. A.4. Virtual disk configuration Create a virtual disk on the PowerVault MD1200s. This disk will be used as the NFS storage. All the commands in this section are executed on the PowerEdge R720 NFS server. 1. Note the controller ID of the PERC H810 adapter. From the - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 32
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Status Name Slot ID State Firmware Version Minimum Required Firmware Version Driver Version Minimum Required Driver Version Storport Driver Version Minimum Required Storport Driver Version : Ok : PERC H810 Adapter - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 33
Policy : 512 KB : Disabled A.5. XFS and DFC configuration In this final step of the configuration on the server, the XFS file system is created, DFC is configured, and the storage exported to the I/O clients via NFS. 1. Create the XFS file system on the RAID 60 virtual disk attached to the PERC - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 34
5. Dell OpenManage Server Administrator provides a GUI to configure, administer, and monitor the server. Browse to https://localhost:1311 on the NFS server to see this GUI. 6. Enable IP ports on both the cluster servers. The list of ports to be enabled is in the Red Hat Storage Administration Guide - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 35
for these tests and installed the compute nodes. server. • Mount XFS file system on the server and start the NFS service. • Mount NFS Share on clients. In addition for the cold cache tests described in Section 3.4, the disk managed by Dell Fluid Cache for DAS was disabled and the SSDs of threads -e - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 36
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS IOzone Argument -t +m -w -I -O Description Number of threads Location of clients to run IOzone on when in clustered mode Does not unlink ( bypass the cache on the compute node on which we are running the IOzone thread. 36 - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 37
Dell Fluid Cache for DAS B.2. mdtest mdtest can be downloaded from http://sourceforge.net/projects/mdtest/. Version 1.8.3 was used in these tests. It was compiled and installed Remove service and umount the XFS file system on the server. • Mount XFS file system on the server and start the NFS service - Dell PowerEdge SDS 100 | Improving NFS performance on HPC clusters with Dell Flu - Page 38
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Metadata file and directory creation test: # /share/filedir -i 6 -b 320 -z 1 -L -I 3000 -y -u -t -R -T Metadata file and directory removal test: # mpirun -np 32 -rr --hostfile ./hosts /nfs/share/mdtest -d /nfs/share/filedir -i 6
Garima Kochhar
Dell HPC Engineering
March 2013, Version 1.0
Improving NFS Performance on HPC
Clusters with Dell Fluid Cache for DAS
This Dell technical white paper explains how to improve Network File
System I/O performance by using Dell Fluid Cache for Direct Attached
Storage in a High Performance Computing Cluster.