Dell PowerEdge External Media System 1434 Improving NFS performance on HPC clu - Page 24
fldcstat - CacheIO-DiskIO, CacheIO-DiskIO on cold-cache reads,
View all Dell PowerEdge External Media System 1434 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 24 highlights
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 13 shows that on a cold-cache read for the random tests, peak IOPS of the DFC configurations drop from ~123,000 IOPs to ~80,000 IOPs. Interestingly this is higher than the baseline IOPs of ~9,300, and explained below. Figure 14 helps explain why the cold-cache read behavior is different for sequential and random I/O. It uses the output of fldcstat, a utility provided by DFC that displays statistics for the DFC configuration. Figure 14. CacheIO-DiskIO on cold-cache reads CacheIO-DiskIO MiB (-ve is from disk, +ve from cache) 8000 6000 4000 2000 0 -2000 -4000 -6000 -8000 -10000 fldcstat - CacheIO-DiskIO Random Reads-coldcache Sequential Reads-coldcache One piece of information provided by fldcstat is CacheIO-DiskIO. This shows the number of bytes read and written to the cache device minus the number of bytes read and written to the disk. A positive value shows IO activity is being served mostly by the cache while, a negative value indicates more disk IO is being performed than cache IO. Figure 14 plots CacheIO-DiskIO over the duration of the cold-cache sequential read and random read tests for the 64 concurrent client test case. From the figure it is seen that during sequential reads there is a lot of disk IO followed by cache IO. This pattern repeats for the duration of the test. Data is pulled from the backend disk to the SSDs, served from the SSD, and this cycle is repeated. However for the random read tests, there is backend disk IO at the start, but the cache quickly warms up and subsequent read requests are satisfied directly from the cache. This helps explain why cold-cache sequential reads have low throughput-lower than the DFC-WB configuration, and also lower than the baseline. Cold-cache random reads do take a hit in IOPS compared to the DFC-WB configuration but, due to the nature of the small I/O block size (4k) and the longer duration of the test itself, show significantly better performance than the baseline since the DFC cache warms up very quickly. 24