Hp StorageWorks Scalable File Share User Manual download pdf (Page 72)

A.4 One Shared File

Frequently in HPC clusters, a number of clients share one file either for read or for write. For

example, each of N clients could write 1/N'th of a large file as a contiguous segment. Throughput

in such a case depends on the interaction of several parameters including the number of clients,

number of OSTs, the stripe size, and the I/O size.

Generally, when all the clients share one file striped over all the OSTs, throughput is roughly

comparable to when each client writes its own file striped over all the OSTs. In both cases, every

client talks to every OST at some point, and there will inevitably be busier and quieter OSTs at

any given time. OSTs slightly slower than the average tend to develop a queue of waiting requests,

while slightly faster OSTs do not. Throughput is limited by the slowest OST. Random distribution

of the load is not the same as even distribution of the load.

In specific situations, performance can improve by carefully choosing the stripe count, stripe

size, and I/O size so each client only talks to one or a subset of the OSTs.

Another situation in which a file is shared among clients involves all the clients reading the same

file at the same time. In a test of this situation, 16 clients read the same 20 GB file simultaneously

at a rate of 4200 MB/s. The file must be read from the storage array multiple times, because Lustre

does not cache data on the OSS nodes. These reads might benefit from the read cache of the

arrays themselves, but not from caching on the server nodes.

A.5 Stragglers and Stonewalling

All independent processes involved in a performance test are synchronized to start simultaneously.

However, they normally do not all end at the same time for a number of reasons. The I/O load

might not be evenly distributed over the OSTs, for example if the number of clients is not a

multiple of the number of OSTs. Congestion in the interconnect might affect some clients more

than others. Also, random fluctuations in the throughput of individual clients might cause some

clients to finish before others.

Figure A-8 shows this behavior. Here, 16 processes read individual files. For most of the test run,

throughput is about 4000 MB/s. But, as the fastest clients finished, the remaining stragglers

generated less load and the total throughput tailed off.

Figure A-8 Stonewalling

The standard measure of throughput is the total amount of data moved divided by the total

elapsed time until the last straggler finishes. This average over the entire elapsed time is shown

by the lower wider box in Figure A-8. Clearly, the system can sustain a higher throughput while

all clients are active, but the time average is pulled down by the stragglers. In effect, the result

is the number of clients multiplied by the throughput of the slowest client. This is the throughput

that would be seen by an application that has to wait at a barrier for all I/O to complete.

72 HP SFS G3 Performance

1 2 ... 67 68 69 70 71 72 73 74 75 76 77 ... 83 84

No comments

Hp StorageWorks Scalable File Share User Manual Page 72