Friday, May 15, 2009

Is it really a random load?

I'm running some benchmarks in a background and for some reason I wanted to verify if the workload filebench is generating is actually random withing a large file. Then the file is 100GB in size but workload is supposed to do random reads to only first 70GB of the file.

# dtrace -n io:::start'/args[2]->fi_name == "00000001"/ \
{@=lquantize(args[2]->fi_offset/(1024*1024*1024),0,100,10);}' \
-n tick-3s'{printa(@);}'
[...]
0 49035 :tick-3s

value ------------- Distribution ------------- count
0 | 0
0 |@@@@@@ 218788
10 |@@@@@@ 219156
20 |@@@@@@ 219233
30 |@@@@@@ 218420
40 |@@@@@@ 218628
50 |@@@@@@ 217932
60 |@@@@@@ 217572
70 | 0

So it is true on both cases - we can see evenly distributed access all over first 70GB of the file.

2 comments:

Rand Huntzinger said...

I don't see how you can determine from this whether or not the load is random. The number of accesses in each interval are roughly the same but you provide no information as to whether or not data was accessed in a random order.

milek said...

ok, I maybe oversimplified here.
Knowing the workset size and the fact that distribution is even among it every 1s you know that at least it is not sequential (as you can't read that much data in 1s in my config). Then by providing more granular output you can be even more sure. Now it is not of course the smoking gun for real randomness but I didn't need it here.