Wednesday, December 21, 2005

Writing Zeros to ZFS files

I did look into some of the ZFS code and found nice little feature. In a function zio_compress_data there is:
75 /*

76 * If the data is all zeroes, we don't even need to allocate

77 * a block for it. We indicate this by setting *destsizep = 0.

78 */

79 allzero = 1;

80 word = src;

81 word_end = (uint64_t *)(uintptr_t)((uintptr_t)word + srcsize);

82 while (word < word_end) {

83 if (*word++ != 0) {

84 allzero = 0;

85 break;

86 }

87 }

88 if (allzero) {

89 *destp = NULL;

90 *destsizep = 0;

91 *destbufsizep = 0;

92 return (1);

93 }



So if you set compression=on (or set any other available compression method - right now there's only lzjb available) if a block contains all zeros then actual compression method will not be called (less CPU consumed) and no IOs for these zeros will be generated. I did a small test - created two filesystems on a RAID-Z pool - one with compression set to on and other to off. Then I did run 'dd if=/dev/zero of=/test/fs1/q1 bs=1024k count=1024' which translates to writing 1GB of data (only zeros). If a filesystem is with compression set to off then using iostat I can see lot of IOs to underlying disks. However if compression is set to on then only about 600KB is written to underlying disks (not to mention that whole operation took about 2s on a 1.5GHz USIIIi). Well, this is really clever small thing (not the most clever in ZFS of course).

Now I wonder if it would be beneficial if the same check would be done even if there's no compression? (basically moving this code up - so regardless if compression is on or off if whole block is 0s then do now write them).

2 comments:

Anonymous said...

I haven't looked at the code, but I'm assuming "not written" is equivalent to "not allocated".

I wouldn't expect a non-compression filesystem to fail to rewrite blocks in a file, which could happen if the zeroes weren't initially allocated and later the filesystem was full.

--
Darren

milek said...

Actually it also means not allocated. So if you create on a compressed filesystem a big file filled with 0 which consumes 99% of storage (in theory) then you can put another files (not filled with 0s) which can consume again 99%. While trying to actually write some data to the first file you quickly will run out of space.

But it will occur only when you use compression - which is correct behaviour.

However I'm not sure if such a bahaviour is desired with non-compressed filesystems - it's not the case anyway, but looks like it's planned.