Much shorter follow-up to my post yesterday: Robert Cheetah Wolf Love and I talked a bit about it, and we came to the determination that calling fsync() from the UnixStream.Flush() method isn’t the right thing to do. fsync() does indeed ensure that the data is written to the disk. This is usually a Bad Thing, however, because it circumvents the kernel’s own internal buffering of data. We determined that since UnixStream uses the unbuffered read() and write() syscalls instead of standard I/O, Flush() should be a no-op instead of calling fsync(). (If it were using fread() and fwrite(), it would make sense to call fflush() in the Flush() method.)
For most applications this wouldn’t be a big deal, but in the case of Beagle’s Lucene lockfiles we were repeatedly creating very small files. Normally the kernel would just buffer all of this in memory and write the changes out to disk when convenient, but the fsync() meant it was written to the hardware every time. 300ms is a perceptible amount of time on its own, but multiplying it for each lockfile instance quickly adds up to a very obvious performance degredation. Moreover, if you were doing this on a laptop on battery, your battery life must have plummeted. Linux has a “laptop mode” you can enable which causes the kernel to only write out changes to disk very infrequently, allowing the disk to spin down and use less power. fsync() kills that optimization.
Anyway, Jonathan Pryor, the Mono.Unix maintainer and the author of the UnixStream class almost immediately popped into the Beagle IRC channel and we had a short chat about it, and he agreed to make the change. The fix is now checked into the trunk and will go into Mono 1.1.17. Moreover, the fact that UnixStream.Close() unconditionally closed the file descriptor was also fixed (in the code). Thanks Jon!

One Trackback/Pingback
[...] Beagle 0.2.9 is out. This is a bug fix release, and includes the fix for the performance degredation I previously posted about. [...]