Skip to content

21 August 2006

The other day I was trying to hunt down a bug that was reported against Beagle in SLED 10. Normally, I try to duplicate these sorts of bugs in a sandbox using my CVS checkout, which was just after I released 0.2.8. I wasn’t able to duplicate the bug there, so I tried 0.2.3, which is the version (albeit heavily patched) that we ship in SLED. I wasn’t able to find the bug there either, but I did notice something awry: indexing was noticably faster in 0.2.3 than 0.2.8.

Thousands of lines of code have been changed between 0.2.3 and 0.2.8, so I checked out and built each version to narrow the bottleneck down, and I found that it happened between 0.2.6 and 0.2.7. That, fortunately, is a much smaller diff, but nothing from the diff immediately jumped out at me as a potential bottleneck. In fact, there’s basically no change to the indexing code between 0.2.6 and 0.2.7 at all.

In Beagle we have a simple class called Stopwatch that we use in a lot of places for doing pretty simple wall clock time measurements. All profiling adds some overhead, but Stopwatch is constant and I’d say it’s accurate within one or two hundredths of a second. Anyway, I added calls to that in a few different places of the code trying to narrow down what exactly was going slowly. It didn’t lead me very far. I could see that calls to Lucene’s IndexWriter.AddDocument() went from an (aggregate) 0.1 second to 0.4 seconds. Drilling down in the Lucene code led me nowhere.

So then I decided to try to use Mono’s built-in profiler for this. I hadn’t had very good experiences with it in the past, but I decided to give it another shot. I ran Beagle like so:

BEAGLE_HOME=/tmp/sandbox BEAGLE_EXERCISE_THE_DOG=1 MONO_EXTRA_ARGS=”–profile=default:time,file=profile-slow.out” ./beagled –fg –debug –allow-backend files –indexing-test-mode

BEAGLE_HOME tells it to run entirely within the /tmp/sandbox directory, which contained something like 3 directories and 35 files. MONO_EXTRA_ARGS is a convention we use in the Beagle wrapper scripts to pass in additional arguments to the mono command that is run. This tells us to use the default, built-in profiler with the “time” option (to measure execution time) and where to write it out. The –indexing-test-mode argument tells the Beagle daemon to run tasks in the scheduler immediately and sequentially and to shut down when the queue is empty. That gave us a fairly constant amount of time to index the sandbox.

The results took a little while to interpret — I feel that the data isn’t presented in the best way — but still very useful. Essentially I stepped through call traces as best I could, starting by looking at IndexWriter.AddDocument():

Time(ms) Count   P/call(ms) Method name
 17192.532      34  505.663   Lucene.Net.Index.IndexWriter::AddDocument(Document,Analyzer)
  Callers (with count) that contribute at least for 1%:
          34  100 % Lucene.Net.Index.IndexWriter::AddDocument(Document)

 10767.008      34  316.677   Lucene.Net.Index.IndexWriter::AddDocument(Document,Analyzer)
  Callers (with count) that contribute at least for 1%:
          34  100 % Lucene.Net.Index.IndexWriter::AddDocument(Document)

The top one is the slower one: 17.2 seconds cumulative, 506ms per call. The latter faster: 10.7 cum, 317ms per call. The next step is to see what AddDocument() calls. This was a pain in the butt, because I had to manually search the file for this. It would be much nicer if the time profiler instead showed times of the callees rather than the callers. Sysprof has a helpful tree view UI for this. It’s great to know that 100% of my AddDocument() calls come from another AddDocument() call, but where is the 100% of my time going?

Drilling down, we find the time in AddDocument() split up among several functions:

 8274.756      34  243.375   Lucene.Net.Index.DocumentWriter::InvertDocument(Document)
 1280.747      34   37.669   Lucene.Net.Index.DocumentWriter::SortPostingTable()
 650.023      34   19.118   Lucene.Net.Index.DocumentWriter::WritePostings(Posting[],string)
 520.404      34   15.306   Lucene.Net.Index.DocumentWriter::WriteNorms(string)
 454.174     104    4.367   Lucene.Net.Index.FieldsWriter::AddDocument(Document)
(and many others of decreasing time)

 4898.335      34  144.069   Lucene.Net.Index.DocumentWriter::InvertDocument(Document)
 614.078      34   18.061   Lucene.Net.Index.DocumentWriter::SortPostingTable()
 455.776      34   13.405   Lucene.Net.Index.DocumentWriter::WriteNorms(string)
 423.392      34   12.453   Lucene.Net.Index.DocumentWriter::WritePostings(Posting[],string)
 316.489     104    3.043   Lucene.Net.Index.FieldsWriter::AddDocument(Document)

Ok, the most obvious here is that DocumentWriter.InvertDocument() takes almost half the time in the faster profile, but all of the other methods here take marginally longer and that tells us something too. Most likely there is something very low-level that is slowing down all of these methods. That should have thrown up a red flag to me, but I didn’t realize it until afterward.

Now, I could probably descend further into the call stack and ultimately find the issue, but Lucene is pretty complicated and that was going to take forever. I also probably should have just written a tool to reorder the information the way I wanted it, but really my perl isn’t very good and although it would save me work in the long run, I am just plain lazy.

What I decided to do is to put the profiles up side by side and step through them, looking for the anomalies. By and large the profiles matched up. Some functions affected by the bug took a little longer than totally unrelated ones, but it was immediately obvious when I hit it:

1630.753 34 47.963 Lucene.Net.Store.Lock::Obtain(long)
Callers (with count) that contribute at least for 1%:
28 82 % .With::Run()
6 17 % Lucene.Net.Index.IndexWriter::.ctor(Directory,Analyzer,bool,bool)

23.932 34 0.704 Lucene.Net.Store.Lock::Obtain(long)
Callers (with count) that contribute at least for 1%:
28 82 % .With::Run()
6 17 % Lucene.Net.Index.IndexWriter::.ctor(Directory,Analyzer,bool,bool)

Wow. 1.6 seconds in the slow profile, 23.9 ms in the fast profile. Both are called exactly 34 times. Something is definitely amiss, and it turns out that the difference between 0.2.6 and 0.2.7 was a fix to write out the process’s PID to the lockfile for debugging purposes:

System.IO.StreamWriter w = new System.IO.StreamWriter (new Mono.Unix.UnixStream (fd, false));
w.WriteLine (System.Diagnostics.Process.GetCurrentProcess ().Id);
w.Close ();

It’s not obvious from that code what’s actually slow, so I turned back to the profile:

 1563.926      68   22.999   Mono.Unix.Native.Syscall::fsync(int)
  Callers (with count) that contribute at least for 1%:
          68  100 % Mono.Unix.UnixStream::Flush()

The calls to fysnc() takes 1.5 seconds cumulatively. That’s called by UnixStream.Flush(), which is called by UnixStream.Close(). Why does it take so long? My only guess is that the buffer cache is actually flushed out to disk on the fsync() call. That’s totally unnecessary for what we’re doing. The docs for UnixStream say that passing false in for the second parameter means that the stream is actually flushed, not closed, when Close() is called.

That, it turns out, is a lie. Looking at the code, we see that regardless of who owns the file descriptor if Close() is called, the fd is closed. Turns out that was a good thing for us; passing in false is a bug in Beagle and should have resulted in us leaking file descriptors.

I changed the code to use only Mono.Unix.Native calls directly:

string s = System.Diagnostics.Process.GetCurrentProcess ().Id.ToString () + "\n";
ptr = Mono.Unix.UnixMarshal.StringToHeap (s);

do {
        ret = Mono.Unix.Native.Syscall.write (fd, ptr, (ulong) s.Length);
} while (Mono.Unix.UnixMarshal.ShouldRetrySyscall ((int) ret));

Those calls don’t even show up on the profile, and indexing is once again fast.

Questions for Mono guys:

  • Is closing the file descriptor in UnixStream.Close() regardless of ownership a bug in the code or in the docs?
  • Should UnixStream.Close() really call Flush() or, in light of the poor performance, is that a bug? And if it should, should its performance implications be documented?

One Trackback/Pingback

  1. joe shaw on 22 August 2006 at 10:49 am

    [...] Much shorter follow-up to my post yesterday: Robert Cheetah Wolf Love and I talked a bit about it, and we came to the determination that calling fsync() from the UnixStream.Flush() method isn’t the right thing to do. fsync() does indeed ensure that the data is written to the disk. This is usually a Bad Thing, however, because it circumvents the kernel’s own internal buffering of data. We determined that since UnixStream uses the unbuffered read() and write() syscalls instead of standard I/O, Flush() should be a no-op instead of calling fsync(). (If it were using fread() and fwrite(), it would make sense to call fflush() in the Flush() method.) [...]