Beagle 0.2.3 is out. The big news about this release is that it no longer has the plague of growing memory usage that has been a problem since its inception. I’ll go more into how I tracked that down (using heap-buddy) later.
This release includes a couple new filters: a BMP image filter from Alexander Macdonald; a new video filter which uses mplayer to extract metadata, also from Alexander Macdonald; and a new “external” filter, written by me.
The idea behind the external filter is that you may want to extract data from files — maybe a specialized, local format — but you don’t want to have to learn C# to do it. Today is your lucky day! You’ll need some tool to extract text content from a file. Although Beagle already has PDF filtering, a good example of this is pdftotext from xpdf. You give it a PDF file, and it spits out just the text from the file. To set one of these up, you just add a simple XML blob to /etc/beagle/external-filters.xml:
<filter>
<mimetype>text/plain</mimetype>
<extension>.txt</extension>
<command>cat</command>
<arguments>%s</arguments>
</filter>
And that’s it! Now any files with mime type “text/plain” or extension “.txt” will be fed into “cat” and the output will be indexed. Of course, we also have plain text indexing too, so that’s not a good example either. But whatever, you get the idea.
All the information you need is included in the external-filters.xml file included.

One Trackback/Pingback
[...] I dug a little into the Mono class code and found that Thread.SetData() was adding whatever you had passed in to a thread-local Hashtable. Because the Hashtable was declared static for the Thread class, it maintained a reference on every single one of those byte arrays passed into it forever. Miguel thankfully passed the issue onto the experts, Dick and Paolo, and it was resolved right away. So if you update to the latest Mono in SVN, this issue is fixed. But since most people aren’t running it yet, I also committed a workaround to Beagle in CVS. It’s included in the 0.2.3 release I announced earlier today. [...]