Skip to content

beagle and metadata

7 March 2007

Indexing external metadata
Yesterday I landed a pretty major step forward for Beagle’s handling of metadata from different sources. Up to this point it wasn’t easy to add metadata to indexed documents without adding hacks into things like the file backend or a specific filter. Now we can build Beagle backends that aren’t backed by a Lucene index, but that can still pull from a data source and generate meaningful, indexable metadata independent of the actual data’s original source.

The canonical example is Nautilus metadata. Up to this point, we had to have a hack in our file system backend which had to match up the file being indexed with an entry in the Nautilus metadata. This meant a bunch of extra code, it slowed down file indexing a little, and didn’t work when Nautilus metadata was updated after a file had already been indexed.

Yesterday’s checkin changes that.

Adding some notes in Nautilus

We now have a simple backend dedicated to Nautilus metadata, and we avoid lame hacks in the file system backend. We now watch the Nautilus metadata with inotify, and with the metadata timestamping I added to Nautilus a while back, processing the changes in real-time is easy.

Displaying this in beagle-search

The thing that is really impressive about this — which I didn’t really realize until JP pointed it out to me yesterday — is that this really only took a few days and a little over 1000 lines of code to do. And if I may toot the horn of the Beagle development community past and present, this is a strong testament to the superior quality in Beagle’s design.

Next steps
I would like to move our indexing of F-Spot’s tags over from a hack in the JPEG filter to this new system. It suffers from the same problems as the Nautilus metadata did: if you change tags in F-Spot after we’ve indexed the image, and you don’t have F-Spot configured to save those tags in the file, we’ll miss them.

I’d like to extend this beyond just F-Spot’s tags and additionally use something like Leaftag to make everything taggable.

Today we already allow any application to send data and attached metadata to Beagle to be indexed, through an easy to use API in C#, C, or Python. (I’ll blog more about that later.) It makes a lot of sense to extend these APIs so that any application can send metadata about any documents, not just ones it created. Again, F-Spot is a good candidate. Rather than Beagle pulling changes from the F-Spot database, F-Spot itself could simply push those changes into Beagle. And since F-Spot understands its own data better than Beagle ever could, it can do it much more efficiently.

Lastly, I want to add a mechanism so that important changes to metadata can be pushed back to the source. I don’t buy into the world view that there is (or ever can be) one repository for all your metadata, so having some agent which is able to push changes to metadata back to its source is a very valuable utility. Today if you use something other than Nautilus to rename a file that has notes and emblems, they disappear from Nautilus but are persisted in Beagle. Beagle should be able to push that information (”foo.txt is now bar.txt”) back into the Nautilus metadata. You can easily imagine editing tags in a Beagle-enabled application and having it automatically be pushed back into F-Spot or Leaftag. Adding notes to a document could automatically push it back into Nautilus metadata. Or fixing a typo in the artist information in an MP3 or OGG file would actually write that information into the file itself.

2 Trackbacks/Pingbacks

  1. joe shaw on 27 March 2007 at 12:50 pm

    [...] Last week I checked in code to Beagle which lets application developers set additional metadata on already indexed files, which I first mentioned a couple of weeks ago. My hope is that applications like F-Spot (with its tags) will use these APIs to proactively index their metadata, in addition to the “old fashioned” way of Beagle pulling that information when first building its indexes. [...]

  2. [...] 05. April 2007 Hacking, Beagle, Gnome Alright, before I get to the cool stuff, first things first, with current metadata system thriving,  and at Beagle’s current speed, if someone really wants to start to stabilize up the Dashboard API a little more, and start to make this something less abstract, IM,Call, Mail (either one), or even just show up in Cleveland, and I’d  be more than willing to help get Dashboard moving again. [...]