Skip to content

the “not null” is important

7 February 2007

I just released Beagle 0.2.16 unto the world. No new features in this release, just bug fixes. After doing a few test builds on SUSE 10.1, I rolled back the Mono requirement from 1.1.18 to 1.1.13.5. It looks like Beagle will run fine on those older releases, and it allows us to push 0.2.16 packages on released distros like SUSE 10.1 and Ubuntu Edgy without having to push a whole new Mono stack.

Things in this release include more robust JPEG, PDF, and SVG filters; a fix for a potential deadlock in Mono when running external text extractors; a fix for the KMail backend which prevented some folders from being indexed; etc. You can read the release notes for all the goodies. I suggest anyone who was running 0.2.15.1 before upgrade to this version.

One of the most annoying bugs that was fixed in this release was the looping directory of death. Essentially what happened was that Beagle would repeatedly crawl a single directory, never moving on to the next one. I thought we had fixed this for 0.2.15, but another rare case popped up. It could only be triggered if:

  1. You couldn’t write extended attributes for some reason (no support in the filesystem or no write access to the directory)
  2. You were using Sqlite 3, not Sqlite 2
  3. The directory in question began with zeros

The problem was that our nearly 3 year old Sqlite schema was wrong. Sqlite doesn’t enforce types at the column level, so when you create a table, you can put in whatever you want for the types:

CREATE TABLE my_table (
    col1   TEXT UNIQUE,
    col2   FOO,
    col3   BUTT NOT NULL
)

and Sqlite will go happily on its way. You can insert whatever you want into these columns. Sqlite 3 has “type affinity”, however, which means that values that go into columns are interpreted a certain way unless the table is declared with a given type. You can find all the nitty gritty details here.

Suffice it to say, 3 years ago we didn’t know what we were doing. We had created a table that looked somewhat like this:

CREATE TABLE file_attributes (
    unique_id   STRING UNIQUE,
    directory   STRING NOT NULL,
    filename    STRING NOT NULL
    ...
)

Well, STRING isn’t a valid type — we should have been using TEXT — and so Sqlite was using the INTEGER affinity for these columns. That’s not a problem most of the time, but it does break down if the value you are inserting is prefixed with zeros. While “foobar” goes into the table as “foobar”, “0009″ goes into the table as “9″.

Bad things happen when the value you try to retrieve later doesn’t match what you expected. I actually think I ran into this problem more than a year ago, but I didn’t quite understand the issue and ended up treating the symptom of the bug rather than the problem itself — the file attributes couldn’t be found? Just rescheduled the directory to be indexed again! Since then I’ve tweaked the order in which directories are indexed, and when this bug popped back up the result was looping on a single directory.

dBera did a great job noticing that “0009″ became “9″ in the database, and once we were able to track down the causes, the fix was straightforward. I also removed the broken workaround I had added a long time ago, so even if attributes are magically missing in the future, we’ll print an error but move on.