This weekend I read that the MBTA and Massachusetts Department of Transportation had released a trial real-time data feed for the positioning of vehicles on five of its bus routes. This is very important data to have, and while obviously everyone would like to see more routes added, it’s a start.

I decided to hack together a mashup of this data with Google Maps, to see how easy it would be. In the end it took me a few hours on Saturday to get the site up and running, and a couple more on Sunday adding features like the drawing of routes on the map, colorizing markers for inbound vs. outbound buses, and adding reverse geocoding of the buses themselves.

MBTA Real-time bus info

To do this I used three technologies (Google App Engine, JQuery, Google Maps) and two data sources (the real-time XML feed and the MBTA Google Transit Feed Specification files).

Google App Engine

App Engine is so perfectly suited for smaller, playtime hacks like this that it’s hard to imagine how anyone got anything done before it existed. The tedious, up-front bootstrapping that is required in so many programming projects has been enough to completely turn me off to small, spare-time hacking projects on occasion in the past. The brilliance behind a hosted software environment is obvious, but the amount of work to build a safe, hosted system with a fairly comprehensive set of APIs seems to be such a mountain of work that in many ways I find it surprising that anyone — even, perhaps especially, Google — built it at all.

I chose the Python SDK and the programming was straightforward and easy. It takes some elements from Django, with which I am familiar from work.

JQuery

A no-brainer. Hands down the best JavaScript toolkit available. Making the AJAX calls to get route and vehicle location information was a breeze, and the transparent handling of the XML data of the real-time feed prevents me from losing the will to live — a common feeling when dealing with XML.

My only complaint is with the documentation. While the API reference is good for any given piece of the API, the examples are a little light and there is absolutely zero cross-referencing to other parts, especially ones not a part of JQuery itself. It was not obvious, for example, how to deal with the XML document returned by the AJAX call. It sounds like the docs are getting some work, though, so this will hopefully improve.

Google Maps

This was my first endeavor with the Maps API, and it’s good. It’s not the best API in the world, but it’s hardly the worst either. Adding markers of different colors is annoying, but not so onerous as to make it tedious. The breadth of functionality provided is impressive, but then again it has been around for a few years at this point. Markers are easy to add, drawing the route map is absolutely trivial with a KML file, and even the reverse geocoding — which gives you a street address given a latitude/longitude pair — is straightforward.

The docs suck, though. There’s no indication that a size or anchor position is required when creating an icon for a custom marker — required for colors other than red — and due to the minified JS files tracking down that error took longer than any other task in the project. Reverse geocoding mentions that a Placemark object will be returned, but that class doesn’t appear anywhere in the reference documentation.

Real-time feed data

Lots to like. Straightforward, easy to parse. It’d be nice if I didn’t have to do the reverse geocoding to figure out what the street address is, but it’s not a dealbreaker. Main downside is that it’s XML as opposed to JSON. And of course, it’s only 5 bus routes and zero subway and commuter rail routes.

MBTA Google Transit Feed Specification files

A comprehensive set of data describing every transit route, every stop, and every route in the MBTA system. An impressive set of data encoded in a format designed for Google Transit. There is a set of example tools to view and manipulate this data, and one of those translates this data into a KML file for use with Google Maps. I should have tweaked the tools to output only the KML for the routes I cared about, but I did this by hand instead… not a big deal for only 5 bus lines. These KML files are fed into the Google Maps API to display the route as a blue line on the map when selected.

POKE 47196, 201

This is what a lot of programming is like now, for better and for worse.

On the one hand it is the perfect example of high-level component-oriented programming. Data is formatted in easily parseable interchange formats and plugged into well-defined interfaces. These interfaces plug into other interfaces. The result is a zoomable, pannable map with real-time bus location information that updates every 15 seconds. The lines-of-code count is around 100 including both Python and JavaScript. With a few hours work, I built something modestly useful out of nothing. I stand on the shoulders of giants.

On the other hand I didn’t really build anything. This is just assembly line programming. It was not a particularly creative endeavor, and it wasn’t challenging intellectually. Anybody could have done it. It’s cool, but there is little sense of accomplishment in the end product. It feels a little hollow.

Which is not to say that I didn’t enjoy it, or that it wasn’t worth the effort. I learned new technology, I played with software and data that I hadn’t had the opportunity to before. I broadened my horizons, however slightly. And it got me to write this blog post.

I discovered a way to copy some podcasts onto my 5th gen iPod Nano without using iTunes, since I was not at the Mac I use to sync it. It utilizes the voice memos feature, so it probably won’t work on other iPod models. It probably will work on Windows, though.

Connect your iPod to the Mac, and turn on hard drive access in iTunes if it’s not already on.

You’ll need your music or podcast to be in AAC (.m4a) format. If you have an MP3, go into iTunes preferences -> General -> Import Settings… and set the “Import using” encoder to AAC. Then select the tracks and go to Advanced -> Create AAC Version.

Copy the new .m4a file into the Recordings directory on the iPod, and give it a name like “20090101 120000.m4a”.

Eject your iPod and go to the voice memos feature. In your list of voice memos you should see something like “1/1/2009 12:00 PM” — it corresponds to the filename above — and press play. Voilà!

It’s not an elegant solution, but it’ll give you something new to listen to on your subway ride home when you’re desperate.

I have ranted about the iPhone’s horrible iPod interface in the past, and any improvement they can make is certainly welcome. But the improvements in the iPhone OS 3.0 update seem more half-assed than a true solution. Yes, the ability to skip back 30 seconds will be nice, but it’s still a ham-fisted solution to the problem of exact scrolling inside a 70 minute podcast. And the “scrubber” interface seems complicated and error-prone.

The frustrating thing is that Apple already has the One True User Interface for playing audio: the click-wheel. With its handling of acceleration you can both seek through hours of audio extremely quickly while still giving you the one-second resolution to seek to the exact point you want, and I don’t understand why it isn’t emulated on the iPhone and iPod touch. At this point, I have to believe that there is some limitation of the touchscreen hardware which prevents it. Sigh.

The other day at work we encountered an unusual exception in our nightly pounder test run after landing some new code to expose some internal state via a monitoring API. The problem occurred on shutdown. The new monitoring code was trying to log some information, but was encountering an exception. Our logging code was built on top of Python’s logging module, and we thought perhaps that something was shutting down the logging system without us knowing. We ourselves never explicitly shut it down, since we wanted it to live until the process exited.

The monitoring was done inside a daemon thread. The Python docs say only:

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. “

Which sounds pretty good, right? This thread is just occasionally grabbing some data, and we don’t need to do anything special when the program shuts down. Yeah, I remember when I used to believe in things too.

Despite a global interpreter lock that prevents Python from being truly concurrent anyway, there is a very real possibility that the daemon threads can still execute after the Python runtime has started its own tear-down process. One step of this process appears to be to set the values inside globals() to None, meaning that any module resolution results in an AttributeError attempting to dereference NoneType. Other variations on this cause TypeError to be thrown.

The code which triggered this looked something like this, although with more abstraction layers which made hunting it down a little harder:

try:
    log.info("Some thread started!")
    try:
        do_something_every_so_often_in_a_loop_and_sleep()
    except somemodule.SomeException:
        pass
    else:
        pass
finally:
    log.info("Some thread exiting!")

The exception we were seeing was an AttributeError on the last line, the log.info() call. But that wasn’t even the original exception. It was actually another AttributeError caused by the somemodule.SomeException dereference. Because all the modules had been reset, somemodule was None too.

Unfortunately the docs are completely devoid of this information, at least in the threading sections which you would actually reference. The best information I was able to find was this email to python-list a few years back, and a few other emails which don’t really put the issue front and center.

In the end the solution for us was simply to make them non-daemon threads, notice when the app is being shut down and join them to the main thread. Another possibility for us was to catch AttributeError in our thread wrapper class — which is what the author of the aforementioned email does — but that seems like papering over a real bug and a real error. Because of this misbehavior, daemon threads lose almost all of their appeal, but oddly I can’t find people really publicly saying “don’t use them” except in scattered emails. It seems like it’s underground information known only to the Python cabal. (There is no cabal.)

So, I am going to say it. When I went searching there weren’t any helpful hints in a Google search of “python daemon threads considered harmful”. So, I am staking claim to that phrase. People of The Future: You’re welcome.

Tags: , , ,

From the Safari 4 beta release:

* Full History Search, where users search through titles, web addresses and the complete text of recently viewed pages to easily return to sites they’ve seen before;

If you’ve been a Beagle user in the last 3 years this has been supported for Firefox and Epiphany users. But I wouldn’t mind seeing Firefox have this sort of indexing and search built-in either… the AwesomeBar was a great first step in that direction.

Tags: , , , , ,

« Older entries