For my final project, I’ve begun working in collaboration with David Gochfeld. After abandoning my initial idea, I came up with a new direction that was much more music-composition-based, much at the same time that David was moving in a similar direction, so it made sense to combine forces. David’s got a much better write-up of the project idea, so to put it briefly: we wanted to use the Kinect to track participants’ movement through a room and then feed that movement into a music generation engine. Instead, I’m going to focus on a few of the technical details he didn’t cover, in particular, how we tracked participants.
The problem: We’re working with the Kinect’s depth camera, but we’re mounting it overhead, so most of the Kinect’s built-in person identification/tracking tools are useless to us. (Related problem: I have a somewhat stubborn interest in implementing things from scratch.)
The solution: write our own light-weight blob detection algorithm, purely based on depth data. It didn’t need the sophistication of something like OpenCV, but it did need to be fairly adaptable as we wanted to quantify different aspects of the data. Looking around, I found this page to be a particularly helpful reference, and much of my work draws from the concepts as described there.
Simple Blob Detection
First, apply a basic depth threshold so that everything above a given line (in this case, a few inches above the floor) is ignored. With a top-down view of an empty space, this groups people pretty cleanly.
Now, iterate through all the pixels until you find a pixel that is above the threshold, but that hasn’t been assigned a blob index yet. Set this pixel as belonging to blob number 1 (in this case, red), and then perform a recursive flood fill in all directions to find the other pixels in this blob. Continue where you left off, checking each pixel, but now ignoring any pixels that are already in blob number 1.
When we encounter another unassigned blob pixel, assign that one to blob number 2. You’ll need to flood fill in all directions to catch things like that long bit that extends off to the left.
Continue this process, and soon you’ll have all the blobs filled in. One caveat: If you actually implement this recursively (i.e. with a setBlobIndex function that calls a bunch of setBlobIndex functions on its neighbors), then you may get stack overflow exceptions when enough people enter the frame (also the resulting Processing error message is inscrutable). To get around this, we stored a list of cells that we wanted to fill next, and then filled them in order.
In the video below, we’ve implemented exactly that. The color of the blobs is mapped to their relative size (the biggest one will always be red and a blob of size 0 would be purple-red), and the white circle is the highest point (determined right after the blobs have been assigned). Note that the system currently doesn’t have a persistent sense of who people are or how they move. That will be fixed next.
Tracking Individual People
For everything we were doing, we realized that having a persistent sense of who people were was crucial. This involved creating a class for participants that could keep track of changes in velocity, assigned musical instrument, and so on, and keeping a list of people in the scene. First, we calculate the centers of mass for all of our blobs identified at the end of the last step, by averaging their pixel x and y values. Then, we get rid of all blobs that are too small, and then look at all our blobs, see if there’s an existing person whose last center of mass was closest, and then update that person with the new blob data. If no person matches, then create a new person with the blob, and if people are left over, then delete them. From here, we were able to quickly measure things such as velocity, position, bounding box dimensions, and so on. In addition, building our own person tracking system allowed us to do more simpler person tracking, and also exceed the Kinect’s five-person limit for skeleton tracking.
The code is on Github, if you want to peruse.