Matthew Kaney at ITP

(Specifically, Physical Computing projects)

Final Project: Play Space

Physical Computing


David Gochfeld and my final project was “Play Space”, a musically-activated space that used a Kinect to track the motion of people and transform that motion into musical output. We envisioned the project as a sort of collaborative instrument, one which would be usable even by people who just wander into the space. Along those lines, I think the project was generally successful—it quickly engaged people, but we also found that we, at least, enjoyed teasing out more sophisticated control of our musical output.

System Diagram

As far as materials were concerned, the project was quite simple. We used the Kinect camera for our tracking and a Processing sketch with the OpenNI library for interpreting the data. From there, we sent our musical output (in MIDI form) to a copy of Logic Pro X, which handled the musical instrument synthesis and final audio output. Besides the Kinect, our only hardware requirements were a decently fast computer, speakers, and the necessary cabling.

We set ourselves the challenge of creating a space that was both immediately engaging but also interesting and rich as a musical instrument. The biggest challenge was communicating to our users what was going on using only audio cues, and indeed many of our first user tests were confusing for people. Our simple instructions (a taped rectangle around the active area and the injunction to “PLAY”), seemed to help a bit—the people who moved in the most interesting ways (not coincidentally, the people with dance/movement backgrounds) created more interesting sounds and more quickly were able to feel out the possibilities for the interface. One remaining challenge is how to encourage more people to move and explore in this way.

We didn’t have very clear ideas of the exact mappings we wanted between movement and music, so we experimented in many directions through trial and error. Because of this working process, I feel like there are so many ways the ideas here could be adapted for various uses. I’ve begun to conceive of this as less of a discrete project, and more as a solid foundation for future research.

Final Project: Musical Mappings

Physical Computing

As part of David’s and my final project, we experimented with various methods for mapping position and movement of people in a space to musical output. As we explored, we were fundamentally trying to solve this list of (sometimes contradictory) challenges:

With this in mind, we partially implemented a variety of different mapping philosophies, most of which didn’t even survive our own personal testing. Not all worked for what we needed, but I think all have potential as musical interfaces. Here are approaches we tried, with a few notes:

Option 1: Position Based Pitch, No User Tracking

This is the simplest method. If there is something interesting (however your algorithm defines that) in a specific part of the screen, then play the pitch associated for that part of the screen. This can be done with just the shapes of users, or it can be done with the centers of blobs. However, this is just based on checking the on/off state of each cell every frame, so it doesn’t have a persistent memory of where people are or how they move over time.

Drawbacks: Without tracking users, we’re incredibly limited in what we can do. In particular, we have very little information over time, so we don’t have much we can do with interesting types of note attacks or decays.

Option 2: Position Based Pitch, User Tracking

Adding a sense of different users to the previous technique. This way, you can choose to only attack notes when the user moves, or enters a new cell, or something. Also, you can assign different users different instruments or voices, which we discovered was crucial for user understanding. In the end, this was the approach we took, with multiple adjustments.

Drawbacks: Because the mapping of pitches is based on screen, rather than physical, space, it’s sometimes difficult to tell when one is between pitch areas. Also, by creating a large enough pitch grid to be legible, it forces the user to move around quite a bit for a small amount of output control. We addressed this by allowing certain gestures that could adjust the pitch relative to the root pitch of the area. Also, by limiting the system to a small number of pitches, the music can be more harmonic, but it also limits the potential range, adding to the audible confusion of too many people playing notes within a constrained pitch space.

Option 3: Direction-Based Pitch

I briefly experimented with a system where movement in one direction would cause a series of ascending notes and movement in an opposing direction would cause descending notes, and vice versa. This allowed us to break up the rigid pitch grid, and the fact that the directions didn’t have to directly oppose (just mostly oppose) meant that the user could move in more interesting patterns, while running up and down the scale.

Drawbacks: Because pitch was movement-based, there was less control over when specific notes played, which confused the mapping. Also, it was difficult to play larger intervals, or specific sequences of notes, given the limitation of moving up and down relative to the user’s previously played note.

Option 4: Human Sequencer, Non-Position Based

With this system, each participant is assigned a note in a sequence. The pitch of that note could change based on position, and the length of the note could vary based on the amount of space the person occupies, but the fundamental sequence would be that Person 1’s note plays, Person 2’s note plays, and so on until it loops back to Person 1 again. By spacing out people’s contributions over time (rather than on different pitches or voices), more people can contribute to a more complex melody, without the music becoming to chaotic.

Drawbacks: The initial state (where one person enters and gets a loop of their own note, which they can then modify) is quite discoverable, I think, but also really boring. Without having three or more people, there’s little that can be done with this system. Also, by adding and removing notes to the sequence (and varying the notes’ durations), the system has a pattern, but very irregular rhythms.

Option 5: Human Sequencer, Position Based

As a variant on the previous idea, we also tried out a much more literal sequencer: we mapped the space on the floor to time and looped across, playing notes for each person the system encountered with time offsets relative to their position. To help communicate this, we added a basic beat that would play when users were in the space.

Drawbacks: This, in particular, suffered from the lack of visual feedback. It was clear that something was happening, but since the virtual “playhead” was immaterial, it was hard to tell when one’s note was about to be played. In addition, having to wait until the loop reaches the user means that exploratory gestures don’t necessarily have prompt feedback. This has potential, but it requires a user that’s more experienced/knowledgable than our intended users.

Final Project: Blob Detection

Physical Computing

For my final project, I’ve begun working in collaboration with David Gochfeld. After abandoning my initial idea, I came up with a new direction that was much more music-composition-based, much at the same time that David was moving in a similar direction, so it made sense to combine forces. David’s got a much better write-up of the project idea, so to put it briefly: we wanted to use the Kinect to track participants’ movement through a room and then feed that movement into a music generation engine. Instead, I’m going to focus on a few of the technical details he didn’t cover, in particular, how we tracked participants.

The problem: We’re working with the Kinect’s depth camera, but we’re mounting it overhead, so most of the Kinect’s built-in person identification/tracking tools are useless to us. (Related problem: I have a somewhat stubborn interest in implementing things from scratch.)

The solution: write our own light-weight blob detection algorithm, purely based on depth data. It didn’t need the sophistication of something like OpenCV, but it did need to be fairly adaptable as we wanted to quantify different aspects of the data. Looking around, I found this page to be a particularly helpful reference, and much of my work draws from the concepts as described there.

Simple Blob Detection


First, apply a basic depth threshold so that everything above a given line (in this case, a few inches above the floor) is ignored. With a top-down view of an empty space, this groups people pretty cleanly.



Now, iterate through all the pixels until you find a pixel that is above the threshold, but that hasn’t been assigned a blob index yet. Set this pixel as belonging to blob number 1 (in this case, red), and then perform a recursive flood fill in all directions to find the other pixels in this blob. Continue where you left off, checking each pixel, but now ignoring any pixels that are already in blob number 1.



When we encounter another unassigned blob pixel, assign that one to blob number 2. You’ll need to flood fill in all directions to catch things like that long bit that extends off to the left.


Continue this process, and soon you’ll have all the blobs filled in. One caveat: If you actually implement this recursively (i.e. with a setBlobIndex function that calls a bunch of setBlobIndex functions on its neighbors), then you may get stack overflow exceptions when enough people enter the frame (also the resulting Processing error message is inscrutable). To get around this, we stored a list of cells that we wanted to fill next, and then filled them in order.

In the video below, we’ve implemented exactly that. The color of the blobs is mapped to their relative size (the biggest one will always be red and a blob of size 0 would be purple-red), and the white circle is the highest point (determined right after the blobs have been assigned). Note that the system currently doesn’t have a persistent sense of who people are or how they move. That will be fixed next.

Tracking Individual People

For everything we were doing, we realized that having a persistent sense of who people were was crucial. This involved creating a class for participants that could keep track of changes in velocity, assigned musical instrument, and so on, and keeping a list of people in the scene. First, we calculate the centers of mass for all of our blobs identified at the end of the last step, by averaging their pixel x and y values. Then, we get rid of all blobs that are too small, and then look at all our blobs, see if there’s an existing person whose last center of mass was closest, and then update that person with the new blob data. If no person matches, then create a new person with the blob, and if people are left over, then delete them. From here, we were able to quickly measure things such as velocity, position, bounding box dimensions, and so on. In addition, building our own person tracking system allowed us to do more simpler person tracking, and also exceed the Kinect’s five-person limit for skeleton tracking.

The code is on Github, if you want to peruse.

Glove Runner

Physical Computing

For my Physical Computing midterm project, I worked with Lirong Liu on a glove-based game controller.

Our initial discussions revolved around a variety of gestural control schemes. In particular, we discussed various ways that hand gestures in space could be tracked, using gloves, computer vision, and other approaches. Some topics of discussion were pointing/moving in 3-d space, and various gestures for instrument control. After much discussion, we settled on a controller for a Mario-style side-scrolling game, where the player would make their character run by physically “running” their index and middle fingers along a table top. I think this gesture is attractive for a number of reasons. It has a very clear correspondence with the action performed on screen, and although the controller gives little physical feedback, the assumption that users would run in place on a table top helps ground their actions. Also, it seemed like a lot of fun.

Glove Mockup

From there, I began working with a physical prototype. To begin with, I took a pair of flex sensors (shown at left with a plug we added for better interfacing) and attached them to my fingers using strips of elastic. From this prototype, it was clear that the best sensor response came when the tip of the flex sensor was firmly attached to the fingertip and the rest of the flex sensor could slide forward and back along the finger as it bends.

Test Processing Output

Reading this sensor data into Processing, I was able to quickly map the movement to a pair of “legs” and get a sense of the motion in the context of a running character. For a standing character, we found that just two changing bend values (one for each sensor) could produce some very sophisticated, lifelike motion. Meanwhile, as I worked on the physical prototype, Lirong set up a basic game engine in JavaFX with a scrolling world and three different obstacle types.

At this point, we both worked on the software for a while, with Lirong setting up a system for converting various finger movements to discrete events (for steps, jumps, etc) and me working on various issues related to graphics and animation. In the end, we wound up using our sensor input with two different levels of abstraction: the high level (the specific running and jumping events) controls the actual game logic, while the low level (the actual bend values of the sensors) controls the character’s animation.

After that, I sewed up the two pairs of gloves shown in the video above, allowing the flex sensors to slide back and forth along the fingers. As we worked on the glove design, we tested with various users to identify potential sizing issues. From there, we built a simple, self-contained system for doing basic user control, and wired everything up.

Code can be found here:

A few challenges we faced:

A few things I think we did well:

Physical Computing Project Plan

Physical Computing

Prototype Music Composer

Last week, I put together a simple paper prototype of my musical composition machine and tried it out with my classmates to see if the interaction was understandable. In my original sketch of the design, I thought that the device would have both a musical keyboard and a text keyboard. That seemed excessive, so for this iteration, I assumed that the user would use a separate MIDI-based music input of their own choosing, and then enter text with a built-in QWERTY keyboard. Even this proved problematic, however—for many users, the small size of the keyboard seemed uncomfortable to use. It was suggested that my device could instead interface with a USB keyboard. In principle, that suggestion makes sense to me, but the idea of farming out so much of the interaction to pre-existing interfaces makes me uncomfortable with the project.

In general, the direction of my user testing tended towards complexity. Because the output media—punched paper tape— is so linear, I decided that the interface should allow for recording a finite number of musical bars before writing out those bars to paper and recording the next set. In general, testers seemed uncomfortable with this mode of working, and wanted more sophisticated score editing tools. As the interface is already too complex, I decided that my efforts would be better focused on building a tool that could convert MIDI scores to my punched paper format and letting users compose music using any of the already available programs for composing MIDI scores. A good, solid solution, but now out of scope for the class.

So, what about physical computing?

Project Plan

For my final, I think I want to shift in a related direction, which is interfaces for sequencing music. Most sequencers and arpeggiators take musical input and then transform those notes around into complex patterns. For my project, I’m interested in a system that begins with a random sequence of notes and allows the user to filter that random sequence into something cohesive. The knobs on the left will control variables for the interpretation of a randomly-generated sequence, controlling aspects such as playback speed, root tone and the distribution of high vs low notes. The buttons on the bottom can be used to optionally filter the sequence to specific notes on the chromatic scale, and the knobs on the right can be used to control the rules for how the sequence is constructed. The output of the device is a standard MIDI signal, which can then be recorded or synthesized.

I’ve worked with MIDI and the Arduino before, so I foresee the greatest challenges with this project being the layout and selection of different controls. I intend to expose a lot of different choices to the user, so laying everything out cleanly is imperative. In addition, these controls may exceed the capacity of the ATMega, so I’m interested in exploring strategies for multiplexing or alternating between inputs.


System Diagram


Initial Bill of Materials

Physical Computing Final Project Concepts

Physical Computing

Right now, I’m generally interested in interfaces for musical controllers or sequencers. Are there specific movements that could lend themselves to interesting tonal changes? What about some sort of individual components that can be physically manipulated to control some sort of musical structure?


For example, for my midterm project in Automata, I’m building a music box that transforms data punched in paper tape into voice. For that project, I’ll probably be die-cutting/laser-cutting my punched paper strips, but I’m interested in the challenge of how to build a device for punching these strips. Specifically, we have decent interfaces for inputting text and musical notation, but how would a hybrid work (in order to both perform text and lyrics)? Is it possible without a lot of tedious labor for the operator?

Blog 2: Observation

Physical Computing


In order to observe interactive technology use in the wild, I staked out a pair of document scanning stations in the NYU library. These stations, with a scanner attached to a specialized computer kiosk, allow for scanning books, chapters, articles, notes, or any of the other paper information one finds in the library. I assumed that the users (as a sample of University students and faculty) would not be experts, but would be more computer literate than the general public. I assumed the system would be used mostly for books and multi-page scans, and that it would support various digital file transfer systems (given that it wasn’t a photocopy system).

Having used many frustrating scanners in the past, I didn’t have high expectations for usability, so I was pleased that most of the people I observed could operate the mechanism with relative ease. One tap on the touch screen, and you’re prompted to pick a delivery mechanism (email, Google drive, etc). Another tap and you’re faced with a page of scan settings. Here users diverged, between those who immediately accepted the defaults and those who scrutinized their options at length. The deliberators were often those who were scanning large documents and wanted to perfect the settings at the outset (a tricky task without visual indication of the effect of a given setting). Once settings were approved, then the user pressed one more button and the machine scanned the page, automatically cropped and rotated it, and displayed it for the user to edit further. On one extreme, I saw a user approach and successfully scan a single-page document in less than 90 seconds.

At this point, I noticed one major problem with the workflow. With a scanned page open, it was unclear for many people whether the “scan” button in the corner (which they’d initially pressed) would now replace the current scan with a new scan, or add a new scan as a second page. Assuming the former, I saw multiple people instead push the “Next” button, only to backtrack when they were confronted with an interface for emailing their scanned document. In fact, the workflow for multiple pages was quite tidy—keep hitting scan on the same screen and it keeps adding pages. Once they settled into this rhythm, users generally took no more than 10 seconds a page, but the interface did little to suggest that this flow was possible.

This process constitutes a very clear interaction by Crawford’s definition: the user instructs the computer to scan a page, the computer processes and outputs an image of a scanned page, the user evaluates this image and responds to the computer accordingly. On the large scale, I felt like all the users understood the interaction. However, on the small scale, certain details caused problems. In particular, I think the touchscreen interface is lacking. It worked pretty well early on in the process, when one must choose one large button from four or six. Inevitably, though, the user wishes to actually save the scanned file, and this means typing an email address, or password, or file name on a vertically-mounted touchscreen keyboard. The users I observed were typing maybe ten words per minute, in one of the most inefficient parts of the whole process. Often, the button wouldn’t even register the tap. Whatever visual feedback was supposed to work here had failed, causing the user to wait (in hopes that the computer was stalled) before trying to press the button again. Without the physical affordances of buttons, these virtual buttons gave little indication of whether they’d been pressed, let alone what angle, force, etc., was required to effectively activate them.

Blog 1: What is physical interaction?

Physical Computing


In The Art of Interactive Design, Chris Crawford defines interaction as the process through which two actors take turns listening, thinking, and speaking, like in a conversation. After defining his terms, he explains himself with some nice examples: computers are interactive, books and movies aren’t; dancing with another person is interactive, dancing to music is not; live performances can be interactive, but are hardly ever interestingly so. This definition is pretty cut and dried, and it makes a lot of sense, especially given Crawford’s bias toward computer interfaces. If one is imagining information conveyed through sight or sound (most computer output, to be sure), it’s easy to think of discrete moments of communication back and forth with processing time in between. We imagine a horror movie viewer shouting at the unlistening screen and appreciate our interactive media all the more.

Unfortunately, once we start considering touch, everything goes murky. I can shout at a movie screen to no effect, but if I touch it, things happen. The screen will move, or resist, or transform its shape. This process of action and response sounds kind of like our interaction cycle—are all physical objects interactive?

Crawford has no patience with this line of reasoning. No, such actions don’t count. In his definition, the actors must be “purposeful”—a deceptively philosophical concept used here as shorthand for “a computer.” He dismisses the refrigerator light switch as a trivially simple interactive mechanism, before promptly changing the subject. Left unsaid: is there a machine sufficiently complex to count as interactive? A piano? A typewriter? A mechanical calculator? Do any of these machines “think” enough to really interact? In this light, Bret Victor’s “Brief Rant on the Future of Interaction Design” seems relevant. Victor is a champion of physical objects. He admires the way we manipulate them, the subtle information their weights, temperatures, textures and so on impart. Physical objects can be acted upon, and they can communicate back, but Victor never actually uses the i-word when describing hammers or jar lids.

My interpretation is that both writers are driving at the same continuum from opposite ends. On the one end are simple physical objects, which I would call reactive, and on the other are full interactive systems. And what separates the two? Nothing specific, I’d argue, especially not Crawford’s requirement that an interactive agent think or act purposely.

So, what defines interaction, then?

Most everything is reactive, meaning that it behaves (broadly speaking) like an interactive object. A hammer may not interact, strictly speaking, but it will react in specific ways, which is almost as interesting. However, if we must propose an arbitrary distinction, I believe it is this: interactivity is not determined by how elaborate an actor’s thinking ability is, but rather by how much novelty the actor injects into the interaction. We don’t think of a rock as interactive because when we move it, it moves as expected, no more or no less. The simplest interactive system may therefore be a handful of dice. Roll the dice, and the result is unexpected, forcing us to reevaluate and keeping us engaged. A system can be sophisticated or not, predictable or not, but as long as each new round adds new information, we (the humans) don’t get bored, and the system has successfully interacted.

At the same time, we shouldn’t discount merely reactive objects. As Bret Victor points out, there’s a wealth of information to be found in the ways that physical controls react to human touch. Ideally, the physical interactions we design should blend both ends of the spectrum. Small, reactive details make things more comfortable and understandable, and a broad interactive design keeps the process interesting. The “pictures under glass” that Victor criticizes are great case studies of a design that’s interactive, but insufficiently reactive.

Of course, a designed object can also be reactive instead of interactive. The computer mouse is often cited as a pinnacle of interaction, but I contend that it is less interactive (by my explanation of Crawford’s definition) than purely reactive. The mouse rarely gives the user new information. Rather, it does what it’s told, accurately, but boringly. If the person and the mouse are having a conversation, that conversation is pretty one-sided.

In light of this idea, I thought back to an electronic artwork I read about years ago, Rafael Lozano-Hemmer’s Pulse Room. This piece (where a user’s pulse controls the flickering of a lightbulb in a large array) still attracts me for the reasons it did then—it’s impressive, yet elegant, and so simple to use. However, I now wonder whether it should really be considered interactive. Crawford would probably say it was, but ultimately, it responds in exactly one way to any type of input. It’s a good response, to be sure, but when designing interactions, I feel we could stretch further. How do we respond in interesting ways that keep our users engaged? That, to me, is the fundamental question of interaction design.