Museums are a challenging environment for installations with gestural input.
In 2014, Trimpin and I collaborated on a piece called REDHOT. Trimpin prepared a baby grand piano with his usual assortment of solenoids, pluckers, magnetic resonators, and spinners — painted it a bright red, and then flipped the entire thing upside-down and suspended it a meter off the ground on an aluminum tripod.
One thing Trimpin and I talk about often is how to engage the attention of visitors for more than a few minutes. After 30+ years of exhibiting installations in diverse environments, Trimpin's come up with a few ideas. One of the simple tricks he likes to use is a coin collector. As soon as people commit something tangible — like a quarter — they expect something worthwhile in return. Attention is then invested in the installation to capture the audience's imagination for just a bit longer.
In a museum, visitors are there for an experience. They've already paid $10 or $20 or $50 for admission; there's a significant upfront investment: coins just don't work as well.
For installations where the piece is "though-composed" (i.e. it starts/stops and isn't interactive in the sense of real-time input), Trimpin and I have been experimenting with a simple but ubiquitous gesture to replace the investment: conducting. Conducting could be considered an intuitive gesture since it's culturally pervasive, but not necessarily natural in the HCI sense of the word.
Conducting a downbeat triggers REDHOT to play a ~30 second to two-minute composition. Here's a picture of the conducting stand that was used at the Portland Art Museum:
While a simple interaction, the range of movement you'll see from visitors can only be described as chaotic. Interactees range in age from the very young to very old. Talk about a diverse audience! Despite advances in gesture recognition and camera technology in the past few decades, robustly being able to detect conducting across a diverse audience still remains challenging.
The stand for REDHOT has an Intel depth camera embedded in the top. Hand tracking with depth gives a fairly accurate representation of position, but with any camera, framerate and field of view can be problematic. Framerate is an issue as kids tend to conduct too quickly, but also have interesting hand poses that confuse garden-variety tracking algorithms.
Another issue is field of view (FoV). Trimpin and I didn't want a screen in the stand since it might have been too distracting, although at the cost of visual bandwidth for issuing instructions. We designed a small illustration on the stand to help visitors to mitigate the problem. On the first iteration of the diagram, we observed visitors doing what we termed the jumping jack. The design of the illustration failed because people were following the arrows too literally and hands were missing the FoV of the camera entirely.
In a later iteration installed at the Tech Museum of Innovation in San Jose, CA, we made sure to put feet markers on the ground in front of the stand so the field of view was less of an issue (although skeletal tracking with the Kinect might have solved this as well, given the range of the camera)*. The graphic was changed to what you see above.
The biggest takeaway from building REDHOT is that interaction testing in the field is important and challenging. Luckily, we were able to tune and tweak across several planned exhibitions. In the same way a startup might iterate on a minimum viable product, it's critical to observe an audience using the product early on.
(* Well, a Kinect may have worked. An adjacent installation to REDHOT was using a Kinect to control an LED wall. I noticed a written note on the side saying the inteactivity wasn't working. Just as I was turning around, a young boy came up to the Kinect thinking it was a joystick and gave it a really good jostle, ripping it from the wall. Well, there's your problem!)