In mid-2014, Google released a new camera app with a compelling computational photography feature for simulating a shallow depth of field.
Traditionally, capturing this effect requires a traditonally thick, multi-element lens with a wide aperture. High-end smartphones like the iPhone 5s or Nexus 5 have demonstratively shown better results in this area — for instance, it's possible to get a shallow-ish depth of field on macro shots — but the general rule of thumb is that devices are limited by sensor area and the height of the optical stack on an ever-more-thin phone formfactor.
Computational photography describes a growing research field where light may be captured, modified, or post-processed to achieve new effects, either in real-time or post-processing. One focused area of computer vision research deals with deconstructing scenes to understand depth.
With depth data, a simulation of the bokeh effect is possible with perceptually reasonable results. In short, each RGB pixel has a variable blur applied to it according to distance. Although a variable blur is not the same process by which a lens produces this physical phenomenon, an untrained eye might not notice the difference. A more complex problem is generating a depth map from a monocular camera. Below is a sample image by an Intel colleague showing the original image (without bokeh) and extracted depth.
The HTC One M8 also recently introduced a lens blur feature. This implementation relies on extra hardware: the M8 has two cameras on the back of the phone for stereo vision. In this instance, the problem of solving for depth is trivial.
But how does Google get similar or better results with a monocular camera? The crux of the feature uses a technique known to computer vision researchers as structure-from-motion (SfM).
The primary difference between SfM and a dual-camera capture is that SfM solves for 3D geometry using a multi-view-stereo approach. This method complicates the user experience as there's an explicit scan gesture necessary where the app collects a set of photos constituting the multiple views. Google has recently published a paper in CVPR 2014 showing how SfM can be leveraged to build 3D data from motion as insignificant as hand jittering (about ~3mm of movement on average). While an explicit guided motion will yield better data, this research illustrates a solid attempt at mitigating the additional cognitive overhead of moving while framing a shot.
It's also worth noting that SfM on mobile has been tried in the past — by none other than Stanford professor and imaging wizard Mark Levoy with SynthCam — although SynthCam requires a very manual process during capture.
New Implementations
Other computational photography apps have also been published recently, particularly on iOS. Notable in this area is Seene, an app designed to create "3D" photos. Seene seems as though it's using view interpolation (instead of more complex mesh or depth computation) since photos are processed within a few milliseconds. Both Google's app and Seene guide the user through the scanning process. There's a slight learning curve in understanding the failure modes during capture: neither handles too much motion or textureless scenes.
All said, these algorithms help bring new types of functionality to existing devices without additional hardware. I view the genesis of 'software-defined' computational photography applications as a function of the growing power and efficiency of smartphone processors. It would be easy to see how a sluggish experience could easily kill the enjoyment of these apps. Specialized camera hardware side-steps issues involving motion, performance, and data quality (in reference to the cameras I work with at Intel, or Google's Tango phone), but there's a still lot to enjoy about novel camera functionality without the extra hardware.
As computational photography APIs emerge as a first-class citizen on platforms like Android (see the HAL v3 API), there's many more unrealized possibilities (e.g. multi-flash). After all, many of these techniques and algorithms have been kicking around the computer vision community for years. It's quite exciting to watch as photographic effects beyond Instagram filters make their way through social media.
Comment -