Lens Blur and Beyond

Google recently released a new camera app with a compelling feature called Lens Blur, simulating a shallow depth of field usually seen on photos taken from a dSLR.

Traditionally, capturing a shot with a shallow depth of field translates to needing a nice lens with a wide aperture. High-end smartphones like the iPhone 5s or Nexus 5 have demonstratively shown better results in this area — for instance, it's possible to get a shallow-ish depth of field on macro shots — but the general rule of thumb is that thin devices are limited by sensor area and the height of the optic stack.

Computational photography describes a growing research field where light may be captured, modified, or post-processed to achieve new effects, either in real-time or post-processing. One focused area of computer vision research deals with deconstructing scenes to understand depth. By applying some well-known algorithms in this domain to photography, a simulated depth of field (among other effects) is possible.

With depth data, a physically-based simulation of the bokeh effect of a lens aperture is possible. In doing so, each RGB pixel has a variable blur applied to it according to z-distance and a few other parameters like focal plane. This algorithm is generally well known, so the harder problem is generating depth from a monocular camera. Below is a sample image by a colleague showing the original image (without post-processed blur) and extracted dense depth map.

Lens Blur

The HTC One M8 announced a few weeks ago similarly introduces a lens blur feature. This implementation relies on extra hardware: the M8 has two cameras on the back of the phone to enable stereo vision. In this case, the problem of solving for depth becomes tractable, but what's up with this new Google camera application? Indeed it is using a technique known to computer vision researchers as structure-from-motion (SfM). Speaking in terms of mobile photography, SfM does place a burden on the user, however.

The main difference between SfM and a dual-camera capture is that SfM solves for 3D geometry using a multi-view-stereo approach. This method complicates the user experience as there's an explicit scan gesture necessary where the app collects a set of photos constituting the multiple views. Though a well-studied method, Google has recently published a paper in CVPR 2014 showing how SfM can be leveraged to build 3D data from motion as insignificant as hand jittering (about ~3mm of movement on average). Naturally a scan gesture will yield better data, though this research illustrates how the UX impact might be mitigated with smarter algorithms.

It's also worth noting that SfM on mobile has been tried in the past — by none other than Stanford professor and imaging wizard Mark Levoy with SynthCam — although SynthCam requires a very manual process during capture.

New Implementations

Other computational photography apps have also been published recently, particularly on iOS. Notable in this area is Seene, an app designed to create "3D" photos. Seene seems as though it's using view interpolation (instead of more complex mesh or depth computation) since photos are processed within a few milliseconds. Both Google's app and Seene guide the user through the scanning process. There's a slight learning curve in understanding the failure modes during capture: neither handles too much motion or textureless scenes.

All said, these algorithms help bring new types of functionality to existing devices without additional hardware. I view the genesis of 'software-defined' computational photography applications as a function of the growing power and efficiency of smartphone processors. It would be easy to see how a sluggish experience could easily kill the enjoyment of these apps. Specialized camera hardware side-steps issues involving motion, performance, and data quality (in reference to the cameras I work with at Intel, or Google's Tango phone), but there's a still lot to enjoy about novel camera functionality without the extra hardware.

As computational photography APIs emerge as a first-class citizen on platforms like Android (see the HAL v3 API), there's many more unrealized possibilities (e.g. multi-flash). After all, many of these techniques and algorithms have been kicking around the computer vision community for years. It's quite exciting to watch as photographic effects beyond Instagram filters make their way through social media.

Comment -

Profile Image

Dimitri Diakopoulos