Google recently released a new camera app with a compelling feature called Lens Blur, simulating a shallow depth of field usally seen on photos taken from a dSLR.
Traditionally, a shallow depth of field translates to a nice lens with a wide aperture. High-end smartphones like the iPhone 5s or Nexus 5 have been getting better in this area — for instance, it's possible to get a shallow-ish depth of field on macro shots — but the general rule of thumb is that thin devices are limited by sensor area and the height of the optic stack.
Computational photography describes a growing research field where light may be captured, modified, or post-processed to achieve new effects, either in real-time or post-processing. One focused area of computer vision research deals with deconstructing scenes to understand depth. By applying some well-known algorithms in this domain to photography, a simulated depth of field (among other effects) is possible.
With depth data, a physically-based simulation of the bokeh effect of a lens aperture is possible. In doing so, each RGB pixel has a variable blur applied to it according to z-distance and a few other parameters (like focal plane). This algorithm is generally well known, so the harder problem is generating depth from a monocular camera. Below is a sample image by a colleague showing the original image (without post-processed blur) and extracted dense depth map.
The HTC One M8 announced a few weeks ago similarly introduces a lens blur feature. This implementation relies on extra hardware: the M8 has two cameras on the back of the phone to enable stereo vision. In this case, the problem of solving for depth becomes tractable, but what's up with this new Google camera application? Indeed it is using a technique known to computer vision researchers as structure-from-motion (SfM). Speaking in terms of mobile photography, SfM does place a burden on the user, however.
The main difference between SfM and a dual-camera capture is that SfM solves for 3D geometry using a multi-view-stereo approach. This method complicates the user experience as there's an explicit scan gesture necessary where the app collects a set of photos constituting the multiple views. Though a well-studied method, Google has recently published a paper in CVPR 2014 showing how SfM can be leveraged to build 3D data from motion as insignificant as hand jittering (about ~3mm of movement on average). Naturally a scan gesture will yield better data, though this research illustrates how the UX impact might be mitigated with smarter algorithms. It's also worth noting that SfM on mobile has been tried in the past — by none other than imaging wizard Mark Levoy with his SynthCam app — although SynthCam requires a very manual process during capture.
Around the same time as the release of the app, Google published an open specification to their depth data format, also shared with the Tango device. A few posts on Google+ by a product manager on one of the vision teams at Google confirmed photos taken using the Lens Blur feature embed the depth metadata in the original image. In the weeks following the release of the app, several creative web applications have appeared showcasing a variety of other depth-enabled effects beyond blur — like 3D parallax or complete scenes rendered with a mesh. Below is a GIF made from lens blur data with the Depthy web app:
I think this kind of data transparency is critical to create compelling experiences and creative remixes. From the perspective of a data artist or creative coder, new tools to collect or generate data are some of the key enabling technologies. As another concrete example, take this point-cloud rendered city generated from Google's Street View API. Open data is great.
Other computational photography apps have also been published recently, particularly on iOS. Notable in this area is Seene, an app designed to create "3D" photos. Seene seems as though it's using view interpolation (instead of more complex mesh or depth computation) since photos are processed within a few milliseconds. Both Google's app and Seene guide the user through the scanning process. There's a slight learning curve in understanding the failure modes during capture: neither handles too much motion or textureless scenes.
All said, these algorithms help bring new types of functionality to existing devices without additional hardware. I view the genesis of 'software-defined' computational photography applications as a function of the growing power and efficiency of smartphone processors. It would be easy to see how a sluggish experience could easily kill the enjoyment of these apps. Specialized camera hardware side-steps issues involving motion, performance, and data quality (in reference to the cameras I work with at Intel, or Google's Tango phone), but there's a still lot to enjoy about novel camera functionality without the extra hardware.
As computational photography APIs emerge as a first-class citizen on platforms like Android (see the HAL v3 API), there's many more unrealized possibilities (e.g. multi-flash). After all, many of these techniques and algorithms have been kicking around the computer vision community for years. It's quite exciting to watch as photographic effects beyond Instagram filters make their way through social media.