MENLO PARK — In 2018, Facebook launched its immersive 3D Photos feature, which initially relied on dual-lens “portrait mode” capabilities found in high-end smartphones. However, with advancements in machine learning, this feature is now accessible to users with standard 2D photos, whether from a single-lens smartphone or older images uploaded to any device.
Using state-of-the-art AI techniques, Facebook developed a system that infers the 3D structure of any 2D image. This breakthrough allows users to convert everyday photos, old family pictures, and even selfies into 3D experiences, significantly expanding the feature’s availability. It can now be accessed by anyone with a compatible iPhone or Android device through the Facebook app.
The development of this technology required overcoming several challenges, such as training a convolutional neural network (CNN) on millions of 3D image pairs with depth maps and ensuring the system could work efficiently on mobile devices. Facebook AI’s advanced tools, including FBNet and ChamNet, were instrumental in optimizing performance.
The system estimates the depth of each pixel in an image, enabling the creation of a 3D representation. Facebook employed mobile-optimized neural building blocks and an automated architecture search to develop a configuration that performs this task in under a second across various devices.
Looking forward, Facebook aims to enhance this AI technology further, enabling real-time 3D depth estimation for videos and exploring augmented reality applications. This work represents a significant step in 3D scene understanding, which could benefit a range of fields, including robotics and interactive technologies.