Apple Expands AI Capabilities with Open-Source Model for 3D Image Conversion

Apple has released an open-source model named SHARP that can reconstruct a photorealistic 3D scene from a single 2D image in under a second. According to 9to5Mac, the study titled 'Sharp Monocular View Synthesis in Less Than a Second' outlines how SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene.

This process is executed in less than a second on a standard GPU through a single feedforward pass of a neural network. The 3D Gaussian representation produced by SHARP allows for real-time rendering, yielding high-resolution images for nearby views while maintaining consistent distances and scales in real-world terms.

Experimental results demonstrate that SHARP achieves robust zero-shot generalization across datasets, setting new state-of-the-art benchmarks by reducing LPIPS by 25 to 34 percent and DISTS by 21 to 43 percent compared to the best prior models.

The model predicts a full 3D scene representation from a single image, allowing for plausible rendering from nearby viewpoints, although it does not synthesize entirely unseen parts of the scene, which keeps the model efficient.

Apple trained SHARP on extensive synthetic and real-world data to recognize common patterns of depth and geometry. The model is available on GitHub, and users have begun to share their own results from testing it, highlighting its impressive capabilities.

The SHARP project page and the related paper can be found on GitHub, providing users with the opportunity to experiment with this innovative technology.

Apple Expands AI Capabilities with Open-Source Model for 3D Image Conversion

Full Transcript

More Info...