Nvidia’s New AI Model can Convert still Images to 3D Graphics

Nvidia’s New AI Model can Convert still Images to 3D Graphics

Nvidia's technology can help train robots and self-driving cars, or create 3D settings for games, and animations with more ease.

Nvidia has made another attempt to add depth to shallow graphics. After converting 2D images into 3D scenes, models, and videos, the company has turned its focus to editing. The GPU giant today unveiled a new AI method that transforms still photos into 3D objects that creators can modify with ease. Nvidia researchers have developed a new inverse rendering pipeline, Nvidia 3D MoMa that allows users to reconstruct a series of still photos into a 3D computer model of an object, or even a scene. The key benefit of this workflow, compared to more traditional photogrammetry methods, is its ability to output clean 3D models capable of being imported and edited out-of-the-box by 3D gaming and visual engines.

According to reports, other photogrammetry programs will turn 2D images into 3D models, Nvidia's 3D MoMa technology takes it a step further by producing mesh, material, and lighting information of the subjects and outputting it in a format that's compatible with existing 3D graphics engines and modeling tools. And it's all done in a relatively short timeframe, with Nvidia saying 3D MoMa can generate triangle mesh models within an hour using a single Nvidia Tensor Core GPU.

David Luebke, Nvidia's VP of graphics research, describes the technique with India Today as "a holy grail unifying computer vision and computer graphics."

"By formulating every piece of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses the machinery of modern AI and the raw computational horsepower of NVIDIA GPUs to quickly produce 3D objects that creators can import, edit, and extend without limitation in existing tools," said Lubeke.

With this, Nvidia says that its technology is "one of the first models of its kind to combine ultra-fast neural network training and rapid rendering." As mentioned in its blog, Instant NeRF can learn a high-resolution 3D scene in seconds, and "can render images of that scene in a few milliseconds." This is touted to be "more than 1,000x speedups" than regular NeRF processes seen to date.

What Is a NeRF?

According to Nvidia, NeRFs use neural networks to represent and render realistic 3D scenes based on an input collection of 2D images. Collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity's outfit from every angle — the neural network requires a few dozen images taken from multiple positions around the scene, as well as the camera position of each of those shots.

In a scene that includes people or other moving elements, the quicker these shots are captured, the better. If there's too much motion during the 2D image capture process, the AI-generated 3D scene will be blurry. From there, a NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space. The technique can even work around occlusions — when objects seen in some images are blocked by obstructions such as pillars in other images.

The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them. It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on. Beyond NeRFs, NVIDIA researchers are exploring how this input encoding technique might be used to accelerate multiple AI challenges including reinforcement learning, language translation, and general-purpose deep learning algorithms.

More Trending Stories 

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net