NVIDIA is looking to take the sting out of creating virtual 3D worlds with a new artificial intelligence model. GET3D can generate characters, buildings, vehicles and other types of 3D objects, NVIDIA says. The model should be able to whip up shapes quickly too. The company notes that GET3D can generate around 20 objects per second using a single GPU.
Researchers trained the model using synthetic 2D images of 3D shapes taken from multiple angles. NVIDIA says it took just two days to feed around 1 million images into GET3D using A100 Tensor Core GPUs.
The model can create objects with “high-fidelity textures and complex geometric details,” NVIDIA’s Isha Salian wrote in a blog post. The shapes GET3D makes “are in the form of a triangle mesh, like a papier-mâché model, covered with a textured material,” Salian added.
Users should be able to swiftly import the objects into game engines, 3D modelers and film renderers for editing, as GET3D will create them in compatible formats. That means it could be much easier for developers to create dense virtual worlds for games and the metaverse. NVIDIA cited robotics and architecture as other use cases.
The company said that, based on a training dataset of car images, GET3D was able to generate sedans, trucks, race cars and vans. It can also churn out foxes, rhinos, horses and bears after being trained on animal images. As you might expect, NVIDIA notes that the larger and more diverse the training set that’s fed into GET3D, “the more varied and detailed the output.”
With the help of another NVIDIA AI tool, StyleGAN-NADA, it’s possible to apply various styles to an object with text-based prompts. You might apply a burned-out look to a car, convert a model of a home into a haunted house or, as a video showing off the tech suggests, apply tiger stripes to any animal.
The NVIDIA Research team that created GET3D believes future versions could be trained on real-world images instead of synthetic data. It may also be possible to train the model on various types of 3D shapes at once, rather than having to focus on one object category at a given time.
