Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.
Text-To-3d Generation Methods
Explore top LinkedIn content from expert professionals.
Summary
Text-to-3D generation methods are cutting-edge technologies that transform text descriptions into three-dimensional models, making 3D design more accessible and automating complex modeling tasks. These methods use advanced AI techniques like diffusion models to create detailed, customizable, and often 3D-printable outputs from simple text prompts.
- Explore advanced models: Utilize new architectures like Multi-view ControlNet or MVDream, which enhance 3D generation by solving challenges like geometric consistency and offering personalized outputs.
- Simplify 3D creation: Save time and effort by employing AI tools such as DeepShape3D that generate solid, printable 3D shapes directly from text descriptions, eliminating the need for intricate manual design work.
- Optimize for printing: Look for generation methods that automatically refine geometry, ensuring the 3D model is free of errors and ready for real-world applications like 3D printing.
-
-
🐶 Diffusion into 3D models has been an active area of research for some time. Luma AI is doing it with "Imagine 3D," Common Sense Machines is doing it on their Discord [https://lnkd.in/eq6asrZ9], etc. Getting consistent geometry and a texture that looks aesthetically pleasing is the trick. Much of the work at this point has had generative models that are too contrasty to be viable for much beyond a proof of concept. This appears to be one step closer to mass-adoption viability. abs: We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity. Project: https://lnkd.in/e4Ub9hmQ Gallery: https://lnkd.in/eepWHPb2 arXiv: https://lnkd.in/ebpt8CF9 GitHub: https://lnkd.in/ewU5s82w **the GitHub page is a placeholder for now; could change** For more like this ⤵︎ 👉 Follow Orbis Tabula • GenAI × VP × Consulting #generativeai #diffusion #3dmodeling
-
My latest AI research project, DeepShape3D, focuses on generating high-quality, ready-to-print 3D shapes directly from text or image prompts. I love 3D printing at home, but designing objects with tools like Fusion 360, Blender, or ZBrush is very time-consuming and failed prints after hours of waiting is extremely discouraging. So I took inspiration from stable diffusion models that produce amazing 2D images and set out to train a new model to generate similarly impressive 3D printable shapes. With DeepShape3D, you just describe the shape you want to 3D print and the model produces a clean, solid, printable shape ready for the real world. Here’s a simplified overview how DeepShape3D works. The model first interprets your prompt, identifying the intended structure, features, and design details of your desired shape. Next, the model applies both diffusion and transformer-based architectures to construct the shape in 3D space by calculating surfaces, contours, and details. Then the model automatically refines the shape by correcting geometry issues like holes, overhangs, and non-manifold edges so that the shape is ideal for 3D printing. Finally, you receive a downloadable .stl file that is compatible with most 3D printers. Creating an AI model like this is very resource-intensive (and therefore very expensive). So it required a lot of detailed planning broken into clear phases. Phase 1: Core Model Research (COMPLETED) * Initial experiments with diffusion-based shape-generation models on a laptop * Test small-scale inference using personal hardware and Google Colab GPUs Phase 2: Core Shape Generation (COMPLETED) * Develop an automated batch pipeline to generate 5,000 shapes from diverse prompts * Streamline cloud GPU provisioning, data handling, and checkpoint management Phase 3: 3D Shape Post-processing and Export (IN PROGRESS) * Convert raw 3D mesh data into 3D printable files * Create tools for geometry repair, simplification, and validation Phase 4: Model Fine-Tuning & Style Control * Fine-tune the base AI model on curated prompt/shape datasets * Enable anyone to specify output shape style, complexity, and resolution I'm extremely grateful to the Google Cloud for Startups team for their support. Both Damian H. Moncada and Solomon Sam have provided valuable access to powerful A100 GPU compute power along with helpful technical insights and guidance. Without them, this project wouldn't have been possible. I highly encourage AI startups to apply to join the program here: https://lnkd.in/edru7Zwr More details about the DeepShape3D AI model and how to test it are on the way! #AI #3DPrinting #Makerspace #GenerativeDesign #CloudCreativity #PromptToPrint #DeepShape3D