🎨✨ Niantic, Inc. 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗷𝘂𝘀𝘁 𝗱𝗿𝗼𝗽𝗽𝗲𝗱 𝗮 𝗴𝗮𝗺𝗲-𝗰𝗵𝗮𝗻𝗴𝗲𝗿 𝗶𝗻 𝟯𝗗 𝗦𝗽𝗹𝗮𝘁 𝘀𝘁𝘆𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: 𝙈𝙤𝙧𝙥𝙝𝙚𝙪𝙨, 𝗮 𝗻𝗲𝘄 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝘁𝗲𝘅𝘁-𝗱𝗿𝗶𝘃𝗲𝗻 𝘀𝘁𝘆𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝟯𝗗 𝗚𝗮𝘂𝘀𝘀𝗶𝗮𝗻 𝗦𝗽𝗹𝗮𝘁𝘀! Creating immersive, stylized 3D worlds from real-world scenes has always been exciting—but convincingly changing geometry and appearance simultaneously? That's been the tough part. Until now. Morpheus Highlights: ✅ 𝗜𝗻𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗦𝗵𝗮𝗽𝗲 & 𝗖𝗼𝗹𝗼𝗿 𝗖𝗼𝗻𝘁𝗿𝗼𝗹: Adjust geometry and appearance separately—unlocking limitless creativity! ✅ 𝗗𝗲𝗽𝘁𝗵-𝗚𝘂𝗶𝗱𝗲𝗱 𝗖𝗿𝗼𝘀𝘀-𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 & 𝗪𝗮𝗿𝗽 𝗖𝗼𝗻𝘁𝗿𝗼𝗹𝗡𝗲𝘁: Ensures your stylizations stay consistent across views. ✅ 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗥𝗚𝗕𝗗 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹: Stylizes each frame based on previously edited views for seamless immersion. ✅ 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗺𝗲𝘁𝗵𝗼𝗱𝘀 in both aesthetics and prompt adherence, validated by extensive user studies. Imagine turning your neighborhood into a neon cyberpunk cityscape 🌃, a cozy winter lodge ❄️, or even a Minecraft village 🧱—all from just a simple text prompt! This isn't just about stunning visuals—it's about reshaping geometry and appearance independently, opening endless possibilities for immersive experiences. 📝Paper: https://lnkd.in/gGtbWQr3 👉Project: https://lnkd.in/gWWSPNAe 🎥Video: https://lnkd.in/g_KMEMe2 #AI #MachineLearning #ComputerVision #3D #Innovation #Metaverse #GaussianSplats #GenerativeAI
Innovations in 3d Scene Generation
Explore top LinkedIn content from expert professionals.
Summary
Innovations in 3D scene generation are revolutionizing how we create, transform, and interact with digital environments. By combining advanced AI methods like neural networks, text-to-3D transformations, and Gaussian splats, these breakthroughs allow for dynamic, highly customizable, and immersive 3D experiences.
- Explore creative control: Use techniques that separate geometry and appearance, enabling you to transform 3D scenes into stylized environments with simple text prompts.
- Bridge 2D and 3D: Leverage methods that convert 2D descriptions or sketches into detailed 3D representations for realistic object placement or scene enhancement.
- Rethink traditional models: Challenge conventional 3D approaches by embracing methods that allow dynamic, free-flowing representations for more natural and lifelike motion.
-
-
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.
-
What if we stopped forcing 3D objects to have a "home base" in computer vision? 🎯 Researchers just achieved a 4.1dB improvement in dynamic scene reconstruction by letting Gaussian primitives roam free in space and time. Traditional methods anchor 3D Gaussians in a canonical space, then deform them to match observations, like trying to model a dancer by stretching a statue. FreeTimeGS breaks this paradigm: Gaussians can appear anywhere, anytime, with their own motion functions. Think of it as the difference between animating a rigid skeleton versus capturing fireflies in motion. The results are striking: - 29.38dB PSNR on dynamic regions (vs 25.32dB for previous SOTA) - Real-time rendering at 450 FPS on a single RTX 4090 - Handles complex motions like dancing and cycling that break other methods This matters beyond academic metrics. Real-time dynamic scene reconstruction enables everything from better AR/VR experiences to more natural video conferencing. Sometimes constraints we think are necessary (like canonical representations) are actually holding us back. One limitation: the method still requires dense multi-view capture. But as we move toward a world of ubiquitous cameras, this approach could reshape how we capture and recreate reality. What rigid assumptions in your field might be worth questioning? Full paper in comments. #ComputerVision #3DReconstruction #AIResearch #MachineLearning #DeepLearning
-
🚀 Introducing InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes. 📚 InseRF is a groundbreaking method for generative object insertion in NeRF reconstructions of 3D scenes. By leveraging user-provided textual descriptions and a 2D bounding box in a reference viewpoint, InseRF brings your imagination to life in 3D! 💡 The approach grounds 3D object insertion to a 2D edit in a reference view, lifting it to 3D through a single-view object reconstruction method. The result? Newly generated objects seamlessly integrated into the scene, guided by monocular depth estimation priors. 🔗 Dive into the details with the paper: [InseRF Paper](https://lnkd.in/g2x665sr) 🌐 Explore further at: [InseRF Project](https://lnkd.in/gYxCFSkG) #nerf #generativeaitools #InseRF #3DScenes #GenerativeAI #NeuralNetworks #ResearchInnovation Google
-
Researchers at the Universitat Politècnica de Catalunya trained neural networks on thousands of archival sketches and photographs of Gaudí’s buildings. The models can now “guess” how an unfinished curve might wrap into a column or how a roofline could rise if the architect had kept sketching. In blind tests, some architects rated these AI completed designs as more imaginative than Gaudí’s published drawings. Apple’s recent GAUDI research project also converts a single line drawing or a short text prompt into an immersive 3D scene you can walk through in VR. The system combines two AI techniques: - Diffusion models to fill in missing visual detail, and - NeRFs (Neural Radiance Fields) to map those details onto a volumetric space. The result is a room or an entire facade that you can orbit around, inspect from new angles, and even relight. Digital artist Sofia Crespo recently projected AI generated marine patterns onto the undulating facade of Casa Batlló in Barcelona. Interactive projections and VR tours make a 19th century visionary feel contemporary to new audiences. Gaudí sketched in silence; today, algorithms can let those sketches speak - and even improvise. If you could breathe new life into any historical artwork or structure with AI, what story would you retell? #innovation #technology #future #management #startups
-
GenAI + Gaussian splats in game development. Can it become a new Procedural level generation? With Gaussian Splats using one picture you can create full world. While generating such 3D scenes is not new, doing it in real-time is so far it's unprecedented. #WonderWorld is an interesting project that is slowly trying to get closer to real-time. Based on their paper it takes 10 seconds to generate a complete scene on an A6000 GPU. Although this is not ultra-fast, it's worth noting that it's a considerable step into optimization. Based on their paper it's a mix of fast Gaussian splats, depth maps, and outpainting. The process involves taking an initial image and extracting a depth map from it. Then, similar to standard outpainting in ControlNet, a world is generated around the initial image. The difference is that the depth map is simplified and uses a limited number of "depths" for optimization. The models are trained on the image. We can then enjoy and explore the newly created scene. Adding PhysDreamer (https://lnkd.in/gm5f3TGS), which allows physical interaction with splats, will make it even more impressive. How far do you all think we are from 30-60-120 FPS? Project page (https://lnkd.in/gX7ZGAqE), which even has a demo for exploration. However, the scene rendering is done directly in the browser, which might take a while to load. Paper (https://lnkd.in/gwCggnjY) #GameDevelopment #3DScenes #GaussianSplatting #RealTimeRendering #AIGeneratedWorlds #PhysDreamer #NeuralRendering #TechInnovation #GamingFuture #AIInGames
-
**Reimagining 3D Avatars: A Leap Beyond Traditional Models** Imagine stepping into a virtual world where the characters you encounter are not just lifelike but dynamically expressive, replicating the intricate nuances of human expressions. This is not a distant dream; it's a reality we're crafting with the latest advancements in 3D avatar technology, today! *Stay tuned for the beta release of our #EmotionAI #API, that will help you to automate the creation of life-like TTS, with expressive synthesis for the voice, and embodied Digital Humans, thanks to #Microsoft.* In the realm of virtual human applications, the quest to create high-fidelity, controllable 3D head #avatars from 2D videos is reaching an exciting frontier. Traditional methods like the 3D Morphable Model (3DMM) have laid a solid foundation, but they fall short in capturing the fine-scale, asymmetric facial expressions that bring virtual characters to life. This is where their groundbreaking approach comes into play. Their innovation hinges on neural implicit fields, a technique that transcends the limitations of 3DMM by offering a more personalized representation of head avatars. This includes intricate details of facial parts like hair and the mouth interior, which are often overlooked. The real game-changer, however, is how they address the challenge of modeling faces with those elusive, fine-scale features and providing local control of facial parts, particularly for asymmetric expressions derived from monocular videos. By leveraging part-based implicit shape models, they’ve developed a novel formulation that breaks down a global deformation field into multiple local ones. This allows for local semantic rig-like control, utilizing 3DMM-based parameters and facial landmarks. Key to their approach is the use of an attention mask mechanism and a local control loss, promoting the sparsity of each learned deformation field. The result? Sharper, more nuanced, and locally controllable non-linear deformations, particularly noticeable in the mouth interior and asymmetric facial expressions. So, how does it work? They begin with a face tracker on an input video sequence, capturing key parameters of a linear 3DMM at each frame, including expression and pose codes, along with sparse 3D facial landmarks. An attention mask, pre-computed from the 3DMM, is applied to give local spatial support. Their model interprets dynamic deformations as translations of observed points to a canonical space, breaking the global deformation into localized fields. This process is refined using RGB information, geometric regularization, and their novel local control loss. In essence, they’re not just reconstructing 3D head avatars; together, we’re breathing life into them, enabling a level of expression and realism previously unattainable in virtual human applications. As you dive into this cutting-edge field, remember, you're not just creating avatars; you're shaping the future of virtual interactions.
-
The internet is flush with creatives debating the merits of text-to-video models like Sora, but this is the type of AI everyone can get behind. Teleporting your talent into virtual environments is typically a tall order. Green screen is easy, but re-lighting your subject to match a dynamic 3D environment is painful. Skin, hair, clothing all interact with lighting differently. Professionals often rely on a fancy light stage or LED capture volume (ala. Mandalorian), combined with a ton of manual compositing. Meanwhile, ML-based approaches use simplified physical models and are limited by training data. Beeble + New York University is pushing the boundaries of virtual production – making these advanced techniques accessible to all creators, while giving them fine-grain control. It's not just a PBR shader; they utilize neural rendering to emulate light transport effects like sub-surface scattering – so when light interacts with your skin, you don’t look like a waxed up cadaver in Madame Tussauds :) Paper link below!
-
🚨 Have you seen the latest LLaMA-Mesh AI tooling published this week? LLaMA-Mesh is a novel approach that integrates large language models (LLMs) with 3D mesh generation by representing 3D meshes as text. This method enables LLMs to both interpret and generate 3D meshes, effectively unifying text and 3D data within a single model. By fine-tuning LLMs to handle 3D mesh data formatted as text, LLaMA-Mesh preserves the language understanding capabilities of the models while introducing the ability to create and comprehend 3D structures. This advancement facilitates conversational 3D creation, allowing users to interact with and generate 3D content through natural language prompts. Congratulations on pushing this research area forward Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun ZHU, Sanja Fidler, Xiaohui Zheng, Tsinghua University, and NVIDIA.
-
3D world-building used to be a headache. Hours of modeling, texturing, and rendering. Now? You type a few words, and AI does the heavy lifting. Meet Intangible, the world’s first AI-native 3D platform that simplifies design in ways we never imagined: ↳ Generate and manipulate 3D environments using simple text prompts. ↳ A knowledge graph that actually understands human 3D concepts. ↳ AI agents that automate the most tedious workflows. Translation: It’s like ChatGPT for 3D design. And it’s no toy. Intangible was built by veterans from Apple, Pixar, and Unity—people who know 3D. They just raised $4M in seed funding from a16z speedrun, Crosslink Capital, and industry angels. Why is this a big deal? ↳ 3D world-building is about to get way faster and more accessible. ↳ The learning curve? Practically gone. ↳ Anyone—from game devs to marketers—can bring 3D ideas to life in minutes. AI reshaped writing, art, and video. Now it’s coming for 3D. Ludovit Nastisin Giulio Fulchignoni Phill Turner