Wow. Recreating the Shawshank Redemption prison in 3D from a single video, in real time (!). Just read the MASt3R-SLAM paper and it's pretty neat. Here's the TL;DR: → These folks basically built a real-time dense SLAM system on top of MASt3R, which is a transformer-based neural network that can do 3d reconstruction and localization from uncalibrated image pairs. → The cool part is they don't need a fixed camera model -- it just works with arbitrary cameras -- think different focal lengths, sensor sizes, even handling zooming in video (FMV drone video anyone?!). If you've done photogrammetry or played with NeRFs you know that is a HUGE deal. → They've solved some tricky problems like efficient point matching and tracking, plus they've figured out how to fuse point clouds and handle loop closures in real-time. → Their system runs at about 15 FPS on a 4090 and produces both camera poses and dense geometry. When they know the camera calibration, they get SOTA results across several benchmarks, but even without calibration, they still perform well. → What's interesting is the approach -- most recent SLAM work has built on DROID-SLAM's architecture, but these folks went a different direction by leveraging a strong 3D reconstruction prior. Seems to give them more coherent geometry, which makes sense since that's what MASt3R was designed for. → For anyone who cares about monocular SLAM and 3D reconstruction, this feels like a significant step toward plug-and-play dense SLAM without calibration headaches -- perfect for drones, robots, AR/VR -- the works! Link to code release and paper in comments below.
Innovations Advancing 3d Scene Reconstruction
Explore top LinkedIn content from expert professionals.
Summary
Innovations in 3D scene reconstruction are transforming how we digitally recreate and understand the real world. This field leverages advanced technologies like neural networks, Gaussian splatting, and monocular dynamic reconstruction to build high-quality, real-time 3D models for applications in AR/VR, robotics, and more.
- Explore flexible models: New methodologies, such as allowing Gaussian primitives to move freely in space and time, are enabling accurate reconstructions of dynamic scenes with complex motions.
- Focus on real-time processing: Advancements like MASt3R-SLAM achieve real-time dense 3D reconstruction, revolutionizing fields like drone navigation, virtual reality, and robotics.
- Adopt hybrid techniques: Combining radiance fields and traditional 3D modeling methods is driving faster rendering speeds and achieving high-quality visualizations for large-scale and dynamic scenes.
-
-
What if we stopped forcing 3D objects to have a "home base" in computer vision? 🎯 Researchers just achieved a 4.1dB improvement in dynamic scene reconstruction by letting Gaussian primitives roam free in space and time. Traditional methods anchor 3D Gaussians in a canonical space, then deform them to match observations, like trying to model a dancer by stretching a statue. FreeTimeGS breaks this paradigm: Gaussians can appear anywhere, anytime, with their own motion functions. Think of it as the difference between animating a rigid skeleton versus capturing fireflies in motion. The results are striking: - 29.38dB PSNR on dynamic regions (vs 25.32dB for previous SOTA) - Real-time rendering at 450 FPS on a single RTX 4090 - Handles complex motions like dancing and cycling that break other methods This matters beyond academic metrics. Real-time dynamic scene reconstruction enables everything from better AR/VR experiences to more natural video conferencing. Sometimes constraints we think are necessary (like canonical representations) are actually holding us back. One limitation: the method still requires dense multi-view capture. But as we move toward a world of ubiquitous cameras, this approach could reshape how we capture and recreate reality. What rigid assumptions in your field might be worth questioning? Full paper in comments. #ComputerVision #3DReconstruction #AIResearch #MachineLearning #DeepLearning
-
Here's my 2024 LinkedIn Rewind, by Coauthor: 2024 proved that 3D Gaussian splatting isn't just another tech trend - it's transforming how we capture and understand the world around us. From real-time architectural visualization to autonomous vehicle training, we're seeing practical implementations I could only dream about a year ago. Through my "100 Days of Splats" project, I witnessed this technology evolve from research papers to real-world applications. We saw: → Large-scale scene reconstruction becoming practical → Real-time rendering reaching 60+ FPS → Integration with game engines and VFX pipelines → Adoption by major companies like Meta, Nvidia, and Varjo Three posts that captured pivotal developments: "VastGaussians - First Method for High-Quality Large Scene Reconstruction" Finally bridging the gap between research and AEC industry needs "This research is specifically tailored for visualization of large scenes such as commercial and industrial buildings, quarries, and landscapes." https://lnkd.in/gvgpqMNe "2D Gaussian Splatting vs Photogrammetry" The first radiance fields project producing truly accurate geometry "All in one pipeline I can generate a radiance field, textured mesh, and fly renderings - all in less than an hour" https://lnkd.in/geprBw6j "HybridNeRF Development" Pushing rendering speeds while maintaining quality "HybridNeRF looks better than 3DGS and can achieve over 60 FPS framerate" https://lnkd.in/gcqdE4iD Speaking at Geo Week showed me how hungry the industry is for practical applications of these technologies. We're no longer asking if Gaussian splatting will be useful - we're discovering new uses every day. 2025 will be about scaling practical applications - from AEC to geospatial to virtual production. The foundation is laid; now it's time to build. To everyone exploring and pushing the boundaries of 3D visualization - your experiments today are tomorrow's innovations. Keep building, keep sharing, keep pushing what's possible. #ComputerVision #3D #AI #GuassianSplatting #LinkedInRewind
-
Shape of Motion 4D Reconstruction from a Single Video paper page: https://buff.ly/3S9Zroj Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
-
🚀 Exciting Research in Computer Vision: Depth Anything V2 🚀 I recently explored the latest research on Depth Anything V2, a cutting-edge model that significantly enhances how computers perceive depth in images using just a single photo. Here’s a summary of the impressive advancements: 🔍 Key Improvements: Training with Synthetic Data: The model leverages highly detailed, computer-generated images for training, resulting in enhanced accuracy. Enhanced Teacher-Student Model: By creating a powerful 'teacher' model to generate labeled images, smaller and more efficient 'student' models are trained to deliver superior performance. Faster and More Accurate: Depth Anything V2 outperforms previous models in speed and accuracy while requiring less computational power. 🧪 My Experimentation Insights: The model excels with synthetic data, likely due to its synthetic training dataset. Performance on real images is good, but there's room for improvement, especially with non-human objects. 💡 Possible Applications of This Research: Autonomous Driving: Helps self-driving cars understand surroundings and estimate distances to objects for better obstacle detection, navigation, and collision avoidance. Robotics: Improves robot perception for better navigation and interaction with environments, aiding in understanding space layout and object manipulation. Medical Imaging: Enhances MRI and CT scans by providing better depth perception from 2D slices, improving diagnostic accuracy and surgical planning. In summary, Depth Anything V2 is a significant step forward in depth perception for images, offering enhanced accuracy and efficiency. I look forward to seeing how it continues to evolve and impact the field of computer vision. #AI #ComputerVision #Innovation #Research #DepthAnythingV2 #Depth #DeepLearning #Data #DataScience #syntheticData #Training #MedicalImaging #Robotics #AutomonusDriving
-
I thought you all might appreciate seeing the evolution of Nerfs and Splats, from Nvidia InstantNerf way back in June 2022 to the impressive results we achieve today with Splat Markov Chain Monte Carlo (MCMC)! The first two photos showcase an InstantNerf created from a video I took on an iPhone 13 Pro while drifting down a river in France, aligned using colmap. While the rendition is beautiful, the water color style in the NeRF left much to be desired... Fast forward just two years, and that same dataset now produces a remarkably realistic second set of photos using Splat MCMC with a high-quality alignment via RealityCapture. Here's a short GIF that showcases a glimpse of the camera movement freedom this river dataset provides after being trained: https://lnkd.in/gyNH4F3a The results have come so far that this simple dataset is now almost a plausible virtual production environment. I’m really excited to see the excellent papers from this year's SIGGRAPH, CVPR, and ECCV bring these scenes even further with higher fidelity, multi-GPU training, and especially animateable/dynamic environments. The future looks bright! #VirtualProduction #Nerfs #Gaussiansplat #SplatMCMC #NvidiaInstantNerf #3DRendering #RealityCapture #Colmap #VirtualScouting #VFX #TechInnovation #VisualEffects #3DModeling #DigitalProduction #Innovation #FutureOfFilm #Film