Introducing SAM 3D: Powerful 3D Reconstruction for Physical World Images

November 19, 2025•

10 minute read

Takeaways:

We’re announcing SAM 3D. This release includes two new state-of-the-art models: SAM 3D Objects for object and scene reconstruction, and SAM 3D Body for human body and shape estimation. SAM 3D sets a new standard for grounded 3D reconstruction in physical world scenarios.
As part of this release, we’re sharing training and evaluation data, an evaluation benchmark, model checkpoints, inference code and a parametric human model. This work has potential to be used for creative applications in fields like robotics, interactive media, science, and sports medicine.
We’re also introducing the Segment Anything Playground, a new platform that makes it easy for everyone to try out the capabilities of our models and experiment with cutting-edge AI for creative media modification. Together with today's launch of SAM 3, SAM 3D will be available on the Playground for everyone to explore using their own images.
We’re also translating our research breakthroughs into product innovation. SAM 3D and SAM 3 are powering Facebook Marketplace’s new View in Room feature, helping people visualize the style and fit of home decor items, like a lamp or a table, in their spaces before purchasing.

Today, we’re excited to introduce SAM 3D — a first-of-its-kind addition to the SAM collection of models, bringing common sense 3D understanding of natural images. Whether you’re a researcher exploring new frontiers in AR/VR, a creator looking to generate assets for a game, or simply curious about the possibilities of AI-enabled 3D modeling, SAM 3D opens up new ways to interact with and understand the visual world.

This release marks a significant step forward in leveraging large scale real-world data to address the complexity and richness of the physical world. With SAM 3D, we’re introducing two new models: SAM 3D Objects, which enables object and scene reconstruction, and SAM 3D Body, which focuses on human body and shape estimation. Both models deliver robust, state-of-the-art performance, transforming static 2D images into detailed 3D reconstructions.

As part of this release, we're sharing SAM 3D model checkpoints and inference code. Coming soon, we look forward to also sharing our new SAM 3D Artist Objects (SA-3DAO) dataset for visually grounded 3D reconstruction in real world images. This novel evaluation dataset features a diverse array of paired images and object meshes, offering a level of realism and challenge that surpasses existing 3D benchmarks.

To make these advancements widely accessible, we’re introducing Segment Anything Playground, the simplest way for anyone to experiment with our state-of-the-art models for media modification. Anyone can upload their own images, select humans and objects, generate detailed 3D reconstructions, and explore the full range of features offered by our new models. The Playground also includes SAM 3, our latest foundation model that advances understanding across image and video understanding. More information about this release can be found in the SAM 3 blog post.

At Meta, we’re using these advancements in our products. SAM 3D and SAM 3 are enabling the new View in Room feature on Facebook Marketplace, helping people visualize the style and fit of home decor items in their spaces before purchasing. By broadening access to these models, we hope to inspire new possibilities for everyone — including creative projects, research, and interactive applications.

SAM 3D Objects: From a Still Image to Virtual Objects in a 3D Scene

SAM 3D Objects represents a new approach to tackling robust, visually grounded 3D reconstruction and object pose estimation from a single natural image, reconstructing detailed 3D shapes, textures, and layouts of objects from everyday images. In these images, small objects, indirect views, and occlusion are frequent, but recognition and context can assist the reconstruction where pixels alone are insufficient. Using SAM 3D Objects, people can start from an image, select any objects, and quickly generate posed 3D models. This makes it easy to precisely manipulate individual objects in a reconstructed 3D scene, or freely control the camera to view from different perspectives.

Past 3D models have been significantly limited by data availability. Compared to other modalities like text or images, the availability of rich 3D ground truth is multiple orders of magnitude smaller, and what exists primarily consists of isolated synthetic 3D assets. This has resulted in models that can generate high quality isolated 3D assets, but as 3D reconstruction models are limited to synthetic or staged settings — often with a single high-resolution object on a simple background. This approach of training on large-scale isolated 3D asset datasets provides a strong starting point, but moving beyond these simplified settings to more challenging scenarios common in everyday environments requires a new approach.

The innovation behind SAM 3D Objects comes from shattering the longstanding barrier to 3D data from the physical world with a powerful data annotation engine, and tightly coupling that with a new multistage training recipe for 3D. By building upon modern techniques recently pioneered by large language models, SAM 3D Objects demonstrates the viability of such paradigms for 3D perception, to great effect.

Unlike other modalities like text, image, or video, creating 3D ground truth from scratch requires highly specialized skills, limited primarily to 3D artists. This makes data collection in 3D significantly slower and more expensive. However, our key insight is that verifying or ranking meshes is a more accessible skill. We can thus scale by building a data engine asking annotators to rate multiple options generated by a suite of models in the loop, while routing the hardest examples to expert 3D artists to fill data blindspots. Using this data engine, we annotate physical world images with 3D object shape, texture, and layout at unprecedented scale for 3D, annotating almost 1 million distinct images and generating approximately 3.14 million model-in-the-loop meshes.

At the same time, adapting terminology from recent LLM training recipes, we recast learning from synthetic data as pre-training for 3D. For our model to work on natural images, a subsequent post-training stage is required as alignment in order to overcome the sim-to-real gap. Our data engine provides the data to fuel this post-training process. In turn, improvements to our model’s robustness and output quality make our data engine better at generating data, creating a positive feedback loop that we repeat. This tight coupling of our data-engine and post-training allows us to use general human expertise to steer the model toward capabilities beyond what’s possible through any one approach alone.

Due to the dearth of natural image distribution benchmarks for single-image 3D reconstruction of physical world objects, we collaborated with artists to build the SAM 3D Artist Objects dataset (SA-3DAO), a first-of-its-kind evaluation dataset for visually grounded 3D reconstruction in physical world images. With diverse images and objects that are significantly more challenging than existing 3D benchmarks, this evaluation set represents a new way to measure research progress in 3D, pushing the field away from staged images and synthetic assets and towards physical world 3D perception.

SAM 3D Objects significantly outperforms existing methods, generalizing well across many types of images and supporting dense scene reconstructions. In head-to-head human preference tests, it achieves at least a 5:1 win rate over other leading models. Our model can return full textured reconstructions of comparable quality within a few seconds through diffusion shortcuts and other engineering optimizations. This enables near real-time applications of 3D, such as serving as a 3D perception module for robotics.

Limitations

While SAM 3D Objects is an exciting step forward, there are several areas where the model is limited. The current moderate output resolution limits detail in complex objects. For example, attempts to reconstruct a whole person can exhibit distortion or lose detail. A natural next step would be to increase the output resolution.

Object layouts are another area where improvements can be made. SAM 3D Objects currently predicts objects one at a time and isn’t trained to reason about physical interactions, such as contact or interpenetration. Predicting multiple objects combined with appropriate losses would allow joint reasoning about multiple objects in a scene.

SAM 3D Body: Robust, Accurate and Interactive 3D Human Reconstruction

SAM 3D Body addresses the need for accurate 3D human pose and shape estimations from a single image — even in complex situations that involve unusual postures, blocked portions of the image, or multiple people. We designed SAM 3D Body to be promptable, supporting interactive inputs like segmentation masks and 2D key points, enabling people to guide and control what the model predicts.

The model leverages a new open source 3D mesh format called Meta Momentum Human Rig (MHR), which offers enhanced interpretability by separating the skeletal structure and the soft tissue shape of a human body. We build upon the transformer encoder-decoder architecture to predict MHR mesh parameters — the image encoder adopts a multi-input design to capture high-resolution details of body parts, while the mesh decoder is extended to support prompt-based prediction.

SAM 3D Body delivers accurate and robust 3D human pose and shape estimation by leveraging large-scale, high-quality data and a robust training strategy. We start with a large dataset of billions of images, using images from large-scale diverse photo collections, high-quality videos from various multi-camera capture systems and professionally constructed synthetic data. We then use a scalable, automated data engine that mines for high-value images, selecting images with unusual poses and rare capture conditions. We assembled a high-quality training dataset of approximately 8 million images, which is used to train our model to be robust to occlusions, rare postures, and diverse clothing. The model is trained using prompt-based guidance and multi-step refinement, enabling flexible user interaction and improving 2D alignment with visual evidence in the image.

SAM 3D Body stands out for its step change in accuracy and robustness, outperforming previous models on multiple 3D benchmarks. With this release, we’re also sharing MHR, the parametric human model enabling Meta’s technologies like Codec Avatars, under a permissive commercial license.

Limitations

There are several areas that warrant additional improvements. Currently, SAM 3D Body processes each individual separately, without considering multi-person or human-object interactions. This limits its ability to accurately reason about relative positions and physical interactions. A natural next step is to incorporate interactions among humans, objects, and the environment into model training. Another area is to continue to improve hand pose estimation performance. While our model has achieved significant improvements in hand pose estimation as part of the whole body estimation task, the accuracy doesn’t surpass specialized hand-only pose estimation methods.

Get started with SAM 3D Objects and SAM 3D Body

We encourage everyone to explore the capabilities of SAM 3D on the Playground, where they can upload their own images and reconstruct humans and objects in 3D. Looking ahead, our model has the potential to enhance the work of industries that rely on visual engagement and spatial understanding. We believe this impact will be especially profound in the areas of gaming, film, and robotics. We can’t wait to see the new possibilities that SAM 3D will unlock for creators, developers, and researchers everywhere.

Visit the SAM 3D Website

Read the SAM 3D Objects Research Paper

Read the SAM 3D Body Research Paper

Download SAM 3D Objects

Download SAM 3D Body

Download MHR

Explore the Playground

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.