We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.
We demonstrate pix2gestalt's zero-shot amodal completion & segmentation capability on occlusions ranging from natural images to post-impressonist art. Please click on the videos to visualize our amodal reconstructions.
We show qualitative results on Amodal COCO.
We show qualitative results on Amodal Berkeley Segmentation Dataset.
Occlusions in visual art often challenge priors on the natural image manifold. We showcase examples ranging from surrealist paintings to black & white photographs.
While occluders are often elements of positive space, they can function as negative space too. Such cases challenge physical priors. We show negative space occlusions within sculptures, Magritte’s paintings, and one created by Harry Potter’s invisibility cloak.
pix2gestalt can often uncover physical states of the world plausible within occlusions. In these examples, water accumulating on the tooth brush and the baby's weight marks on the sofa are captured.
While older methods can solve the checker-shadow illusion too, we sanity check whether our method can de-occlude its checkerboard with correct texture.
Amodal perception is crucial for many downstream applications in vision, graphics, and robotics. We show practical examples from robotics and autonomous driving.
We found that our approach has limitations in situations that require commonsense or physical reasoning. Notice the car going in the wrong direction, or the following samples that contradict physics.
@article{ozguroglu2024pix2gestalt,
title={pix2gestalt: Amodal Segmentation by Synthesizing Wholes},
author={Ege Ozguroglu and Ruoshi Liu and D\'idac Sur\'s and Dian Chen and Achal Dave and Pavel Tokmakov and Carl Vondrick},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}