PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

1Technical University of Munich
Input 04
Input 12
Geometry Input 04 Style Input 04
Geometry Input 01
"female with curly hair"
PercHead reconstructs 3D heads from a single input image and enables disentangled 3D editing using semantic maps combined with image or text-based style inputs.

Abstract

We present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing — two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. We develop a unified base model for reconstructing view-consistent 3D heads from a single input image. The model employs a dual-branch encoder followed by a ViT-based decoder that lifts 2D features into 3D space through iterative cross-attention. Rendering is performed using Gaussian Splatting. At the heart of our approach is a novel perceptual supervision strategy based on DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric and appearance fidelity. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles compared to established baselines. Furthermore, this base model can be seamlessly extended for semantic 3D editing by swapping the encoder and finetuning the network. In this variant, we disentangle geometry and style through two distinct input modalities: a segmentation map to control geometry and either a text prompt or a reference image to specify appearance. We highlight the intuitive and powerful 3D editing capabilities of our model through a lightweight, interactive GUI, where users can effortlessly sculpt geometry by drawing segmentation maps and stylize appearance via natural language or image prompts.

Method

Overview of our method showing the reconstruction and editing pipelines.

Overview of Our Method. Our framework supports 3D Reconstruction from a single image and 3D Editing from a segmentation map and style input. Both tasks share a 3D ViT decoder that lifts 2D features via iterative cross-attention, differing only in the encoder. The reconstruction model uses a dual-branch encoder with DINOv2 and a task-specific ViT; the editing model uses a segmentation ViT and injects a global CLIP style token. Outputs are rendered via Gaussian Splatting and refined with a 2D CNN, with supervision from DINOv2 and SAM2.1.

Video

Reconstruction Results

Input 01
Input 02
Input 03
Input 04
Input 05
Input 06
Input 07
Input 08
Input 09
Input 10
Input 11
Input 12

Frame-by-Frame Video Reconstruction

Input
Reconstructed Input View
Reconstructed 3D View
Input
Reconstructed Input View
Reconstructed 3D View

Interactive 3D Editing UI

Processing in the video is sped up.

Disentangled Editing: Geometry via Segmentation & Style via Image

Geometry Input 01 Style Input 01
Geometry Input 02 Style Input 02
Geometry Input 03 Style Input 03
Geometry Input 04 Style Input 04
Geometry Input 05 Style Input 05
Geometry Input 06 Style Input 06
Geometry Input 07 Style Input 07
Geometry Input 08 Style Input 08
Geometry Input 09 Style Input 09
Geometry Input 10 Style Input 10
Geometry Input 11 Style Input 11
Geometry Input 12 Style Input 12

Disentangled Editing: Geometry via Segmentation & Style via Prompt

Geometry Input 01
"female with curly hair"
Geometry Input 02
"female with dark hair"
Geometry Input 03
"female with red hair"
Geometry Input 04
"female with gray hair"
Geometry Input 05
"male kid"
Geometry Input 06
"male adult"
Geometry Input 07
"female adult"
Geometry Input 08
"middle-aged female"
Geometry Input 09
"male with no beard"
Geometry Input 10
"male with gray beard"
Geometry Input 11
"man with dark skin"
Geometry Input 12
"serious looking female"

BibTeX


      @misc{oroz2025percheadperceptualheadmodel,
        title={PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing}, 
        author={Antonio Oroz and Matthias Nießner and Tobias Kirschstein},
        year={2025},
        eprint={2511.02777},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2511.02777}, 
      }