ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

I3D 2024

Binglun Wang Niladri Shekhar Dutt Niloy J. Mitra

University College London

Video Layout

ProteusNeRF is a fast and lightweight framework for interactive editing of NeRF assets via existing image manipulation tools or generative frameworks. This is enabled by a novel 3D-aware image context that allows linking edits across multiple views. These edits take 10-70 seconds.

Abstract

Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing.

Methodology

TriplaneLite

ProteusNeRF takes in a set of posed images and encodes it as feature-distilled NeRF in a TriplaneLite representation. The user can easily select a part (yellow legos) that gets converted to a 3D mask. We generate a novel 3D-aware image context that allows editing via imaging tools while still producing view-coherent edits. This edited context is then converted back to view-consistent NeRFs by fine-tuning the TriplaneLite. The context image is updated and the process is iterated (2-3 times in our examples). Editing, particularly appearance editing, runs at interactive framerates. Please refer to Section 4 of our paper for more details.

Results

Original NeRF

"Fern in low light"

"Fern in vibrant green"

"Fern in snow weather"

"Fern in Van Gogh style"

Original NeRF

"Horns in fire"

"Horns in glowing ice"

"Horns in gold"

"Horns in paper"

Original NeRF

"T-rex made of wood"

"Remove T-rex"

Original NeRF

"Orchid in puprle"

Original NeRF

"Turn the bear into a grizzly bear"

"Turn the bear into a sloth bear"

Original NeRF

"Turn the bear doll into a polar bear"

Recoloring comparison

Original NeRF

CLIP-NeRF [1]
(~10 seconds)

DFF [2] (selection with
DFF + CLIP-NeRF)

RecolorNeRF [3]
(~120 seconds)

Ours
(10 seconds)

Comparison with InstructNeRF2NeRF [2]

Our 2×2 3D-aware image context can be used with InstructNeRF2NeRF to signficiantly reduce editing time.

Original NeRF

"Give him a cowboy hat" -IN2N

"Give him a cowboy hat" -Ours

"Give him a mustache" -IN2N

"Give him a mustache" -Ours

Layered editing

Our TriPlaneLite architecture allows a single MLP φ comprising of 4-36KB to store each edit. Thus, we can perform layered editing of NeRFs akin to image editors to enable controlled and creative workflows.

c: Original NeRF

L1(c): "Orange flower"

L2(c): "Tone mapping"

Combined: L2(L1(c))

Ablation

Results using our 2×2 3D-aware image context for NeRF editing show consistent geometry and appearance.

Single view

2×2 Iterative 3D-aware image context

Single view

2×2 Iterative 3D-aware image context

BibTeX


      @article{10.1145/3651290,
        author = {Wang, Binglun and Dutt, Niladri Shekhar and Mitra, Niloy J.},
        title = {ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context},
        year = {2024},
        issue_date = {May 2024},
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        volume = {7},
        number = {1},
        url = {https://doi.org/10.1145/3651290},
        doi = {10.1145/3651290},
        abstract = {Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing. Video results and code can be found on our project webpage at https://proteusnerf.github.io.},
        journal = {Proc. ACM Comput. Graph. Interact. Tech.},
        month = {may},
        articleno = {22},
        numpages = {17},
        keywords = {Generative AI, Interactive 3D Editing, Neural Editing, Neural Radiance Field, ProteusNeRF, Stable Diffusion Model}
      }