DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks

Computer Graphics Forum (EGSR 2021)

Thomas Neff
Graz University of Technology

Pascal Stadlbauer
Graz University of Technology

Mathias Parger
Graz University of Technology

Andreas Kurz
Graz University of Technology

Joerg H. Mueller
Graz University of Technology

Chakravarty R. Alla Chaitanya
Facebook Reality Labs

Anton Kaplanyan
Facebook Reality Labs

Markus Steinberger
Graz University of Technology

[Paper on CGF]

[ArXiv]

[EGSR Presentation (at 1:37:23)]

[EGSR Slides]

[Code, Datasets]

[Two Minute Papers]

Abstract & Method

The recent research explosion around Neural Radiance Fields (NeRFs) shows that there is immense potential for implicitly storing scene and lighting information in neural networks, e.g., for novel view generation. However, one major limitation preventing the widespread use of NeRFs is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS when aiming for real-time rendering on current devices. We show that the number of samples required for each view ray can be significantly reduced when local samples are placed around surfaces in the scene. To this end, we propose a depth oracle network, which predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, a dual network design with a depth oracle network as a first step and a locally sampled shading network for ray accumulation. With our design, we reduce the inference costs by up to 48x compared to NeRF. Using an off-the-shelf inference API in combination with simple compute kernels, we are the first to render raymarching-based neural representations at interactive frame rates (15 frames per second at 800x800) on a single GPU. At the same time, since we focus on the important parts of the scene around surfaces, we achieve equal or better quality compared to NeRF.

Real-Time View Synthesis

Due to our novel depth oracle sampling scheme, DONeRF achieves quality similar to NeRF, which uses a total of 256 samples. At only 4 samples (comparison to NeRF below), DONeRF achieves a speedup of 20x-48x at the same quality. Click / Drag the Sliders to compare various outputs between DONeRF, NeRF and Ground Truth Blender renderings.

Depth Oracle Predicition

Our Depth Oracle predicts multiple potential sampling candidates along each ray by discretizing the space along rays and predicting sampling probabilities along rays. The 3 color channels encode the 3 highest probabilities along the ray - gray values illustrate that there is likely only a single surface that should be sampled, while colorful values indicate that samples need to be spread out in depth. Even a relatively coarse depth prediction is sufficient for DONeRF to place samples efficiently.

FLIP Comparison

We use the FLIP error estimator to produce error maps that model how likely humans would perceive errors when "flipping" between an image and the target output. DONeRF shows similar or better results at significantly lower performance requirements.

Video Summary

Paper and Supplementary Material

T. Neff, P. Stadlbauer, M. Parger, A. Kurz, J. H. Mueller, C. R. A. Chaitanya, A. Kaplanyan, M. Steinberger
DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks
(published in Computer Graphics Forum (EGSR 2021))

[Bibtex]

Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.