Notice

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

연세대 인공지능학회 YAI

Instant Neural Graphics Primitives with Multiresolution Hash Encoding 본문

컴퓨터비전 : CV/3D

Instant Neural Graphics Primitives with Multiresolution Hash Encoding

_YAI_ 2022. 9. 26. 22:04

Instant Neural Graphics Primitives with Multiresolution Hash Encoding

https://arxiv.org/abs/2201.05989

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significa

arxiv.org

*9기 조정빈님이 3D 팀에서 작성하신 리뷰입니다.

1. Introduction

Computer graphics primitives are represented by mathematical functions→MLPs used as neural graphics primitives e.g NeRF, Neural sparse voxel fileds, deep SDF, ACORN

The inputs of the neural network needs to be encoded(mapped) in to higher dimensions to extract high approximation quality from compact models
- [-] heuristic, structural modifications that complicate training→task specific, limit GPU performance
Multiresolution hash encoding
- Adaptivity
  - Mapping cascade of grids to fixed size array feature vectors → No structural update needed
  - coarse resolution → 1:1 mapping
  - fine resolution → spatial hash function automatically prioritizing sparse areas with most important fine detail
- Efficiency
  - $O(1)$ hash tabel lookup
  - No pointer-chasing
- Independent of task
  - Gigapixel image, Neural SDF, NRC, NeRF

2. Background and Related work

Frequency Encodings

Transformers introduced encoded scalar positions as a multi resolution sequence of $L \in \mathbb{N}$ sine and cosine functions

$$enc(x) = \Bigl( \sin(2^0x),\sin(2^1x),\dots ,\sin(2^{L-1}x), \ \cos(2^0x),\cos(2^1x),\dots ,\cos(2^{L-1}x) \Bigr)$$

Parametric Encodings

Arrange additional trainable parameters in an auxiliary data structure e.g grid, tree and to look-up and interpolate these parameters depending on the input vector.
Larger memory footprint for a smaller computational cost
- For each gradient → every parameter in MLP needs to be updated, but for trainable input encoding parameters, only a small number are affected
- reducing the size of the MLP, such parametric models can typically be trained to convergence much faster without sacrificing approximation quality

Coordinate Encoder

Large Auxiliary coordinate encoder neural network (ACORN) is trained to ouput dense feature grids around $\textbf{x}$

Sparse Parametric Encodings

Dense Grids → Consume much memory, fixed resolution
- Allocates too much memory on empty space
- Natural scenes exhibit smoothness → multi resolution decomposition

Multiresolution Hash Encoding

Compact spatial hash table whose size can be tuned, not relies on priori knowledge or pruning during training
Multiple separate hash tables indexed at different resolutions

3. Method

Parameter	Symbol	Value
Number of levels	⁍	16
Max. entries per level(hash table size)	⁍	⁍
Number of feature dimensions per entry	⁍	2
Coarest resolution	⁍	16
Finest resolution	⁍	512~524288

Multi-Resolution Hash Encoding

Given a fully connected neural network , $m(\textbf{y};\Phi)$ we are interested in an encoding of its inputs $\textbf{y}=\text{enc}(\textbf{x}; \theta)$ that improves the approximation quality and training speed across a wide range of applications

Procedure

Input coordinate $\textbf{x} \in \mathbb{R}^d$ is scaled by that level’s grid resolution and rounded down and up
- $\lfloor \textbf{x}_l \rfloor:= \lfloor \textbf{x} \cdot N_l \rfloor$, $\lceil \textbf{x}_l \rceil := \lceil \textbf{x} \cdot N_l \rceil$
$\lfloor \textbf{x}_l \rfloor , \lceil \textbf{x}_l \rceil$ span a voxel with $2^d$ integer vertices
- Coarse levels : $V: (N_l + 1)^d \le T$
  - mapping is 1 : 1
- Fine levels : $V: (N_l + 1)^d > T$
  - hash function to index the array
  - $h(\textbf{x})=\Biggl(\bigoplus^{d}_{i=1} x_i \pi_i \Biggr) \ \ \ \mod T$
  - $h : \mathbb{Z}^d \rightarrow \mathbb{Z}_T$
  - No explicit collision handling as the following $m(\textbf{y};\Phi)$ handles it
Feature vectors are d-linearly interpolated according to the relative position of $\textbf{x}$ within its hypercube
Interpolated feature vectors at each level and auxiliary inputs are concatenated

$$\text{enc}(\textbf{x}; \theta) \rightarrow\textbf{y} \in \mathbb{R}^{LF+E}$$

Choice of Grid Resolution for Each Level

The resolution of each level is chosen to be a geometric progression between coarsest and finest resolutions $[N_{min},N_{max}]$

$$N_l :=\lfloor N_{min} \cdot b^l \rfloor$$

$$b :=\exp \Bigl( \frac{\ln N_{max}-\ln N_{min}}{L-1} \Bigr)$$

$N_{max}$ is chosen to match the finest detail in training data
Growth factor $b$ is usuall small $b\in [1.26,2]$

Performance vs Quality

Memory grow linearly with $T$ but quality and performance scale sub-linearly

Implicit hash collision resolution

Low levels
- No collisions as mapping is 1:1
- Low-resolution as features are interpolated from a widely spaced grid of points
High levels
- Lots of collisions → average the gradients
  - Visible surface that contribute highly will have larger gradients

AS a result, the gradients of the more important samples dominate the collision average and the aliased table entry will naturally be optimized in such a way that it reflects the needs of the higher-weighted point.

Capture small features thanks to fine grid resolution

Online Adaptivity

Input $\textbf{x}$ changes during training → fewer collisions → more accurate function can be learned
Multiresolution hash encoding automatically adapts to the training data distribution

D-Linear Interpolation

To make ensure that the encoding and the composition with the MLP are continuous

$$m(\text{enc}(\textbf{x};\theta) ; \Phi)$$

4. Implementation

Hash table entries at half precision(2 bytes per entry) and maintain master copy of parameters in full precision for stable mixed-precision parameter updates
Evaluate hash tables level by level
- first level of multiresolution hash encoding for all inputs, then second level
- → only a small number of hash tables reside in caches

5. Results

6. Discussion and Future work

Concatenation vs Reduction

Concatenation allows

Fully parallel processing of each resolution
Keeps dimensions which helps encode useful information compared with reduction

Reduction can be favorable when

following MLP is so large that increasing $F$ is insignificant

'컴퓨터비전 : CV > 3D' 카테고리의 다른 글

[논문 리뷰] NeRF : Representing scenes as Neural Radiance Fields for view synthesis (0)	2023.03.04
[논문 리뷰] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (0)	2022.05.17

'컴퓨터비전 : CV/3D' Related Articles

Comments

연세대 인공지능학회 YAI

Instant Neural Graphics Primitives with Multiresolution Hash Encoding 본문

Instant Neural Graphics Primitives with Multiresolution Hash Encoding

Instant Neural Graphics Primitives with Multiresolution Hash Encoding

1. Introduction

2. Background and Related work

3. Method

Multi-Resolution Hash Encoding

Procedure

Choice of Grid Resolution for Each Level

Performance vs Quality

Implicit hash collision resolution

Online Adaptivity

D-Linear Interpolation

4. Implementation

5. Results

6. Discussion and Future work

Concatenation vs Reduction

'컴퓨터비전 : CV > 3D' 카테고리의 다른 글

티스토리툴바