일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- Fast RCNN
- cl
- rl
- YAI 10기
- 컴퓨터비전
- GAN #StyleCLIP #YAI 11기 #연세대학교 인공지능학회
- RCNN
- PytorchZeroToAll
- cv
- NLP
- YAI
- CS231n
- Googlenet
- 3D
- 강화학습
- nerf
- transformer
- GaN
- Perception 강의
- Faster RCNN
- VIT
- 컴퓨터 비전
- CNN
- 자연어처리
- YAI 9기
- YAI 8기
- YAI 11기
- NLP #자연어 처리 #CS224N #연세대학교 인공지능학회
- CS224N
- 연세대학교 인공지능학회
- Today
- Total
연세대 인공지능학회 YAI
Instant Neural Graphics Primitives with Multiresolution Hash Encoding 본문
Instant Neural Graphics Primitives with Multiresolution Hash Encoding
_YAI_ 2022. 9. 26. 22:04Instant Neural Graphics Primitives with Multiresolution Hash Encoding
https://arxiv.org/abs/2201.05989
*9기 조정빈님이 3D 팀에서 작성하신 리뷰입니다.
1. Introduction
Computer graphics primitives are represented by mathematical functions→MLPs used as neural graphics primitives e.g NeRF, Neural sparse voxel fileds, deep SDF, ACORN
- The inputs of the neural network needs to be encoded(mapped) in to higher dimensions to extract high approximation quality from compact models
- [-] heuristic, structural modifications that complicate training→task specific, limit GPU performance
- Multiresolution hash encoding
- Adaptivity
- Mapping cascade of grids to fixed size array feature vectors → No structural update needed
- coarse resolution → 1:1 mapping
- fine resolution → spatial hash function automatically prioritizing sparse areas with most important fine detail
- Efficiency
- $O(1)$ hash tabel lookup
- No pointer-chasing
- Independent of task
- Gigapixel image, Neural SDF, NRC, NeRF
- Adaptivity
2. Background and Related work
Frequency Encodings
- Transformers introduced encoded scalar positions as a multi resolution sequence of $L \in \mathbb{N}$ sine and cosine functions
$$enc(x) = \Bigl( \sin(2^0x),\sin(2^1x),\dots ,\sin(2^{L-1}x), \ \cos(2^0x),\cos(2^1x),\dots ,\cos(2^{L-1}x) \Bigr)$$
Parametric Encodings
- Arrange additional trainable parameters in an auxiliary data structure e.g grid, tree and to look-up and interpolate these parameters depending on the input vector.
- Larger memory footprint for a smaller computational cost
- For each gradient → every parameter in MLP needs to be updated, but for trainable input encoding parameters, only a small number are affected
- reducing the size of the MLP, such parametric models can typically be trained to convergence much faster without sacrificing approximation quality
Coordinate Encoder
- Large Auxiliary coordinate encoder neural network (ACORN) is trained to ouput dense feature grids around $\textbf{x}$
Sparse Parametric Encodings
- Dense Grids → Consume much memory, fixed resolution
- Allocates too much memory on empty space
- Natural scenes exhibit smoothness → multi resolution decomposition
Multiresolution Hash Encoding
- Compact spatial hash table whose size can be tuned, not relies on priori knowledge or pruning during training
- Multiple separate hash tables indexed at different resolutions
3. Method
Parameter | Symbol | Value |
---|---|---|
Number of levels | ⁍ | 16 |
Max. entries per level(hash table size) | ⁍ | ⁍ |
Number of feature dimensions per entry | ⁍ | 2 |
Coarest resolution | ⁍ | 16 |
Finest resolution | ⁍ | 512~524288 |
Multi-Resolution Hash Encoding
Given a fully connected neural network , $m(\textbf{y};\Phi)$ we are interested in an encoding of its inputs $\textbf{y}=\text{enc}(\textbf{x}; \theta)$ that improves the approximation quality and training speed across a wide range of applications
Procedure
- Input coordinate $\textbf{x} \in \mathbb{R}^d$ is scaled by that level’s grid resolution and rounded down and up
- $\lfloor \textbf{x}_l \rfloor:= \lfloor \textbf{x} \cdot N_l \rfloor$, $\lceil \textbf{x}_l \rceil := \lceil \textbf{x} \cdot N_l \rceil$
- $\lfloor \textbf{x}_l \rfloor , \lceil \textbf{x}_l \rceil$ span a voxel with $2^d$ integer vertices
- Coarse levels : $V: (N_l + 1)^d \le T$
- mapping is 1 : 1
- Fine levels : $V: (N_l + 1)^d > T$
- hash function to index the array
- $h(\textbf{x})=\Biggl(\bigoplus^{d}_{i=1} x_i \pi_i \Biggr) \ \ \ \mod T$
- $h : \mathbb{Z}^d \rightarrow \mathbb{Z}_T$
- No explicit collision handling as the following $m(\textbf{y};\Phi)$ handles it
- Coarse levels : $V: (N_l + 1)^d \le T$
- Feature vectors are d-linearly interpolated according to the relative position of $\textbf{x}$ within its hypercube
- Interpolated feature vectors at each level and auxiliary inputs are concatenated
$$\text{enc}(\textbf{x}; \theta) \rightarrow\textbf{y} \in \mathbb{R}^{LF+E}$$
Choice of Grid Resolution for Each Level
- The resolution of each level is chosen to be a geometric progression between coarsest and finest resolutions $[N_{min},N_{max}]$
$$N_l :=\lfloor N_{min} \cdot b^l \rfloor$$
$$b :=\exp \Bigl( \frac{\ln N_{max}-\ln N_{min}}{L-1} \Bigr)$$
- $N_{max}$ is chosen to match the finest detail in training data
- Growth factor $b$ is usuall small $b\in [1.26,2]$
Performance vs Quality
- Memory grow linearly with $T$ but quality and performance scale sub-linearly
Implicit hash collision resolution
- Low levels
- No collisions as mapping is 1:1
- Low-resolution as features are interpolated from a widely spaced grid of points
- High levels
- Lots of collisions → average the gradients
- Visible surface that contribute highly will have larger gradients
- Lots of collisions → average the gradients
AS a result, the gradients of the more important samples dominate the collision average and the aliased table entry will naturally be optimized in such a way that it reflects the needs of the higher-weighted point.
- Capture small features thanks to fine grid resolution
Online Adaptivity
- Input $\textbf{x}$ changes during training → fewer collisions → more accurate function can be learned
- Multiresolution hash encoding automatically adapts to the training data distribution
D-Linear Interpolation
- To make ensure that the encoding and the composition with the MLP are continuous
$$m(\text{enc}(\textbf{x};\theta) ; \Phi)$$
4. Implementation
- Hash table entries at half precision(2 bytes per entry) and maintain master copy of parameters in full precision for stable mixed-precision parameter updates
- Evaluate hash tables level by level
- first level of multiresolution hash encoding for all inputs, then second level
- → only a small number of hash tables reside in caches
5. Results
6. Discussion and Future work
Concatenation vs Reduction
Concatenation allows
- Fully parallel processing of each resolution
- Keeps dimensions which helps encode useful information compared with reduction
Reduction can be favorable when
- following MLP is so large that increasing $F$ is insignificant
'컴퓨터비전 : CV > 3D' 카테고리의 다른 글
[논문 리뷰] NeRF : Representing scenes as Neural Radiance Fields for view synthesis (0) | 2023.03.04 |
---|---|
[논문 리뷰] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (0) | 2022.05.17 |