Paper Review

[Paper Review] "VOS: LEARNING WHAT YOU DON’T KNOW BY VIRTUAL OUTLIER SYNTHESIS"

래훈

|2024. 7. 23. 10:36

728x90

논문 링크 : VOS

VOS: LEARNING WHAT YOU DON’T KNOW BY VIRTUAL OUTLIER SYNTHESIS 논문 리뷰 입니다.

What

가상 이상치를 합성하여 ID와 OOD의 decision boundary를 개선함으로써 OOD detection 성능을 향상시키는 새로운 framework VOS(Virtual Outlier Synthesis)를 제안하였다.
VOS는 feature space에서 class-conditional distribution의 가능성이 낮은 영역에서 가상 이상치를 sampling하여 ID-data와 합성된 이상치 사이의 uncertainty surface를 대조적으로 형성하는 새로운 Unknown-aware training 목표를 제안한다.

Why

Neural Network는 in-distribution data에 대해서 최적화 되기 때문에 OOD에 대해 취약성이 존재하여 자율주행과 같은 중요한 사안에서 재앙으로 이어질 수 있다는 데서 비롯되었다.
ID data와 OOD data를 효과적으로 구분하여 실제 환경에서 Neural Network의 신뢰성과 안전성을 향상시키는 것을 목표로 한다.

How

가상 이상치 합성 : feature space에서 class-conditional distribution을 추정하고 low-likelihood 영역에서 가상 이상치를 sampling한다. 이 방법은 고차원 픽셀 공간에서 이상치를 생성하는 것 보다 더 다루기 쉽다.
Unknown-Aware Training Objective : ID data와 합성된 이상치 사이의 uncertainty surface를 대조적으로 형성하여 ID task와 OOD detection을 모두 최적화 한다. energy score를 기반으로 ID data와 OOD data를 구분한다.
추론 시간 OOD 탐지 : Logistic Regression Uncertainty branch를 사용하여 ID object와 OOD object를 구별하기 위한 확률 점수를 생성한다.

1. Introduction

현대의 Deep Neural Network는 'Known' 상황에서 전례없는 성공을 거두었다. 하지만, 'Unknown'의 상황을 처리하는데 종종 어려움을 겪는다.
특히, 신경망은 모델에서 예측해서는 안되는 'Unknown' 범주를 out-of-distribution(OOD) test input에 대해 높은 사후확률을 생성하는 것으로 나타났다.
(a)에서 moose(OOD)를 pedestrian(ID)으로 예측하는 문제가 있다.
Neural Network는 in-distribution(ID) data에 대해서 최적화 되기 때문에 Unknown에 대한 명확한 지식 부족 때문에 OOD에 대한 취약성이 존재한다.
ID-data : (b)와(c)에서의 회색 점은 three class-conditional Gaussian(three-way softmax classifier)으로 구성되었다.
(b)의 red shade를 참고하면 높은 ID Score를 가지는 것을 알 수 있다
- ID-data로부터 멀리 떨어진 영역에 대해 overconfident되어 OOD Detection에 문제를 일으킨다.
(c)의 그림처럼 ID-data에는 낮은 uncertainty, 다른곳에서는 높은 OOD uncertainty의 decision boundary를 학습해야한다.

Proposal

VOS(Virtual Outlier Synthesis)라는 새로운 Unknown-aware learning framework를 제안한다.

ID task와 OOD detection 두가지 모두를 최적화할 수 있다.
outlier synthesis와 합성된 이상치를 사용한 효과적인 model 정규화를 위한 세가지 구성 요소로 이루어져 있다.
이상치를 합성하기 위해 feature space에서 class-conditional distribution를 추정하고 ID(in-distribution) class의 가능성이 낮은 영역에서 outlier를 sampling한다.
- 이 방법의 핵심은 고차원의 pixel space에서 이미지 합성하는 것 보다 feature space에서 sampling하는 것이 더 다루기 쉽다는 것이다.
ID-data와 합성된 이상치 사이의 uncertainty surface를 대조적으로 형성하는 새로운 unknown-aware training 목표를 제안한다.
VOS는 학습하는 동안 ID task(e.g. classification or object detection)와 OOD uncertainty 정규화를 동시에 수행한다.
추론시간 동안, 불확실성 추정 branch는 ID data에 더 큰 확률점수를 생성하고 OOD에는 낮은 확률점수를 생성하여 효과적인 OOD detection이 가능하다.

Advantages

VOS는 기존 solution에 비해 몇가지 강력한 이점이 존재한다.

기존의 방식이 image classification에 중점을 두었던 반면에 VOS는 object detection과 image classification 둘다 효과적이다.
VOS는 적응형 이상치 합성을 가능하게 하기 때문에 데이터 수집이나 cleaning이 필요없이 유연하고 편리하게 사용가능하다. , 기존의 방식은 충분히 많은 보조 이미지 데이터셋이 필요하고, ID-data와 겹치지 않도록 data cleaning이 필요하다.
VOS는 이상치를 합성하여 ID와 OOD 사이의 compact decision boundary를 추정할 수 있다. , 기존의 방법은 너무 사소하여 OOD estimator를 정규화 하기 어렵거나 ID-data와 분리하기 어려운 이상치를 사용하여 최적의 성능을 내지 못했다.

2. Problem Setup

VOS는 bounding box가 전체 이미지 일 때 image classification으로 쉽게 일반화 될 수 있다.

input space : $\mathcal{X} = \mathbb{R}^d$
input image : $\mathbf{x} \in \mathcal{X}$
bounding box coordinates : $\mathbf{b} \in \mathbb{R}^4$
label space : $\mathcal{Y} = {{1,2,\dots, K}}$
semantic label : $y \in \mathcal{Y}$
Unknown joint distribution $\mathcal{P}$ 에서 추출 된 in-distribution Data $\mathcal{D}=\{(\mathbf{x}_{i}, \mathbf{b}_{i}, y_i)\}^N_{i=1}$
Parameter : $\theta$
- bounding box regression : $p_\theta(\mathbf{b}|\mathbf{x})$
- classification : $p_\theta(y|\mathbf{x},\mathbf{b})$

OOD detection은 in-distribution과 out-of-distribution 객체를 구별하는 이진 분류 문제로 공식화 될 수 있다.

$P_{\mathcal{X}} : \mathcal{X}$ 의 marginal probability distribution
test input : $\mathbf{x}^{\ast} \sim P_{\mathcal{X}}$
Object Detector Prediction : $\mathbf{b}^\ast$

Goal : $p_\theta(g|\mathbf{x}^{\ast},\mathbf{b}^\ast)$를 예측하는 것

$g = 1$ : in-distribution
$g = 0$ : out-of-distribution, $\mathcal{Y}$에 포함되지 않음.

3. Method

How to synthesize the virtual outliers : 가상 이상치 합성 방법 (3.1)
How to leverage the synthesized outliers for effective model regularization : 효과적인 모델 정규화를 위해 합성된 이상치를 활용하는 방법 (3.2)
How to perform OOD detection during inference time : 추론시간동안 OOD detection을 수행하는 방법 (3.3)

3.1 VOS: Virtual Outliers Synthesis

VOS는 외부 데이터에 의존하지 않고, 모델 정규화를 위한 가상 이상치를 생성한다.
GAN과 같은 생성 모델은 고차원 픽셀 공간에서 이미지를 합성하는 것은 최적화하기 어렵기에 그 대신 feature space에서 가상 이상치를 합성하기 때문에 저차원에서 더 다루기 쉽다.

assumption

Object instance의 feature representation이 class-conditional multivariate Gaussian distribution을 형성한다.
$$p_\theta(h(\mathbf{x}, \mathbf{b})|y = k) = \mathcal{N}(\pmb {\mu}_k ,\Sigma)$$
$\pmb{\mu}_k$ : Gaussian mean of class $k \in {1,2, \dots, K}$
$\Sigma$ : tied covariance matrix
$h(\mathcal{x}, \mathcal{b}) \in \mathbb{R}^m$ : object instance $(\mathbf{x}, \mathbf{b})$의 latent representation ($m$ < $d$ (input dimension))
latent representation을 추출하기 위해 neural network의 두번째 layer를 사용한다.

class-conditional Gaussian의 parameter를 추정하기 위해 ${(\mathbf{x}_{i}, \mathbf{b}_{i}, y_i)}^N_{i=1}$에서 ${\widehat{\pmb{\mu}}_{k}}, \widehat{\Sigma}$를 계산한다.
$$
\begin{align}
& \widehat{\pmb{\mu}}_{k} = \frac{1}{N_k} \sum_{i: y_{i}= k} h(\mathbf{x}_{i}, \mathbf{b}_{i}) \tag{1} \\
& \widehat{\Sigma} = \frac{1}{N} \sum_{k} \sum_{i:y_i=k}(h(\mathbf{x}_{i}, \mathbf{b}_{i}) - \widehat{\pmb{\mu}}_k)(h(\mathbf{x}_{i}, \mathbf{b}_{i}) - \widehat{\pmb{\mu}}_{k})^\top \tag{2}
\end{align}
$$

$N_k$ : class $k$에서 object의 수
$N$ : 전체 object의 수
효율적인 학습을 위해 online estimation을 사용한다.
각 class로부터 $\vert{Q_k}\vert$(queue size)개의 object instance를 포함하는 class-conditional queue를 유지한다.
각 반복마다 object의 embedding을 class-conditional queue에 삽입하고, queue의 크기를 유지하기 위해 삽입한 수 만큼의 object embedding을 제거한다.

Sampling from the feature representation space

feature representation space에서 sampling한 가상 이상치가 ID data와 OOD data의 decision boundary를 더 compact하게 추정하는데 도움을 줄 것이다.
이를 위해 추정된 class-conditional distribution의 $\epsilon$-likelihood 영역에서 가상 이상치 $\mathcal{V}_k$ 를 sampling 한다.

$$
\begin{align}
\mathcal{V}_k = \{\mathbf{v}_k \mid \frac{1}{(2\pi)^{m/2}|\widehat\Sigma|^{1/2}} \exp \left( -\frac{1}{2} (\mathbf{v}_k - \widehat{\pmb{\mu}}_k)^\top \widehat\Sigma^{-1} (\mathbf{v}_k - \widehat{\pmb{\mu}}_k) \right) < \epsilon\} \tag{3}
\end{align}
$$

$\mathbf{v}_{k} \sim \mathcal{N}(\widehat{\mu}_k,\widehat\Sigma)$ : class $k$에서 sampling된 가상 이상치

Classification outputs for virtual outliers

$\mathbf{v} \in \mathbb{R}^m$ : sampled virtual outlier
$\mathbf{v}$를 입력으로 받아서 classification branch의 output 계산
$$
f(\mathbf{v};\theta) = W_{cls}^\top\mathbf{v}\tag{4}
$$

$W_{cls} \in \mathbb{R}^{m \times K}$ : fully connected layer의 마지막 가중치 행렬

3.2 Unknown-Aware Training Objective

visual recognition task를 수행하면서 in-distribution data에는 낮은 OOD score를 부여하고, 가상 이상치에 대해서 높은 OOD score를 부여하도록 모델을 정규화 한다.

Uncertainty regularization for classification

다중 클래스 분류에서의 정규화
Log Partition Function : $E(\mathbf{x};\theta) := - \log \sum_{k=1}^{K} \exp(f_k(\mathbf{x};\theta))$
$$
p(y|\mathbf{x}) = \frac{p(\mathbf{x}, y)}{p(\mathbf{x})} = \frac{e^{f_y(\mathbf{x};\theta)}}{\sum_{k=1}^{K}e^{f_k(\mathbf{x};\theta)}}
$$
$f_y(\mathbf{x};\theta)$ : Label $y$에 해당하는 logit output의 $y$번째 요소
$f_k(\mathbf{x};\theta)$ : class k에 대한 logit output

Energy Function(threshold = 0)을 기반으로 ID data는 negative energy value 합성된 이상치는 positive energy value를 가지도록 하는 것이 목표이다.

$$
L_{\text{uncertainty}} = \mathbb{E_{\mathbf{v} \sim \mathcal{V}}} \mathbf{1} \{ E(\mathbf{v};\theta) > 0 \} + \mathbb{E}_\mathbf{x\sim\mathcal{D}} \mathbf{1} \{ E(\mathbf{x};\theta) \le 0 \}
$$

density를 추정하는 것보다 간단한 목표이다.
0/1 Loss Function은 다루기 쉽지 않기 때문에 0/1 Loss의 smooth approximation인 binary sigmoid loss를 사용한다.

$$
L_{\text{uncertainty}} = \mathbb{E}_{\mathbf{v} \sim \mathcal{V}} \left[ - \log \frac{1}{1 + \exp(-\phi(E(\mathbf{v}; \theta)))} \right]+\mathbb{E}_{\mathbf{x} \sim \mathcal{D}} \left[ - \log \frac{\exp(-\phi(E(\mathbf{x}; \theta)))}{1 + \exp(-\phi(E(\mathbf{x}; \theta)))} \right] \tag{5}
$$

$\phi(\cdot)$ : nonlinear MLP function : 유연한 energy 표면 학습 가능하다.
학습과정에서 불확실성 표면을 형성하여 ID data에 대해서는 높은 확률을, 가상 이상치에 대해서는 낮은 확률을 예측한다.
uncertainty regularization loss는 hyperparameter가 필요 없고, 실제로 사용하기 더 쉽다.
VOS는 OOD detection을 위해 확률 점수를 생성한다.

Object-level energy score

object detection case에서 image-level energy를 object-level energy score로 대체 할 수 있다.
ID object $\mathbf{(x, b)}$ 에 대한 energy score 정의

$$
E(\mathbf{x}, \mathbf{b};\theta)= -\text{log}\sum_{k=1}^{K}w_{k}\cdot \text{exp}^{f_k((\mathbf{x},\mathbf{b});\theta)} \tag{6}
$$

$f_{k}((\mathbf{x}, \mathbf{b})\theta) = W_{\text{cls}}^{\top}h(\mathbf{x}, \mathbf{b})$ : class $k$에서 classification branch의 logit output
virtual outlier에 대한 energy score도 유사한 방법으로 정의할 수 있다.
$w_k$ : dataset에서 클래스 불균형을 해결하기 위한 가중치

Overall training objective

standard object detection loss와 uncertainty regularization loss를 결합하는 것이 목적이다.

$$
\min_\theta\mathbb{E}_{(\mathbf{x},\mathbf{b},y) \sim \mathcal{D}} \quad [\mathcal{L}_{\text{cls}} + \mathcal{L}_{\text{loc}}] + \beta \cdot L_{\text{uncertainty}} \tag{7}
$$

$\beta$ : uncertainty regularization의 weight
$\mathcal{L}_{\text{cls}} , \mathcal{L}_{\text{loc}}$ : classification loss와 bounding box regression loss
$\mathcal{L}_\text{loc}$가 없다면 classification task로 간단화 할 수 있다.

3.3 Inference-time OOD Detection

추론 하는 동안 Logistic Regression Uncertainty branch의 output을 OOD detection에 사용한다.
test input $\mathbf{x}^\ast$ 가 주어지면 object detector는 bounding box prediction $\mathbf{b}^*$ 를 생성한다.

$$
p_\theta(g|\mathbf{x}^{\ast}, \mathbf{b}^{\ast}) = \frac{\exp^{-\phi(E(\mathbf{x}^{\ast}, \mathbf{b}^{\ast}))}}{1+\exp^{-\phi(E(\mathbf{x}^{\ast}, \mathbf{b}^{\ast}))}} \tag{8}
$$

$p_\theta(g|\mathbf{x}^{\ast}, \mathbf{b}^{\ast})$ : OOD uncertainty score

thresholding machanism을 사용하여 ID와 OOD 객체를 구분하였다.
threshold $\gamma$는 ID data의 high fraction(95%)이 올바르게 분류되도록 선택된다.
$$
G(\mathbf{x}^{\ast}, \mathbf{b}^{\ast}) = \begin{cases}
1 & \text{if } p_\theta(g|\mathbf{x}^{\ast},\mathbf{b}^{\ast}) \ge \gamma \\
0 & \text{if } p_\theta(g|\mathbf{x}^{\ast},\mathbf{b}^{\ast}) < \gamma
\end{cases} \tag{9}
$$

VOS Algorithm 1.

Experiment

Evaluation On Object Detection

ID training data : PASCAL VOC, BDD-100K
OOD data : MS-COCO, OpenImages(validation)
ResNet-50, RegNetX-4.0GF backbone architecture에서 학습하였다.

Metrics

FPR95(false positive rate) : ID sample의 positive rate가 95%일 때 OOD sample이 ID로 분류되는 비율
AUROC : model이 ID와 OOD를 구별하는 능력 측정
mAP : ID task에서 객체 탐지 성능을 평가

Table 1.

VOS는 다른 OOD 탐지 방법들과 비교하여 우수한 성능을 보였다.

GAN-synthesis와 비교하여 VOS는 Pascal VOC에서는 13.40%, BDD-100k에서는 12.76% OOD detecton 성능이 향상되었다.
VOS는 기존의 in-distribution task(mAP)에서 높은 정확도를 유지하면서 더 강력한 OOD detection 성능을 달성하였다.

Table 2.

다른 outlier synthesis 방법들과 VOS 비교 결과

$\text{i}^\diamond$ : pixel space에서 outlier 합성

GAN 기반 합성 방법은 unstable했고, Mixup 방법은 Object detection 성능이 나빴다.

$\text{ii}^\natural$ : noise를 outlier로 사용

Gaussian noise는 유망했지만, 비교적 너무 단순하며 VOS처럼 ID와 OOD사이의 decision boundary를 효과적으로 정규화하지 못할 수 있다.

$\text{iii}^\clubsuit$ : Negative Proposal(객체가 포함되지 않은 영역)을 outlier로 사용

ID data에 분포적으로 가깝기 때문에 효과적이지 않다.

Qualitative Analysis

Blue : 모델이 탐지하고 ID class로 분류한 객체
Green : VOS model이 탐지한 OOD 객체

VOS를 적용하여 OOD 객체를 더 잘 탐지하고, 잘못된 탐지를 줄였다.

5. Conclusion

VOS(Virtual Outlier Synthesis)는 OOD detection을 위한 Unknown-Aware를 학습하는 새로운 framework이다.
학습하는 동안 class-conditional distribution의 low-likelihood 영역으로 부터 virtual outlier를 sampling하여 outliers를 합성한다.
이를 활용하여 ID data와 OOD data사이의 decision boundary를 개선하여 ID task의 performance를 유지하면서 우수한 OOD detection의 성능을 보인다.
VOS는 object detection과 classification task에서 모두 효과적이고 적합하다.