Paper Review

[Paper Review] (ODIN) Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

래훈

|2024. 8. 8. 14:24

728x90

논문 링크 : ODIN

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks 논문 리뷰입니다.

What

Neural Network에서 OOD detection 성능을 향상시키는 방법을 연구한다.
ODIN(Out-of-Distribution detector for Neural Networks)이라는 새로운 방법을 제안하여, Pre-trained Network를 수정하지 않고 효과적으로 OOD detection을 할 수 있는 방법을 제시한다.

Why

현대의 신경망은 train data와 test data가 동일한 분포일 때, 일반화 성능이 뛰어나지만 실제 Application에서는 test data의 분포를 제어할 수 없는 경우가 많다.
이로 인해 신경망이 전혀 인식할 수 없는 Input에 대해서 high confidence를 가지고 잘못된 예측을 할 수 있다.
이러한 문제를 해결하기 위해 OOD detection이 중요하다.
신경망이 새로운 종류의 input에 대한 uncertainty를 인식하고 더 신뢰할 수 있는 예측을 할 수 있다.

How

Temperature Scaling
- softmax function에서 temperature scaling parameter $T$를 조정하여 ID와 OOD의 softmax score 차이를 확대한다.
Input Preprocessing
- input에 작은 perturbation을 추가하여 ID image와 OOD image간의 softmax score를 분리한다.

Introduction

현대의 Neural Network들은 동일한 분포를 가지는 train data와 test data에 대해 일반적으로 좋은 성능을 낸다.
그러나 real-world에서 test data의 분포를 제어할수 없는 경우가 많다.
따라서 out-of-distribution example들을 정확하게 감지하는 것은 visual recognition task에서 실질적으로 중요할 수 있다.
간단하게 생각할 수 있는 접근법은 OOD example를 학습하는 것이지만, 이는 real-world에서는 무수히 많은 OOD example이 존재하고, 계산비용이 비싸며 다루기 어렵다. 게다가 ID example을 잘 분류해야하며, OOD example을 잘 감지하기 위해서 model architecture가 커지고 학습과정이 복잡해진다.
baseline methond 논문에서는 softmax score를 통해 ID example과 OOD example을 구별하였다.
이 논문에서는 softmax function에 temperature scaling을 적용하고, input에 small perturbation을 적용하여 ID example과 OOD example의 차이를 더 크게 보였다.
경험적으로 parameter setting을 하였고, 간단한 분석을 통해 방법의 직관력을 보여준다.

Problem Statement

$P_{\textbf{X} \times Z}$ : $\mathcal{X} \times {0, 1}$
$P_{\textbf{X}} = P_{\textbf{X}|Z=0}$ : in-distribution image space $\mathcal{X}$
$Q_{\textbf{X}} = P_{\textbf{X}|Z=1}$ : out-distribution image space $\mathcal{X}$

pre-trained Network를 이용해 in-distribution image와 out-of-distribution image를 구별하는 문제를 고려한다.
image가 in-distribution인 경우, original image를 neural network로 분류하기만 하면 되므로, 이 논문에서는 out-of-distribution image detection 성능 향상에 집중한다.

ODIN: Out-of-distribution Detector

ODIN은 temperature scaling과 input preprocessing 두가지 구성요소를 기반으로 한다.

Temperature Scaling

$\mathcal{f} = (f_{1}, \dots, f_{N})$ : $N$개의 class를 학습한 Neural Network
$\pmb{x}$ : input image
$\hat{y}(\pmb{x}) = \text{argmax}_{i}S_{i}(\pmb{x};T)$ : 각 class에 대한 softmax output
$T \in \mathbb{R}^+$ , training할 때 $T = 1$ 이다.

$$
\begin{align}
S_i(\pmb{x};T) = \frac{\text{exp}(f_{i}(\pmb{x})/T)}{\sum_{j=1}^{N} \text{exp}(f_{j}(\pmb{x})/T)}
\tag{1}
\end{align}
$$

softmax score

input $\mathcal{x}$ 가 주어졌을 때 $S_{\hat{y}}(\pmb{x};T) = \text{max}_{i}S_{i}(\pmb{x};T)$
이 논문에서 notation으로 $S_{\hat{y}} (\pmb{x};T) = S(\pmb{x};T)$ 표현
temperature scaling을 이용하면 ID image와 OOD image의 softmax 점수를 분리하여 OOD detection을 효과적으로 수행할 수 있다.

Input Preprocessing

$$
\begin{align}
\tilde{\pmb{x}} = \pmb{x} - \varepsilon \text{sign}(-\nabla_{\pmb{x}}\text{log}S_{\hat{y}}(\pmb{x};T)) \tag{2}
\end{align}
$$

$\varepsilon$ : perturbation magnitude
input에 small perturbation을 추가하였다.
small perturbation을 추가하여 true label에 대한 softmax score를 감소시키고, 잘못된 예측을 하게 만드는 adversarial example에서 영감을 받았고, 여기에서는 softmax score를 높이는 것을 목표로한다.
perturbation의 영향이 out-of-distribution보다 in-distribution에서 더 강하기 때문에 쉽게 분리가 가능하다.
perturbation은 input에 대한 cross-entropy loss의 gradient를 back-propagate 하여 쉽게 계산할 수 있다.
cross-entropy loss 참고

Out-of-distribution Detector

위에서 두가지 요소를 결합하여 detector를 구성하였다.

각 이미지 $\pmb{x}$ 에 Eq2를 적용하여 $\tilde{\pmb{x}}$를 계산한다.
Neural Network에 $\tilde{\pmb{x}}$를 통과시켜 softmax score $S(\tilde{\pmb{x}};T)$를 계산하여 아래의 수식과 같이 threshold $\delta$와 비교한다.
$$
g(\pmb{x};\delta,T,\varepsilon) =
\begin{cases} 1 & \text{if} \; \mathrm{max}_{i} \; p(\tilde{\pmb{x}};T)\le \delta, \\
0 & \text{if} \; \mathrm{max}_{i} \; p(\tilde{\pmb{x}};T) > \delta.
\end{cases}
$$

$T, \varepsilon, \delta$는 TPR(in-distribution image를 ID image로 잘 분류한 비율)이 95%가 되도록 선택된다.

Experiments

Training Setup

Neural Network architecture로 DenseNet(depth L = 100, growth rate k = 12, dropout rate 0)과 WideResNet(depth 28, width 10, dropout rate 0)을 사용하였다.

Out-of-distribution Dataset

CIFAR-10, CIFAR-100 dataset을 In-distribution(positive) data로 사용하고, TinyImageNet, LSUN, Gaussian Noise, Uniform Noise의 Out-of-distribution(negative) data로 사용하였다.

Evaluation Metrics

FPR at 95% TPR, Detection Error, AUROC, AUPR의 평가지표를 이용하였다.
- FPR at 95% TPR : TPR이 95%일때 OOD example이 positive로 잘못 분류될 확률
- Detection Error : TPR이 95%일때 잘못분류될 확률 $P_{e}= 0.5(1-TPR) + 0.5FPR$ 로 계산된다.

Experimental Results

red curve : baseline method
blue curve : ODIN
ROC Curve에서 큰 차이가 있음을 보인다.
TPR이 95%일 때 FPR은 34%에서 4.2%로 줄어든다.

iSUN을 validation dataset으로 사용하였고 모든 setting에 T = 1000을 사용하였다.
DenseNet의 경우 $\varepsilon =$ 0.0014(CIFAR-10), $\varepsilon =$ 0.002 (CIFAR-100) 를 사용하였다.
모든 dataset에 대해 상당한 성능 향상을 관측하였다.

Discussions

Analysis on Temperature Scaling

softmax score를 Taylor Expansion

$$\begin{align}
S_\hat{y}(\pmb{x};T) &= \frac{\mathrm{exp}(f_{\hat{y}}(\pmb{x})/T)}{\sum_{i=1}^{N}\mathrm{exp}(f_{i}(\pmb{x})/T)}
\\
&= \frac{1}{\sum_{i=1}^{N}\mathrm{exp} \left (\frac{f_{i}(\pmb{x})-f_\hat{y}(\pmb{x})}{T} \right )}
\\
&= \frac{1}{\sum_{i=1}^{N}\left[1+\frac{f_{i}(\pmb{x})-f_\hat{y}(\pmb{x})}{T} + \frac{1}{2!} \frac{(f_i(\pmb{x})-f_{\hat{y}}(\pmb{x}))^2}{T^{2}}+o\left(\frac{1}{T^2}\right)\right]}
\\
& \approx \frac{1}{N - \frac{1}{T} \sum_i[f_\hat{y}(\pmb{x}) - f_i(\pmb{x})] + \frac{1}{2T^2} \sum_i[f_\hat{y}(\pmb{x}) - f_i(\pmb{x})]^2}
\tag{3}
\end{align}
$$

$U_1$과 $U_2$에 대한 해석

$$
U_1(\pmb{x})=\frac{1}{N-1}\sum_{i\neq\hat{y}}[f_{\hat{y}}(\pmb{x})-f_{i}(\pmb{x})]\quad\text{ and }\quad U_2(\pmb{x})=\frac{1}{N-1}\sum_{i\neq\hat{y}}[f_{\hat{y}}(\pmb{x})-f_{i}(\pmb{x})]^2\tag{4}
$$

$U_1$은 Neural Network의 정규화되지 않은 가장 큰 output이 나머지 output에서 벗어나는 정도를 측정하고, $U_2$는 나머지 작은 output들이 서로 얼마나 다른지 측정한다.

(a)에서 in-distribution과 out-of-distribution에 대한 $U_1$의 분포보면 in-distribution data에서의 가장 큰 output이 나머지 output과 더 큰 편차를 보이는 것을 알 수 있다.
(b)에서 in-distribution image에 대한 큰 기댓값을 확인할 수 있다. 이는 $U_1$값이 비슷한 두 이미지가 있을 때, in-distribution image가 out-of-distribution image보다 높은 $U_2$값을 가질 가능성이 높음을 나타낸다.
in-distribution image의 경우 나머지 output들이 서로 더 많이 분리되는 경향이 있다.
(f)와 (g)를 보면 CIFAR-10에서 dog image output과 cat output이 가깝지만 car, truck과는 크게 떨어져있다. 그러나 TinyImageNet(crop)의 경우 큰 output을 제외한 나머지 output들이 서로 가깝게 분포되어있다.

The effects of T

$$
S\propto(U_1-U_2/2T)/T
$$

Equation 3를 다시 쓰게 되면 위와 같은 수식으로 나타난다.
따라서 softmax score는 $U_1$과 $U_2/2T$에 의해 결정된다.
$U_1$은 in-distribution image가 더 높은 softmax score를 가지게 하고 $-U_2$이기 때문에 반대의 효과를 가진다.
따라서 충분히 큰 T를 이용해 $U_2/2T$의 부정적인 영향을 상쇄할 수 있어 in-distribution과 out-of-distribution image간의 softmax score를 더 잘 구분하게 된다.

Analysis on Input Preprocessing

temperature scaling 방법 만으로도 성능을 향상시킬 수 있지만, T가 매우 커지면 효과는 감소하게 된다.
더 나은 성능 향상을 위해 Input Preprocessing을 결합하였다.
위 그림에서 $T=1000$일 때 적절한 $\varepsilon$을 선택하면 대부분의 dataset에서 성능이 크게 향상한다.

The effects of gradient

perturbed image $\tilde{\pmb{x}}$에 대한 1차 Taylor expansion
$$\mathrm{log}S_\hat{y}(\tilde{\pmb{x}};T)=\mathrm{log}S_\hat{y}(\pmb{x};T)+\varepsilon \Vert \nabla_{\pmb{x}}\mathrm{log}S_\hat{y}(\pmb{x};T) \Vert_{1}+o(\varepsilon)$$

(c)에서 in-distribution image가 out-of-distribution image보다 norm값이 더 큰 경향이 있다.
(d)에서 in-distribution image와 out-of-distribution image가 동일한 softmax score를 가질때 $\Vert \nabla_{\pmb{x}}\mathrm{log}S_\hat{y}(\pmb{x};T) \Vert_{1}$값이 더 클 가능성이 높다는 것을 보여준다.

in-distribution image $\pmb{x}_1$과, out-of-distribution image $\pmb{x}_2$ 가 비슷한 softmax score를 가지고 있다고 가정하면, 즉 $S(\pmb{x}_1) \approx S(\pmb{x}_2)$ 일 때, Input preprocessing 후 in-distribution image가 out-of-distribution image 보다 더 큰 softmax score를 가진다.
이는 $\pmb{x}_1$이 $\pmb{x}_2$보다 더 큰 softmax gradient norm을 가지기 때문이다. 따라서 Input Preprocessing 후 ID와 OOD image를 서로 더 잘 구분하게 된다.

The effect of $\varepsilon$

$\varepsilon$이 충분히 작을 때 perturbation을 추가하는 것은 예측에 변화를 주지 않지만, 무시할수 없을 정도일 때 in-distribution과 out-of-distribution image들의 softmax score의 차이가 $\Vert \nabla_{\pmb{x}}\mathrm{log}S_\hat{y}(\pmb{x};T) \Vert_{1}$에 영향을 받을 수 있다.
아주 큰 $\varepsilon$을 사용하게 되면 예측 성능이 저하될 수 있다.

Conclusion

Neural Network에서 OOD data를 잘 탐지하는 간단하고 효과적인 방법을 제안하였다.
retraining을 요구하지 않고, baseline method 보다 성능을 향상시켰다.
다양한 parameter를 경험적으로 분석하고, 접근 방식에 대한 insight를 제안하였다.