[ 논문 리뷰 ] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

Recent Posts

Link

kalelPark's GitHub

« 2024/09 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Today

Total

관리 메뉴

KalelPark's LAB

[ 논문 리뷰 ] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning 본문

Data Science/Self Supervised Learning

[ 논문 리뷰 ] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

kalelpark 2023. 1. 14. 21:42

GitHub를 참고하시면, CODE 및 다양한 논문 리뷰가 있습니다! 하단 링크를 참고하시기 바랍니다.
(+ Star 및 Follow는 사랑입니다..!)

https://github.com/kalelpark/Awesome-ComputerVision

GitHub - kalelpark/Awesome-ComputerVision: Awesome-ComputerVision

Awesome-ComputerVision. Contribute to kalelpark/Awesome-ComputerVision development by creating an account on GitHub.

github.com

Abstract

Bootstrap Your Own Latent (BYOL)은 서로 상호작용하고 학습하는 온라인과 대상 네트워크인 2개의 신경망에 의존합니다.
Augmentation된 이미지로부터, Online Network를 학습시키고, 다른 관점에서의 Network representation을 예측합니다.

동시에, slow-moving average online Network로 target Network를 업데이트합니다.

Introduction

ComputerVision에서는 Representation을 학습하기 위한 방법들을 제안합니다.

Network의 출력을 반복적으로 Bootstrap하여, enhanced representation을 위한 대성으로 활용합니다.
BYOL은 contrastive Method보다 더욱 강건합니다. negative Pairs를 사용하지 않는 것이, 견고성이 된 이유 중 하나라고 본다.

모든 이미지에서 동일한 Vector를 출력하는 것과 같은 Collapsed Solution을 허용하지만, BYOL는 그러한 솔루션에 수렴하지 않는다는 것을 보여준다.
BYOL의 단점으로는 Image Augmentation을 무작위로 사용하는 경우, SimCLR보다 훨씬 적은 성능 저하를 격는다.

Related work

대부분의 UnsupervisedLearning 기법은 Generative 또는 Disciriminative를 사용합니다.

Generative는 Image Representation으로써 data와 latent Embedding분포를 사용하고, 학습된 Embedding을 사용합니다.
Generative는 Pixel 차원내에서 활용됩니다.
하지만 이러한 방법은 Computation 비용이 높을 뿐만 아니라, high level의 Image Representation을 학습하지 못할 수 있습니다.

Discriminative방법론 중 Contrastive는 Computation 비용을 최소화하고, Postivie Pairs를 사용하는 경우 유사하게 학습하거나, Negative Pairs를 활용하는 경우에는 분포를 떨어뜨리게 학습합니다.

본 눈문의 아이디어는 Predictions of BootStrapped Latents에서 착안하였습니다. Predictions of BootStrapped Latents(PBL)은
과거의 representation과 미래에 관찰된 Encoding을 공동으로 훈련합니다. 관찰된 Encoding은 agent의 representation을 학습하기위한 목표로 사용됩니다.

Method

많은 대부분의 self-supervised Learning기법들은 cross-view prediction framework를 사용합니다. 특히, 이러한 접근법은 같은 이미지를 다른 관점으로 예측함으로써 Representation을 학습합니다.

이러한 방식의 대표적인 학습론인 Contrastive Learning은 모든 View에서 같은 Representation을 도출하는 Collapse Representation 현상이 발생할 수 있습니다.

이러한 방법을 타개하기 위해 Contrastive Learning은 Augment 클래스를 예측하는 문제에서 Positive Pair와 Negative piar를 구분하는 Discrimnation문제로 변환합니다. 하지만 이러한 방식은 많은 Negative example이 필요할 뿐만 아니라, 큰 Batch 혹은 Semi-Hard Negative를 추출하는 과정을 거칩니다.

본 연구에서는 High performance를 보존하면서, collapse를 방지하기 위한 방법을 연구합니다.
Collapse를 방지하기 위한 가장 쉬운 방법은 예측에 대한 target을 생성하기 위해 fixed randomly initialized Network를 사용하는 것이다. 이러한 방법이 성능이 좋다는 것은, 초기의 고정된 Representation보다 훨씬 좋다는 것에 주목할 수 있습니다.

BYOL은 반복적으로 표현을 세분화함으로써, Bootstrapping 절차를 일반화하지만, Online Network를 느리게 움직이게하는 방법을 사용합니다.

Description of BYOL

BYOL은 Representation을 학습하는 것을 목표로 합니다. BYOL은 two Neural Network(Online, Target)을 학습합니다. Online Network는 Encoder, Projecter, Predicter로 구성됩니다. Target Network는 Online Network와 동일한 Network를 사용하지만, Weight이 다릅니다.

Target Network는 Online Network를 학습하기 위한 regression targets을 제공합니다.

먼저 이미지에 두 가지 Transform이 적용됩니다. 이미지의 t, t'은 Image Augmentation Set에서 추출합니다.

이후, Online Network는 z를 추출하고, target Network는 z'를 추출합니다. Online Network는 target Projection을 예측하기 위해,
prediction을 사용합니다.

Intuitions on BYOL Behavior

Coallpse를 막기 위해 BYOL을 사용함으로써, BYOL은 각각의 매개변수의 loss를 최소화하는 방향으로 수렴해야 한다. 하지만,

Implementations Details

Image Augmentations

- SimCLR과 동일한 Augmentation을 사용합니다. 224x224로 resize하는 방식, Horizontal Flip, Color Distortion등등의 방 식을 사용합니다. 마지막으로 Gaussian Blur와 Solarization을 Patch에 적용합니다.

Experiments

Conclusion

본 논문에서는 Self-Supervised Learning기법인 BYOL을 설명합니다. BYOL은 Negative pairs없이, Output의 이전의 Version을 예측하여 표현을 학습합니다. BYOL는 특정한 Augmentation에 기반합니다. 그러므로, Augmentation을 자동화하는 것은 BYOL를 다른 양식으로 일반화하는 것이 중요한 단계가 될것이다.

참고

https://arxiv.org/abs/2006.07733

Bootstrap your own latent: A new approach to self-supervised Learning

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view o

arxiv.org

저작자표시 비영리

'Data Science > Self Supervised Learning' 카테고리의 다른 글

[Self Supervised Learning] Contrastive Learning이란? (0)	2023.01.19
[논문 리뷰] Barlow Twins : Self-Supervised Learning via Redundancy Reduction (0)	2023.01.16
[ 논문 리뷰 ] Revisiting Self-Supervised Visual Representation Learning (0)	2023.01.06
[ 논문 구현 ] SimCLR DataLoader, info_loss 구현 (0)	2023.01.05
[ 논문 리뷰 ] A Simple Framework for Contrastive Learning of Visual Representations (0)	2023.01.05