[Week6] Object Detection [Day5]

*Object Detection

1.1 Fundamental image recognition tasks

Semantic < Instance < Panoptic
Semantic에서 동일한 클래스에서도 각각의 개체를 나눔
위 task를 수행하기 위해서는 object detection이 필요

1.2 What is object detection?

classification
bounding box

1.3 What are the applications of object detection?

Autonomous driving
Optical Character Recognition(OCR)

2. Two-stage detector

2.0 Traditional methods - hand - crafted techniques

Gradient-based detector (e.g., HOG)
Selective search
- Over-segmentation
- Iteratively merging similar regions
- Extracting candidate boxes from all remaining segmentations

2.1 R-CNN

region proposal (~2k)
warped region (reshape)
classifier : SVM
region proposal에서 나온 region을 모두 CNN에 넣기 때문에 굉장히 느림

2.2 Fast R-CNN

Recycle a pre-computed feature for multiple object detection
Conv. feature map from the original image
ROI feature extraction from the feature map through ROI pooling
Class and box prediction for each ROI

2.3 Faster R-CNN

End-to-End object detection by neural region proposal
IoU
Anchor boxes
Region Proposal Network (RPN)
- image에서 한개의 feature maps을 뽑아 놓고, RPN에 넣어 region proposal을 함
Non-Maximum Suppression (NMS)
- Step 1: Select the box with the highest objectiveness score
- Step 2: Compare IoU of this box with other boxes
- Step 3: Remove the bounding boxes with IoU 50%
- Step 4: Move to the next highest objectiveness score
- Step 5: Repeat steps 2-4
Summary of the R-CNNN family

3. Single-stage detector

3.0 Comparison with two-stage detectors

One-stage vs. two-stage
No explicit ROI pooling

3.1 You only loock once (YOLO)

YOLO Architecture
- 마지막 layer의 30 dimensions (length : 5B + C , B=2 C=20)
- SxS grid (S=7) -> CNN 마지막 layer의 resolution
Performance

3.2 Single Shot MultiBox Detector (SSD)

YOLO에서는 속도는 빠르지만 Localization 정확도가 떨어지는 단점이 있음
따라서 SSD에서는 multi scale object를 더 잘 처리하기 위한 방법을 제안
SSD Architecture
Performance
- input size는 다르지만 mAP, FPS 성능이 좋아짐

4. Two-stage detector vs. one-stage detector

4.1 Focal loss

Single-stage detector는 ROI pooling이 없다보니 모든 영역에서의 loss가 발생하고 일정 gradient가 발생함.

일반적으로 background의 영역이 많고, 상대적으로 positive 영역은 적기 때문에 많은 필요없는 negative sample에 대

한 정보가 많아지면서 class imbalance 문제가 발생.

class imbalance problem
- Focal loss는 앞에 확률텀을 붙여줌
- 잘 맞춘 애들은 loss를 낮게 만들고, 잘 맞추지 못한 애들은 loss를 많이 준다

4.2 RetinaNet

RetinaNet is a one-stage network
class subnet , box subnet
Performance

5. Detection with Transformer

*Transformer

Transformer has shown a great success in NLP
Why not extending Transformer to computer vision tasks!
- ViT (Vision Transformer) by Google
- DeiT (Data-efficient image Transformer) by Facebook
- DETR (DEtection TRansformer) by Facebook

*DETR

CNN의 feature와 각 위치의 multi dimension으로 표현한 encoding을 쌍으로 입력 토큰을 만들어 줌
transformer의 input으로 넣어줌
encoding된 특징들을 decoder에 넣어줌 (decoder에게 질의함)
decoder의 output을 통해 prediction(class, bbox)

*Further reading

Object Detection의 또 다른 트렌드
- Bounding box can be represented by other ways (left-top, right-bottom, centroid & size)
- Idea: Let’s detect objects using corresponding points!
- CornerNet/CenterNet will be covered in Lecture 7

저작자표시 (새창열림)

'부스트캠프 AI Tech > [Week6] Computer Vision' 카테고리의 다른 글

[Week6] Semantic segmentation [Day4] (0)	2021.09.09
[Week6] CV - Image Classification Ⅱ [Day3] (0)	2021.09.08
[Week6] CV - Image Classification Ⅰ[Day1] (0)	2021.09.06

백chef

[Week6] Object Detection [Day5]

*Object Detection

'부스트캠프 AI Tech > [Week6] Computer Vision' 카테고리의 다른 글

티스토리툴바

[Week6] Object Detection [Day5]

*Object Detection

'부스트캠프 AI Tech > [Week6] Computer Vision' 카테고리의 다른 글

'부스트캠프 AI Tech/[Week6] Computer Vision' Related Articles

티스토리툴바