[Week6] Semantic segmentation [Day4]

*What is semantic segmentation?

*Semantic segmentation architectures

Fully convolutional networks
- The first end-to-end architecture for semantic segmentation
- Take an image of an arbitrary size as input, and output a segmentation map of the corresponding size to the input
Fully connected vs Fully convolutional
- Fully connected layer: Output a fixed dimensional vector and discard spatial coordinates
- Fully convolutional layer: Output a classification map which has spatial coordinates
Interpreting fully connected layers as 1x1 convolutions
- A fully connected layer classifies a single feature vector
- A 1x1 convolution layer classifies every feature vector of the convolutional feature map
- Limitation : Predicted score map is in a very low-resolution
- why?
  - For having a large receptive field, several spatial pooling layers are deployed
- Solution : Enlarge the score map by upsampling!
Upsampling
- The size of the input image is reduced to a smaller feature map
- Upsample to the size of input image
  - Unpooling
  - Transposed convolution
  - Upsample and convolution
- Pooling layer를 없애거나 Stride를 크게 주면 output resolution은 커지지만 receptive field가 작아지면서 이미지의 전반적인 context를 담을수 없어 성능이 떨어짐
- 따라서 receptive field는 크게 유지한채 Upsampling layer를 추가 하여 resolution을 높게 가져가는 방법을 사용

*Transposed convolution

Transposed convolutions work by swapping the forward and backward passes of convolution
Checkerboard artifacts due to uneven overlaps
- overlap되는 문제가 존재

*Upsample convolution

Better approaches for upsampling
Avoid overlap issues in transposed convolution
Decompose into spatial upsampling and feature convolution
- {Nearest-neighbor (NN), Bilinear} interpolation followed by convolution

*Back to FCN

중간층의 map들을 upsampling하여 사용한다
Integrates activations from lower layers into prediction
Preseves higher spatial resolution
Captures lower-level semantics at the same time
각 upsampled prediction들을 score를 계산한다
Features of FCN
- Faster
  - The end-to-end architecture that does not depend on other hand-crafted components
- Accurate
  - Feature representation and classifiers are jointly optimized

2.2 Hypercolumns for object segmentation (비슷하지만 다른 방법론)

Fully convolutional networks
- CNN layers typically use the output of the last layer as feature representation
  - Too coarse spatially
Overall architecture
- Very similar to FCN
- Difference : Apply to each bounding box

2.3 U-Net

Built upon “fully convolutional networks”
- Share the same FCN property
Predict a dense map by concatenating feature maps from contracting path
- Similar to skip connections in FCN
Yield more precise segmentations
Overall architecture
- Contracting path
  - Repeatedly applying 3x3 convolutions
  - Doubling the number of feature channels
  - Being used to capture holistic context
- Expanding path
  - Repeatedly applying 2x2 convolutions
  - Halving the number of feature channels
  - Concatenating the corresponding feature maps from the contracting path
- Overall
  - Concatenation of feature maps provides localized information
    - localized된 중요한 정보들이 마지막 단으로 바로 skip connection 되면서 민감한 경계선을 잘 찾음
- What if the spatial size of the feature map is an odd number?
  - An even number is required for input and feature sizes
  - 일반적으로 downsampling은 버림을 하여 7x7 -> 3x3
  - 3x3을 upsampling을 하면 6x6이 되기 때문에 정보가 손실
  - 따라서 홀수 feature map이 나오지 않도록 주의해야함

2.4 DeepLab

•DeepLab v1 (2015) : Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ICLR 2015.

•DeepLab v2 (2017) : DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017.

•DeepLab v3 (2017) : Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017.

•DeepLab v3+ (2018) : Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ECCV 2018

Conditional Random Fields (CRFs)
- CRF post-processes a segmentation map to be refined to follow image boundaries
- 1st row: score map (before softmax) / 2nd row : belief map (after softmax)
Dilated convolution
- Atrous convolution
- Inflate the kernel by inserting spaces between the kernel element (Dilation factor)
- Enable exponential expansion of the receptive field

Depthwise separable convolution (proposed by Howard et al.)
- 연산량을 낮추고자 standard conv를 두 단계로 나눔

[Week6] Object Detection [Day5] (0)	2021.09.10
[Week6] CV - Image Classification Ⅱ [Day3] (0)	2021.09.08
[Week6] CV - Image Classification Ⅰ[Day1] (0)	2021.09.06

백chef