본문 바로가기

부스트캠프 AI Tech/[Week2] Deep Learning basic

[Week2] DL Basic - CNN [Day3]

*Convolution

  • Contiunuous convolution


  • Discrete convolution


  • 2D image convolution
  • 2D convolution in action
  • RGB Image Convolution

 

 

 

*Convolutional Neural Networks

  • Convolution and pooling layers : feature extraction
  • Fully connected layer : decision making (e.g., classification) 
    • 최소화 시키는 추세 (parameter dependency : 파라미터가 많으면 학습이 잘 안되고, Generalization performance 감소)

 

 

 

*Stride

 

 

 

 

 

*Padding

 

 

 

*Convolution Arithmetic

  • Padding (1),  Stride (1),  3 x 3 Kernel
  • What is the number of parameters of this model?
    • The answer is 3 x 3 x 128 x 64 = 73,728
  • Excercise
    number of parameters : 2048 * 2 x 1000 = 약 4M
    • parameter수를 줄이기 위해서는 dense layer를 줄여야 함

 

 

 

 

* why 1x1 Convolution?

  • Dimension reduction
  • To reduce the number of parameters while increasing the depth
  • e.g., bottleneck architecture

 

 

 

 

 

*Modern CNN

 

*AlexNet

  • Key ideas
    • ReLu activation
      1. Preserves properties of linear models
      2. Easy to optimize with gradient descent
      3. Good generalization
      4. Overcome the vanishing gradient problem
    • GPU implementation (2 GPUs)
    • Local response normalization, Overlapping pooling
    • Data augmentation
    • Dropout

 

 

 

 

*VGGNet

  • Increasing depth with 3 x 3 convolution filters (with stride 1)
    • Receptive field : 커널이 커짐에 따라 고려되는 인풋의 크기
    • why 3 x 3 convolution?


  • 1 x 1 convolution for fully connected layers
  • Dropout (p=0.5)
  • VGG16, VGG19

 

 

 

*GoogLeNet

  • network-in-network (NiN) with inception blocks
  • Inception blocks

    • 1 x 1 Conv를 적절히 섞어주며 dimension reducntion을 통해 parameter를 줄임
      1 x 1 convolution enables about 30% reduce of the number of parameters


 

 

*ResNet

  • Deeper neural networks are hard to train
    • overfitting은 아니지만 layer가 깊어져도 학습률이 좋지 않음
  • Add an identity map (skip connection)
    skip connection

  • Bottleneck architecture
    • dimension reduction

 

*DenseNet

  • DenseNet uses concatenation instead of addition
    concatenation


    • Dense Block
      • concatenates the feature maps of all preceding layers
      • The number of channels increases geometrically
    • Transition Block
      • BatchNorm -> 1x1 Conv -> 2x2 AvgPooling
      • Dimension reduction

 

 

 

 

*Computer Vision Applications

 

*Semantic Segmentation

 

 

  • *Fully Convolutional Network
    • Convolutionalization

      • Left : 4 x 4 x 16 x 10 = 2,560
      • Right : 4 x 4 x 16 x 10 = 2,560
    • Transforming fully connected layers into convolution layers enables a classfication net to output a heat map
    • Deconvolution (conv transpose)
      • Convolutionalization을 수행하면 파라미터 수는 같지만 spatial dimension이 줄게 됨
      • 따라서 spatial dimension을 키우기 위한 Deconvolution을 함
    • Result

 

 

*Detection

  • R-CNN

 

  • SPPNet
    • R-CNN은 이미지 내의 bbox를 CNN을 다 통과 시켜야함
    • SPPNet은 CNN을 한번만 수행 


  • Fast R-CNN
    • Selective search를 통한 bbox 제안
    • CNN 한번 수행 (SPPNet과 같음)
    • For each region, get a fixed length feature from ROI pooling
    • Two outputs : class and bounding-box regressor 


  • Fatser R-CNN
    • 기존 Selective search단계를 Region Proposal Network로 바꿈
    • Region Proposal Network



    • 9 : Thress different region sizes (128, 256, 512) with three different ratios (1:1, 1:2, 2:1)
    • 4 : four bounding box regression parameters
    • 2 : box classification (wheter to use it or not)

 

 

  • YOLO
    • No explicit bounding box sampling (compared with Fatser R-CNN) -> speed up
    • Given an imgae, YOLO divides it into SxS grid
    • Each cell predicts B bounding boxes (B=5)
      • box refinement (x / y / w / h)
      • confidence (of objectness)
    • Each cell predicts C class probabilities
    • In total, it becomes a tensor with SxSx(B*5+C) size
      • SxS : Number of cells of the grid
      • B*5 : B bounding boxes with offsets(x,y,w,h) and confidence
      • C : Number of classes
    • Result