본문 바로가기

부스트캠프 AI Tech/[Week7] Computer Vision

[Week7] 3D Understanding [Day5]

1. Seeing the world in 3D perspective

 

1.1 Why is 3D important?

  • AI agents operate in the real world, which is a 3D space
  • 3D applications - AR/VR
  • 3D applications - 3D printing
  • 3D applications - Medical applications

1.2 The way we observe 3D

  • An image is a projection of the 3D world onto a 2D space
  • Triangulation - The way to obtain a 3D point from 2D images
    • 두 같은 지점과 카메라의 위치관계를 알고 있으면 3D 형상화가 가능

 

1.3 3D data representation

  • 3D data representation is not unique


1.4 3D datasets

  • ShapeNet
    • Large scale synthetic objects (51,300 3D models with 55 categories)
  • PartNet (ShapeNetPart2019)
    • Fine-grained dataset, useful for segmentation(573,585 part instances in 26,671 3D models)


  • SceneNet
    • 5 million RGB-Depth synthetic indoor images


  • ScanNet
    • RGB-Depth dataset with 2.5 million views obtained from more than 1500 scans


  • Outdoor 3D scene datasets (typically for autonomous vehicle applications)

 

2. 3D tasks

 

 

2.1 3D recognition

  • Various tasks for 3D data



  • 3D object recognition
    • Recognizing a 3D object like the object recognition in 2D image



2.2 3D object detection

  • Detecting 3D object locations in image or 3D spaces
  • Useful for autonomous driving applications



2.3 3D semantic segmentation

  • Semantic segmentation of 3D data, such as neuroimaging


2.4 Conditional 3D generation

  • Mesh R-CNN
    • Input : a 2D image , output: 3D meshes of detected objects
    • Can be implemented by modification from Mask R-CNN



  • Recap : Branches in Mask R-CNN
    • Mask R-CNN segments objects by predicting ”box”, “classes”, and “mask”
    • Branches infer each output from a shared feature corresponding to each RoI



  • Mask R-CNN vs. Mesh R-CNN
    • Mesh R-CNN : "3D branch": is added to Mask R-CNN
    • The 3D branch outputs a 3D mesh of an object
  • More complex 3D reconstruction models
    • Decomposing 3D object reconstruction into multiple sub-problems
    • Sub-problems: physically meaningful disentanglement (Surface normal, depth, silhouette, …)