1. Seeing the world in 3D perspective
1.1 Why is 3D important?
- AI agents operate in the real world, which is a 3D space
- 3D applications - AR/VR
- 3D applications - 3D printing
- 3D applications - Medical applications
1.2 The way we observe 3D
- An image is a projection of the 3D world onto a 2D space
- Triangulation - The way to obtain a 3D point from 2D images
- 두 같은 지점과 카메라의 위치관계를 알고 있으면 3D 형상화가 가능
- 두 같은 지점과 카메라의 위치관계를 알고 있으면 3D 형상화가 가능
1.3 3D data representation
- 3D data representation is not unique
1.4 3D datasets
- ShapeNet
- Large scale synthetic objects (51,300 3D models with 55 categories)
- Large scale synthetic objects (51,300 3D models with 55 categories)
- PartNet (ShapeNetPart2019)
- Fine-grained dataset, useful for segmentation(573,585 part instances in 26,671 3D models)
- Fine-grained dataset, useful for segmentation(573,585 part instances in 26,671 3D models)
- SceneNet
- 5 million RGB-Depth synthetic indoor images
- 5 million RGB-Depth synthetic indoor images
- ScanNet
- RGB-Depth dataset with 2.5 million views obtained from more than 1500 scans
- RGB-Depth dataset with 2.5 million views obtained from more than 1500 scans
- Outdoor 3D scene datasets (typically for autonomous vehicle applications)
2. 3D tasks
2.1 3D recognition
- Various tasks for 3D data
- 3D object recognition
- Recognizing a 3D object like the object recognition in 2D image
- Recognizing a 3D object like the object recognition in 2D image
2.2 3D object detection
- Detecting 3D object locations in image or 3D spaces
- Useful for autonomous driving applications
2.3 3D semantic segmentation
- Semantic segmentation of 3D data, such as neuroimaging
2.4 Conditional 3D generation
- Mesh R-CNN
- Input : a 2D image , output: 3D meshes of detected objects
- Can be implemented by modification from Mask R-CNN
- Recap : Branches in Mask R-CNN
- Mask R-CNN segments objects by predicting ”box”, “classes”, and “mask”
- Branches infer each output from a shared feature corresponding to each RoI
- Mask R-CNN vs. Mesh R-CNN
- Mesh R-CNN : "3D branch": is added to Mask R-CNN
- The 3D branch outputs a 3D mesh of an object
- More complex 3D reconstruction models
- Decomposing 3D object reconstruction into multiple sub-problems
- Sub-problems: physically meaningful disentanglement (Surface normal, depth, silhouette, …)
'부스트캠프 AI Tech > [Week7] Computer Vision' 카테고리의 다른 글
[Week7] Multi-modal Learning [Day4] (0) | 2021.09.16 |
---|---|
[Week7] Conditional generative model [Day3] (0) | 2021.09.15 |
[Week7] Instance/Panoptic Segmentation and Landmark Localization [Day2] (0) | 2021.09.14 |