vault backup: 2025-02-03 07:04:13

This commit is contained in:
2025-02-03 07:04:14 +01:00
parent e158386068
commit 7909836706
2019 changed files with 59 additions and 26816 deletions

View File

@@ -1,138 +0,0 @@
---
title: Computer Vision
created_date: 2024-10-22
updated_date: 2024-10-22
aliases:
tags:
---
# Computer Vision
---
- [ ] 3d reconstruction
- [ ] camera calibration
- [ ] photogrammetry
- [ ] image segmentation
- [ ] facial recognition and eigenfaces
- [ ] image stitching
- [ ] feature recognition
- [ ] connection to [[LLM]]s and [[Multi Modal Models]]
- [ ] [[Convolutional Neural Networks]]
- [ ] [[Deep Learning]]
- [ ] [[Signal Processing]]
- [ ] Vision transformer (VT)
- [ ] Tactile feedback sensors through CV
- [ ] Structured-light 3D scanners
- [ ] thermal cameras
- [ ] radar imaging
- [ ] lidar scanners
- [ ] MRI
- [ ] Sonar
- [ ]
---
## Introduction
- Computer Vision acquires, processes, analyzes and understands digital images
- CV works with high dimensional data and extracts useful information from it: It transforms visual information into descriptions of the world, that make sense and can lead to appropriate decision making and action.
- Many subdomains are known
- Object detection and recognition
- Event detection
- 3D pose estimation
- motion estimation
- image restoration
- Definition:
> Computer vision is a field of AI that enables computers to interpret, understand and analyze visual data from images or videos, simulating human vision. It involves tasks like object detection, image classification, and facial recognition, with applications in areas like autonomous vehicles and medical imaging.
### Distinctions
- [[Image Processing]] focuses on 2D images and how to transform an image into another image. Therefore, the input and output of image processing is an image. Thus, Image processing does not interpret nor requires assumptions about the image content
- [[Machine Vision]] focuses on image based automation of inspection, process control, robot guidance in industrial applications. Often, image sensor technologies and [[control theory]] are closely intertwined with machine vision. Often there is interaction with the world, e.g. the lighting can be altered, etc.
- [[Imaging]] focuses primarily on producing images and sometimes also interpreting them. E.g. [[medical imaging]] focuses on producing medical images and detecting diseases through them.
### Foundational Techniques
- Edge detection
- line labelling
- non-polyhedral and polyhedral modelling
- optical flow
- motion estimation
- [[Divide and Conquer]] strategies: run CV algorithms on interesting sub ROI instead of the entire image.
### Applications and Tasks
- Automate inspection
- Identification tasks: e.g. species id
- Controlling processes: e.g. robot
- Detecting events: surveillance, counting, etc
- Monitoring: health, disease, state of object, color graduation, etc.
- modeling objects
- navigation
- organisation of information: indexing existing photos
- tracking of objects, surfaces, edges
- tactile feedback sensor: put a silicone dome with known elastic properties over a camera. On the inside are markers. When the silicone done touches something the markers move and thus a model can calculate forces and interaction with the object.
---
## Recognition
- Object recognition: predefined objects that can be identified but not differentiated
- Identification: specific objects are detected and individually tracked: two different people can be differentiated.
- Detection: Object detection together with location: [[Obstacle Detection]] for robots.
[[Convolutional Neural Networks |CNN]]s are currently the state of the art algorithms for object detection in images. They are nearly as good as humans (only very thin objects don't work well), and even better as humans in subcategories (such as breeds of dogs or species of birds).
### Specialized Tasks based on recognition
- Content-based image retrieval: give me all images with multiple dogs in them
- Pose estimation: estimate the pose of an object relative to the camera: e.g. robot arm, human pose, obstacle, etc.
- [[Optical Character Recognition]]: identify characters in images. Is used by many phones and even obsidian nowadays. QR-codes represent a similar task
- [[Facial Recognition]]: matching of faces
- Emotion recognition
- Shape Recognition Technology (SRT)
- (Human) Activity Recognition
## Motion Analysis
Using image sequences to produce an estimate of the velocity of an object, allows to track objects (or the camera itself).
- Egomotion: tracking the rigid 3D-motion of the camera
- Tracking: follow the movements of objects in the frames (humans, cars, obstacles)
- [[Optical Flow]]: determine how each point is moving relative to the image plane: combines the movement of the goal point as well as the camera movement. Can be used to do state estimation of a [[Drone]] for example.
## Others
- Scene reconstruction: the goal is to compute a 3D-Model of a scene from images.
- Image restoration:
---
## Courses
### Udacity
The course about [computer vision](https://www.udacity.com/course/computer-vision-nanodegree--nd891). 2 Week free trial.
1. Image Representation and Classification: numeric representation of images, color masking, binary classification
2. Convolutional Filters and Edge Detection: frequency in images, image filters for detecting edges and shapes in images, use opencv for face detection
3. Types of Features & Image Segmentation: corner detector, k-means clustering for segmenting an image into unique parts
4. feature vectors: describe objects and images using feature vectors
5. CNN layers and feature visualization: define and train your own CNN for clothing recognition, use feature visualization techniques to see what a network has learned
6. Project: Facial Keypoint detection: create CNN for facial keypoint (eyes, mouth, nose, etc.) detection
7. Cloud Computing with AWS: train networks on amazon's GPUs
8. Advanced CNN architectures: region based CNNs, Faster R-CNN --> fast localized object recognition in images
9. YOLO: multi object detection model
10. RNN's: incorporate memory into deep learning model using recurrent neural networks. How do they learn from and generate ordered sequences of data
11. Long Short-Term Memory Networks (LSTMs): dive into architecture and benefits of preserving long term memory
12. Hyperparameters: what hyperparameters are used in deep learning?
13. Attention Mechanisms: Attention models: how do they work?
14. Image Captioning: combine CNN and RNN to build automatic image captioning model
15. Project: Image Captioning Model: predict captions for a given image: implement an effective RNN decoder for a CNN encoder
16. Motion: mathematical representation of motion, introduction of optical flow
17. Robot Localization: Bayesian filter, uncertainty in robot motion
18. Mini-Project: 2D Histogram filter: sense and move functions a 2D histogram filter
19. Kalman Filters: intuition behind kalman filter, vehicle tracking algorithm, one-dimensional tracker implementation
20. State and Motion: represent state of a car in vector, that can be modified using Linear Algebra
21. Matrices and Transformation of State: LinAlg: learn matrix operations for multidimensional Kalman Filters
22. SLAM: SLAM implementation autonomous vehicle and create map of landmarks
23. Vehicle Motion and Calculus
24. Project: Landmark Detection & Tracking: implement SLAM using probability, motion models and linalg
25. Apply Deep Learning Models: Style transfer using pre-trained models that others have provided on github
26. Feedforward and Backpropagation: introduction to neural networks feedforward pass and backpropagation
27. Training Neural Networks: techniques to improve training
28. Deep Learning with Pytorch: build deep learning models with pytorch
29. Deep learning for Cancer detection: CNN detects skin cancer
30. Sentiment Analysis: CNN for sentiment analysis
31. Fully-convolutional neural networks: classify every pixel in an image
32. C++ programming: getting started
33. C++: vectors
34. C++: local compilation
35. C++: OOP
36. Python and C++ Speed
37. C++ Intro into Optimization
38. C++ Optimization Practice
39. Project: Optimize Histogram Filter