Files

Claudio Fritsche ad227f2d69 vault backup: 2025-02-03 08:29:48

2025-02-03 08:29:48 +01:00

7.9 KiB

Raw Blame History

title, created_date, updated_date, aliases, tags

title	created_date	updated_date	aliases	tags
Computer Vision	2024-10-22	2024-10-22

Computer Vision

3d reconstruction
camera calibration
photogrammetry
image segmentation
facial recognition and eigenfaces
image stitching
feature recognition
connection to LLMs and Multi Modal Models
Convolutional Neural Networks
Deep Learning
Signal Processing
Vision transformer (VT)
Tactile feedback sensors through CV
Structured-light 3D scanners
thermal cameras
radar imaging
lidar scanners
MRI
Sonar

Introduction

Computer Vision acquires, processes, analyzes and understands digital images
CV works with high dimensional data and extracts useful information from it: It transforms visual information into descriptions of the world, that make sense and can lead to appropriate decision making and action.
Many subdomains are known
- Object detection and recognition
- Event detection
- 3D pose estimation
- motion estimation
- image restoration
Definition:

Computer vision is a field of AI that enables computers to interpret, understand and analyze visual data from images or videos, simulating human vision. It involves tasks like object detection, image classification, and facial recognition, with applications in areas like autonomous vehicles and medical imaging.

Distinctions

Image Processing focuses on 2D images and how to transform an image into another image. Therefore, the input and output of image processing is an image. Thus, Image processing does not interpret nor requires assumptions about the image content
Machine Vision focuses on image based automation of inspection, process control, robot guidance in industrial applications. Often, image sensor technologies and control theory are closely intertwined with machine vision. Often there is interaction with the world, e.g. the lighting can be altered, etc.
Imaging focuses primarily on producing images and sometimes also interpreting them. E.g. medical imaging focuses on producing medical images and detecting diseases through them.

Foundational Techniques

Edge detection
line labelling
non-polyhedral and polyhedral modelling
optical flow
motion estimation
Divide and Conquer strategies: run CV algorithms on interesting sub ROI instead of the entire image.

Applications and Tasks

Automate inspection
Identification tasks: e.g. species id
Controlling processes: e.g. robot
Detecting events: surveillance, counting, etc
Monitoring: health, disease, state of object, color graduation, etc.
modeling objects
navigation
organisation of information: indexing existing photos
tracking of objects, surfaces, edges
tactile feedback sensor: put a silicone dome with known elastic properties over a camera. On the inside are markers. When the silicone done touches something the markers move and thus a model can calculate forces and interaction with the object.

Recognition

Object recognition: predefined objects that can be identified but not differentiated
Identification: specific objects are detected and individually tracked: two different people can be differentiated.
Detection: Object detection together with location: Obstacle Detection for robots.

Convolutional Neural Networks s are currently the state of the art algorithms for object detection in images. They are nearly as good as humans (only very thin objects don't work well), and even better as humans in subcategories (such as breeds of dogs or species of birds).

Specialized Tasks based on recognition

Content-based image retrieval: give me all images with multiple dogs in them
Pose estimation: estimate the pose of an object relative to the camera: e.g. robot arm, human pose, obstacle, etc.
Optical Character Recognition: identify characters in images. Is used by many phones and even obsidian nowadays. QR-codes represent a similar task
Facial Recognition: matching of faces
Emotion recognition
Shape Recognition Technology (SRT)
(Human) Activity Recognition

Motion Analysis

Using image sequences to produce an estimate of the velocity of an object, allows to track objects (or the camera itself).

Egomotion: tracking the rigid 3D-motion of the camera
Tracking: follow the movements of objects in the frames (humans, cars, obstacles)
Optical Flow: determine how each point is moving relative to the image plane: combines the movement of the goal point as well as the camera movement. Can be used to do state estimation of a Drone for example.

Others

Scene reconstruction: the goal is to compute a 3D-Model of a scene from images.
Image restoration:

Courses

Udacity

The course about computer vision. 2 Week free trial.

Image Representation and Classification: numeric representation of images, color masking, binary classification
Convolutional Filters and Edge Detection: frequency in images, image filters for detecting edges and shapes in images, use opencv for face detection
Types of Features & Image Segmentation: corner detector, k-means clustering for segmenting an image into unique parts
feature vectors: describe objects and images using feature vectors
CNN layers and feature visualization: define and train your own CNN for clothing recognition, use feature visualization techniques to see what a network has learned
Project: Facial Keypoint detection: create CNN for facial keypoint (eyes, mouth, nose, etc.) detection
Cloud Computing with AWS: train networks on amazon's GPUs
Advanced CNN architectures: region based CNNs, Faster R-CNN --> fast localized object recognition in images
YOLO: multi object detection model
RNN's: incorporate memory into deep learning model using recurrent neural networks. How do they learn from and generate ordered sequences of data
Long Short-Term Memory Networks (LSTMs): dive into architecture and benefits of preserving long term memory
Hyperparameters: what hyperparameters are used in deep learning?
Attention Mechanisms: Attention models: how do they work?
Image Captioning: combine CNN and RNN to build automatic image captioning model
Project: Image Captioning Model: predict captions for a given image: implement an effective RNN decoder for a CNN encoder
Motion: mathematical representation of motion, introduction of optical flow
Robot Localization: Bayesian filter, uncertainty in robot motion
Mini-Project: 2D Histogram filter: sense and move functions a 2D histogram filter
Kalman Filters: intuition behind kalman filter, vehicle tracking algorithm, one-dimensional tracker implementation
State and Motion: represent state of a car in vector, that can be modified using Linear Algebra
Matrices and Transformation of State: LinAlg: learn matrix operations for multidimensional Kalman Filters
SLAM: SLAM implementation autonomous vehicle and create map of landmarks
Vehicle Motion and Calculus
Project: Landmark Detection & Tracking: implement SLAM using probability, motion models and linalg
Apply Deep Learning Models: Style transfer using pre-trained models that others have provided on github
Feedforward and Backpropagation: introduction to neural networks feedforward pass and backpropagation
Training Neural Networks: techniques to improve training
Deep Learning with Pytorch: build deep learning models with pytorch
Deep learning for Cancer detection: CNN detects skin cancer
Sentiment Analysis: CNN for sentiment analysis
Fully-convolutional neural networks: classify every pixel in an image
C++ programming: getting started
C++: vectors
C++: local compilation
C++: OOP
Python and C++ Speed
C++ Intro into Optimization
C++ Optimization Practice
Project: Optimize Histogram Filter

7.9 KiB Raw Blame History