Files
Main/99 Work/0 OneSec/OneSecNotes/30 Engineering Skills/Computer Science/Computer Vision.md

7.9 KiB

title, created_date, updated_date, aliases, tags
title created_date updated_date aliases tags
Computer Vision 2024-10-22 2024-10-22

Computer Vision



Introduction

  • Computer Vision acquires, processes, analyzes and understands digital images
  • CV works with high dimensional data and extracts useful information from it: It transforms visual information into descriptions of the world, that make sense and can lead to appropriate decision making and action.
  • Many subdomains are known
    • Object detection and recognition
    • Event detection
    • 3D pose estimation
    • motion estimation
    • image restoration
  • Definition:

Computer vision is a field of AI that enables computers to interpret, understand and analyze visual data from images or videos, simulating human vision. It involves tasks like object detection, image classification, and facial recognition, with applications in areas like autonomous vehicles and medical imaging.

Distinctions

  • Image Processing focuses on 2D images and how to transform an image into another image. Therefore, the input and output of image processing is an image. Thus, Image processing does not interpret nor requires assumptions about the image content
  • Machine Vision focuses on image based automation of inspection, process control, robot guidance in industrial applications. Often, image sensor technologies and control theory are closely intertwined with machine vision. Often there is interaction with the world, e.g. the lighting can be altered, etc.
  • Imaging focuses primarily on producing images and sometimes also interpreting them. E.g. medical imaging focuses on producing medical images and detecting diseases through them.

Foundational Techniques

  • Edge detection
  • line labelling
  • non-polyhedral and polyhedral modelling
  • optical flow
  • motion estimation
  • Divide and Conquer strategies: run CV algorithms on interesting sub ROI instead of the entire image.

Applications and Tasks

  • Automate inspection
  • Identification tasks: e.g. species id
  • Controlling processes: e.g. robot
  • Detecting events: surveillance, counting, etc
  • Monitoring: health, disease, state of object, color graduation, etc.
  • modeling objects
  • navigation
  • organisation of information: indexing existing photos
  • tracking of objects, surfaces, edges
  • tactile feedback sensor: put a silicone dome with known elastic properties over a camera. On the inside are markers. When the silicone done touches something the markers move and thus a model can calculate forces and interaction with the object.

Recognition

  • Object recognition: predefined objects that can be identified but not differentiated
  • Identification: specific objects are detected and individually tracked: two different people can be differentiated.
  • Detection: Object detection together with location: Obstacle Detection for robots.

Convolutional Neural Networks s are currently the state of the art algorithms for object detection in images. They are nearly as good as humans (only very thin objects don't work well), and even better as humans in subcategories (such as breeds of dogs or species of birds).

Specialized Tasks based on recognition

  • Content-based image retrieval: give me all images with multiple dogs in them
  • Pose estimation: estimate the pose of an object relative to the camera: e.g. robot arm, human pose, obstacle, etc.
  • Optical Character Recognition: identify characters in images. Is used by many phones and even obsidian nowadays. QR-codes represent a similar task
  • Facial Recognition: matching of faces
  • Emotion recognition
  • Shape Recognition Technology (SRT)
  • (Human) Activity Recognition

Motion Analysis

Using image sequences to produce an estimate of the velocity of an object, allows to track objects (or the camera itself).

  • Egomotion: tracking the rigid 3D-motion of the camera
  • Tracking: follow the movements of objects in the frames (humans, cars, obstacles)
  • Optical Flow: determine how each point is moving relative to the image plane: combines the movement of the goal point as well as the camera movement. Can be used to do state estimation of a Drone for example.

Others

  • Scene reconstruction: the goal is to compute a 3D-Model of a scene from images.
  • Image restoration:

Courses

Udacity

The course about computer vision. 2 Week free trial.

  1. Image Representation and Classification: numeric representation of images, color masking, binary classification
  2. Convolutional Filters and Edge Detection: frequency in images, image filters for detecting edges and shapes in images, use opencv for face detection
  3. Types of Features & Image Segmentation: corner detector, k-means clustering for segmenting an image into unique parts
  4. feature vectors: describe objects and images using feature vectors
  5. CNN layers and feature visualization: define and train your own CNN for clothing recognition, use feature visualization techniques to see what a network has learned
  6. Project: Facial Keypoint detection: create CNN for facial keypoint (eyes, mouth, nose, etc.) detection
  7. Cloud Computing with AWS: train networks on amazon's GPUs
  8. Advanced CNN architectures: region based CNNs, Faster R-CNN --> fast localized object recognition in images
  9. YOLO: multi object detection model
  10. RNN's: incorporate memory into deep learning model using recurrent neural networks. How do they learn from and generate ordered sequences of data
  11. Long Short-Term Memory Networks (LSTMs): dive into architecture and benefits of preserving long term memory
  12. Hyperparameters: what hyperparameters are used in deep learning?
  13. Attention Mechanisms: Attention models: how do they work?
  14. Image Captioning: combine CNN and RNN to build automatic image captioning model
  15. Project: Image Captioning Model: predict captions for a given image: implement an effective RNN decoder for a CNN encoder
  16. Motion: mathematical representation of motion, introduction of optical flow
  17. Robot Localization: Bayesian filter, uncertainty in robot motion
  18. Mini-Project: 2D Histogram filter: sense and move functions a 2D histogram filter
  19. Kalman Filters: intuition behind kalman filter, vehicle tracking algorithm, one-dimensional tracker implementation
  20. State and Motion: represent state of a car in vector, that can be modified using Linear Algebra
  21. Matrices and Transformation of State: LinAlg: learn matrix operations for multidimensional Kalman Filters
  22. SLAM: SLAM implementation autonomous vehicle and create map of landmarks
  23. Vehicle Motion and Calculus
  24. Project: Landmark Detection & Tracking: implement SLAM using probability, motion models and linalg
  25. Apply Deep Learning Models: Style transfer using pre-trained models that others have provided on github
  26. Feedforward and Backpropagation: introduction to neural networks feedforward pass and backpropagation
  27. Training Neural Networks: techniques to improve training
  28. Deep Learning with Pytorch: build deep learning models with pytorch
  29. Deep learning for Cancer detection: CNN detects skin cancer
  30. Sentiment Analysis: CNN for sentiment analysis
  31. Fully-convolutional neural networks: classify every pixel in an image
  32. C++ programming: getting started
  33. C++: vectors
  34. C++: local compilation
  35. C++: OOP
  36. Python and C++ Speed
  37. C++ Intro into Optimization
  38. C++ Optimization Practice
  39. Project: Optimize Histogram Filter