Illustration with collage of pictograms of face profile, leaf, cloud

Computer vision is a field of artificial intelligence (AI) that uses machine learning and neural networks to teach computers and systems to derive meaningful information from digital images, videos and other visual inputs—and to make recommendations or take actions when they see defects or issues.  

If AI enables computers to think, computer vision enables them to see, observe and understand. 

Computer vision works much the same as human vision, except humans have a head start. Human sight has the advantage of lifetimes of context to train how to tell objects apart, how far away they are, whether they are moving or something is wrong with an image.

Computer vision trains machines to perform these functions, but it must do it in much less time with cameras, data and algorithms rather than retinas, optic nerves and a visual cortex. Because a system trained to inspect products or watch a production asset can analyze thousands of products or processes a minute, noticing imperceptible defects or issues, it can quickly surpass human capabilities.

Computer vision is used in industries that range from energy and utilities to manufacturing and automotive—and the market is continuing to grow. It is expected to reach USD 48.6 billion by 2022. 1

With ESG disclosures starting as early as 2025 for some companies, make sure that you're prepared with our guide.

Register for the playbook on smarter asset management

Computer vision needs lots of data. It runs analyses of data over and over until it discerns distinctions and ultimately recognize images. For example, to train a computer to recognize automobile tires, it needs to be fed vast quantities of tire images and tire-related items to learn the differences and recognize a tire, especially one with no defects.

Two essential technologies are used to accomplish this: a type of machine learning called deep learning and a convolutional neural network (CNN).

Machine learning uses algorithmic models that enable a computer to teach itself about the context of visual data. If enough data is fed through the model, the computer will “look” at the data and teach itself to tell one image from another. Algorithms enable the machine to learn by itself, rather than someone programming it to recognize an image.

A CNN helps a machine learning or deep learning model “look” by breaking images down into pixels that are given tags or labels. It uses the labels to perform convolutions (a mathematical operation on two functions to produce a third function) and makes predictions about what it is “seeing.” The neural network runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true. It is then recognizing or seeing images in a way similar to humans.

Much like a human making out an image at a distance, a CNN first discerns hard edges and simple shapes, then fills in information as it runs iterations of its predictions. A CNN is used to understand single images. A recurrent neural network (RNN) is used in a similar way for video applications to help computers understand how pictures in a series of frames are related to one another.

Scientists and engineers have been trying to develop ways for machines to see and understand visual data for about 60 years. Experimentation began in 1959 when neurophysiologists showed a cat an array of images, attempting to correlate a response in its brain. They discovered that it responded first to hard edges or lines and scientifically, this meant that image processing starts with simple shapes like straight edges. 2

At about the same time, the first computer image scanning technology was developed, enabling computers to digitize and acquire images. Another milestone was reached in 1963 when computers were able to transform two-dimensional images into three-dimensional forms. In the 1960s, AI emerged as an academic field of study and it also marked the beginning of the AI quest to solve the human vision problem.

1974 saw the introduction of optical character recognition (OCR) technology, which could recognize text printed in any font or typeface. 3   Similarly, intelligent character recognition (ICR) could decipher hand-written text that is using neural networks. 4  Since then, OCR and ICR have found their way into document and invoice processing, vehicle plate recognition, mobile payments, machine conversion and other common applications.

In 1982, neuroscientist David Marr established that vision works hierarchically and introduced algorithms for machines to detect edges, corners, curves and similar basic shapes. Concurrently, computer scientist Kunihiko Fukushima developed a network of cells that could recognize patterns. The network, called the Neocognitron, included convolutional layers in a neural network.

By 2000, the focus of study was on object recognition; and by 2001, the first real-time face recognition applications appeared. Standardization of how visual data sets are tagged and annotated emerged through the 2000s. In 2010, the ImageNet data set became available. It contained millions of tagged images across a thousand object classes and provides a foundation for CNNs and deep learning models used today. In 2012, a team from the University of Toronto entered a CNN into an image recognition contest. The model, called AlexNet, significantly reduced the error rate for image recognition. After this breakthrough, error rates have fallen to just a few percent. 5

Access videos, papers, workshops and more.

There is a lot of research being done in the computer vision field, but it doesn't stop there. Real-world applications demonstrate how important computer vision is to endeavors in business, entertainment, transportation, healthcare and everyday life. A key driver for the growth of these applications is the flood of visual information flowing from smartphones, security systems, traffic cameras and other visually instrumented devices. This data could play a major role in operations across industries, but today goes unused. The information creates a test bed to train computer vision applications and a launchpad for them to become part of a range of human activities:

  • IBM used computer vision to create My Moments for the 2018 Masters golf tournament. IBM Watson® watched hundreds of hours of Masters footage and could identify the sights (and sounds) of significant shots. It curated these key moments and delivered them to fans as personalized highlight reels.
  • Google Translate lets users point a smartphone camera at a sign in another language and almost immediately obtain a translation of the sign in their preferred language. 6
  • The development of self-driving vehicles relies on computer vision to make sense of the visual input from a car’s cameras and other sensors. It’s essential to identify other cars, traffic signs, lane markers, pedestrians, bicycles and all of the other visual information encountered on the road.
  • IBM is applying computer vision technology with partners like Verizon to bring intelligent AI to the edge and to help automotive manufacturers identify quality defects before a vehicle leaves the factory.

Many organizations don’t have the resources to fund computer vision labs and create deep learning models and neural networks. They may also lack the computing power that is required to process huge sets of visual data. Companies such as IBM are helping by offering computer vision software development services. These services deliver pre-built learning models available from the cloud—and also ease demand on computing resources. Users connect to the services through an application programming interface (API) and use them to develop computer vision applications.

IBM has also introduced a computer vision platform that addresses both developmental and computing resource concerns. IBM Maximo® Visual Inspection includes tools that enable subject matter experts to label, train and deploy deep learning vision models—without coding or deep learning expertise. The vision models can be deployed in local data centers, the cloud and edge devices.

While it’s getting easier to obtain resources to develop computer vision applications, an important question to answer early on is: What exactly will these applications do? Understanding and defining specific computer vision tasks can focus and validate projects and applications and make it easier to get started.

Here are a few examples of established computer vision tasks:

  • Image classification sees an image and can classify it (a dog, an apple, a person’s face). More precisely, it is able to accurately predict that a given image belongs to a certain class. For example, a social media company might want to use it to automatically identify and segregate objectionable images uploaded by users.
  • Object detection can use image classification to identify a certain class of image and then detect and tabulate their appearance in an image or video. Examples include detecting damages on an assembly line or identifying machinery that requires maintenance.
  • Object tracking follows or tracks an object once it is detected. This task is often executed with images captured in sequence or real-time video feeds. Autonomous vehicles, for example, need to not only classify and detect objects such as pedestrians, other cars and road infrastructure, they need to track them in motion to avoid collisions and obey traffic laws. 7
  • Content-based image retrieval uses computer vision to browse, search and retrieve images from large data stores, based on the content of the images rather than metadata tags associated with them. This task can incorporate automatic image annotation that replaces manual image tagging. These tasks can be used for digital asset management systems and can increase the accuracy of search and retrieval.

Put the power of computer vision into the hands of your quality and inspection teams. IBM Maximo Visual Inspection makes computer vision with deep learning more accessible to business users with visual inspection tools that empower.

IBM Research is one of the world’s largest corporate research labs. Learn more about research being done across industries.

Learn about the evolution of visual inspection and how artificial intelligence is improving safety and quality.

Learn more about getting started with visual recognition and IBM Maximo Visual Inspection. Explore resources and courses for developers.

Read how Sund & Baelt used computer vision technology to streamline inspections and improve productivity.

Learn how computer vision technology can improve quality inspections in manufacturing.

Unleash the power of no-code computer vision for automated visual inspection with IBM Maximo Visual Inspection—an intuitive toolset for labelling, training, and deploying artificial intelligence vision models.

1. https://www.forbes.com/sites/bernardmarr/2019/04/08/7-amazing-examples-of-computer-and-machine-vision-in-practice/#3dbb3f751018  (link resides outside ibm.com)

2.   https://hackernoon.com/a-brief-history-of-computer-vision-and-convolutional-neural-networks-8fe8aacc79f3 (link resides outside ibm.com)

3. Optical character recognition, Wikipedia  (link resides outside ibm.com)

4. Intelligent character recognition, Wikipedia  (link resides outside ibm.com)

5. A Brief History of Computer Vision (and Convolutional Neural Networks), Rostyslav Demush, Hacker Noon, February 27, 2019  (link resides outside ibm.com)

6. 7 Amazing Examples of Computer And Machine Vision In Practice, Bernard Marr, Forbes, April 8, 2019  (link resides outside ibm.com)

7. The 5 Computer Vision Techniques That Will Change How You See The World, James Le, Heartbeat, April 12, 2018  (link resides outside ibm.com)

Subscribe to the PwC Newsletter

Join the community, computer vision, semantic segmentation.

essay on computer vision

Tumor Segmentation

essay on computer vision

Panoptic Segmentation

essay on computer vision

3D Semantic Segmentation

essay on computer vision

Weakly-Supervised Semantic Segmentation

Representation learning.

essay on computer vision

Disentanglement

Graph representation learning, sentence embeddings.

essay on computer vision

Network Embedding

Classification.

essay on computer vision

Text Classification

essay on computer vision

Graph Classification

essay on computer vision

Audio Classification

essay on computer vision

Medical Image Classification

Object detection.

essay on computer vision

3D Object Detection

essay on computer vision

Real-Time Object Detection

essay on computer vision

RGB Salient Object Detection

essay on computer vision

Few-Shot Object Detection

Image classification.

essay on computer vision

Out of Distribution (OOD) Detection

essay on computer vision

Few-Shot Image Classification

essay on computer vision

Fine-Grained Image Classification

essay on computer vision

Learning with noisy labels

Text retrieval, deep hashing, table retrieval, 2d object detection.

essay on computer vision

Edge Detection

essay on computer vision

Open Vocabulary Object Detection

Thermal image segmentation, reinforcement learning (rl), off-policy evaluation, multi-objective reinforcement learning, 3d point cloud reinforcement learning, domain adaptation.

essay on computer vision

Unsupervised Domain Adaptation

essay on computer vision

Domain Generalization

essay on computer vision

Test-time Adaptation

Source-free domain adaptation, image generation.

essay on computer vision

Image-to-Image Translation

essay on computer vision

Text-to-Image Generation

essay on computer vision

Image Inpainting

essay on computer vision

Conditional Image Generation

Data augmentation.

essay on computer vision

Image Augmentation

essay on computer vision

Text Augmentation

essay on computer vision

Image Denoising

essay on computer vision

Color Image Denoising

essay on computer vision

Sar Image Despeckling

Grayscale image denoising, autonomous vehicles.

essay on computer vision

Autonomous Driving

essay on computer vision

Self-Driving Cars

essay on computer vision

Simultaneous Localization and Mapping

essay on computer vision

Autonomous Navigation

Contrastive learning.

essay on computer vision

Meta-Learning

essay on computer vision

Few-Shot Learning

essay on computer vision

Sample Probing

Universal meta-learning, super-resolution.

essay on computer vision

Image Super-Resolution

essay on computer vision

Video Super-Resolution

essay on computer vision

Multi-Frame Super-Resolution

essay on computer vision

Reference-based Super-Resolution

Pose estimation.

essay on computer vision

3D Human Pose Estimation

essay on computer vision

Keypoint Detection

essay on computer vision

3D Pose Estimation

essay on computer vision

6D Pose Estimation

Self-supervised learning.

essay on computer vision

Point Cloud Pre-training

Unsupervised video clustering, 2d semantic segmentation, image segmentation, text style transfer.

essay on computer vision

Scene Parsing

essay on computer vision

Reflection Removal

Visual question answering (vqa).

essay on computer vision

Visual Question Answering

essay on computer vision

Machine Reading Comprehension

essay on computer vision

Chart Question Answering

Chart understanding.

essay on computer vision

Depth Estimation

essay on computer vision

3D Reconstruction

essay on computer vision

Neural Rendering

essay on computer vision

3D Face Reconstruction

Anomaly detection.

essay on computer vision

Unsupervised Anomaly Detection

essay on computer vision

One-Class Classification

Supervised anomaly detection, anomaly detection in surveillance videos, sentiment analysis.

essay on computer vision

Aspect-Based Sentiment Analysis (ABSA)

essay on computer vision

Multimodal Sentiment Analysis

essay on computer vision

Aspect Sentiment Triplet Extraction

essay on computer vision

Twitter Sentiment Analysis

essay on computer vision

Temporal Action Localization

essay on computer vision

Video Understanding

Video generation.

essay on computer vision

Video Object Segmentation

Video retrieval, 3d object super-resolution.

essay on computer vision

One-Shot Learning

essay on computer vision

Few-Shot Semantic Segmentation

Cross-domain few-shot.

essay on computer vision

Unsupervised Few-Shot Learning

Activity recognition.

essay on computer vision

Action Recognition

essay on computer vision

Human Activity Recognition

essay on computer vision

Group Activity Recognition

Egocentric activity recognition, medical image segmentation.

essay on computer vision

Lesion Segmentation

essay on computer vision

Brain Tumor Segmentation

essay on computer vision

Cell Segmentation

Skin lesion segmentation, exposure fairness, monocular depth estimation.

essay on computer vision

Stereo Depth Estimation

Depth and camera motion.

essay on computer vision

3D Depth Estimation

Facial recognition and modelling.

essay on computer vision

Face Recognition

essay on computer vision

Face Swapping

essay on computer vision

Face Detection

essay on computer vision

Facial Expression Recognition (FER)

essay on computer vision

Face Generation

Instance segmentation.

essay on computer vision

Referring Expression Segmentation

essay on computer vision

3D Instance Segmentation

essay on computer vision

Unsupervised Object Segmentation

essay on computer vision

Real-time Instance Segmentation

Optical character recognition (ocr).

essay on computer vision

Active Learning

essay on computer vision

Handwriting Recognition

Handwritten digit recognition, irregular text recognition, quantization, data free quantization, unet quantization, zero-shot learning.

essay on computer vision

Generalized Zero-Shot Learning

essay on computer vision

Compositional Zero-Shot Learning

Multi-label zero-shot learning, continual learning.

essay on computer vision

Class Incremental Learning

Continual named entity recognition, unsupervised class-incremental learning, object tracking.

essay on computer vision

Multi-Object Tracking

essay on computer vision

Visual Object Tracking

essay on computer vision

Multiple Object Tracking

essay on computer vision

Cell Tracking

essay on computer vision

Action Recognition In Videos

essay on computer vision

3D Action Recognition

Self-supervised action recognition, few shot action recognition, scene understanding.

essay on computer vision

Video Semantic Segmentation

Visual relationship detection, lighting estimation.

essay on computer vision

3D Room Layouts From A Single RGB Panorama

essay on computer vision

Scene Text Recognition

essay on computer vision

Scene Graph Generation

essay on computer vision

Scene Recognition

Adversarial attack.

essay on computer vision

Backdoor Attack

essay on computer vision

Adversarial Text

Adversarial attack detection, real-world adversarial attack, image retrieval.

essay on computer vision

Sketch-Based Image Retrieval

essay on computer vision

Content-Based Image Retrieval

essay on computer vision

Composed Image Retrieval (CoIR)

essay on computer vision

Medical Image Retrieval

Active object detection, image reconstruction.

essay on computer vision

MRI Reconstruction

Ct reconstruction.

essay on computer vision

Film Removal

Conformal prediction.

essay on computer vision

Text Simplification

essay on computer vision

Self-Supervised Image Classification

essay on computer vision

Music Source Separation

Emotion recognition.

essay on computer vision

Speech Emotion Recognition

essay on computer vision

Emotion Recognition in Conversation

essay on computer vision

Multimodal Emotion Recognition

Emotion-cause pair extraction, dimensionality reduction.

essay on computer vision

Supervised dimensionality reduction

Online nonnegative cp decomposition.

essay on computer vision

Monocular 3D Object Detection

Robust 3d object detection.

essay on computer vision

3D Object Detection From Stereo Images

essay on computer vision

Multiview Detection

Style transfer.

essay on computer vision

Image Stylization

essay on computer vision

Font Style Transfer

Style generalization, face transfer, optical flow estimation.

essay on computer vision

Video Stabilization

Action localization.

essay on computer vision

Action Segmentation

Spatio-temporal action localization, image captioning.

essay on computer vision

3D dense captioning

Controllable image captioning, aesthetic image captioning.

essay on computer vision

Relational Captioning

Person re-identification.

essay on computer vision

Unsupervised Person Re-Identification

Video-based person re-identification, occluded person re-identification, generalizable person re-identification, image restoration.

essay on computer vision

Demosaicking

Spectral reconstruction, underwater image restoration, flare removal, action detection.

essay on computer vision

Skeleton Based Action Recognition

essay on computer vision

Online Action Detection

Audio-visual active speaker detection, metric learning.

essay on computer vision

Object Recognition

essay on computer vision

3D Object Recognition

Continuous object recognition.

essay on computer vision

Depiction Invariant Object Recognition

10-shot image generation.

essay on computer vision

Motion Synthesis

essay on computer vision

Community Question Answering

essay on computer vision

Talking Head Generation

Gan image forensics, image enhancement.

essay on computer vision

Low-Light Image Enhancement

Image relighting, de-aliasing, multi-label classification.

essay on computer vision

Missing Labels

Extreme multi-label classification, hierarchical multi-label classification, medical code prediction.

essay on computer vision

Monocular 3D Human Pose Estimation

Pose prediction.

essay on computer vision

3D Multi-Person Pose Estimation

3d human pose and shape estimation, continuous control.

essay on computer vision

Steering Control

Drone controller.

essay on computer vision

Semi-Supervised Video Object Segmentation

essay on computer vision

Unsupervised Video Object Segmentation

essay on computer vision

Referring Video Object Segmentation

essay on computer vision

Video Salient Object Detection

3d face modelling.

essay on computer vision

Trajectory Prediction

essay on computer vision

Trajectory Forecasting

Human motion prediction, out-of-sight trajectory prediction, novel view synthesis.

essay on computer vision

Novel LiDAR View Synthesis

essay on computer vision

Gournd video synthesis from satellite image

essay on computer vision

Multivariate Time Series Imputation

Image quality assessment, no-reference image quality assessment, blind image quality assessment.

essay on computer vision

Aesthetics Quality Assessment

Stereoscopic image quality assessment, instruction following, visual instruction following, object localization.

essay on computer vision

Weakly-Supervised Object Localization

Image-based localization, unsupervised object localization, active object localization.

essay on computer vision

Blind Image Deblurring

Single-image blind deblurring, out-of-distribution detection, camera shot segmentation.

essay on computer vision

Facial Inpainting

Cloud removal.

essay on computer vision

Fine-Grained Image Inpainting

Prompt engineering.

essay on computer vision

Visual Prompting

Change detection.

essay on computer vision

Semi-supervised Change Detection

Image compression.

essay on computer vision

Feature Compression

Jpeg compression artifact reduction.

essay on computer vision

Lossy-Compression Artifact Reduction

Color image compression artifact reduction, explainable artificial intelligence, explainable models, explanation fidelity evaluation, fad curve analysis, saliency detection.

essay on computer vision

Saliency Prediction

essay on computer vision

Co-Salient Object Detection

Video saliency detection, unsupervised saliency detection, image registration.

essay on computer vision

Unsupervised Image Registration

Visual reasoning.

essay on computer vision

Visual Commonsense Reasoning

Ensemble learning, salient object detection, saliency ranking, rgb-t salient object detection, visual grounding.

essay on computer vision

3D visual grounding

Person-centric visual grounding.

essay on computer vision

Phrase Extraction and Grounding (PEG)

Visual tracking.

essay on computer vision

Point Tracking

Rgb-t tracking, real-time visual tracking.

essay on computer vision

RF-based Visual Tracking

Video question answering.

essay on computer vision

Zero-Shot Video Question Answer

Few-shot video question answering, whole slide images, image manipulation detection.

essay on computer vision

Generalized Zero Shot skeletal action recognition

Zero shot skeletal action recognition, 3d point cloud classification.

essay on computer vision

3D Object Classification

essay on computer vision

Few-Shot 3D Point Cloud Classification

Supervised only 3d point cloud classification, zero-shot transfer 3d point cloud classification, 2d classification.

essay on computer vision

Neural Network Compression

Cell detection.

essay on computer vision

Plant Phenotyping

Open-set classification, video captioning.

essay on computer vision

Dense Video Captioning

Boundary captioning.

essay on computer vision

Live Video Captioning

Visual text correction, motion estimation, gesture recognition.

essay on computer vision

Hand Gesture Recognition

essay on computer vision

Hand-Gesture Recognition

essay on computer vision

RF-based Gesture Recognition

Activity prediction, motion prediction, cyber attack detection, sequential skip prediction, point cloud registration.

essay on computer vision

Image to Point Cloud Registration

Text detection.

essay on computer vision

Robust 3D Semantic Segmentation

essay on computer vision

Real-Time 3D Semantic Segmentation

essay on computer vision

Unsupervised 3D Semantic Segmentation

Furniture segmentation, 3d point cloud interpolation, medical diagnosis.

essay on computer vision

Alzheimer's Disease Detection

essay on computer vision

Retinal OCT Disease Classification

Blood cell count, thoracic disease classification, rain removal.

essay on computer vision

Single Image Deraining

Visual odometry.

essay on computer vision

Face Anti-Spoofing

Monocular visual odometry.

essay on computer vision

Hand Pose Estimation

essay on computer vision

Hand Segmentation

Gesture-to-gesture translation.

essay on computer vision

Image Dehazing

essay on computer vision

Single Image Dehazing

Video quality assessment, video alignment, temporal sentence grounding, long-video activity recognition, deepfake detection.

essay on computer vision

Synthetic Speech Detection

Human detection of deepfakes, multimodal forgery detection, robot navigation.

essay on computer vision

Social Navigation

Pointgoal navigation.

essay on computer vision

Sequential Place Learning

Image clustering.

essay on computer vision

Online Clustering

essay on computer vision

Face Clustering

Multi-view subspace clustering, multi-modal subspace clustering, image manipulation, colorization.

essay on computer vision

Line Art Colorization

essay on computer vision

Point-interactive Image Colorization

essay on computer vision

Color Mismatch Correction

Visual localization.

essay on computer vision

Visual Place Recognition

essay on computer vision

Indoor Localization

3d place recognition, stereo disparity estimation, stereo matching, image editing, rolling shutter correction, shadow removal, multimodel-guided image editing, joint deblur and frame interpolation, multimodal fashion image editing.

essay on computer vision

Unsupervised Image-To-Image Translation

essay on computer vision

Synthetic-to-Real Translation

essay on computer vision

Multimodal Unsupervised Image-To-Image Translation

essay on computer vision

Cross-View Image-to-Image Translation

essay on computer vision

Fundus to Angiography Generation

Human-object interaction detection.

essay on computer vision

Affordance Recognition

Hand-object interaction detection, earth observation, image deblurring, low-light image deblurring and enhancement, object reconstruction.

essay on computer vision

3D Object Reconstruction

essay on computer vision

Crowd Counting

essay on computer vision

Visual Crowd Analysis

Group detection in crowds, image matching.

essay on computer vision

Semantic correspondence

Patch matching, set matching.

essay on computer vision

Matching Disparate Images

Point cloud classification, jet tagging, few-shot point cloud classification, hyperspectral.

essay on computer vision

Hyperspectral Image Classification

Hyperspectral unmixing, hyperspectral image segmentation, classification of hyperspectral images, point cloud generation, point cloud completion, referring expression, 3d point cloud reconstruction, scene classification.

essay on computer vision

Document Text Classification

Multi-label classification of biomedical texts, political salient issue orientation detection.

essay on computer vision

Weakly-supervised Temporal Action Localization

Weakly supervised action localization.

essay on computer vision

Temporal Action Proposal Generation

Activity recognition in videos, 2d human pose estimation, action anticipation.

essay on computer vision

3D Face Animation

Semi-supervised human pose estimation, reconstruction, 3d human reconstruction.

essay on computer vision

Single-View 3D Reconstruction

4d reconstruction, single-image-based hdr reconstruction, visual navigation, objectgoal navigation, keyword spotting.

essay on computer vision

Small-Footprint Keyword Spotting

Visual keyword spotting, compressive sensing, boundary detection.

essay on computer vision

Junction Detection

essay on computer vision

Motion Style Transfer

Temporal human motion composition, cross-modal retrieval, image-text matching, cross-modal retrieval with noisy correspondence, multilingual cross-modal retrieval.

essay on computer vision

Zero-shot Composed Person Retrieval

Cross-modal retrieval on rsitmd, scene text detection.

essay on computer vision

Curved Text Detection

Multi-oriented scene text detection, document ai, document understanding, camera calibration, video-text retrieval, video grounding, video-adverb retrieval, replay grounding, composed video retrieval (covr), image matting.

essay on computer vision

Semantic Image Matting

Point cloud segmentation, sensor fusion, video summarization.

essay on computer vision

Unsupervised Video Summarization

Supervised video summarization, 3d anomaly detection, video anomaly detection, artifact detection, superpixels.

essay on computer vision

Emotion Classification

essay on computer vision

Few-Shot Transfer Learning for Saliency Prediction

essay on computer vision

Aerial Video Saliency Prediction

Remote sensing.

essay on computer vision

Remote Sensing Image Classification

Change detection for remote sensing images, building change detection for remote sensing images.

essay on computer vision

Segmentation Of Remote Sensing Imagery

essay on computer vision

The Semantic Segmentation Of Remote Sensing Imagery

essay on computer vision

Point cloud reconstruction

essay on computer vision

3D Semantic Scene Completion

essay on computer vision

3D Semantic Scene Completion from a single RGB image

Garment reconstruction, camera pose estimation, document layout analysis.

essay on computer vision

Video Editing

Video temporal consistency.

essay on computer vision

Cross-Domain Few-Shot Object Detection

Talking face generation.

essay on computer vision

Face Age Editing

Facial expression generation, kinship face generation, machine unlearning, continual forgetting, privacy preserving deep learning, membership inference attack, video instance segmentation.

essay on computer vision

Virtual Try-on

Line items extraction.

essay on computer vision

Generalized Few-Shot Semantic Segmentation

Human detection.

essay on computer vision

Generalized Referring Expression Segmentation

essay on computer vision

Weakly Supervised Referring Expression Segmentation

Scene flow estimation.

essay on computer vision

Self-supervised Scene Flow Estimation

Gait recognition.

essay on computer vision

Multiview Gait Recognition

Gait recognition in the wild, motion forecasting.

essay on computer vision

Multi-Person Pose forecasting

essay on computer vision

Multiple Object Forecasting

Carla map leaderboard, dead-reckoning prediction, 3d classification, depth completion.

essay on computer vision

Object Discovery

Texture synthesis, multi-view learning, incomplete multi-view clustering, dataset distillation, interactive segmentation.

essay on computer vision

3D Hand Pose Estimation

Cross-modal alignment, gaze estimation.

essay on computer vision

Face Reconstruction

Scene generation.

essay on computer vision

text-guided-image-editing

Text-based image editing, concept alignment.

essay on computer vision

Zero-Shot Text-to-Image Generation

Conditional text-to-image synthesis, image recognition, fine-grained image recognition, license plate recognition, material recognition, sign language recognition.

essay on computer vision

Breast Cancer Detection

Skin cancer classification.

essay on computer vision

Breast Cancer Histology Image Classification

Lung cancer diagnosis, classification of breast cancer histology images, object counting, few-shot object counting and detection, open-vocabulary object counting, training-free object counting, inverse rendering, 3d absolute human pose estimation.

essay on computer vision

Image to 3D

essay on computer vision

Text-to-Face Generation

Interest point detection, homography estimation, event-based vision.

essay on computer vision

Event-based Optical Flow

essay on computer vision

Event-Based Video Reconstruction

Event-based motion estimation, human parsing.

essay on computer vision

Multi-Human Parsing

Pose tracking.

essay on computer vision

3D Human Pose Tracking

Disease prediction, disease trajectory forecasting, weakly supervised segmentation.

essay on computer vision

3D Multi-Person Pose Estimation (absolute)

essay on computer vision

3D Multi-Person Mesh Recovery

essay on computer vision

3D Multi-Person Pose Estimation (root-relative)

Text-to-video generation, text-to-video editing, subject-driven video generation.

essay on computer vision

Dichotomous Image Segmentation

Scene segmentation, multi-label image classification.

essay on computer vision

Multi-label Image Recognition with Partial Labels

Facial landmark detection.

essay on computer vision

Unsupervised Facial Landmark Detection

essay on computer vision

3D Facial Landmark Localization

3d character animation from a single photo, activity detection, temporal localization.

essay on computer vision

Language-Based Temporal Localization

Temporal defect localization, 3d object tracking.

essay on computer vision

3D Single Object Tracking

Camera localization.

essay on computer vision

Camera Relocalization

Lidar semantic segmentation, knowledge distillation.

essay on computer vision

Data-free Knowledge Distillation

Self-knowledge distillation, few-shot class-incremental learning, class-incremental semantic segmentation, non-exemplar-based class incremental learning, moment retrieval.

essay on computer vision

Zero-shot Moment Retrieval

Template matching, intelligent surveillance.

essay on computer vision

Vehicle Re-Identification

Disparity estimation.

essay on computer vision

Multimodal Large Language Model

Visual dialog.

essay on computer vision

Motion Segmentation

Relation network, text spotting.

essay on computer vision

Handwritten Text Recognition

Handwritten document recognition, unsupervised text recognition, decision making under uncertainty.

essay on computer vision

Uncertainty Visualization

Text to video retrieval, partially relevant video retrieval.

essay on computer vision

3D Multi-Object Tracking

Real-time multi-object tracking, referring multi-object tracking, multi-animal tracking with identification, trajectory long-tail distribution for muti-object tracking, shadow detection.

essay on computer vision

Shadow Detection And Removal

Person search, semi-supervised object detection.

essay on computer vision

Zero Shot Segmentation

Video enhancement, video inpainting.

essay on computer vision

Mixed Reality

Physics-informed machine learning, soil moisture estimation.

essay on computer vision

Unconstrained Lip-synchronization

Human mesh recovery, open vocabulary semantic segmentation, zero-guidance segmentation, future prediction, overlapped 10-1, overlapped 15-1, overlapped 15-5, disjoint 15-1, disjoint 15-5.

essay on computer vision

Face Image Quality Assessment

Lightweight face recognition.

essay on computer vision

Age-Invariant Face Recognition

Synthetic face recognition, face quality assessement.

essay on computer vision

Cross-corpus

Micro-expression recognition, micro-expression spotting.

essay on computer vision

3D Facial Expression Recognition

essay on computer vision

Smile Recognition

essay on computer vision

Stereo Image Super-Resolution

Burst image super-resolution, satellite image super-resolution, multispectral image super-resolution, video reconstruction.

essay on computer vision

Image Categorization

Fine-grained visual categorization, key information extraction, key-value pair extraction, line detection, sign language translation.

essay on computer vision

Color Constancy

essay on computer vision

Few-Shot Camera-Adaptive Color Constancy

Tone mapping, hdr reconstruction, multi-exposure image fusion, visual recognition.

essay on computer vision

Fine-Grained Visual Recognition

Deep attention, image cropping, stereo matching hand.

essay on computer vision

Zero-Shot Action Recognition

Natural language transduction, video restoration.

essay on computer vision

Analog Video Restoration

Image forensics, infrared and visible image fusion.

essay on computer vision

Novel Class Discovery

essay on computer vision

Image Animation

essay on computer vision

Breast Cancer Histology Image Classification (20% labels)

Landmark-based lipreading, abnormal event detection in video.

essay on computer vision

Semi-supervised Anomaly Detection

Cross-domain few-shot learning, vision-language navigation.

essay on computer vision

Transparent Object Detection

Transparent objects.

essay on computer vision

Grasp Generation

essay on computer vision

hand-object pose

essay on computer vision

3D Canonical Hand Pose Estimation

Action quality assessment, surface normals estimation.

essay on computer vision

Object Segmentation

essay on computer vision

Camouflaged Object Segmentation

Landslide segmentation, text-line extraction, highlight detection, pedestrian attribute recognition.

essay on computer vision

Probabilistic Deep Learning

Segmentation, open-vocabulary semantic segmentation, steganalysis, computer vision techniques adopted in 3d cryogenic electron microscopy, single particle analysis, cryogenic electron tomography, dense captioning, texture classification.

essay on computer vision

Camouflaged Object Segmentation with a Single Task-generic Prompt

Image to video generation.

essay on computer vision

Unconditional Video Generation

Person retrieval, spoof detection, face presentation attack detection, detecting image manipulation, cross-domain iris presentation attack detection, finger dorsal image spoof detection, unsupervised few-shot image classification, generalized few-shot classification, iris recognition, pupil dilation, action understanding.

essay on computer vision

Sketch Recognition

essay on computer vision

Face Sketch Synthesis

Drawing pictures.

essay on computer vision

Photo-To-Caricature Translation

Meme classification, hateful meme classification.

essay on computer vision

Unbiased Scene Graph Generation

essay on computer vision

Panoptic Scene Graph Generation

Severity prediction, intubation support prediction, image stitching.

essay on computer vision

Multi-View 3D Reconstruction

Surgical phase recognition, online surgical phase recognition, offline surgical phase recognition, document image classification.

essay on computer vision

One-shot visual object segmentation

Universal domain adaptation, zero-shot semantic segmentation, automatic post-editing.

essay on computer vision

Face Reenactment

essay on computer vision

Text based Person Retrieval

Text-to-image, story visualization, complex scene breaking and synthesis, human dynamics.

essay on computer vision

3D Human Dynamics

Image fusion, pansharpening, blind face restoration.

essay on computer vision

Cloud Detection

essay on computer vision

Geometric Matching

Human action generation.

essay on computer vision

Action Generation

Object categorization, table recognition, diffusion personalization.

essay on computer vision

Diffusion Personalization Tuning Free

essay on computer vision

Efficient Diffusion Personalization

Image deconvolution.

essay on computer vision

Image Outpainting

essay on computer vision

Sports Analytics

Image shadow removal, intrinsic image decomposition, single-source domain generalization, evolving domain generalization, source-free domain generalization, point clouds, point cloud video understanding, point cloud rrepresentation learning.

essay on computer vision

Semantic SLAM

essay on computer vision

Object SLAM

Image steganography, lane detection.

essay on computer vision

3D Lane Detection

Line segment detection, person identification, situation recognition, grounded situation recognition, face image quality, layout design, multi-target domain adaptation, visual prompt tuning, weakly-supervised instance segmentation, fake image detection.

essay on computer vision

Fake Image Attribution

essay on computer vision

Robot Pose Estimation

Image morphing, motion detection, rotated mnist, image smoothing, drone navigation, drone-view target localization, contour detection.

essay on computer vision

Crop Classification

License plate detection.

essay on computer vision

Occlusion Handling

essay on computer vision

Video Panoptic Segmentation

Value prediction, body mass index (bmi) prediction, crop yield prediction, personalized image generation, viewpoint estimation.

essay on computer vision

motion retargeting

3d point cloud linear classification, gaze prediction, multi-object tracking and segmentation.

essay on computer vision

Multiview Learning

essay on computer vision

Document Shadow Removal

Zero-shot transfer image classification.

essay on computer vision

3D Object Reconstruction From A Single Image

essay on computer vision

CAD Reconstruction

Bird's-eye view semantic segmentation.

essay on computer vision

Zero-Shot Composed Image Retrieval (ZS-CIR)

Human part segmentation.

essay on computer vision

Material Classification

essay on computer vision

Person Recognition

essay on computer vision

Photo Retouching

Space-time video super-resolution, symmetry detection, shape representation of 3d point clouds, dense pixel correspondence estimation, image forgery detection, precipitation forecasting, synthetic image detection, traffic sign detection, video style transfer, referring image matting.

essay on computer vision

Referring Image Matting (Expression-based)

essay on computer vision

Referring Image Matting (Keyword-based)

essay on computer vision

Referring Image Matting (RefMatte-RW100)

Referring image matting (prompt-based), human interaction recognition, one-shot 3d action recognition, mutual gaze, semi-supervised image classification.

essay on computer vision

Open-World Semi-Supervised Learning

Semi-supervised image classification (cold start), affordance detection.

essay on computer vision

Hand Detection

Image instance retrieval, amodal instance segmentation, image quality estimation.

essay on computer vision

Image Similarity Search

essay on computer vision

Multispectral Object Detection

Referring expression generation, road damage detection.

essay on computer vision

Video Matting

essay on computer vision

inverse tone mapping

Art analysis, facial editing.

essay on computer vision

Holdout Set

essay on computer vision

Open Vocabulary Attribute Detection

Binary classification, llm-generated text detection, cancer-no cancer per breast classification, cancer-no cancer per image classification, stable mci vs progressive mci, suspicous (birads 4,5)-no suspicous (birads 1,2,3) per image classification, image/document clustering, self-organized clustering, lung nodule detection, lung nodule 3d detection, 3d scene reconstruction, 3d shape modeling.

essay on computer vision

Action Analysis

Anatomical landmark detection, event segmentation, generic event boundary detection, food recognition.

essay on computer vision

Motion Magnification

Scanpath prediction, semi-supervised instance segmentation, video deraining, video segmentation, camera shot boundary detection, open-vocabulary video segmentation, open-world video segmentation, 2d pose estimation, category-agnostic pose estimation, overlapping pose estimation, deception detection, deception detection in videos, instance search.

essay on computer vision

Audio Fingerprint

Lung nodule classification, lung nodule 3d classification, image comprehension, image manipulation localization, image retouching, image-variation, jpeg artifact removal, point cloud super resolution, pose retrieval, short-term object interaction anticipation, skills assessment.

essay on computer vision

Text-based Person Retrieval

essay on computer vision

Sensor Modeling

Highlight removal, handwriting verification, bangla spelling error correction, video prediction, earth surface forecasting, predict future video frames.

essay on computer vision

Video Visual Relation Detection

Human-object relationship detection, 3d open-vocabulary instance segmentation.

essay on computer vision

Ad-hoc video search

Audio-visual synchronization, handwriting generation, scene change detection.

essay on computer vision

Semi-Supervised Domain Generalization

Sketch-to-image translation, skills evaluation, 3d shape reconstruction from a single 2d image.

essay on computer vision

Shape from Texture

3d shape representation.

essay on computer vision

3D Dense Shape Correspondence

Birds eye view object detection, event data classification, few-shot instance segmentation, multiple people tracking.

essay on computer vision

Network Interpretation

Open vocabulary panoptic segmentation, rgb-d reconstruction, seeing beyond the visible, single-object discovery, unsupervised semantic segmentation.

essay on computer vision

Unsupervised Semantic Segmentation with Language-image Pre-training

essay on computer vision

Sequential Place Recognition

Autonomous flight (dense forest), autonomous web navigation, vietnamese visual question answering, explanatory visual question answering, multiple object tracking with transformer.

essay on computer vision

Multiple Object Track and Segmentation

Constrained lip-synchronization, face dubbing, 2d semantic segmentation task 3 (25 classes), document enhancement, 3d shape reconstruction, 4d panoptic segmentation, defocus blur detection, face anonymization, horizon line estimation, instance shadow detection, kinship verification, medical image enhancement, spatio-temporal video grounding, training-free 3d point cloud classification, video forensics.

essay on computer vision

Generative 3D Object Classification

Cube engraving classification, enf (electric network frequency) extraction, enf (electric network frequency) extraction from video, facial expression recognition, cross-domain facial expression recognition, zero-shot facial expression recognition, landmark tracking, muscle tendon junction identification, multimodal machine translation.

essay on computer vision

Face to Face Translation

Multimodal lexical translation, 3d scene editing, action assessment, bokeh effect rendering, drivable area detection, font recognition, stochastic human motion prediction, image imputation.

essay on computer vision

Long Video Retrieval (Background Removed)

Medical image denoising.

essay on computer vision

Occlusion Estimation

Physiological computing.

essay on computer vision

Lake Ice Monitoring

Text-based person retrieval with noisy correspondence.

essay on computer vision

Unsupervised 3D Point Cloud Linear Evaluation

Visual speech recognition, lip to speech synthesis, wireframe parsing, gaze redirection, single-image-generation, unsupervised anomaly detection with specified settings -- 30% anomaly, root cause ranking, anomaly detection at 30% anomaly, anomaly detection at various anomaly percentages.

essay on computer vision

Unsupervised Contextual Anomaly Detection

Mistake detection, online mistake detection, 3d object captioning, 3d semantic occupancy prediction, animated gif generation.

essay on computer vision

Occluded Face Detection

Generalized referring expression comprehension, image colorization, sketch colorization, image deblocking, image retargeting, infrared image super-resolution, motion disentanglement, online vectorized hd map construction, personality trait recognition, personalized segmentation, persuasion strategies, scene text editing, image to sketch recognition, traffic accident detection, accident anticipation, unsupervised landmark detection, vcgbench-diverse, vehicle speed estimation, visual analogies, continual anomaly detection, text-guided-generation.

essay on computer vision

Human-Object Interaction Generation

Image-guided composition, noisy semantic image synthesis, weakly supervised action segmentation (transcript), weakly supervised action segmentation (action set)), calving front delineation in synthetic aperture radar imagery, calving front delineation in synthetic aperture radar imagery with fixed training amount, continual semantic segmentation, overlapped 5-3, overlapped 25-25.

essay on computer vision

Handwritten Line Segmentation

Handwritten word segmentation.

essay on computer vision

General Action Video Anomaly Detection

Physical video anomaly detection, road scene understanding, monocular cross-view road scene parsing(road), monocular cross-view road scene parsing(vehicle).

essay on computer vision

Transparent Object Depth Estimation

Age and gender estimation, data ablation, fingertip detection, gait identification, historical color image dating, image and video forgery detection, keypoint detection and image matching, motion captioning, part-aware panoptic segmentation.

essay on computer vision

Part-based Representation Learning

Unsupervised part discovery, portrait animation, repetitive action counting, scene-aware dialogue, spatial relation recognition, spatial token mixer, steganographics, story continuation.

essay on computer vision

Supervised Image Retrieval

Unsupervised anomaly detection with specified settings -- 0.1% anomaly, unsupervised anomaly detection with specified settings -- 1% anomaly, unsupervised anomaly detection with specified settings -- 10% anomaly, unsupervised anomaly detection with specified settings -- 20% anomaly, visual social relationship recognition, zero-shot text-to-video generation, video frame interpolation, 3d video frame interpolation, unsupervised video frame interpolation.

essay on computer vision

eXtreme-Video-Frame-Interpolation

Micro-expression generation, micro-expression generation (megc2021), period estimation, art period estimation (544 artists), unsupervised panoptic segmentation, unsupervised zero-shot panoptic segmentation, 2d tiny object detection.

essay on computer vision

Insulator Defect Detection

3d rotation estimation, camera auto-calibration, defocus estimation, derendering, grounded multimodal named entity recognition, hierarchical text segmentation, human-object interaction concept discovery.

essay on computer vision

One-Shot Face Stylization

Speaker-specific lip to speech synthesis, multi-person pose estimation, multi-modal image segmentation, neural stylization.

essay on computer vision

Population Mapping

Pornography detection, prediction of occupancy grid maps, raw reconstruction, svbrdf estimation, semi-supervised video classification, spectrum cartography, synthetic image attribution, training-free 3d part segmentation, unsupervised image decomposition, video individual counting, video propagation, vietnamese multimodal learning, weakly supervised 3d point cloud segmentation, weakly-supervised panoptic segmentation, drone-based object tracking, brain visual reconstruction, brain visual reconstruction from fmri, fashion understanding, semi-supervised fashion compatibility.

essay on computer vision

intensity image denoising

Lifetime image denoising, observation completion, active observation completion, boundary grounding.

essay on computer vision

Video Narrative Grounding

3d inpainting, 3d scene graph alignment, 4d spatio temporal semantic segmentation.

essay on computer vision

Age Estimation

essay on computer vision

Few-shot Age Estimation

Animal action recognition, cow identification, brdf estimation, camouflage segmentation, clothing attribute recognition, damaged building detection, depth image estimation, detecting shadows, dynamic texture recognition, face verification.

essay on computer vision

Disguised Face Verification

Few shot open set object detection, gaze target estimation, generalized zero-shot learning - unseen, hd semantic map learning, human-object interaction anticipation, image deep networks, manufacturing quality control, materials imaging, micro-gesture recognition, multi-person pose estimation and tracking.

essay on computer vision

Multi-object discovery

Neural radiance caching.

essay on computer vision

Parking Space Occupancy

essay on computer vision

Partial Video Copy Detection

essay on computer vision

Multimodal Patch Matching

Perpetual view generation, procedure learning, prompt-driven zero-shot domain adaptation, safety perception recognition, jersey number recognition, photo to rest generalization, single-shot hdr reconstruction, on-the-fly sketch based image retrieval, specular reflection mitigation, thermal image denoising, trademark retrieval, unsupervised instance segmentation, unsupervised zero-shot instance segmentation, vehicle key-point and orientation estimation.

essay on computer vision

Video-Adverb Retrieval (Unseen Compositions)

Video-to-image affordance grounding.

essay on computer vision

Vietnamese Scene Text

Visual sentiment prediction, human-scene contact detection, localization in video forgery, controllable grasp generation, grasp rectangle generation, video classification, student engagement level detection (four class video classification), multi class classification (four-level video classification), 3d canonicalization, 3d surface generation.

essay on computer vision

Visibility Estimation from Point Cloud

Amodal layout estimation, blink estimation, camera absolute pose regression, change data generation, constrained diffeomorphic image registration, continuous affect estimation, deep feature inversion, disjoint 19-1, document image skew estimation, earthquake prediction, fashion compatibility learning.

essay on computer vision

Displaced People Recognition

Fine-grained vehicle classification, vehicle color recognition, finger vein recognition, flooded building segmentation.

essay on computer vision

Future Hand Prediction

Generative temporal nursing, house generation, human fmri response prediction, hurricane forecasting, ifc entity classification, image declipping, image similarity detection.

essay on computer vision

Image Text Removal

Image-to-gps verification.

essay on computer vision

Image-based Automatic Meter Reading

Dial meter reading, indoor scene reconstruction, jpeg decompression.

essay on computer vision

Kiss Detection

Laminar-turbulent flow localisation.

essay on computer vision

Landmark Recognition

Brain landmark detection, corpus video moment retrieval, linear probing object-level 3d awareness, mllm evaluation: aesthetics, marine animal segmentation, medical image deblurring, mental workload estimation, meter reading, mirror detection, motion expressions guided video segmentation, natural image orientation angle detection, multi-object colocalization, multilingual text-to-image generation, video emotion detection, nwp post-processing, occluded 3d object symmetry detection, one-shot segmentation.

essay on computer vision

Patient-Specific Segmentation

Open set video captioning, open-vocabulary panoramic semantic segmentation, pso-convnets dynamics 1, pso-convnets dynamics 2, partial point cloud matching.

essay on computer vision

Partially View-aligned Multi-view Learning

essay on computer vision

Pedestrian Detection

essay on computer vision

Thermal Infrared Pedestrian Detection

Personality trait recognition by face, physical attribute prediction, point cloud semantic completion, point cloud classification dataset, point- of-no-return (pnr) temporal localization, pose contrastive learning, potrait generation, procedure step recognition, prostate zones segmentation, pulmorary vessel segmentation, pulmonary artery–vein classification, pupil diameter estimation, reference expression generation, interspecies facial keypoint transfer, specular segmentation, state change object detection, surface normals estimation from point clouds, train ego-path detection.

essay on computer vision

Transform A Video Into A Comics

Transparency separation, typeface completion.

essay on computer vision

Unbalanced Segmentation

essay on computer vision

Unsupervised Long Term Person Re-Identification

Video correspondence flow.

essay on computer vision

Key-Frame-based Video Super-Resolution (K = 15)

Zero-shot single object tracking, yield mapping in apple orchards, lidar absolute pose regression, opd: single-view 3d openable part detection, self-supervised scene text recognition, spatial-aware image editing, video narration captioning, spectral estimation, spectral estimation from a single rgb image, 3d prostate segmentation, aggregate xview3 metric, atomic action recognition, composite action recognition, calving front delineation from synthetic aperture radar imagery, computer vision transduction, crosslingual text-to-image generation, zero-shot dense video captioning, document to image conversion, frame duplication detection, geometrical view, hyperview challenge.

essay on computer vision

Image Operation Chain Detection

Kinematic based workflow recognition, logo recognition.

essay on computer vision

MLLM Aesthetic Evaluation

Motion detection in non-stationary scenes, open-set video tagging, retinal vessel segmentation.

essay on computer vision

Artery/Veins Retinal Vessel Segmentation

Satellite orbit determination.

essay on computer vision

Segmentation Based Workflow Recognition

2d particle picking, small object detection.

essay on computer vision

Rice Grain Disease Detection

Sperm morphology classification, video & kinematic base workflow recognition, video based workflow recognition, video, kinematic & segmentation base workflow recognition, animal pose estimation.

facebook

  • Skip to primary navigation
  • Skip to main content

OpenCV

Open Computer Vision Library

Deep Learning For Computer Vision: Essential Models and Practical Real-World Applications

Farooq Alvi November 29, 2023 Leave a Comment AI Careers

Deep Learning For Computer Vision: Graphic illustration showcasing various deep learning models and computer vision applications, including ResNet-50, YOLO, Vision Transformers, and Stable Diffusion V2, against a backdrop of digital neural networks.

The advancement of computer vision, a field blending machine learning with computer science, has been significantly uplifted by the emergence of deep learning. This article on deep learning for computer vision explores the transformative journey from traditional computer vision methods to the innovative heights of deep learning.

We begin with an overview of foundational techniques like thresholding and edge detection and the critical role of OpenCV in traditional approaches.

Brief History and Evolution of Traditional Computer Vision

Computer vision, a field at the intersection of machine learning and computer science, has its roots in the 1960s when researchers first attempted to enable computers to interpret visual data. The journey began with simple tasks like distinguishing shapes and progressed to more complex functions. Key milestones include the development of the first algorithm for digital image processing in the early 1970s and the subsequent evolution of feature detection methods. These early advancements laid the groundwork for modern computer vision, enabling computers to perform tasks ranging from object detection to complex scene understanding.

Core Techniques in Traditional Computer Vision

Thresholding : This technique is fundamental in image processing and segmentation. It involves converting a grayscale image into a binary image, where pixels are marked as either foreground or background based on a threshold value. For instance, in a basic application, thresholding can be used to distinguish an object from its background in a black-and-white image.

Edge Detection : Critical in feature detection and image analysis, edge detection algorithms like the Canny edge detector identify the boundaries of objects within an image. By detecting discontinuities in brightness, these algorithms help understand the shapes and positions of various objects in the image, laying the foundation for more advanced analysis.

Find here the list of the TOP Computer Vision and Deep Learning Courses from OpenCV University.

The Dominance of OpenCV

OpenCV (Open Source Computer Vision Library) is a key player in computer vision, offering over 2500 optimized algorithms since the late 1990s. Its ease of use and versatility in tasks like facial recognition and traffic monitoring have made it a favorite in academia and industry, especially in real-time applications.

The field of computer vision has evolved significantly with the advent of deep learning, shifting from traditional, rule-based methods to more advanced and adaptable systems. Earlier techniques, such as thresholding and edge detection, had limitations in complex scenarios. Deep learning, particularly Convolutional Neural Networks (CNNs), overcomes these by learning directly from data, allowing for more accurate and versatile image recognition and classification.

This advancement, propelled by increased computational power and large datasets, has led to significant breakthroughs in areas like autonomous vehicles and medical imaging, making deep learning a fundamental aspect of modern computer vision.

Deep Learning Models:

Resnet-50 for image classification.

ResNet-50 is a variant of the ResNet (Residual Network) model, which has been a breakthrough in the field of deep learning for computer vision, particularly in image classification tasks. The “50” in ResNet-50 refers to the number of layers in the network – it contains 50 layers deep, a significant increase compared to previous models.

Resnet 50 : Deep Learning For Computer Vision

Key Features of ResNet-50:

1. Residual Blocks : The core idea behind ResNet-50 is its use of residual blocks. These blocks allow the model to skip one or more layers through what are known as “ skip connections ” or “shortcut connections.” This design addresses the vanishing gradient problem, a common issue in deep networks where gradients get smaller and smaller as they backpropagate through layers, making it hard to train very deep networks.

2. Improved Training : Thanks to these residual blocks, ResNet-50 can be trained much deeper without suffering from the vanishing gradient problem. This depth enables the network to learn more complex features at various levels, which is a key factor in its improved performance in image classification tasks.

3. Versatility and Efficiency : Despite its depth, ResNet-50 is relatively efficient in terms of computational resources compared to other deep models. It achieves excellent accuracy on various image classification benchmarks like ImageNet, making it a popular choice in the research community and industry.

4. Applications : ResNet-50 has been widely used in various real-world applications. Its ability to classify images into thousands of categories makes it suitable for tasks like object recognition in autonomous vehicles, content categorization in social media platforms, and aiding diagnostic procedures in healthcare by analyzing medical images.

Impact on Computer Vision:

ResNet-50 has significantly advanced the field of image classification. Its architecture serves as a foundation for many subsequent innovations in deep learning and computer vision. By enabling the training of deeper neural networks, ResNet-50 opened up new possibilities in the accuracy and complexity of tasks that computer vision systems can handle.

YOLO (You Only Look Once) Model

The YOLO (You Only Look Once) model is a revolutionary approach in the field of computer vision, particularly for object detection tasks. YOLO stands out for its speed and efficiency, making real-time object detection a reality.

YOLO: Deep Learning For Computer Vision

Key Features of YOLO

Single Neural Network for Detection : Unlike traditional object detection methods which typically involve separate steps for generating region proposals and classifying these regions, YOLO uses a single convolutional neural network (CNN) to do both simultaneously. This unified approach allows it to process images in real-time.

Speed and Real-Time Processing : YOLO’s architecture allows it to process images extremely fast, making it suitable for applications that require real-time detection, such as video surveillance and autonomous vehicles.

Global Contextual Understanding : YOLO looks at the entire image during training and testing, allowing it to learn and predict with context. This global perspective helps in reducing false positives in object detection.

Version Evolution: Recent iterations such as YOLOv5 , YOLOv6 , YOLOv7 , and the latest YOLOv8 , have introduced significant improvements. These newer models focus on refining the architecture with more layers and advanced features, enhancing their performance in various real-world applications.

Impact on Computer Vision

YOLO’s contribution to the field of deep learning for computer vision has been significant. Its ability to perform object detection in real-time, accurately, and efficiently has opened up numerous possibilities for practical applications that were previously limited by slower detection speeds. Its evolution over time also reflects the rapid advancement and innovation within the field of deep learning in computer vision.

Real-World Applications of YOLO

Traffic Management and Surveillance Systems: A pertinent real-world application of the YOLO model is in the domain of traffic management and surveillance systems. This application showcases the model’s ability to process visual data in real time, a critical requirement for managing and monitoring urban traffic flow.

Implementation in Traffic Surveillance: Vehicle and Pedestrian Detection – YOLO is employed to detect and track vehicles and pedestrians in real-time through traffic cameras. Its ability to process video feeds quickly allows for the immediate identification of different types of vehicles, pedestrians, and even anomalies like jaywalking.

Traffic Flow Analysis: By continuously monitoring traffic, YOLO helps in analyzing traffic patterns and densities. This data can be used to optimize traffic light control, reducing congestion and improving traffic flow.

Accident Detection and Response: The model can detect potential accidents or unusual events on roads. In case of an accident, it can alert the concerned authorities promptly, enabling faster emergency response.

Enforcement of Traffic Rules: YOLO can also assist in enforcing traffic rules by detecting violations like speeding, illegal lane changes, or running red lights. Automated ticketing systems can be integrated with YOLO to streamline enforcement procedures.

Vision Transformers

This model applies the principles of transformers, originally designed for natural language processing, to image classification and detection tasks. It involves splitting an image into fixed-size patches, embedding these patches, adding positional information, and then feeding them into a transformer encoder. 

The model uses a combination of Multi-head Attention Networks and Multi-Layer Perceptrons within its architecture to process these image patches and perform classification.

Vision transformers : Deep Learning For Computer Vision

Key Features

Patch-based Image Processing : ViT divides an image into patches and linearly embeds them, treating the image as a sequence of patches.

Positional Embeddings : To maintain the spatial relationship of image parts, positional embeddings are added to the patch embeddings.

Multi-head Attention Mechanism : It utilizes a multi-head attention network to focus on critical regions within the image and understand the relationships between different patches.

Layer Normalization : This feature ensures stable training by normalizing the inputs across the layers.

Multilayer Perceptron (MLP) Head : The final stage of the ViT model, where the outputs of the transformer encoder are processed for classification.

Class Embedding : ViT includes a learnable class embedding, enhancing its capability to classify images accurately.

Enhanced Accuracy and Efficiency : ViT models have demonstrated significant improvements in accuracy and computational efficiency over traditional CNNs in image classification.

Adaptability to Different Tasks : Beyond image classification, ViTs are effectively applied in object detection, image segmentation, and other complex vision tasks.

Scalability : The patch-based approach and attention mechanism make ViT scalable for processing large and complex images.

Innovative Approach : By applying the transformer architecture to images, ViT represents a paradigm shift in how machine learning models perceive and process visual information.

The Vision Transformer marks a significant advancement in the field of computer vision, offering a powerful alternative to conventional CNNs and paving the way for more sophisticated image analysis techniques.

Vision Transformers (ViTs) are increasingly being used in a variety of real-world applications across different fields due to their efficiency and accuracy in handling complex image data. 

Real World Applications

Image Classification and Object Detection : ViTs are highly effective in image classification, categorizing images into predefined classes by learning intricate patterns and relationships within the image. In object detection, they not only classify objects within an image but also localize their positions precisely. This makes them suitable for applications in autonomous driving and surveillance, where accurate detection and positioning of objects are crucial​​​​.

Your Image Alt Text

Action Recognition : ViTs are being utilized in action recognition to understand and classify human actions in videos. Their robust image processing capabilities, makes them useful in areas such as video surveillance and human-computer interaction​​.

Generative Modeling and Multi-Modal Tasks : ViTs have applications in generative modeling and multi-modal tasks, including visual grounding (linking textual descriptions to corresponding image regions), visual-question answering, and visual reasoning. This reflects their versatility in integrating visual and textual information for comprehensive analysis and interpretation​​​​.

Transfer Learning : An important feature of ViTs is their capacity for transfer learning. By leveraging pre-trained models on large datasets, ViTs can be fine-tuned for specific tasks with relatively small datasets. This significantly reduces the need for extensive labeled data, making ViTs practical for a wide range of applications​​.

Industrial Monitoring and Inspection : In a practical application, the DINO pre-trained ViT was integrated into Boston Dynamics’ Spot robot for monitoring and inspection of industrial sites. This application showcased the ability of ViTs to automate tasks like reading measurements from industrial processes and taking data-driven actions, demonstrating their utility in complex, real-world environments​​.

Stable Diffusion V2: Key Features and Impact on Computer Vision

Image generated using stable diffusion.

Key Features of Stable Diffusion V2

Advanced Text-to-Image Models : Stable Diffusion V2 incorporates robust text-to-image models, utilizing a new text encoder (OpenCLIP) that enhances the quality of generated images. These models can produce images with resolutions like 512×512 pixels and 768×768 pixels, offering significant improvements over previous versions​​.

Super-resolution Upscaler : A notable addition in V2 is the Upscaler Diffusion model that can increase the resolution of images by a factor of 4. This feature allows for converting low-resolution images into much higher-resolution versions, up to 2048×2048 pixels or more when combined with text-to-image models​​.

Depth-to-Image Diffusion Model : This new model, known as depth2img, extends the image-to-image feature from the earlier version. It can infer the depth of an input image and then generate new images using both text and depth information. This feature opens up possibilities for creative applications in structure-preserving image-to-image and shape-conditional image synthesis​​.

Enhanced Inpainting Model : Stable Diffusion V2 includes an updated text-guided inpainting model, allowing for intelligent and quick modification of parts of an image. This makes it easier to edit and enhance images with high precision​​.

Optimized for Accessibility : The model is optimized to run on a single GPU, making it more accessible to a wider range of users. This optimization reflects a commitment to democratizing access to advanced AI technologies​​.

Revolutionizing Image Generation: Stable Diffusion V2’s enhanced capabilities in generating high-quality, high-resolution images from textual descriptions represent a leap forward in computer-generated imagery. This opens new avenues in various fields like digital art, graphic design, and content creation.

Facilitating Creative Applications : With features like depth-to-image and upscaling, Stable Diffusion V2 enables more complex and creative applications. Artists and designers can experiment with depth information and high-resolution outputs, pushing the boundaries of digital creativity.

Improving Image Editing and Manipulation : The advanced inpainting capabilities of Stable Diffusion V2 allow for more sophisticated image editing and manipulation. This can have practical applications in fields like advertising, where quick and intelligent image modifications are often required.

Enhancing Accessibility and Collaboration : By optimizing the model for single GPU use, Stable Diffusion V2 becomes accessible to a broader audience. This democratization could lead to more collaborative and innovative uses of AI in visual tasks, fostering a community-driven approach to AI development.

Setting a New Benchmark in AI : Stable Diffusion V2’s combination of advanced features and accessibility may set new standards in the AI and computer vision community, encouraging further innovations and applications in these fields.

Real-world Applications:

Medical and Health Education : MultiMed, a health technology company, uses Stable Diffusion technology to provide accessible and accurate medical guidance and public health education in multiple languages​​.

Audio Transcription and Image Generation : AudioSonic project transforms audio narratives into images, enhancing the listening experience with corresponding visuals​​.

Interior Design : A web application utilizes Stable Diffusion to empower individuals with AI in home design, allowing customers to create and visualize interior designs quickly and efficiently​​.

Comic Book Production : AI-Comic-Factory combines Falcon AI and SDXL technology with Stable Diffusion to revolutionize comic book production, enhancing both narratives and visuals​​.

Educational Summarization Tool : Summerize, a web application, offers structured information retrieval and summarization from online articles, along with relevant image prompts, aiding research and presentations​​.

Interactive Storytelling in Gaming : SonicVision integrates generative music and dynamic art with storytelling, creating an immersive gaming experience​​.

Cooking and Recipe Generation : DishForge uses Stable Diffusion to visualize ingredients and generate personalized recipes based on user preferences and dietary needs​​.

Marketing and Advertising : EvoMate, an autonomous marketing agent, creates targeted campaigns and content, leveraging Stable Diffusion for content creation​​.

Podcast Fact-Checking and Media Enhancement : TrueCast uses AI algorithms for real-time fact-checking and media presentation during live podcasts​​.

Personal AI Assistants : Projects like Shadow AI and BlaBlaLand use Stable Diffusion for generating relevant images and creating immersive, personalized AI interactions​​.

3D Meditation and Learning Platforms : Applications like 3D Meditation and PhenoVis utilize Stable Diffusion for creating immersive meditation experiences and educational 3D simulations​​.

AI in Medical Education : Patient Simulator aids medical professionals in practicing patient interactions, using Stable Diffusion for enhanced communication and training​​.

Advertising Production Efficiency : ADS AI aims to improve advertising production time by using AI technologies, including Stable Diffusion, for creative product image and content generation​​.

Creative Content and World Building : Platforms like Text2Room and The Universe use Stable Diffusion for generating 3D content and immersive game worlds​​.

Enhanced Online Meetings : Baatcheet.AI revolutionizes online meetings with voice cloning and AI-generated backgrounds, improving focus and communication efficiency​​.

These applications demonstrate the versatility and potential of Stable Diffusion V2 in enhancing various industries by providing innovative solutions to complex problems.

Popular Frameworks – PyTorch and Keras

PyTorch is an open source machine learning (ML) framework based on the Python programming language and the Torch library.

Developed by Facebook’s AI Research lab, PyTorch is an open-source machine learning library. It’s known for its flexibility, ease of use, and native support for dynamic computation graphs, which makes it particularly suitable for research and prototyping. PyTorch also provides strong support for GPU acceleration, which is essential for training large neural networks efficiently.

Checkout: Getting started with Pytorch.

Keras is a high-level, deep learning API developed by Google for implementing neural networks

Keras, now integrated with TensorFlow (Google’s AI framework), is a high-level neural networks API designed for simplicity and ease of use. Initially developed as an independent project, Keras focuses on enabling fast experimentation and prototyping through its user-friendly interface. It supports all the essential features needed for building deep learning models but abstracts away many of the complex details, making it very accessible for beginners.

Checkout: Getting started with Keras

Both frameworks are extensively used in both academic and industrial settings for a variety of machine learning and AI applications, from simple regression models to complex deep neural networks. 

PyTorch is often preferred for research and development due to its flexibility, while Keras is favored for its simplicity and ease of use, especially for beginners.

Conclusion: The Ever-Evolving Landscape of AI Models

As we look towards the future of AI and machine learning, it’s crucial to acknowledge that one model does not fit all. Even a decade from now, we might still see the use of classic models like ResNet alongside contemporary ones like Vision Transformers or Stable Diffusion V2. 

The field of AI is characterized by continuous evolution and innovation. It reminds us that the tools and models we use must adapt and diversify to meet the ever-changing demands of technology and society.

Related Posts

introduction to ai jobs in 2023

August 16, 2023    Leave a Comment

introduction to artificial intelligence

August 23, 2023    Leave a Comment

Knowing the history of AI is important in understanding where AI is now and where it may go in the future.

August 30, 2023    Leave a Comment

Become a Member

Stay up to date on OpenCV and Computer Vision news

Free Courses

  • TensorFlow & Keras Bootcamp
  • OpenCV Bootcamp
  • Python for Beginners
  • Mastering OpenCV with Python
  • Fundamentals of CV & IP
  • Deep Learning with PyTorch
  • Deep Learning with TensorFlow & Keras
  • Computer Vision & Deep Learning Applications
  • Mastering Generative AI for Art

Partnership

  • Intel, OpenCV’s Platinum Member
  • Gold Membership
  • Development Partnership

General Link

essay on computer vision

Subscribe and Start Your Free Crash Course

essay on computer vision

Stay up to date on OpenCV and Computer Vision news and our new course offerings

  • We hate SPAM and promise to keep your email address safe.

Join the waitlist to receive a 20% discount

Courses are (a little) oversubscribed and we apologize for your enrollment delay. As an apology, you will receive a 20% discount on all waitlist course purchases. Current wait time will be sent to you in the confirmation email. Thank you!

essay on computer vision

What Is Computer Vision?

Computer vision is a field of artificial intelligence (AI) that trains computers to see, interpret and understand the world around them through machine learning techniques

Jye Sawtell-Rickson

Computer vision is a field of artificial intelligence (AI) that applies machine learning to images and videos to understand media and make decisions about them. With computer vision, we can, in a sense, give vision to software and technology.

How Does Computer Vision Work?

Computer vision programs use a combination of techniques to process raw images and turn them into usable data and insights.

The basis for much computer vision work is 2D images, as shown below. While images may seem like a complex input, we can decompose them into raw numbers. Images are really just a combination of individual pixels and each pixel can be represented by a number (grayscale) or combination of numbers such as (255, 0, 0— RGB ).

Once we’ve translated an image to a set of numbers, a computer vision algorithm applies processing. One way to do this is a classic technique called convolutional neural networks (CNNs) that uses layers to group together the pixels in order to create successively more meaningful representations of the data. A CNN may first translate pixels into lines, which are then combined to form features such as eyes and finally combined to create more complex items such as face shapes.

Why Is Computer Vision Important?

Computer vision has been around since as early as the 1950s and continues to be a popular field of research with many applications. According to the deep learning research group, BitRefine , we should expect the computer vision industry to grow to nearly 50 billion USD in 2022, with 75 percent of the revenue deriving from hardware .

The importance of computer vision comes from the increasing need for computers to be able to understand the human environment. To understand the environment, it helps if computers can see what we do, which means mimicking the sense of human vision. This is especially important as we develop more complex AI systems that are more human-like in their abilities.

On That Note. . . How Do Self-Driving Cars Work?

Computer Vision Examples

Computer vision is often used in everyday life and its applications range from simple to very complex.

Optical character recognition (OCR) was one of the most widespread applications of computer vision. The most well-known case of this today is Google’s Translate , which can take an image of anything — from menus to signboards — and convert it into text that the program then translates into the user’s native language. We can also apply OCR in other use cases such as automated tolling of cars on highways and translating hand-written documents into digital counterparts.

A more recent application, which is still under development and will play a big role in the future of transportation, is object recognition. In object recognition an algorithm takes an input image and searches for a set of objects within the image, drawing boundaries around the object and labelling it. This application is critical in self-driving cars which need to quickly identify its surroundings in order to decide on the best course of action.

Computer Vision Applications

  • Facial recognition
  • Self-driving cars
  • Robotic automation
  • Medical anomaly detection 
  • Sports performance analysis
  • Manufacturing fault detection
  • Agricultural monitoring
  • Plant species classification
  • Text parsing

What Are the Risks of Computer Vision?

As with all technology, computer vision is a tool, which means that it can have benefits, but also risks. Computer vision has many applications in everyday life that make it a useful part of modern society but recent concerns have been raised around privacy. The issue that we see most often in the media is around facial recognition. Facial recognition technology uses computer vision to identify specific people in photos and videos. In its lightest form it’s used by companies such as Meta or Google to suggest people to tag in photos, but it can also be used by law enforcement agencies to track suspicious individuals. Some people feel facial recognition violates privacy, especially when private companies may use it to track customers to learn their movements and buying patterns.

Recent Machine Learning Articles

89 Artificial Intelligence Examples Shaking Up Business Across Industries

Top Computer Vision Papers of All Time (Updated 2024)

  • Nico Klingler
  • March 12, 2024

essay on computer vision

Build, deploy, operate computer vision at scale

  • One platform for all use cases
  • Connect all your cameras
  • Flexible for your needs

Today’s boom in computer vision (CV) started at the beginning of the 21 st century with the breakthrough of deep learning models and convolutional neural networks (CNN). The main CV methods include image classification, image localization, object detection, and segmentation.

In this article, we dive into some of the most significant research papers that triggered the rapid development of computer vision. We split them into two categories – classical CV approaches, and papers based on deep-learning. We chose the following papers based on their influence, quality, and applicability.

Gradient-based Learning Applied to Document Recognition (1998)

Distinctive image features from scale-invariant keypoints (2004), histograms of oriented gradients for human detection (2005), surf: speeded up robust features (2006), imagenet classification with deep convolutional neural networks (2012), very deep convolutional networks for large-scale image recognition (2014), googlenet – going deeper with convolutions (2014), resnet – deep residual learning for image recognition (2015), faster r-cnn: towards real-time object detection with region proposal networks (2015), yolo: you only look once: unified, real-time object detection (2016), mask r-cnn (2017), efficientnet – rethinking model scaling for convolutional neural networks (2019).

About us:  Viso Suite is the end-to-end computer vision solution for enterprises. With a simple interface and features that give machine learning teams control over the entire ML pipeline, Viso Suite makes it possible to achieve a 3-year ROI of 695%. Book a demo to learn more about how Viso Suite can help solve business problems.

Classic Computer Vision Papers

The authors Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner published the LeNet paper in 1998. They introduced the concept of a trainable Graph Transformer Network (GTN) for handwritten character and word recognition . They researched (non) discriminative gradient-based techniques for training the recognizer without manual segmentation and labeling.

LeNet CNN architecture digits recognition

Characteristics of the model:

  • LeNet-5 CNN contains 6 convolution layers with multiple feature maps (156 trainable parameters).
  • The input is a 32×32 pixel image and the output layer is composed of Euclidean Radial Basis Function units (RBF) one for each class (letter).
  • The training set consists of 30000 examples, and authors achieved a 0.35% error rate on the training set (after 19 passes).

Find the LeNet paper here .

David Lowe (2004), proposed a method for extracting distinctive invariant features from images. He used them to perform reliable matching between different views of an object or scene. The paper introduced Scale Invariant Feature Transform (SIFT), while transforming image data into scale-invariant coordinates relative to local features.

SIFT method keypoints detection

Model characteristics:

  • The method generates large numbers of features that densely cover the image over the full range of scales and locations.
  • The model needs to match at least 3 features from each object – in order to reliably detect small objects in cluttered backgrounds.
  • For image matching and recognition, the model extracts SIFT features from a set of reference images stored in a database.
  • SIFT model matches a new image by individually comparing each feature from the new image to this previous database (Euclidian distance).

Find the SIFT paper here .

The authors Navneet Dalal and Bill Triggs researched the feature sets for robust visual object recognition, by using a linear SVM-based human detection as a test case. They experimented with grids of Histograms of Oriented Gradient (HOG) descriptors that significantly outperform existing feature sets for human detection .

histogram object detection

Authors achievements:

  • The histogram method gave near-perfect separation from the original MIT pedestrian database.
  • For good results – the model requires: fine-scale gradients, fine orientation binning, i.e. high-quality local contrast normalization in overlapping descriptor blocks.
  • Researchers examined a more challenging dataset containing over 1800 annotated human images with many pose variations and backgrounds.
  • In the standard detector, each HOG cell appears four times with different normalizations and improves performance to 89%.

Find the HOG paper here .

Herbert Bay, Tinne Tuytelaars, and Luc Van Goo presented a scale- and rotation-invariant interest point detector and descriptor, called SURF (Speeded Up Robust Features). It outperforms previously proposed schemes concerning repeatability, distinctiveness, and robustness, while computing much faster. The authors relied on integral images for image convolutions, furthermore utilizing the leading existing detectors and descriptors.

surf detecting interest points

  • Applied a Hessian matrix-based measure for the detector, and a distribution-based descriptor, simplifying these methods to the essential.
  • Presented experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application.
  • SURF showed strong performance – SURF-128 with an 85.7% recognition rate, followed by U-SURF (83.8%) and SURF (82.6%).

Find the SURF paper here .

Papers Based on Deep-Learning Models

Alex Krizhevsky and his team won the ImageNet Challenge in 2012 by researching deep convolutional neural networks. They trained one of the largest CNNs at that moment over the ImageNet dataset used in the ILSVRC-2010 / 2012 challenges and achieved the best results reported on these datasets. They implemented a highly-optimized GPU of 2D convolution, thus including all required steps in CNN training, and published the results.

alexnet CNN architecture

  • The final CNN contained five convolutional and three fully connected layers, and the depth was quite significant.
  • They found that removing any convolutional layer (each containing less than 1% of the model’s parameters) resulted in inferior performance.
  • The same CNN, with an extra sixth convolutional layer, was used to classify the entire ImageNet Fall 2011 release (15M images, 22K categories).
  • After fine-tuning on ImageNet-2012 it gave an error rate of 16.6%.

Find the ImageNet paper here .

Karen Simonyan and Andrew Zisserman (Oxford University) investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Their main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3×3) convolution filters, specifically focusing on very deep convolutional networks (VGG) . They proved that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers.

 image classification CNN results VOC-2007, VOC-2012

  • Their ImageNet Challenge 2014 submission secured the first and second places in the localization and classification tracks respectively.
  • They showed that their representations generalize well to other datasets, where they achieved state-of-the-art results.
  • They made two best-performing ConvNet models publicly available, in addition to the deep visual representations in CV.

Find the VGG paper here .

The Google team (Christian Szegedy, Wei Liu, et al.) proposed a deep convolutional neural network architecture codenamed Inception. They intended to set the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of their architecture was the improved utilization of the computing resources inside the network.

GoogleNet Inception CNN

  • A carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant.
  • Their submission for ILSVRC14 was called GoogLeNet , a 22-layer deep network. Its quality was assessed in the context of classification and detection.
  • They added 200 region proposals coming from multi-box increasing the coverage from 92% to 93%.
  • Lastly, they used an ensemble of 6 ConvNets when classifying each region which improved results from 40% to 43.9% accuracy.

Find the GoogLeNet paper here .

Microsoft researchers Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun presented a residual learning framework (ResNet) to ease the training of networks that are substantially deeper than those used previously. They reformulated the layers as learning residual functions concerning the layer inputs, instead of learning unreferenced functions.

resnet error rates

  • They evaluated residual nets with a depth of up to 152 layers – 8× deeper than VGG nets, but still having lower complexity.
  • This result won 1st place on the ILSVRC 2015 classification task.
  • The team also analyzed the CIFAR-10 with 100 and 1000 layers, achieving a 28% relative improvement on the COCO object detection dataset.
  • Moreover – in ILSVRC & COCO 2015 competitions, they won 1 st place on the tasks of ImageNet detection, ImageNet localization, COCO detection/segmentation.

Find the ResNet paper here .

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun introduced the Region Proposal Network (RPN) with full-image convolutional features with the detection network, therefore enabling nearly cost-free region proposals. Their RPN was a fully convolutional network that simultaneously predicted object bounds and objective scores at each position. Also, they trained the RPN end-to-end to generate high-quality region proposals, which Fast R-CNN used for detection.

faster R-CNN object detection

  • Merged RPN and fast R-CNN into a single network by sharing their convolutional features. In addition, they applied neural networks with “ attention” mechanisms .
  • For the very deep VGG-16 model, their detection system had a frame rate of 5fps on a GPU.
  • Achieved state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image.
  • In ILSVRC and COCO 2015 competitions, faster R-CNN and RPN were the foundations of the 1st-place winning entries in several tracks.

Find the Faster R-CNN paper here .

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi developed YOLO, an innovative approach to object detection. Instead of repurposing classifiers to perform detection, the authors framed object detection as a regression problem. In addition, they spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance .

YOLO CNN architecture

  • The base YOLO model processed images in real-time at 45 frames per second.
  • A smaller version of the network, Fast YOLO, processed 155 frames per second, while still achieving double the mAP of other real-time detectors.
  • Compared to state-of-the-art detection systems, YOLO was making more localization errors, but was less likely to predict false positives in the background.
  • YOLO learned very general representations of objects and outperformed other detection methods, including DPM and R-CNN , when generalizing natural images.

Find the YOLO paper here .

Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick (Facebook) presented a conceptually simple, flexible, and general framework for object instance segmentation. Their approach could detect objects in an image, while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN , extended Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

mask R-CNN framework

  • Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps.
  • Showed great results in all three tracks of the COCO suite of challenges. Also, it includes instance segmentation, bounding box object detection, and person keypoint detection.
  • Mask R-CNN outperformed all existing, single-model entries on every task, including the COCO 2016 challenge winners.
  • The model served as a solid baseline and eased future research in instance-level recognition.

Find the Mask R-CNN paper here .

The authors (Mingxing Tan, Quoc V. Le) of EfficientNet studied model scaling and identified that carefully balancing network depth, width, and resolution can lead to better performance. They proposed a new scaling method that uniformly scales all dimensions of depth resolution using a simple but effective compound coefficient. They demonstrated the effectiveness of this method in scaling up MobileNet and ResNet .

efficiennet model scaling CNN

  • Designed a new baseline network and scaled it up to obtain a family of models, called EfficientNets. It had much better accuracy and efficiency than previous ConvNets.
  • EfficientNet-B7 achieved state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet.
  • It also transferred well and achieved state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with much fewer parameters.

Find the EfficientNet paper here .

All-in-one platform to build, deploy, and scale computer vision applications

essay on computer vision

  • Stacked scrolling 1 Platform
  • Certificate check Solutions
  • Blog Viso Blog
  • Forum Contact us

Privacy Overview

CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearSet by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
elementorneverThis cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONIDsessionThe JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
ZCAMPAIGN_CSRF_TOKENsessionThis cookie is used to distinguish between humans and bots.
zfccnsessionZoho sets this cookie for website security when a request is sent to campaigns.
CookieDurationDescription
_zcsr_tmpsessionZoho sets this cookie for the login function on the website.
CookieDurationDescription
_gat1 minuteThis cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_177371481_21 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
zabUserId1 yearThis cookie is set by Zoho and identifies whether users are returning or visiting the website for the first time
zabVisitIdone yearUsed for identifying returning visits of users to the webpage.
zft-sdc24hoursIt records data about the user's navigation and behavior on the website. This is used to compile statistical reports and heat maps to improve the website experience.
zps-tgr-dts1 yearThese cookies are used to measure and analyze the traffic of this website and expire in 1 year.
CookieDurationDescription
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
CookieDurationDescription
2d719b1dd3sessionThis cookie has not yet been given a description. Our team is working to provide more information.
4662279173sessionThis cookie is used by Zoho Page Sense to improve the user experience.
ad2d102645sessionThis cookie has not yet been given a description. Our team is working to provide more information.
zc_consent1 yearNo description available.
zc_show1 yearNo description available.
zsc2feeae1d12f14395b6d5128904ae37461 minuteThis cookie has not yet been given a description. Our team is working to provide more information.

essay on computer vision

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Home — Essay Samples — Information Science and Technology — Computer Programming — Computer Vision: Technological Development and Applications

test_template

Computer Vision: Technological Development and Applications

  • Categories: Artificial Intelligence Computer Programming Intelligent Machines

About this sample

close

Words: 1861 |

10 min read

Published: Aug 31, 2023

Words: 1861 | Pages: 4 | 10 min read

Table of contents

Introduction, challenges of computer vision, the role of machine learning in computer vision, applications of computer vision.

  • Apple. (2017). Face ID Security. Apple Support. https://support.apple.com/en-us/HT208108
  • Chen, A. T. Y., Biglari-Abhari, M., & Wang, J. L. (2017). Privacy-Affirming Vision: Obfuscating Surveillance Cameras. IEEE Transactions on Multimedia, 19(5), 1074-1087. https://ieeexplore.ieee.org/document/7848780
  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2818-2826). https://openaccess.thecvf.com/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf
  • Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer. http://szeliski.org/Book/
  • Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep Learning for Computer Vision: A Brief Review. Computational Intelligence and Neuroscience, 2018. https://www.hindawi.com/journals/cin/2018/7068349/
  • Wan, W., Lu, F., & Harada, T. (2016). Development of an intelligent robot assembly system using multi-modal vision. Robotics and Computer-Integrated Manufacturing, 39, 50-61. https://www.sciencedirect.com/science/article/pii/S0736584515301242

Image of Alex Wood

Cite this Essay

Let us write you an essay from scratch

  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours

Get high-quality help

author

Verified writer

  • Expert in: Information Science and Technology Science

writer

+ 120 experts online

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy . We’ll occasionally send you promo and account related email

No need to pay just yet!

Related Essays

7 pages / 3125 words

1 pages / 479 words

3 pages / 1445 words

2 pages / 1094 words

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

Related Essays on Computer Programming

Investing in the right computer vision solution delivers numerous advantages. Don’t, however, only concentrate your efforts on the hardware, as the software also carries significant weight in determining your computer vision [...]

Artificial intelligence (ai) means to perform tasks & functions using a computer system that requires human knowledge in a different way to say that the machine has to think & act like humans. Computer vision describes what a [...]

The realm of technology is continually shaped by the passion for programming. In a world driven by digital innovation, individuals who harbor an intense love for coding and software development are at the forefront of [...]

No agreed-to definition of "algorithm" exists. A simple definition: A set of instructions for solving a problem. The algorithm is either implemented by a program or simulated by a program. Algorithms often have steps that [...]

Return oriented programming (ROP) and Jump-oriented programming (JOP) are both code-reuse attack. They re-use legitimate code of a vulnerable program to construct arbitrary computation without injecting code. They are computer [...]

Computer Engineering is a very precise and decently hard career to pursue.  Computer Engineering deals with the design of products for electronic computation and communication. These people who pursue this career focus on [...]

Related Topics

By clicking “Send”, you agree to our Terms of service and Privacy statement . We will occasionally send you account related emails.

Where do you want us to send this sample?

By clicking “Continue”, you agree to our terms of service and privacy policy.

Be careful. This essay is not unique

This essay was donated by a student and is likely to have been used and submitted before

Download this Sample

Free samples may contain mistakes and not unique parts

Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

Please check your inbox.

We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

Get Your Personalized Essay in 3 Hours or Less!

We use cookies to personalyze your web-site experience. By continuing we’ll assume you board with our cookie policy .

  • Instructions Followed To The Letter
  • Deadlines Met At Every Stage
  • Unique And Plagiarism Free

essay on computer vision

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

A list of papers and other resources on computer vision and deep learning.

tzxiang/awesome-computer-vision-papers

Folders and files.

NameName
5 Commits

Repository files navigation

Awesome-computer-vision-papers.

  • A Survey on Deep Learning Techniques for Stereo-based Depth Estimation. arXiv202006
  • Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. arXiv202005
  • A Gentle Introduction to Deep Learning for Graphs. arXiv201912 [Note]
  • A Comprehensive Survey on Graph Neural Networks. arXiv201912 [Note]
  • Research Guide: Model Distillation Techniques for Deep Learning, Derrick Mwiti, 2019.11 [Blog]
  • Graph Neural Networks: A Review of Methods and Applications, arXiv2019.7 [Intro-Chinese]
  • A Review on Deep Learning in Medical Image Reconstruction, arXiv2019.6
  • MNIST-C: A Robustness Benchmark for Computer Vision, arXiv2019.6 [Code&Dataset]
  • Going Deep in Medical Image Analysis: Concepts, Methods, Challenges and Future Directions, arXiv2019.2
  • Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art. arXiv201704 [Resourses]
  • [2019TIV] A Survey of Autonomous Driving: Common Practices and Emerging Technologies
  • [2014JMLR] Do we need hundreds of classifiers to solve real world classification problems

SemanticSeg

awesome-semantic-segmentation SemanticSegPaperCollection SegLoss : A collection of loss functions for medical image segmentation Efficient-Segmentation-Networks U-Net and its variant code 深度学习下的语义分割综述 [Page] [Notes] 三维语义分割概述及总结 [Page] Unpooling/unsampling deconvolution [Note] Some basic points: align_corners Code: Semantic Segmentation Suite in TensorFlow
  • A Survey on Instance Segmentation: State of the art, arXiv202007
  • Unsupervised Domain Adaptation in Semantic Segmentation: a Review, arXiv202005
  • Image Segmentation Using Deep Learning: A Survey. arXiv202001
  • Recent progress in semantic image segmentation, Artificial Intelligence Review, 2019
  • Review of Deep Learning Algorithms for Image Semantic Segmentation, 2018 [Blog]
  • Divided We Stand: A Novel Residual Group Attention Mechanism for Medical Image Segmentation, arXiv2019.12
  • Hard Pixels Mining: Learning Using Privileged Information for Semantic Segmentation, arXiv2019.11
  • Hierarchical Attention Networks for Medical Image Segmentation, arXiv2019.11 [eye line seg]
  • Multi-scale guided attention for medical image segmentation, arXiv2019.10 [Code]
  • Adaptive Class Weight based Dual Focal Loss for Improved Semantic Segmentation, arXiv2019.10
  • ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation, arXiv2019.6
  • ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation, arXiv2019.6 [Code]
  • FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation, arXiv2019.3 [Proj] [Code] [Note] [JPU: Joint Pyramid Upsampling]
  • ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, arXiv2016.6 [Code]

Journal/Proceedings

[2019IJCV] AdapNet++ : Self-Supervised Model Adaptation for Multimodal Semantic Segmentation [Code]

[2019NIPS] Zero-Shot Semantic Segmentation [Code]

[2019NIPS] Grid Saliency for Context Explanations of Semantic Segmentation [github]

[2019NIPS] Region Mutual Information Loss for Semantic Segmentation

[2019NIPS] Improving Semantic Segmentation via Dilated Affinity

[2019NIPS] Correlation Maximized Structural Similarity Lossfor Semantic Segmentation

[2019NIPS] Multi-source Domain Adaptation for Semantic Segmentation

[2019ICCV] Boundary-Aware Feature Propagation for Scene Segmentation

[2019ICCV] [Adaptive-sampling] Efficient Segmentation: Learning Downsampling Near Semantic Boundaries [github] (Reference: LIP: Local Importance-based Pooling, ICCV2019 [github] [Notes] )

[2019ICCV] Selectivity or Invariance: Boundary-aware Salient Object Detection [Proj&Code]

[2019ICCV] Recurrent U-Net for Resource-Constrained Segmentation

[2019ICCV] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation [Code] [Proj]

[2019ICCV] Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery

[2019ICCV] ACE: Adapting to Changing Environments for Semantic Segmentation

[2019ICCV] Asymmetric Non-local Neural Networks for Semantic Segmentation

[2019ICCV] DADA: Depth-Aware Domain Adaptation in Semantic Segmentation

[2019ICCV] ACFNet: Attentional Class Feature Network for Semantic Segmentation

[2019ICCV] [EMANet] Expectation-Maximization Attention Networks for Semantic Segmentation [github]

[2019ICCV] CCNet : Criss-Cross Attention for Semantic Segmentation [github]

[2019ICCV] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

[2019CVPR] ESPNetv2 : A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network [Code]

[2019CVPR] Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection

[2019CVPR] Beyond Gradient Descent for Regularized Segmentation Losses [Code]

[2019CVPR] Co-occurrent Features in Semantic Segmentation

[2019CVPR] Context-aware Spatio-recurrent Curvilinear Structure Segmentation [line structure seg]

[2019CVPR] Dual attention network for scene segmentation

[2019CVPR] Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation.

[2019AAAI] Learning Fully Dense Neural Networks for Image Semantic Segmentation

[2019MICCAI] ET-Net: A Generic Edge-Attention Guidance Network for Medical Image Segmentation [Code]

[2019MICCAI] Attention Guided Network for Retinal Image Segmentation [Code]

[2019MICCAIW] CU-Net: Cascaded U-Net with Loss Weighted Sampling for Brain Tumor Segmentation

[2018CVPR] [EncNet] Context Encoding for Semantic Segmentation (oral) [Code-Pytorch] [Slides]

[2018CVPR] Learning a Discriminative Feature Network for Semantic Segmentation

[2018CVPR] DenseASPP for Semantic Segmentation in Street Scenes [Code]

[2018CVPR] Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation

[2018ECCV] ESPNet : Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

[2018ECCV] ICNet for Real-Time Semantic Segmentation on High-Resolution Images [Proj] [Code]

[2018ECCV] PSANet : Point-wise Spatial Attention Network for Scene Parsing

[2018ECCV] Bisenet : Bilateral segmentation network for real-time semantic segmentation [Code]

[2018ECCV] [ DeepLabv3+ ] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [Code]

[2018BMVC] Pyramid Attention Network for Semantic Segmentation

[2018DLMIA] UNet++: A Nested U-Net Architecture for Medical Image Segmentation [Code]

[2018MIDL] Attention U-Net: Learning Where to Look for the Pancreas

[2017arXiv] [ DeepLabv3 ] Rethinking Atrous Convolution for Semantic Image Segmentation

[2017PAMI] [ DeepLabv2 ] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[2017PAMI] SegNet : A deep convolutional encoder-decoder architecture for image segmentation

[2017CVPR] [ GCN ] Large Kernel Matters-Improve Semantic Segmentation by Global Convolutional Network [Code] [Note]

[2017CVPR] [ PSPNet ] Pyramid Scene Parsing Network

[2017CVPR] RefineNet : Multi-path refinement networks for high-resolution semantic segmentation

[2017CVPR] [ FCIS ] Fully convolutional instance-aware semantic segmentation

[2017CVPR] [ FRRN ] Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes [Code]

[2017CVPRW] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation [Code]

[2017ICRA] AdapNet : Adaptive semantic segmentation in adverse environmental conditions [Code]

[2016ICLR] Multi-Scale Context Aggregation by Dilated Convolutions

[2016ICLR] ParseNet: Looking Wider to See Better

[2016CVPR] Instance-aware semantic segmentation via multi-task network cascades

[2016CVPR] Attention to Scale: Scale-Aware Semantic Image Segmentation

[2016ECCV] What's the Point: Semantic Segmentation with Point Supervision

[2016ECCV] Instance-sensitive fully convolutional networks

[2016DLMIA] [UNet+ResNet] The Importance of Skip Connections in Biomedical Image Segmentation

[2015ICLR] [ DeepLabv1 ] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

[2015ICCV] Conditional random fields as recurrent neural networks

[2015ICCV] [DeconvNet] Learning Deconvolution Network for Semantic Segmentation

[2015MICCAI] U-Net : Convolutional networks for biomedical image segmentation [Note]

[2015CVPR/2017PAMI] [ FCN ] Fully convolutional networks for semantic segmentation

PanopticSeg

awesome-panoptic-segmentation

Real-Time Panoptic Segmentation from Dense Detections, arXiv2019.12

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation, arXiv2019.12

PanDA: Panoptic Data Augmentation, arXiv2019.11

Learning Instance Occlusion for Panoptic Segmentation, arXiv2019.11

Panoptic Edge Detection, arXiv2019.6

[2020ICRA] DS-PASS : Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing [Code]

[2020AAAI] SOGNet : Scene Overlap Graph Network for Panoptic Segmentation

[2019CVPR] Panoptic Segmentation

[2019CVPR] Attention-guided Unified Network for Panoptic Segmentation

[2019CVPR] Panoptic Feature Pyramid Networks (oral) [ unofficial code ] [detectron2]

[2019CVPR] UPSNet : A Unified Panoptic Segmentation Network [Code]

[2019CVPR] [ OANet ] An End-to-end Network for Panoptic Segmentation

[2019CVPR] DeeperLab : Single-Shot Image Parser (oral) [project] [code]

[2019CVPR] Interactive Full Image Segmentation by Considering All Regions Jointly

[2019CVPR] Seamless Scene Segmentation [code]

awesome image-based 3D reconstruction awesome-point-cloud-analysis [Blog] 基于单目视觉的三维重建算法综述 [Bolg] 三维视觉、SLAM方向全球顶尖实验室汇总 Camera Calibration [Note] [Note2] [Hub] Visual SLAM Related Research
BigSFM: Reconstructing the World from Internet Photos, summary of Noah Snavely works [Proj&Code] (Bundler, 1DSfM, sfm-dismbig, DISCO, LocalSymmetry, dataset ...)

A Survey on Deep Leaning Architectures for Image-based Depth Reconstruction, arXiv2019.6

[2019PAMI] Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era

[2017Robot] Keyframe-based monocular SLAM: design, survey, and future directions, Robotics and Autonomous Systems

  • [2017CVPR] Geometric loss functions for camera pose regression with deep learning [Proj-with PoseNet+Modelling]
  • [2016ICRA] Modelling Uncertainty in Deep Learning for Camera Relocalization
  • [2015ICCV] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
  • [2016ECCV] [ LineSfM ] Robust and Accurate Line- and/or Point-Based Pose Estimation without Manhattan Assumptions [Code]

Depth/StereoMatching

  • [2020PAMI] SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis [Code]
  • [2020ICLR] Pseudo-LiDAR++ : Accurate Depth for 3D Object Detection in Autonomous Driving, arXiv2019.8 [Code]
  • [2019NIPS] [ SC-SfMLearner ] Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video [Proj] [Code]
  • [2019ICCV] How do neural networks see depth in single images? [Note]
  • [2019ICCV] DeepPruner : Learning Efficient Stereo Matching via Differentiable PatchMatch [Code]
  • [2019CVPR] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving [Code]
  • [2019CVPR] DeepLiDAR : Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image
  • [2019CVPR] [ R-MVSNet ] Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference [Code]
  • [2019ToG] 3D Ken Burns Effect from a Single Image [Homepage] [Code]
  • [2019IROS] SuMa++ : Efficient LiDAR-based Semantic SLAM [Code]
  • [2019ICCVW] Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
  • [2019WACV] SfMLearner++ : Learning Monocular Depth & Ego-Motion using Meaningful Geometric Constraints
  • [2018CVPR] Automatic 3D Indoor Scene Modeling From Single Panorama
  • [2018CVPR] LEGO : Learning Edge with Geometry all at Once by Watching Videos (spotlight) [Code]
  • [2018CVPR] [ vid2depth ] Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints [Proj&Code]
  • [2018CVPR] GeoNet : Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Code]
  • [2018CVPR] DeepMVS : Learning Multi-View Stereopsis [Proj] [Code]
  • [2018ECCV] MVSNet : Depth Inference for Unstructured Multi-view Stereo
  • [2017ICCV] SurfaceNet : An End-to-end 3D Neural Network for Multiview Stereopsis [Code]
  • [2017CVPR] [ SfMLearner ] Unsupervised Learning of Depth and Ego-Motion from Video, Oral [Proj] [TF] [Pytorch] [ClassProj]
  • [2017CVPR] SGM-Nets : Semi-Global Matching With Neural Networks
  • [2016JMLR] [ MC-CNN ] Stereo matching by training a convolutional neural network to compare image patches [Code]

Surface Reconstruction

  • [ICCV15/IJCV17] Global, Dense Multiscale Reconstruction for a Billion Points [Proj] [Code]
  • [2014ECCV] Let there be color! Large-scale texturing of 3D reconstructions [Code]

[2017WACV] Pano2CAD: Room Layout From A Single Panorama Image

[2014ECCV] PanoContext : A Whole-room 3D Context Model for Panoramic Scene Understanding, Oral [Homepage&Code] [PanoBasic]

3D SemanticSeg

Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping, arXiv2019.12 [Code]

Rotation Invariant Point Cloud Classification: Where Local Geometry Meets Global Topology, arXiv2019.11

SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving, arXiv2019.9 [Code]

Going Deeper with Point Networks, arXiv2019.7 [Code]

[2020GRSM] A Review of Point Cloud Semantic Segmentation

[2019NIPS] [ PVCNN ] Point-Voxel CNN for Efficient 3D Deep Learning (Spotlight) [Proj] [Code]

[2019IROS] RangeNet++ : Fast and Accurate LiDAR Semantic Segmentation [Code]

[2019ICCV] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

[2019ICCV] Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

[2019ICCV] Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion (oral)

[2019CVPR] ClusterNet : Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis

[2018NIPS] PointCNN : Convolution On X-Transformed Points [Code]

[2018ECCV] Efficient Semantic Scene Completion Network with Spatial Group Convolution [Code]

[2017NIPS] PointNet++ : Deep Hierarchical Feature Learning on Point Sets in a Metric Space [Code]

[2017CVPR] PointNet : Deep Learning on Point Sets for 3D Classification and Segmentation [Code]

LowLevelVision

Tutorial&Reviews

  • ICCV2019 Tutorial: Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision, Michael S. Brown [Homepage] [Slides]
  • CVPR2016 Tutorial: Understanding the In-Camera Image Processing Pipeline for Computer Vision, Michael S. Brown [Slides]
  • NIPS2011 Tutorial: Modeling the Digital Camera Pipeline: From RAW to sRGB and Back, Michael S Brown [Slides]
  • [2018IJCV] RAW Image Reconstruction Using a Self-contained sRGB–JPEG Image with Small Memory Overhead [Michael S. Brown]
  • [2016CVPR] RAW Image Reconstruction using a Self-Contained sRGB-JPEG Image with only 64 KB Overhead
  • [2014CVPR] Raw-to-raw: Mapping between image sensor color responses

Super-Resolution

[Blog] [深入浅出深度学习超分辨率]( https://mp.weixin.qq.com/s/o-I6T8f4AcETJqlDNZs9ug

A Deep Journey into Super-resolution: A survey, arXiv2019.9

[2020PAMI] Deep Learning for Image Super-resolution: A Survey

[2019IJAC] Deep Learning Based Single Image Super-resolution: A Survey

Densely Residual Laplacian Super-resolution, arXiv2019.7 [Code]

Lightweight Image Super-Resolution with Adaptive Weighted Learning Network, arXiv2019.4 [Code]

[2019SIGG] Handheld Multi-Frame Super-Resolution

[2019CVPR] Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels

[2019CVPR] Zoom To Learn, Learn To Zoom [ProjPage] [Code]

[2019CVPR] Towards Real Scene Super-Resolution with Raw Images [Code]

[2019CVPR] 3D Appearance Super-Resolution with Deep Learning [Code]

[2019CVPR] Learning Parallax Attention for Stereo Image Super-Resolution [Code]

[2019CVPR] Meta-SR: A Magnification-Arbitrary Network for Super-Resolution [github]

[2019CVPRW] Hierarchical Back Projection Network for Image Super-Resolution [Code]

[2019ICCVW] Edge-Informed Single Image Super-Resolution [Code]

[2017CVPRW] Enhanced Deep Residual Networks for Single Image Super-Resolution [Code]

[2016PAMI] [ SRCNN ] Image Super-Resolution Using Deep Convolutional Networks

[2016NIPS] Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections

[2016CVPR] [ ESPCN ] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

[2016CVPR] [ VDSR ] Accurate Image Super-Resolution Using Very Deep Convolutional Networks

[2016ECCV] [ FSRCNN ] Accelerating the Super-Resolution Convolutional Neural Network

[2014ECCV] [ SRCNN ] Learning a Deep Convolutional Network for Image Super-Resolution

Enhancement

  • Diving Deeper into Underwater Image Enhancement: A Survey, arXiv2019.7
  • [2018CVPR] Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs, [Homepage] [Code]
  • [2018CVPR] Classification-Driven Dynamic Image Enhancement
  • [2017CVPR] Forget Luminance Conversion and Do Something Better
  • [2016CVPR] Two Illuminant Estimation and User Correction Preference
  • Low-light Enhancement Repo [github]
  • 基于深度学习的低光照图像增强方法总结(2017-2019) [Note]
  • Learning to see, Antonio Torralba, 2016 [Slides]
  • Attention-guided Low-light Image Enhancement, arXiv2019.8
  • Low-light Image Enhancement Algorithm Based on Retinex and Generative Adversarial Network, arXiv2019.6
  • LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement, arXiv2019.6
  • EnlightenGAN: Deep Light Enhancement without Paired Supervision, arXiv2019.6 [Code]
  • Kindling the Darkness: A Practical Low-light Image Enhancer, arXiv2019.5
  • MSR-net: Low-light Image Enhancement Using Deep Convolutional Network, arXiv2017.11
  • [2019TOG] Handheld Mobile Photography in Very Low Light
  • [2019ICCV] Learning to See Moving Objects in the Dark
  • [2019CVPR] Underexposed Photo Enhancement using Deep Illumination Estimation [Code]
  • [2019CVPR] All-Weather Deep Outdoor Lighting Estimation
  • [2019MMM] Progressive Retinex: Mutually Reinforced Illumination-Noise Perception Network for Low Light Image Enhancement
  • [2018TIP] Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images
  • [2018TMM] Naturalness preserved nonuniform illumination estimation for image enhancement based on retinex
  • [2018PRL] LightenNet: A Convolutional Neural Network for weakly illuminated image enhancement
  • [2018CVPR] Learning to See in the Dark
  • [2018BMVC] MBLLEN: Low-light Image/Video Enhancement Using CNNs
  • [2018BMVC] Deep Retinex Decomposition for Low-Light Enhancement
  • [2018BMVC] Deep Retinex Decomposition for Low-Light Enhancement (Oral) [Proj] [Code]
  • [2017TIP] LIME: Low-light image enhancement via illumination map estimation
  • [2017PR] LLNet: A deep autoencoder approach to natural low-light image enhancement [Code] [Code2]
  • [2017CVPR] Deep Outdoor Illumination Estimation
  • [2016TIP] LIME: Low-light Image Enhancement via Illumination Map Estimation
  • [2016ECCV] Deep Specialized Network for Illuminant Estimation

Reflection Removal

  • [2019CVPR] Single Image Reflection Removal Beyond Linearity
  • [2019CVPR] Reflection Removal Using A Dual-Pixel Sensor
  • [2013ICCV] Exploiting Reflection Change for Automatic Reflection Removal
  • Deep Learning on Image Denoising: An overview, arXiv2020.1 [Proj]
  • [2020NN] Attention-guided CNN for image denoising [Code]
  • [2019CVPR] Toward Convolutional Blind Denoising of Real Photographs
  • [SelfDeblur] Neural Blind Deconvolution Using Deep Priors, arXiv2019. 8 [Code]
  • [2019ICCV] DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better [Code]
  • [2018CVPR] DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks [Code]
  • [2018CVPR] Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
  • Single Image Deraining Rain Removal
  • [2019CVPR] Single Image Deraining: A Comprehensive Benchmark Analysis
  • [2018ECCV] Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining [Code]
  • Image inpainting: A review, arXiv2019.9
  • Consistent Generative Query Networks, arXiv2019.4 [Proj]
  • [2019Scirobotics] Emergence of exploratory look-around behaviors through active observation completion [Proj]
  • [2019ICCV] An Internal Learning Approach to Video Inpainting [Homepage] [Code] [Note]
  • [2019ICCV] StructureFlow: Image Inpainting via Structure-aware Appearance Flow [Code]
  • [2018Science] [GQN] Neural scene representation and rendering, DeepMind [Code] [Note]
  • [2018CVPR] Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks [Code]
  • [2018CVPR] Deep Image Prior [github] [Note]
  • [2018Proj] Painting outside the box: image outpainting with GANs, Mark Sabini, Stanford CS230 Project, arXiv2018.8 [Code] [PDF] [Model] [Note]

Image/Video Transfer

Style Transfer Scholar: Dongdong Chen Dmitry Ulyanov

[2018TOG] Progressive Color Transfer with Dense Semantic Correspondences ⭐️⭐️⭐️⭐️

[2017CVPR] Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

[2016ICML] Texture Networks: Feed-forward Synthesis of Textures and Stylized Images [IN] [Code] [Slides]

[2016CVPR] Image Style Transfer Using Convolutional Neural Networks, Gatys [Code]

[2016ECCV] Perceptual Losses for Real-Time Style Transfer and Super-Resolution

[2015] A neural algorithm of artistic style, Gatys, arXiv2015.9 [Code]

Blending/Fusion

  • Deep Image Blending, arXiv201910 [Code]
  • [2019MMM] GP-GAN: Towards Realistic High-Resolution Image Blending [Code] [Homepage]
  • [2018ECCV] Learning to Blend Photos [Homepage]
  • [2018SIGGA] Deep Blending for Free-Viewpoint Image-Based Rendering [Homepage]

Pedestrain/Crowd

PedestrainDetection

Pedestrain Detection collection

Deep Learning for Person Re-identification: A Survey and Outlook, arXiv2020.1 [Code]

Pedestrain Attribute Recognition: A Survey, arXiv2019.1 [Proj]

CrowdHuman: A Benchmark for Detecting Human in a Crowd, arXiv201804 [Proj] [Note]

PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes, arXiv2019.9

[2020TMM/2019CVPRW] Bag of Tricks and A Strong Baseline for Deep Person Re-identification [Code]

[2019ICCV] Mask-Guided Attention Network for Occluded Pedestrian Detection [Code]

[2019CVPR] VRSTC: Occlusion-Free Video Person Re-Identification [ occlusion ]

[2018CVPR] Repulsion Loss: Detecting Pedestrians in a Crowd, CVPR2018 [ occlusion ]

[2016ECCV] Stacked Hourglass Networks for Human Pose Estimation

CrowdCounting

awesome crowd counting

Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detection, arXiv2019.6 [Code]

W-Net: Reinforced U-Net for Density Map Estimation, arXiv2019.3 [Unofficial Code]

[2019TIP] HA-CCN: Hierarchical Attention-based Crowd Counting Network

[2019ICCV] Bayesian Loss for Crowd Count Estimation with Point Supervision [Code]

[2019ICCV] Crowd Counting with Deep Structured Scale Integration Network (oral) [github]

[2019ICCV] Learning Spatial Awareness to Improve Crowd Counting (oral)

[2019ICCV] Perspective-Guided Convolution Networks for Crowd Counting [Code] [Dataset]

[2019ICCV] Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

[2019ICCV] Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method

[2019ICCV] Counting with Focus for Free [Code]

[2019ICCVW] Crowd Counting on Images with Scale Variation and Isolated Clusters

[2019CVPR] Learning from Synthetic Data for Crowd Counting in the Wild [Homepage] [Dataset]

[2019MMM] Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

[2019ICME] Locality-constrained Spatial Transformer Network for Video Crowd Counting

[2019SciAdvance] Number detectors spontaneously emerge ina deep neural network designed for visual object recognition [Note]

[2019TII] Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks [Code]

[2018MICCAIW] Microscopy Cell Counting with Fully Convolutional Regression Networks [Code]

[2010NIPS] Learning to count objects in images [Code]

GenerativeNet

GAN学习路线图:论文、应用、课程、书籍大总结 [Page] 深度学习中最常见GAN模型概览 : GAN,DCGAN,CGAN,infoGAN,ACGAN,CycleGAN,StackGAN ... Blog: One Day One GAN

Training Tricks

How to Train a GAN? Tips and tricks to make GANs work [Page]

Start from NIPS2016, 17 GAN tricks, by Soumith Chintala, Emily Denton, Martin Arjovsky, Michael Mathieu. How to Train a GAN, NeurIPS2016

Top highlight Advances in Generative Adversarial Networks (GANs): A summary of the latest advances in Generative Adversarial Networks [Page] [Note]

Keep Calm and train a GAN. Pitfalls and Tips on training Generative Adversarial Networks [Page]

Image Augmentations for GAN Training. arXiv202006

[Blogg] A Beginner's Guide to Generative Adversarial Networks (GANs) , 2019

Generative Adversarial Networks: A Survey and Taxonomy, arXiv2020.2 [GANReview]

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications, arXiv202001

[2019ACMCS] How Generative Adversarial Networks and Their Variants Work: An Overview

StarGAN v2: Diverse Image Synthesis for Multiple Domains. arXiv201912 [Code]

This dataset does not exist: training models from generated images, arXiv2019.11

Landmark Assisted CycleGAN for Cartoon Face Generation. arXiv201907

Maximum Entropy Generators for Energy-Based Models, arXiv2019.5 [Code]

[2019NIPS] Few-shot Video-to-Video Synthesis [Code]

[2019NIPS] [ vid2vid ] Video-to-Video Synthesis [Code]

[2019CVPR] Semantic Image Synthesis with Spatially-Adaptive Normalization [Proj] [Code]

[2019CVPR] [ seg2vid ] Video Generation from Single Semantic Label Map [Code]

[2019BMVC] The Art of Food: Meal Image Synthesis from Ingredients

[2018ICLR] Spectral Normalization for Generative Adversarial Networks [Code] [Supp1] [Supp2]

[2018CVPR] [ pix2pixHD ] High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

[2018CVPR] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (oral) [Code]

[2018ECCV] [FE-GAN] Fashion Editing with Multi-scale Attention Normalization [Notes]

[2018ECCV] Image Inpainting for Irregular Holes Using Partial Convolutions [Code] [Code2] [used for DeepNude]

[2017ICCV] [ CycleGAN ] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [Proj]

[2017CVPR] [ Pix2Pix ] Image-to-Image Translation with Conditional Adversarial Networks [Demo]

[2016ICLR] [ DCGAN ] Unsupervised representation learning with deep convolutional generative adversarial networks

[2016ICML] A Theory of Generative ConvNet [S-C Zhu] [Proj/Code]

[2014NIPS] Generative Adversarial Nets

Video Understanding

[YOWO] You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization arXiv201911 [Code]

[2019CVPR] Learning Video Representations from Correspondence Proposals

现有的视频深度学习架构通常依赖于三维卷积、自相关、非局部模块等运算,这些运算难以捕捉视频中帧间的长程运动/相关性,该文提出的CPNet学习视频中图片之间的长程对应关系,来解决现有方法在处理视频长程运动中的局限性.

Video Object Detection

  • Object Detection in Video with Spatial-temporal Context Aggregation, arXiv2019.7
  • Looking Fast and Slow: Memory-Guided Mobile Video Object Detection, arXiv2019.3 [TF] [PyTorch]
  • [2019ICCV] [ MGAN ] Motion Guided Attention for Video Salient Object Detection
  • [2019CVPR] Shifting More Attention to Video Salient Objection Detection [Code]
  • [2019CVPR] Activity Driven Weakly Supervised Object Detection [Code]
  • [2019SysML] AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling
  • [2019KDDW] Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings" [Notes]]( https://flashgene.com/archives/28803.html )
  • [2018CVPR] Mobile Video Object Detection With Temporally-Aware Feature Maps

Video Object segmentation

[2016CVPR] A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

[2019ICCV] RANet: Ranking Attention Network for Fast Video Object Segmentation

[2019CVPR] See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks [Code]

[2019CVPR] Improving Semantic Segmentation via Video Propagation and Label Relaxation [Code]

  • Deep Learning Papers Reading Roadmap

Optimization

Summary of SGD,AdaGrad,Adadelta,Adam,Adamax,Nadam Why Momentum Really Works, 2017

Optimization for deep learning: theory and algorithms. arXiv201912 [[OptimizationCourse]]( Optimization Theory for Deep Learning )

Why Adam Beats SGD for Attention Models. arXiv201912

Momentum Contrast for Unsupervised Visual Representation Learning, Kaiming He arXiv2019.11

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources, Amazon, arXiv2019.5

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, arXiv2018.4 [Notes]

[2019NIPS] Uniform convergence may be unable to explain generalization in deep learning

[2019NIPS] Understanding the Role of Momentum in Stochastic Gradient Methods

[2019NIPS] Lookahead optimizer: k steps forward, 1 step back [Code] [Pytorch] [TF]

[2019ICLR] [AdaBound] Adaptive gradient methods with dynamic bound of learning rate [Pytorch] [TF-example]

AdaBound combines SGD and Adam to make it fast as Adam at training start and convergence like SGD later. Usage: require Python 3.6+, and pip install: pip install adabound, and then: optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1). Version of TensorFlow is coming.

[2019CVPRW] The Indirect Convolution Algorithm

[2019ISCAW] Accelerated CNN Training Through Gradient Approximation

Fast training for neural networks, You Yang, Jiangmen Talk [Video] Training Tricks in Object Detection
  • Student Specialization in Deep ReLU Networks With Finite Width and Input Dimension, arXiv2019.11
  • Accelerating CNN Training by Sparsifying Activation Gradients, arXiv2019.8
  • Luck Matters: Understanding Training Dynamics of Deep ReLU Networks, arXiv2019.6 [Code]
  • Bag of Freebies for Training Object Detection Neural Networks, Amazon, arXiv2019.4 [[Code]]( https://github.com/dmlc/gluon-cv\ )
  • Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR2020Review
  • [2019ICCV] Rethinking ImageNet Pre-training, FAIR [Notes]
  • [2019CVPR] Bag of Tricks for Image Classification with Convolutional Neural Networks, Amazon [Code] [Note]
  • [2019CVPR] Accelerating Convolutional Neural Networks via Activation Map Compression
  • [2019CVPR] RePr: Improved Training of Convolutional Filters [Note]
  • [2019BMVC] Dynamic Neural Network Channel Execution for Efficient Training
  • [2018ICPP] Imagenet training in minutes
[Blog] 深度学习中的激活函数 Dead Relu [Notes]

[2019CVPR] Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem (oral) [Code]

[2018] [ GELU ] Gaussian Error Linear Units (GELUs). arXiv201811 [Note]

[2016ICML] [CReLU] Understanding and improving convolutional neural networks via concatenated rectified linear units

[2015ICCV] [PReLU-Net/msra Initilization] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Normalization

Normalization Scholar: Ping Luo [Blog] Introduction to Normalization [Page] [Note] [Blog] Introduction to BN/LN/IN/GN [Page] [Page2] [Talk] Devils in BatchNorm, Jiangmen Talk, 2019 [Page] [Blog] An Overview of Normalization Methods in Deep Learning, 2018.11 [Page]

Attentive Normalization. [Tianfu Wu] arXiv2019.11 [Code]

Network Deconvolution. [a alternative to Batch Normalization]. arXiv2019.9 [Proj]

Weight Standardization. arXiv2019.3 [Code]

[ IN ] Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv2017.11 [Code]

[ LN ] Layer Normalization. [Hinton] arXiv2016.7 [Note]

[2019NIPS] Understanding and Improving Layer Normalization

[2019NIPS] Positional Normalization [Code] [Supp]

[2018NIPS] How Does Batch Normalization Help Optimization? [arXiv19v] [Ref]

[2018NIPS] [ BIN ] Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks [Code]

[2018ECCV] [ GN ] Group normalization

[2017NIPS] Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

[2016NIPS] [ WN ] Weight normalization: A simple reparameterization to accelerate training of deep neural networks

[2015ICML] [ BN ] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Dropout [Note1] [Note2]

[2014JMLR] Dropout: a simple way to prevent neural networks from overfitting

[2012NIPS] ImageNet Classification with Deep Convolutional Neural Networks

Augmentation

[Blog] Research Guide: Data Augmentation for Deep Learning . 201910 [Blog] Data Augmentation: How to use Deep Learning when you have Limited Data . 201805 [Page]

[2019JBD] A survey on Image Data Augmentation for Deep Learning. [PDF] [Notes]

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data. arXiv2019.11

FMix: Enhancing Mixed Sample Data Augmentation arXiv202006 [Code]

GridMask Data Augmentation. arXiv202001 [Code] [Note]

Let’s Get Dirty: GAN Based Data Augmentation for Soiling and Adverse Weather Classification in Autonomous Driving. arXiv2019.12

Faster AutoAugment: Learning augmentation strategies using backpropagation. arXiv201911

Automatic Data Augmentation by Learning the Deterministic Policy. arXiv201910

Greedy AutoAugment, arXiv2019.8

Safe Augmentation: Learning Task-Specific Transformations from Data, arXiv2019.7 [Code]

Learning Data Augmentation Strategies for Object Detection. arXiv201906 [Code]

[2020ICLR] AugMix : A Simple Data Processing Method to Improve Robustness and Uncertainty [Code]

[2019NIPS] Implicit Semantic Data Augmentation for Deep Networks

[2019NIPS] Fast AutoAugment

[2019ICML] Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules [Code] [Examples]

[2019ICCV] CutMix : Regularization Strategy to Train Strong Classifiers with Localizable Features [Code]

[2019ICCVW] Occlusions for Effective Data Augmentation in Image Classification

[2019ICCVW] Style Augmentation : Data Augmentation via Style Randomization

[2019CVPR] AutoAugment : Learning Augmentation Policies from Data [Code]

[2018ICLR] Mixup : Beyond empirical risk minimization

[2018ACML] RICAP : Random Image Cropping and Patching Data Augmentation for Deep CNNs [Code]

[2018ICANN] Further advantages of data augmentation on convolutional neural networks (best paper)

[Blog] 从Softmax到AMSoftmax [Blog] Convolutional Neural Networks Structure [Blog] A Survey of the Recent Architectures of Deep Convolutional Neural Networks , 2019 [Blog] CNN下/上采样详析
  • A closer look at network resolution for efficient network design. arXiv201909 [Code]
  • [2019NIPS] Is Deeper Better only when Shallow is Good? [Code]
  • [2015Nature] Deep Learning Review
  • [2014BMVC] Return of the Devil in the Details: Delving Deep into Convolutional Nets

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection, arXiv201906

Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation, arXiv201901

[2020AAAI] Revisiting Bilinear Pooling: A coding Perspective [Note]

[2019ICCV] LIP: Local Importance-based Pooling [Code] [Notes]

[2018ECCV] Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification

[2017CVPR] Low-rank bilinear pooling for fine-grained classification

[2016EMNLP] Multimodal compact bilinear pooling for visual question answering and visual grounding

[2016CVPR] Compact bilinear pooling

[2015ICCV] [bilinear pooling] Bilinear CNN Models for Fine-grained Visual Recognition

[2012ECCV] Semantic segmentation with second-order pooling

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference arXiv201912

Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator, arXiv201911

Rethinking the Number of Channels for the Convolutional Neural Network, arXiv201909

AutoGrow: Automatic Layer Growing in Deep Convolutional Networks, arXiv201909 [Code]

Mapped Convolutions. [For 2D/3D/Spherical]. arXiv201906 [Code]

Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks. arXiv201905 [Code] [Note]

[2019ICCV] ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [Code]

[2019CVPRW] Convolutions on Spherical Images

[2017ICML] Warped Convolutions : Efficient Invariance to Spatial Transformations

Attention module

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, arXiv201910 [Code] [Chinese]

[2020ICLR] On the Relationship between Self-Attention and Convolutional Layers [Proj] [Code] [Intro]

[2019TIP] Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition [Code]

[2017CVPR] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

  • Comb Convolution for Efficient Convolutional Architecture. arXiv201911
  • [2019ICML] EfficientNet : Rethinking Model Scaling for Convolutional Neural Networks [Code]
  • [2019ICCVW] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [Code]
  • [2018CSVT] [ RoR ] Residual Networks of Residual Networks: Multilevel Residual Networks
  • [2018CVPR] [ SENet ] Squeeze-and-excitation networks
  • [2017ICLR] FractalNet: Ultra-Deep Neural Networks without Residuals
  • [2017CVPR] [PyramidNet] Deep Pyramidal Residual Networks
  • [2017CVPR] [ DenseNet ] Densely Connected Convolutional Networks
  • [2017CVPR] [ ResNeXt ] Aggregated Residual Transformations for Deep Neural Networks
  • [2017CVPR] Xception : Deep Learning with Depthwise Separable Convolutions
  • [2017CVPR] PolyNet: A Pursuit of Structural Diversity in Very Deep Networks [Slides]
  • [2017AAAI] Inception-v4 , Inception-ResNet and the Impact of Residual Connections on Learning
  • [2016CVPR] [ ResNet ] Deep Residual Learning for Image Recognition [Note1] [Note2]
  • [2016CVPR] [ Inception-v3 ] Rethinking the Inception Architecture for Computer Vision
  • [2016ECCV] Good Practices for Deep Feature Fusion
  • [2016ECCV] Deep Networks with Stochastic Depth
  • [2016ECCV] [ Identity ResNet ] Identity Mappings in Deep Residual Networks [Over 1000 Layers ]
  • [2016ICLRW] ResNet in ResNet: Generalizing Residual Architectures
  • [2015NIPS] [ STN ] Spatial Transformer Networks
  • [2015ICML] [ BN-Inception /Inception-v2 ] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • [2015CVPR] [ GoogLeNet/Inception-v1 ] Going Deeper with Convolutions
  • [2015ICLR] [ VGGNet ] Very Deep Convolutional Networks for Large-Scale Image Recognition
  • [2014ICLR] [ NIN ] Network in Network
  • [2014ECCV] [ ZFNet ] Visualizing and Understanding Convolutional Networks
  • [2014MMM] [ CaffeNet ] Caffe: Convolutional Architecture for Fast Feature Embedding
  • [2012NIPS] [ AlexNet ] Imagenet classification with deep convolutional neural networks
  • [1998ProcIEEE] [ LeNet ] Gradient-Based Learning Applied to Document Recognition [LeNet Notes]

Light-weightCNN

[Blog] Introduction of light-weight CNN [Blog] Lightweight convolutional neural network: SqueezeNet、MobileNet、ShuffleNet、Xception

SeesawNet: Convolution Neural Network With Uneven Group Convolution. arXiv201912 [Code]

HGC: Hierarchical Group Convolution for Highly Efficient Neural Network, arXiv201906

[2020CVPR] GhostNet : More Features from Cheap Operations [Code]

[2019CVPRW] Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

[2019BMVC] MixNet: Mixed Depthwise Convolutional Kernels [Code] [Notes]

[2018NIPS] ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions [Code]

[2018NIPS] Learning Versatile Filters for Efficient Convolutional Neural Networks [Code]

[2018BMVC] IGCV3 : Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks [Code] [Pytorch]

[2018CVPR] IGCV2 : Interleaved Structured Sparse Convolutional Neural Networks

[2017ICCV] [ IGVC1 ] Interleaved Group Convolutions for Deep Neural Networks

MobileNet Series:

[Blog] Introduction for MobileNet and Its Variants

[2019ICCV] Searching for MobileNetV3. [Note]

[2018CVPR] MobileNetV2: Inverted Residuals and Linear Bottlenecks. [Note]

[2017] MobileNets : Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv201704

ShuffleNet Series [Note]

[Code] ShuffleNet Series by Megvii: ShuffleNetV1, V2/V2+/V2.Large/V2.ExLarge, OneShot, DetNAS

[2018ECCV] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

[2018CVPR] ShuffleNet : An Extremely Efficient Convolutional Neural Network for Mobile Devices

Interpretation

[Blog] 深度神经网络可解释性方法汇总(附TF代码实现)
  • Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey. arXiv2019.11
  • [2019NIPS] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent [Code] [Note]
  • [2019NIPS] Weight Agnostic Neural Networks (spotlight). [Proj] [Note]
  • [2018AAAI] Interpreting CNN Knowledge via An Explanatory Graph
  • [2018Acces] Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)

RetinaFace: Single-stage Dense Face Localisation in the Wild. arXiv201905 [Code-MXNet] [Code-TF]

[2019CVPR] Group Sampling for Scale Invariant Face Detection [Note]

[2019ICCV] Learning to Paint with Model-based Deep Reinforcement Learning [Code] [Note]

[2019ICCV] Fashion++: Minimal Edits for Outfit Improvement (FAIR) [Proj] [Code]

[2019ICCV] SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition [Code&Dataset]

[2018BMVC] Learning Geo-Temporal Image Features [Proj]

  • [2018ISMIR] MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer [Code]
  • Music continue: MuseNet Bach-AI-Music-Google Generating Piano Music with Transformer by Google
  • On the Measure of Intelligence. arXiv201911 [Intro]

Unsupervised Learning

  • A Simple Framework for Contrastive Learning of Visual Representations. arXive202002
  • Pose Trainer: Correcting Exercise Posture using Pose Estimation, arXiv github

AI+Application

  • MetNet: A Neural Weather Model for Precipitation Forecasting, arXiv202003 [Blog] Intro
  • DOI: 10.29099/IJAIR.V2I1.42
  • Corpus ID: 57808356

Computer Vision and Image Processing: A Paper Review

  • Victor Wiley , Thomas Lucas
  • Published in International Journal of… 1 June 2018
  • Computer Science

Figures and Tables from this paper

table 1

160 Citations

Fusion application of computer vision and image transmission in digital image art design, simulation of computer image recognition technology based on image feature extraction, advancements and integration in computer vision and graphics systems, neural networks based object detection techniques in computer vision, information extraction from operator interface images using computer vision and machine learning, the use of computer vision and data mining in obtaining subconscious user experience, multi target facial recognition system using deep learning, texture and flag color extraction in backpropagation neural network architecture, information hiding in images using neural network, computer vision applications in construction and asset management phases: a literature review, 53 references, automated design of a computer vision system for visual food quality evaluation, atoms of recognition in human and computer vision, unsupervised texture segmentation using gabor filters, multi-classification of pizza using computer vision and support vector machine, texture segmentation using 2-d gabor elementary functions, a new feedback-based method for parameter adaptation in image processing routines, face recognition/detection by probabilistic decision-based neural network, face detection and precise eyes location, gabor filters as texture discriminator, edgeflow: a technique for boundary detection and image segmentation, related papers.

Showing 1 through 3 of 0 Related Papers

essay on computer vision

International Journal of Computer Vision

International Journal of Computer Vision (IJCV) details the science and engineering of this rapidly growing field. Regular articles present major technical advances of broad general interest. Survey articles offer critical reviews of the state of the art and/or tutorial presentations of pertinent topics.

Coverage includes:

- Mathematical, physical and computational aspects of computer vision: image formation, processing, analysis, and interpretation; machine learning techniques; statistical approaches; sensors.

- Applications: image-based rendering, computer graphics, robotics, photo interpretation, image retrieval, video analysis and annotation, multi-media, and more.

- Connections with human perception: computational and architectural aspects of human vision.

The journal also features book reviews, position papers, editorials by leading scientific figures, as well as additional on-line material, such as still images, video sequences, data sets, and software. Please note: the median time indicated below is computed over all the submitted manuscripts including the ones that are not put into the review pipeline at the onset of the review process. The typical time to first decision for manuscripts is approximately 96 days.  

  • Yasuyuki Matsushita,
  • Jiri Matas,
  • Svetlana Lazebnik

essay on computer vision

Latest issue

Volume 132, Issue 9

Latest articles

Guest editorial: special issue on open-world visual recognition.

  • Ming-Hsuan Yang

Learning General and Specific Embedding with Transformer for Few-Shot Object Detection

  • Dacheng Tao

essay on computer vision

Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing

  • Jiashi Feng

essay on computer vision

AROID: Improving Adversarial Robustness Through Online Instance-Wise Data Augmentation

  • Jianing Qiu
  • Michael Spratling

essay on computer vision

IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection

essay on computer vision

Journal updates

Special issue guidelines.

Guidelines for IJCV special issue papers and proposals

International Journal of Computer Vision Outstanding Reviewer awards 2023

In 2023, the International Journal of Computer Vision created an “Outstanding Reviewer award” program in order to celebrate and highlight reviewers in our community that provided exceptional service to the journal with their reviews.

Call for Papers: Special Issue on Visual Datasets

Guest editors:  Xin Zhao, Liang Zheng, Qiang Qiu, Yin Li, Limin Wang, Jose Lezama, Qiuhong Ke, Yongchan Kwon, Ruoxi Jia, Jungong Han Submission deadline:  30 September 2024

Call for Papers: Special Issue on Computer Vision Approaches for Animal Tracking and Modeling 2024

Guest editors:  Anna Zamansky, Helge Rhodin, Silvia Zuffi, Hyun Soo Park, Sara Beery, Angjoo Kanazawa, Shohei Nobuhara, Urs Waldmann, Shangzhe Wu, Gengshan Yang Submission deadline:  extended to 20 September 2024

Journal information

  • ACM Digital Library
  • Current Contents/Engineering, Computing and Technology
  • EI Compendex
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer Science+Business Media, LLC, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

Home / Essay Samples / Information Science and Technology / Artificial Intelligence / Applications Of Computer Vision

Applications Of Computer Vision

  • Category: Information Science and Technology
  • Topic: Artificial Intelligence , Computer

Pages: 2 (745 words)

Views: 1745

  • Downloads: -->

Applications of Computer Vision

Retail and retail security, agriculture.

  • General Motors GM
  • Daimler-Bosch
  • BMW-Intel-FCA
  • Renault-Nissan-Mitsubishi Alliance.

Healthcare. Gauss Surgical

--> ⚠️ Remember: This essay was written and uploaded by an--> click here.

Found a great essay sample but want a unique one?

are ready to help you with your essay

You won’t be charged yet!

Internet Essays

Negative Impact of Technology Essays

Cloud Computing Essays

Artificial Intelligence Essays

Cyber Security Essays

Related Essays

We are glad that you like it, but you cannot copy from our website. Just insert your email and this sample will be sent to you.

By clicking “Send”, you agree to our Terms of service  and  Privacy statement . We will occasionally send you account related emails.

Your essay sample has been sent.

In fact, there is a way to get an original essay! Turn to our writers and order a plagiarism-free paper.

samplius.com uses cookies to offer you the best service possible.By continuing we’ll assume you board with our cookie policy .--> -->