U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Facial-recognition algorithms: A literature review

Affiliations.

  • 1 Centre for Systems Biology and Bioinformatics, Panjab University, Chandigarh, India.
  • 2 Department of Anthropology, Panjab University, Chandigarh, India.
  • 3 Department of Statistics, Panjab University, Chandigarh, India.
  • 4 Department of Forensic Medicine and Toxicology, All India Institute of Medical Sciences, Jodhpur, India.
  • PMID: 31964224
  • DOI: 10.1177/0025802419893168

Keywords: Forensic science; appearance-based methods; biometrics; computer-based facial recognition; criminalistics; human face; knowledge-based methods.

PubMed Disclaimer

Similar articles

  • Changing the face of neuroimaging research: Comparing a new MRI de-facing technique with popular alternatives. Schwarz CG, Kremers WK, Wiste HJ, Gunter JL, Vemuri P, Spychalla AJ, Kantarci K, Schultz AP, Sperling RA, Knopman DS, Petersen RC, Jack CR Jr; Alzheimer's Disease Neuroimaging Initiative. Schwarz CG, et al. Neuroimage. 2021 May 1;231:117845. doi: 10.1016/j.neuroimage.2021.117845. Epub 2021 Feb 11. Neuroimage. 2021. PMID: 33582276 Free PMC article.
  • Deep Spiking Neural Network for Video-Based Disguise Face Recognition Based on Dynamic Facial Movements. Liu D, Bellotto N, Yue S. Liu D, et al. IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):1843-1855. doi: 10.1109/TNNLS.2019.2927274. Epub 2019 Jul 19. IEEE Trans Neural Netw Learn Syst. 2020. PMID: 31329135
  • Likelihood Ratios for Deep Neural Networks in Face Comparison. Macarulla Rodriguez A, Geradts Z, Worring M. Macarulla Rodriguez A, et al. J Forensic Sci. 2020 Jul;65(4):1169-1183. doi: 10.1111/1556-4029.14324. Epub 2020 May 12. J Forensic Sci. 2020. PMID: 32396227 Free PMC article.
  • Individual Identification in Facial Appearance Biometrics Based on Macroscopical Comparison. Huo DM, Mo WW, Zhao FM, Zhou ZH, DU M, Zheng JL, Ma KJ. Huo DM, et al. Fa Yi Xue Za Zhi. 2022 Jun 25;38(3):308-313. doi: 10.12116/j.issn.1004-5619.2020.200909. Fa Yi Xue Za Zhi. 2022. PMID: 36221818 Review. Chinese, English.
  • Parasites in Forensic Science: a historic perspective. Cardoso R, Alves H, Richter J, Botelho MC. Cardoso R, et al. Ann Parasitol. 2017;63(4):235-241. doi: 10.17420/ap6304.110. Ann Parasitol. 2017. PMID: 29385324 Review.
  • Applications of forensic anthropology methodology: accuracy of virtual face reproductions performed on the Tenchini collection. Donato L, Ubelaker DH, Marsella LT, Bugelli V, Camatti J, Treglia M, Cecchi R. Donato L, et al. Forensic Sci Med Pathol. 2024 May 25. doi: 10.1007/s12024-024-00839-y. Online ahead of print. Forensic Sci Med Pathol. 2024. PMID: 38795264
  • Revolutionizing Dental Care: A Comprehensive Review of Artificial Intelligence Applications Among Various Dental Specialties. Alzaid N, Ghulam O, Albani M, Alharbi R, Othman M, Taher H, Albaradie S, Ahmed S. Alzaid N, et al. Cureus. 2023 Oct 14;15(10):e47033. doi: 10.7759/cureus.47033. eCollection 2023 Oct. Cureus. 2023. PMID: 37965397 Free PMC article. Review.
  • Facial recognition lock technology for social care settings: A qualitative evaluation of implementation of facial recognition locks at two residential care sites. Bradwell HL, Edwards KJ, Baines R, Page T, Chatterjee A, Jones RB. Bradwell HL, et al. Front Digit Health. 2023 Mar 3;5:1066327. doi: 10.3389/fdgth.2023.1066327. eCollection 2023. Front Digit Health. 2023. PMID: 36937251 Free PMC article.
  • Application of artificial intelligence to the public health education. Wang X, He X, Wei J, Liu J, Li Y, Liu X. Wang X, et al. Front Public Health. 2023 Jan 10;10:1087174. doi: 10.3389/fpubh.2022.1087174. eCollection 2022. Front Public Health. 2023. PMID: 36703852 Free PMC article. Review.
  • Fish Fingerprinting: Identifying Crude Oil Pollutants using Bicyclic Sesquiterpanes (Bicyclanes) in the Tissues of Exposed Fish. Spilsbury FD, Scarlett AG, Rowland SJ, Nelson RK, Spaak G, Grice K, Gagnon MM. Spilsbury FD, et al. Environ Toxicol Chem. 2023 Jan;42(1):7-18. doi: 10.1002/etc.5489. Epub 2022 Nov 18. Environ Toxicol Chem. 2023. PMID: 36165563 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Research Article

Systematic Literature Review on the Accuracy of Face Recognition Algorithms

  • @ARTICLE{10.4108/eetiot.v8i30.2346, author={Marcos Agenor Lazarini and Rog\^{e}rio Rossi and Kechi Hirama}, title={Systematic Literature Review on the Accuracy of Face Recognition Algorithms}, journal={EAI Endorsed Transactions on Internet of Things}, volume={8}, number={30}, publisher={EAI}, journal_a={IOT}, year={2022}, month={9}, keywords={Accuracy, Convolutional Neural Networks, Facial Recognition, Viola-Jones Algorithm}, doi={10.4108/eetiot.v8i30.2346} }
  • Marcos Agenor Lazarini Rogério Rossi Kechi Hirama Year: 2022 Systematic Literature Review on the Accuracy of Face Recognition Algorithms IOT EAI DOI: 10.4108/eetiot.v8i30.2346
  • 1: Centro Universitário da FEI
  • 2: Universidade de São Paulo

Real-time facial recognition systems have been increasingly used, making it relevant to address the accuracy of these systems given the credibility and trust they must offer. Therefore, this article seeks to identify the algorithms currently used by facial recognition systems through a Systematic Literature Review that considers recent scientific articles, published between 2018 and 2021. From the initial collection of ninety-three articles, a subset of thirteen was selected after applying the inclusion and exclusion procedures. One of the outstanding results of this research corresponds to the use of algorithms based on Artificial Neural Networks (ANN) considered in 21% of the solutions, highlighting the use of Convolutional Neural Network (CNN). Another relevant result is the identification of the use of the Viola-Jones algorithm, present in 19% of the solutions. In addition, from this research, two specific facial recognition solutions associated with access control were found considering the principles of the Internet of Things, one being applied to access control to environments and the other applied to smart cities.

Copyright © 2022 M. A. Lazarini et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NCSA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Alice j. o’toole.

1 School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA;

Carlos D. Castillo

2 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;

Deep learning models currently achieve human levels of performance on real-world face recognition tasks. We review scientific progress in understanding human face processing using computational approaches based on deep learning. This review is organized around three fundamental advances. First, deep networks trained for face identification generate a representation that retains structured information about the face (e.g., identity, demographics, appearance, social traits, expression) and the input image (e.g., viewpoint, illumination). This forces us to rethink the universe of possible solutions to the problem of inverse optics in vision. Second, deep learning models indicate that high-level visual representations of faces cannot be understood in terms of interpretable features. This has implications for understanding neural tuning and population coding in the high-level visual cortex. Third, learning in deep networks is a multistep process that forces theoretical consideration of diverse categories of learning that can overlap, accumulate over time, and interact. Diverse learning types are needed to model the development of human face processing skills, cross-race effects, and familiarity with individual faces.

1. INTRODUCTION

The fields of vision science, computer vision, and neuroscience are at an unlikely point of convergence. Deep convolutional neural networks (DCNNs) now define the state of the art in computer-based face recognition and have achieved human levels of performance on real-world face recognition tasks ( Jacquet & Champod 2020 , Phillips et al. 2018 , Taigman et al. 2014 ). This behavioral parity allows for meaningful comparisons of representations in two successful systems. DCNNs also emulate computational aspects of the ventral visual system ( Fukushima 1988 , Krizhevsky et al. 2012 , LeCun et al. 2015 ) and support surprisingly direct, layer-to-layer comparisons with primate visual areas ( Yamins et al. 2014 ). Nonlinear, local convolutions, executed in cascaded layers of neuron-like units, form the computational engine of both biological and artificial neural networks for human and machine-based face recognition. Enormous numbers of parameters, diverse learning mechanisms, and high-capacity storage in deep networks enable a wide variety of experiments at multiple levels of analysis, from reductionist to abstract. This makes it possible to investigate how systems and subsystems of computations support face processing tasks.

Our goal is to review scientific progress in understanding human face processing with computational approaches based on deep learning. As we proceed, we bear in mind wise words written decades ago in a paper on science and statistics: “All models are wrong, but some are useful” ( Box 1979 , p. 202) (see the sidebar titled Perspective: Theories and Models of Face Processing and the sidebar titled Caveat: Iteration Between Theory and Practice ). Since all models are wrong, in this review, we focus on what is useful. For present purposes, computational models are useful when they give us insight into the human visual and perceptual system. This review is organized around three fundamental advances in understanding human face perception, using knowledge generated from deep learning models. The main elements of these advances are as follows.

PERSPECTIVE: THEORIES AND MODELS OF FACE PROCESSING

Box (1976) reminds us that scientific progress comes from motivated iteration between theory and practice. In understanding human face processing, theories should be used to generate the questions, and machines (as models) should be used to answer the questions. Three elemental concepts are required for scientific progress. The first is flexibility. Effective iteration between theory and practice requires feedback between what the theory predicts and what the model reveals. The second is parsimony. Because all models are wrong, excessive elaboration will not find the correct model. Instead, economical descriptions of a phenomenon should be preferred over complex descriptions that capture less fundamental elements of human perception. Third, Box (1976 , p. 792) cautions us to avoid “worrying selectivity” in model evaluation. As he puts it, “since all models are wrong, the scientist must be alert to what is importantly wrong.”

These principles represent a scientific ideal, rather than a reality in the field of face perception by humans and machines. Applying scientific principles to computational modeling of human face perception is challenging for diverse reasons (see the sidebar titled Caveat: Iteration Between Theory and Practice below). We argue, as Cichy & Kaiser (2019) have, that although the utility of scientific models is usually seen in terms of prediction and explanation, their function for exploration should not be underrated. As scientific models, DCNNs carry out high-level visual tasks in neurally inspired ways. They are at a level of development that is ripe for exploring computational and representational principles that actually work but are not understood. This is a classic problem in reverse engineering—yet the use of deep learning as a model introduces a dilemma. The goal of reverse engineering is to understand how a functional but highly complex system (e.g., the brain and human visual system) solves a problem (e.g., recognizes a face). To accomplish this, a well-understood model is used to test hypotheses about the underlying mechanisms of the complex system. A prerequisite of reverse engineering is that we understand how the model works. Failing that, we risk using one poorly understood system to test hypotheses about another poorly understood system. Although deep networks are not black boxes (every parameter is knowable) ( Hasson et al. 2020 ), we do not fully understand how they recognize faces ( Poggio et al. 2020 ). Therefore, the primary goal should be to understand deep networks for face recognition at a conceptual and representational level.

CAVEAT: ITERATION BETWEEN THEORY AND PRACTICE

Box (1976) noted that scientific progress depends on motivated iteration between theory and practice. Unfortunately, a motivation to iterate between theory and practice is not a reasonable expectation for the field of computer-based face recognition. Automated face recognition is big business, and the best models were not developed to study human face processing. DCNNs provide a neurally inspired, but not copied, solution to face processing tasks. Computer scientists formulated DCNNs at an abstract level, based on neural networks from the 1980s ( Fukushima 1988 ). Current DCNN-based models of human face processing are computationally refined, scaled-up versions of these older networks. Algorithm developers make design and training decisions for performance and computational efficiency. In using DCNNs to model human face perception, researchers must choose between smaller, controlled models and larger-scale, uncontrolled networks (see also Richards et al. 2019 ). Controlled models are easier to analyze but can be limited in computational power and training data diversity. Uncontrolled models better emulate real neural systems but may be intractable. The easy availability of cutting-edge pretrained face recognition models, with a variety of architectures, has been the deciding factor for many research labs with limited resources and expertise to develop networks. Given the widespread use of these models in vision science, brain-similarity metrics for artificial neural networks have been developed ( Schrimpf et al. 2018 ). These produce a Brain-Score made up of a composite of neural and behavioral benchmarks. Some large-scale (uncontrolled) network architectures used in modeling human face processing (See Section 2.1 ) score well on these metrics.

A promising long-term strategy is to increase the neural accuracy of deep networks ( Grill-Spector et al. 2018 ). The ventral visual stream and DCNNs both enable hierarchical and feedforward processing. This offers two computational benefits consistent with DCNNs as models of human face processing. First, the universal approximation theorem ( Hornik et al. 1989 ) ensures that both types of networks can approximate any complex continuous function relating the input (visual image) to the output (face identity). Second, linear and nonlinear feedforward connections enable fast computation consistent with the speed of human facial recognition ( Grill-Spector et al. 2018 , Thorpe et al. 1996 ). Although current DCNNs lack other properties of the ventral visual system, these can be implemented as the field progresses.

  • Deep networks force us to rethink the universe of possible solutions to the problem of inverse optics in vision. The face representations that emerge from deep networks trained for identification operate invariantly across changes in image and appearance, but they are not themselves invariant.
  • Computational theory and simulation studies of deep learning indicate a reconsideration of a long-standing axiom in vision science that face or object representations can be understood in terms of interpretable features. Instead, in deep learning models, the concept of a nameable deep feature, localized in an output unit of the network or in the latent variables of the space, should be reevaluated.
  • Natural environments provide highly variable training data that can structure the development of face processing systems using a variety of learning mechanisms that overlap, accumulate over time, and interact. It is no longer possible to invoke learning as a generic theoretical account of a behavioral or neural phenomenon.

We focus on deep learning findings that are relevant for understanding human face processing—broadly construed. The human face provides us with diverse information, including identity, gender, race or ethnicity, age, and emotional state. We use the face to make inferences about a person’s social traits ( Oosterhof & Todorov 2008 ). As we discuss below, deep networks trained for identification retain much of this diverse facial information (e.g., Colón et al. 2021 , Dhar et al. 2020 , Hill et al. 2019 , Parde et al. 2017 , Terhörst et al. 2020 ). The use of face recognition algorithms in applied settings (e.g., law enforcement) has spurred detailed performance comparisons between DCNNs and humans (e.g., Phillips et al. 2018 ). For analogous reasons, the problem of human-like race bias in DCNNs has also been studied (e.g., Cavazos et al. 2020 ; El Khiyari & Wechsler 2016 ; Grother et al. 2019 ; Krishnapriya et al. 2019 , 2020 ). Developmental data on infants’ exposure to faces in the first year(s) of life offer insight into how to structure the training of deep networks ( Smith & Slone 2017 ). These topics are within the scope of this review. Although we consider general points of comparison between DCNNs and neural responses in face-selective areas of the primate inferotemporal (IT) cortex, a detailed discussion of this topic is beyond the scope of this review. (For a review of primate face-selective areas that considers computational perspectives, see Hesse & Tsao 2020 ). In this review, we focus on the computational and representational principles of neural coding from a deep learning perspective.

The review is organized as follows. We begin with a brief review of where machine performance on face identification stands relative to humans in quantitative terms. Qualitative performance comparisons on identification and other face processing tasks (e.g., expression classification, social perception, development) are integrated into Sections 2 – 4 . These sections consider advances in understanding human face processing from deep learning approaches. We close with a discussion of where the next steps might lead.

1.1. Where We Are Now: Human Versus Machine Face Recognition

Deep learning models of face identification map widely variable images of a face onto a representation that supports identification accuracy comparable to that of humans. The steady progress of machines over the past 15 years can be summarized in terms of the increasingly challenging face images that they can recognize ( Figure 1 ). By 2007, the best algorithms surpassed humans on a task of identity matching for unfamiliar faces in frontal images taken indoors ( O’Toole et al. 2007 ). By 2012, well-established algorithms exceeded human performance on frontal images with moderate changes in illumination and appearance ( Kumar et al. 2009 , Phillips & O’Toole 2014 ). Machine ability to match identity for in-the-wild images appeared with the advent of DCNNs in 2013–2014. Human face recognition was marginally more accurate than DeepFace ( Taigman et al. 2014 ), an early DCNN, on the Labeled Faces in the Wild (LFW) data set ( Huang et al. 2008 ). LFW contains in-the-wild images taken mostly from the front. DCNNs now fare well on in-the-wild images with significant pose variation (e.g., Maze et al. 2018 , data set). Sengupta et al. (2016) found parity between humans and machines on frontal-to-frontal identity matching but human superiority on frontal-to-profile matching.

An external file that holds a picture, illustration, etc.
Object name is nihms-1766682-f0001.jpg

The progress of computer-based face recognition systems can be tracked by their ability to recognize faces with increasing levels of image and appearance variability. In 2006, highly controlled, cropped face images with moderate variability, such as the images of the same person shown, were challenging (images adapted with permission from Sim et al. 2002 ). In 2012, algorithms could tackle moderate image and appearance variability (the top 4 images are extreme examples adapted with permission from Huang et al. 2012 ; the bottom two images adapted with permission from Phillips et al. 2011 ). By 2018, deep convolutional neural networks (DCNNs) began to tackle wide variation in image and appearance, (images adapted with permission from the database in Maze et al. 2018 ). In the 2012 and 2018 images, all side-by side images show the same person except the bottom pair of 2018 panels.

Identity matching:

process of determining if two or more images show the same identity or different identities; this is the most common task performed by machines

Human face recognition:

the ability to determine whether a face is known

1.2. Expert Humans and State-of-the-Art Machines Work Together

DCNNs can sometimes even surpass normal human performance. Phillips et al. (2018) compared humans and machines matching the identity of faces in high-quality frontal images. Although this is generally considered an easy task, the images tested were chosen to be highly challenging based on previous human and machine studies. Four DCNNs developed between 2015 and 2017 were compared to human participants from five groups: professional forensic face examiners, professional forensic face reviewers, superrecognizers ( Noyes et al. 2017 , Russell et al. 2009 ), professional fingerprint examiners, and students. Face examiners, reviewers, and superrecognizers performed more accurately than fingerprint examiners, and fingerprint examiners performed more accurately than students. Machine performance, from 2015 to 2017, tracked human skill levels. The 2015 algorithm ( Parkhi et al. 2015 ) performed at the level of the students; the 2016 algorithm ( Chen et al. 2016 ) performed at the level of the fingerprint examiners ( Ranjan et al. 2017c ); and the two 2017 algorithms ( Ranjan et al. 2017 , c ) performed at the level of professional face reviewers and examiners, respectively. Notably, combining the judgments of individual professional face examiners with those of the best algorithm ( Ranjan et al. 2017 ) yielded perfect performance. This suggests a degree of strategic diversity for the face examiners and the DCNN and demonstrates the potential for effective human–machine collaboration ( Phillips et al. 2018 ).

Combined, the data indicate that machine performance has improved from a level comparable to that of a person recognizing unfamiliar faces to one comparable to that of a person recognizing more familiar faces ( Burton et al. 1999 , Hancock et al. 2000 , Jenkins et al. 2011 ) (see Section 4.1 ).

2. RETHINKING INVERSE OPTICS AND FACE REPRESENTATIONS

Deep networks force us to rethink the universe of possible solutions to the problem of inverse optics in vision. These networks operate with a degree of invariance to image and appearance that was unimaginable by researchers less than a decade ago. Invariance refers to the model’s ability to consistently identify a face when image conditions (e.g., viewpoint, illumination) and appearance (e.g., glasses, facial hair) vary. The nature of the representation that accomplishes this is not well understood. The inscrutability of DCNN codes is due to the enormous number of computations involved in generating a face representation from an image and the uncontrolled training data. To create a face representation, millions of nonlinear, local convolutions are executed over tens (to hundreds) of layers of units. Researchers exert little or no control over the training data, but instead source face images from the web with the goal of finding as much labeled training data as possible. The number of images per identity and the types of images (e.g., viewpoint, expression, illumination, appearance, quality) are left (mostly) to what is found through web scraping. Nevertheless, DCNNs produce a surprisingly structured and rich face representation that we are beginning to understand.

2.1. Mining the Face Identity Code in Deep Networks

The face representation generated by DCNNs for the purpose of identifying a face also retains detailed information about the characteristics of the input image (e.g., viewpoint, illumination) and the person pictured (e.g., gender, age). As shown below, this unified representation can solve multiple face processing tasks in addition to identification.

2.1.1. Image characteristics.

Face representations generated by deep networks both are and are not invariant to image variation. These codes can identify faces invariantly over image change, but they are not themselves invariant. Instead, face representations of a single identity vary systematically as a function of the characteristics of the input image. The representations generated by DCNNs are, in fact, representations of face images.

Work to dissect face identity codes draws on the metaphor of a face space ( Valentine 1991 ) adapted to representations generated by a DCNN. Visualization and simulation analyses demonstrate that identity codes for face images retain ordered information about the input image ( Dhar et al. 2020 , Hill et al. 2019 , Parde et al. 2017 ). Viewpoint (yaw and pitch) can be predicted accurately from the identity code, as can media source (still image or video frame) ( Parde et al. 2017 ). Image quality (blur, usability, occlusion) is also available as the identity code norm (vector length). 1 Poor-quality images produce face representations centered in the face space, creating a DCNN garbage dump. This organizational structure was replicated in two DCNNs with different architectures, one developed by Chen et al. (2016) with seven convolutional layers and three fully connected layers and another developed by Sankaranarayanan et al. (2016) with 11 convolutional layers and one fully connected layer. Image quality estimates can also be optimized directly in a DCNN using human ratings ( Best-Rowden & Jain 2018 ).

Face space:

representation of the similarity of faces in a multidimensional space

For a closer look at the structure of DCNN face representations, Hill et al. (2019) examined the representations of highly controlled face images in a face space generated by a deep network trained with in-the-wild images. The network processed images of three-dimensional laser scans of human heads rendered from five viewpoints under two illumination conditions (ambient, harsh spotlight). Visualization of these representations in the resulting face space showed a highly ordered pattern (see Figure 2 ). Consistent with the network’s high accuracy at face identification, images clustered by identity. Identity clusters separated into regions of male and female faces (see Section 2.1.2 ). Within each identity cluster, the images separated by illumination condition—visible in the face space as chains of images. Within each illumination chain, the image representations were arranged in the space by viewpoint, which varied systematically along the image chain. To further probe the coding of identity, Hill et al. (2019) processed images of caricatures of the 3D heads (see also Blanz & Vetter 1999 ). Caricature representations were centered in each identity cluster, indicating that the network perceived a caricature as a good likeness of the identity.

An external file that holds a picture, illustration, etc.
Object name is nihms-1766682-f0002.jpg

Visualization of the top-level deep convolutional neural network (DCNN) similarity space for all images from Hill et al. (2019) . ( a – f ) Points are colored according to different variables. Grey polygonal borders are for illustration purposes only and show the convex hull of all images of each identity. These convex hulls are expanded by a margin for visibility. The network separates identities accurately. In panels a and d , the space is divided into male and female sections. In panels b and e , illumination conditions subdivide within identity groupings. In panels c and f , the viewpoint varies sequentially within illumination clusters. Dotted-line boxes in panels a – c show areas enlarged in panels d – g . Figure adapted with permission from Hill et al. (2019) .

DCNN face representation:

output vector produced for a face image processed through a deep network trained for faces

All results from Hill et al. (2019) were replicated using two networks with starkly different architectures. The first, developed by Ranjan et al. (2019) , was based on a ResNet-101 with 101 layers and skip connections; the second, developed by Chen et al. (2016) , had 15 convolution and pooling layers, a dropout layer, and one fully connected top layer. As measured using the brain-similarity metrics developed in Brain-Score ( Schrimpf et al. 2018 ), one of these architectures (ResNet-101) was the third most brain-like of the 25 networks tested. The ResNet-101 network scored well on both neural (V4 and IT cortex) and behavioral predictability for object recognition. Hill et al.’s (2019) replication of this face space using a shallower network ( Chen et al. 2016 ), however, suggests that network architecture may be less important than computational capacity in understanding high-level visual codes for faces (see Section 3.2 ).

Brain-Score:

neural and behavioral benchmarks that score an artificial neural network on its similarity to brain mechanisms for object recognition

Returning to the issue of human-like view invariance in a DCNN, Abudarham & Yovel (2020) compared the similarity of face representations computed within and across identities and viewpoints. Consistent with view-invariant performance, same-identity, different-view face pairs were more similar than different-identity, same-view face pairs. Consistent with a noninvariant face representation, correlations between similarity scores across head view decreased monotonically with increasing view disparity. These results support the characterization of DCNN codes as being functionally view invariant but with a view-specific code. Notably, earlier layers in the network showed view specificity, whereas higher layers showed view invariance.

It is worth digressing briefly to consider invariance in the context of neural approaches to face processing. An underlying assumption of neural approaches is that “a major purpose of the face patches is thus to construct a representation of individual identity invariant to view direction” ( Hesse & Tsao 2020 , pp. 703). Ideas about how this is accomplished have evolved. Freiwald & Tsao (2010) posited the progressive computation of invariance via the pooling of neurons across face patches, as follows. In early patches, a neuron responds to a specific identity from specific views; in middle face patches, greater invariance is achieved by pooling the responses of mirror-symmetric views of an identity; in later face patches, each neuron pools inputs representing all views of the same individual to create a fully view-invariant representation. More recently, Chang & Tsao (2017) proposed that the brain computes a view-invariant face code using shape and appearance parameters analogous to those used in a computer graphics model of face synthesis ( Cootes et al. 1995 ) (see the sidebar titled Neurons, Neural Tuning, Population Codes, Features, and Perceptual Constancy ). This code retains information about the face, but not about the particular image viewed.

NEURONS, NEURAL TUNING, POPULATION CODES, FEATURES, AND PERCEPTUAL CONSTANCY

Barlow (1972 , p. 371) wrote, “Results obtained by recording from single neurons in sensory pathways…obviously tell us something important about how we sense the world around us; but what exactly have we been told?” In answer, Barlow (1972 , p. 371) proposed that “our perceptions are caused by the activity of a rather small number of neurons selected from a very large population of predominantly silent cells. The activity of each single cell is thus an important perceptual event and it is thought to be related quite simply to our subjective experience.” Although this proposal is sometimes caricatured as the grandmother cell doctrine (see also Gross 2002 ), Barlow simply asserts that single-unit activity can be interpreted in perceptual terms, and that the responses of small numbers of units, in combination, underlie subjective perceptual experience. This proposal reflects ideas gleaned from studies of early visual areas that have been translated, at least in part, to studies of high-level vision.

Over the past decade, single neurons in face patches have been characterized as selective for facial features (e.g., aspect ratio, hair length, eyebrow height) ( Freiwald et al. 2009 ), face viewpoint and identity ( Freiwald & Tsao 2010 ), eyes ( Issa & DiCarlo 2012 ), and shape or appearance parameters from an active appearance model of facial synthesis ( Chang & Tsao 2017 ). Neurophysiological studies of face and object processing also employ techniques aimed at understanding neural population codes. Using the pattern of neural responses in a population of neurons (e.g., IT), linear classifiers are used often to predict subjective percepts (commonly defined as the image viewed). For example, Chang & Tsao (2017) showed that face images viewed by a macaque could be reconstructed using a linear combination of the activity of just 205 face cells in face patches ML–MF and AM. This classifier provides a real neural network model of the face-selective cortex that can be interpreted in simple terms.

Population code models generated from real neural data (a few hundred units), however, differ substantially in scale from the face- and object-selective cortical regions that they model (1 mm 3 of the cerebral cortex contains approximately 50,000 neurons and 300 million adjustable parameters; Azevedo et al. 2009 , Kandel et al. 2000 , Hasson et al. 2020 ). This difference in scale is at the core of a tension between model interpretability and real-world task generalizability ( Hasson et al. 2020 ). It also creates tension between the neural coding hypotheses suggested by deep learning and the limitations of current neuroscience techniques for testing these hypotheses. To model neural function, an electrode gives access to single neurons and (with multi-unit recordings) to relatively small numbers of neurons (a few hundred). Neurocomputational theory based on direct fit models posits that overparameterization (i.e., the extremely high number of parameters available for neural computation) is critical to the brain’s solution to real-world problems (see Section 3.2 ). Bridging the gap between the computational and neural scale of these perspectives remains an ongoing challenge for the field.

Deep networks suggest an alternative that is largely consistent with neurophysiological data but interprets the data in a different light. Neurocomputational theory posits that the ventral visual system untangles face identity information from image parameters ( DiCarlo & Cox 2007 ). The idea is that visual processing starts in the image domain, where identity and viewpoint information are entangled. With successive levels of neural processing, manifolds corresponding to individual identities are untangled from image variation. This creates a representational space where identities can be separated with hyperplanes. Image information is not lost, but rather, is rearranged (for object recognition results, see Hong et al. 2016 ). The retention of image and identity information in DCNN face representations is consistent with this theory. It is also consistent with basic neuroscience findings indicating the emergence of a representation dominated by identity that retains sensitivity to image features (See Section 2.2 ).

2.1.2. Appearance and demographics.

Faces can be described using what computer vision researchers have called attributes or soft biometrics (hairstyle, hair color, facial hair, and accessories such as makeup and glasses). The definition of attributes in the computational literature is vague and can include demographics (e.g., gender, age, race) and even facial expression. Identity codes from deep networks retain a wide variety of face attributes. For example, Terhörst et al. (2020) built a massive attribute classifier (MAC) to test whether 113 attributes could be predicted from the face representations produced by deep networks [ArcFace ( Deng et al. 2019 ) or FaceNet ( Schroff et al. 2015 )] for images from in-the-wild data sets ( Huang et al. 2008 , Liu et al. 2015 ). The MAC learned to map from DCNN-generated face representations to attribute labels. Cross-validated results showed that 39 of the attributes were easily predictable, and 74 of the 113 were predictable at reliable levels. Hairstyle, hair color, beard, and accessories were predicted easily. Attributes such as face geometry (e.g., round), periocular characteristics (e.g., arched eyebrows), and nose were moderately predictable. Skin and mouth attributes were not well predicted.

The continuous shuffling of identity, attribute, and image information across layers of the network was demonstrated by Dhar et al. (2020) . They tracked the expressivity of attributes (identity, sex, age, pose) across layers of a deep network. Expressivity was defined as the degree to which a feature vector, from any given layer of a network, specified an attribute. Dhar et al. (2020) computed expressivity using a second neural network that estimated the mutual information between attributes and DCNN features. Expressivity order in the final fully connected layer of both networks (Resnet-101 and Inception Resnet v2; Ranjan et al. 2019 ) indicated that identity was most expressed, followed by age, sex, and yaw. Identity expressivity increased dramatically from the final pooling layer to the last fully connected layer. This echos the progressive increase in the detectability of view-invariant face identity representations seen across face patches in the macaque ( Freiwald & Tsao 2010 ). It also raises the computational possibility of undetected viewpoint sensitivity in these neurons (see Section 3.1 ).

Mutual information:

a statistical term from information theory that quantifies the codependence of information between two random variables

2.1.3. Social traits.

People make consistent (albeit invalid) inferences about a person’s social traits based on their face ( Todorov 2017 ). These judgments have profound consequences. For example, competence judgments about faces predict election success at levels far above chance ( Todorov et al. 2005 ). The physical structure of the face supports these trait inferences ( Oosterhof & Todorov 2008 , Walker & Vetter 2009 ), and thus it is not surprising that deep networks retain this information. Using face representations produced by a network trained for face identification ( Sankaranarayanan et al. 2016 ), 11 traits (e.g., shy, warm, impulsive, artistic, lazy), rated by human participants, were predicted at levels well above chance ( Parde et al. 2019 ). Song et al. (2017) found that more than half of 40 attributes were predicted accurately by a network trained for object recognition (VGG-16; Simonyan & Zisserman 2014 ). Human and machine trait ratings were highly correlated.

Other studies show that deep networks can be optimized to predict traits from images. Lewenberg et al. (2016) crowd-sourced large numbers of objective (e.g., hair color) and subjective (e.g., attractiveness) attribute ratings from faces. DCNNs were trained to classify images for the presence or absence of each attribute. They found highly accurate classification for the objective attributes and somewhat less accurate classification for the subjective attributes. McCurrie et al. (2017) trained a DCNN to classify faces according to trustworthiness, dominance, and IQ. They found significant accord with human ratings, with higher agreement for trustworthiness and dominance than for IQ.

2.1.4. Facial expressions.

Facial expressions are also detectable in face representations produced by identity-trained deep networks. Colón et al. (2021) found that expression classification was well above chance for face representations of images from the Karolinska data set ( Lundqvist et al. 1998 ), which includes seven facial expressions (happy, sad, angry, surprised, fearful, disgusted, neutral) seen from five viewpoints (frontal and 90- and 45-degree left and right profiles). Consistent with human data, happiness was classified most accurately, followed by surprise, disgust, anger, neutral, sadness, and fear. Notably, accuracy did not vary across viewpoint. Visualization of the identities in the emergent face space showed a structured ordering of similarity in which viewpoint dominated over expression.

2.2. Functional Invariance, Useful Variability

The emergent code from identity-trained DCNNs can be used to recognize faces robustly, but it also retains extraneous information that is of limited, or no, value for identification. Although demographic and trait information offers weak hints to identity, image characteristics and facial expression are not useful for identification. Attributes such as glasses, hairstyle, and facial hair are, at best, weak identity cues and, at worst, misleading cues that will not remain constant over extended time periods. In purely computational terms, the variability of face representations for different images of an identity can lead to errors. Although this is problematic in security applications, coincidental features and attributes can be diagnostic enough to support acceptably accurate identification performance in day-to-day face recognition ( Yovel & O’Toole 2016 ). (For related arguments based on adversarial images for object recognition, see Ilyas et al. 2019 , Xie et al. 2020 , Yuan et al. 2020 .) A less-than-perfect identification system in computational terms, however, can be a surprisingly efficient, multipurpose face processing system that supports identification and the detection of visually derived semantic information [called attributes by Bruce & Young (1986) ].

What do we learn from these studies that can be useful in understanding human visual processing of faces? First, we learn that it is computationally feasible to accommodate diverse information about faces (identity, demographics, visually derived semantic information), images (viewpoint, illumination, quality), and emotions (expression) in a unified representation. Furthermore, this diverse information can be accessed selectively from the representation. Thus, identity, image parameters, and attributes are all untangled when learning prioritizes the difficult within-category discrimination problem of face identification.

Second, we learn that to understand high-level visual representations for faces, we need to think in terms of categorical codes unbound from a spatial frame of reference. Although remnants of retinotopy and image characteristics remain in high-level visual areas (e.g., Grill-Spector et al. 1999 , Kay et al. 2015 , Kietzmann et al. 2012 , Natu et al. 2010 , Yue et al. 2010 ), the expressivity of spatial layout weakens dramatically from early visual areas to categorically structured areas in the IT cortex. Categorical face representations should capture what cognitive and perceptual psychologists call facial features (e.g., face shape, eye color). Indeed, altering these types of features in a face affects identity perception similarly for humans and deep networks ( Abudarham et al. 2019 ). However, neurocomputational theory suggests that finding these features in the neural code will likely require rethinking the interpretation of neural tuning and population coding (see Section 3.2 ).

Third, if the ventral stream untangles information across layers of computations, then we should expect traces of identity, image data, and attributes at many, if not all, neural network layers. These may variously dominate the strength of the neural signal at different layers (see Section 3.1 ). Thus, various layers in the network will likely succeed in predicting several types of information about the face and/or image, though with differing accuracy. For now, we should not ascribe too much importance to findings about which specific layer(s) of a particular network predict specific attributes. Instead, we should pay attention to the pattern of prediction accuracy across layers. We would expect the following pattern. Clearly, for the optimized attribute (identity), the output offers the clearest access. For subject-related attributes (e.g., demographics), this may also be the case. For image-related attributes, we would expect every layer in the network to retain some degree of prediction ability. Exactly how, where, and whether the neural system makes use of these attributes for specific tasks remain open questions.

3. RETHINKING VISUAL FEATURES: IMPLICATIONS FOR NEURAL CODES

Deep learning models force us to rethink the definition and interpretation of facial features in high-level representations. Theoretical ideas about the brain’s solution to complex real-world tasks such as face recognition must be reconciled at the level of neural units and representational spaces. Deep learning models can be used to test hypotheses about how faces are stored in the high-dimensional representational space defined by the pattern of responses of large numbers of neurons.

3.1. Units Confound Information that Separates in the Representation Space

Insight into interpreting facial features comes from deep network simulations aimed at understanding the relationship between unit responses and the information retained in the face representation. Parde et al. (2021) compared identification, gender classification, and viewpoint estimation in subspaces of a DCNN face space. Using an identity-trained network capable of all three tasks, they tested performance on the tasks using randomly sampled subsets of output units. Beginning at full dimensionality (512-units) and progressively decreasing sample size, they found no notable decline in identification accuracy for more than 3,000 in-the-wild-faces until the sample size reached 16 randomly chosen units (3% of full dimensionality). Correlations between unit responses across representations were near zero, indicating that individual units captured nonredundant identity cues. Statistical power for identification (i.e., separating identities) was uniformly high for all output units, demonstrating that units used their entire response range to separate identities. A unit firing at its maximum provided no more, and no less, information than any other response value. This distinction may seem trivial, but it is not. The data suggest that every output unit acts to separate identities to the maximum degree possible. As such, all units participate in coding all identities. In information theory terms, this is an ideal use of neural resources.

For gender classification and viewpoint estimation, performance declined at a much faster rate than for identification as units were deleted ( Parde et al. 2021 ). Statistical power for predicting gender and viewpoint was strong in the distributed code but weak at the level of the unit. Prediction power for these attributes was again roughly equivalent for all units. Thus, individual units contributed to coding all three attributes, but identity modulated individual unit responses far more strongly than did gender or viewpoint. Notably, a principal component (PC) analysis of representations in the full-dimensional space revealed subspaces aligned with identity, gender, and viewpoint ( Figure 3 ). Consistent with the strength of the categorical identity code in the representation, identity information dominated PCs explaining large amounts of variance, gender dominated the middle range of PCs, and viewpoint dominated PCs explaining small amounts of variation.

An external file that holds a picture, illustration, etc.
Object name is nihms-1766682-f0003.jpg

Illustration of the separation of the task-relevant information into subspaces for an identity-trained deep convolutional neural network (DCNN). Each plot shows the similarity (cosine) between principal components (PCs) of the face space and directional vectors in the space that are diagnostic of identity ( top ), gender ( middle ), and viewpoint ( bottom ). Figure adapted with permission from Parde et al. (2021) .

The emergence and effectiveness of these codes in DCNNs suggest that caution is needed in ascribing significance only to stimuli that drive a neuron to high rates of response. Small-scale modulations of neural responses can also be meaningful. Let us consider a concrete example. A neurophysiologist probing the network used by Parde et al. (2021) would find some neurons that respond strongly to a few identities. Interpreting this as identity tuning, however, would be an incorrect characterization of a code in which all units participate in coding all identities. Concomitantly, few units in the network would appear responsive to viewpoint or gender variations because unit firing rates would modulate only slightly with changes in viewpoint or gender. Thus, the distributed coding of view and gender across units would likely be missed. The finding that neurons in macaque face patch AM respond selectively (i.e., with high response rates) to identity over variable views ( Freiwald & Tsao 2010 ) is consistent with DCNN face representations. It is possible, however, that these units also encode other face and image attributes, but with differential degrees of expressivity. This would be computationally consistent with the untangling theory and with DCNN codes.

Macaque face patches:

regions of the macaque cortex that respond selectively to faces, including the posterior lateral (PL), middle lateral (ML), middle fundus (MF), anterior lateral (AL), anterior fundus (AF), and anterior medial (AM)

Another example comes from the use of generative adversarial networks and related techniques to characterize the response properties of single (or multiple) neuron(s) in the primate visual cortex ( Bashivan et al. 2019 , Ponce et al. 2019 , Yuan et al. 2020 ). These techniques have examined neurons in areas V4 ( Bashivan et al. 2019 ) and IT ( Ponce et al. 2019 , Yuan et al. 2020 ). The goal is to progressively evolve images that drive neurons to their maximum response or that selectively (in)activate subsets of neurons. Evolved images show complex mosaics of textures, shapes, and colors. They sometimes show animals or people and sometimes reveal spatial patterns that are not semantically interpretable. However, these techniques rely on two strong assumptions. First, they assume that a neuron’s response can be characterized completely in terms of the stimuli that activate it maximally, thereby discounting other response rates as noninformative. The computational utility of a unit’s full response range in DCNNs suggests that reconsideration of this assumption is necessary. Second, these techniques assume that a neuron’s response properties can be visualized accurately as a two-dimensional image. Given the categorical, nonretinotopic nature of representations in high-level visual areas, this seems problematic. If the representation under consideration is not in the image or pixel domain, then image-based visualization may offer limited, and possibly misleading, insight into the underlying nature of the code.

3.2. Direct-Fit Models and Deep Learning

In rethinking visual features at a theoretical level, direct-fit models of neural coding appear to best explain deep learning findings in multiple domains (e.g., face recognition, language) ( Hasson et al. 2020 ). These models posit that neural computation fits densely sampled data from the environment. Implementation is accomplished using “overparameterized optimization algorithms that increase predictive (generalization) power, without explicitly modeling the underlying generative structure of the world” ( Hasson et al. 2020 , p. 418). Hasson et al. (2020) begins with an ideal model in a small-parameter space ( Figure 4 ). When the underlying structure of the world is simple, a small-parameter model will find the underlying generative function, thereby supporting generalization via interpolation and extrapolation. Despite decades of effort, small-parameter functions have not solved real-world face recognition with performance anywhere near that of humans.

An external file that holds a picture, illustration, etc.
Object name is nihms-1766682-f0004.jpg

( a ) A model with too few parameters fails to fit the data. ( b ) The ideal-fit model fits with a small number of parameters and has generative power that supports interpolation and extrapolation. ( c ) An overfit function can model noise in the training data. ( d ) An overparameterized model generalizes well to new stimuli within the scope of the training samples. Figure adapted with permission from Hasson et al. (2020) .

When the underlying structure of the world is complex and multivariate, direct-fit models offer an alternative to models based on small-parameter functions. With densely sampled real-world training data, each new observation can be placed in the context of past experience. More formally, direct-fit models solve the problem of generalization to new exemplars by experience-scaffolded interpolation ( Hasson et al. 2020 ). This produces face recognition performance in the range of that of humans. A fundamental element of the success of deep networks is that they model the environment with big data, which can be structured in overparameterized spaces. The scale of the parameterization and the requirement to operate on real-world data are pivotal. Once the network is sufficiently parameterized to fit the data, the exact details of its architecture are not important. This may explain why starkly different network architectures arrive at similarly structured representations ( Hill et al. 2019 , Parde et al. 2017 , Storrs et al. 2020 ).

Returning to the issue of features, in neurocomputational terms, the strength of connectivity between neurons at synapses is the primary locus of information, just as weights between units in a deep network comprise information. We expect features, whatever they are, to be housed in the combination of connection strengths among units, not in the units themselves. In a high-dimensional multivariate encoding space, they are hyperplane directions through the space. Thus, features are represented across many computing elements, and each computing element participates in encoding many features ( Hasson et al. 2020 , Parde et al. 2021 ). If features are directions in a high-dimensional coding space ( Goodfellow et al. 2014 ), then units act as an arbitrary projection surface from which this information can be accessed—albeit in a nontransparent form.

A downside of direct-fit models is that they cannot generalize via extrapolation. The other-race effect is an example of how face recognition may fail due to limited experience ( Malpass & Kravitz 1969 ) (see Section 4.3.2 ). The extrapolation limit may be countered, however, by the capacity of direct-fit models to acquire expertise within the confines of experience. For example, in human perception, category experience selectively structures representations as new exemplars are learned. Collins & Behrmann (2020) show that this occurs in a way that reflects the greater experience that humans have with faces and computer-generated objects from novel made-up categories of objects, which the authors call YUFOs. They tracked the perceived similarity of pairs of other-race faces and YUFOs as people learned novel exemplars of each. Experience changed perceived similarities more selectively for faces than for YUFOs, enabling more nuanced discrimination of exemplars from the experienced category of faces.

In summary, direct-fit models offer a framework for thinking about high-level visual codes for faces in a way that unifies disparate data on single units and high-dimensional coding spaces. These models are fueled by the rich experience that we (models) gain from learning (training on) real-world data. They solve complex visual tasks with interpolated solutions that elude transparent semantic interpretation.

4. RETHINKING LEARNING IN HUMANS AND DEEP NETWORKS

Deep network models of human face processing force us to consider learning as a complex and diverse set of mechanisms that can overlap, accumulate over time, and interact. Learning in both humans and artificial neural networks can refer to qualitatively different phenomena. In both cases, learning involves multiple steps. For DCNNs, these steps are fundamental to a network’s ability to recognize faces across image and appearance variation. Human visual learning is likewise diverse and unfolds across the developmental lifespan in a process governed by genetics and environmental input ( Goodman & Shatz 1993 ). The stepwise implementation of learning is one way that DCNNs differ from previous face recognition networks. Considered as manipulable modeling tools, the learning steps in DCNNs force us to think in concrete and nuanced ways about how humans learn faces.

In this section, we outline the learning layers in human face processing ( Section 4.1 ), introduce the layers of learning used in training machines ( Section 4.2 ), and consider the relationship between the two in the context of human behavior ( Section 4.3.1 ). The human learning layers support a complex, biologically realized face processing system. The machine learning layers can be thought of as building blocks that can be combined in a variety of ways to model human behavioral phenomena. At the outset, we note that machine learning is designed to maximize performance—not to model the development of the human face processing system ( Smith & Slone 2017 ). Concomitantly, the sequential presentation of training data in DCNNs differs from the pattern of exposure that infants and young children have with faces and objects ( Jayaraman et al. 2015 ). The machine learning steps, however, can be modified to model human learning more closely. In practical terms, fully trained DCNNs, available on the web, are used (almost exclusively) to model human neural systems (see the sidebar titled Caveat: Iteration Between Theory and Practice ). It is important, therefore, to understand how (and why) these models are configured as they are and to understand the types of learning tools available for modeling human face processing. These steps may provide computational grounding for basic learning mechanisms hypothesized in humans.

4.1. Human Learning for Face Processing

To model human face processing, researchers need to consider the following types of learning. The most specific form of learning is familiar face recognition. People learn the faces of specific familiar individuals (e.g., friends, family, celebrities). Familiar faces are recognized robustly over challenging changes in appearance and image characteristics. The second-most specific is local population tuning. People recognize own-race faces more accurately than other-race faces, a phenomenon referred to as the other-race effect (e.g., Malpass & Kravitz 1969 ). This likely results from tuning to the statistical properties of the faces that we see most frequently—typically faces of our own race. The third-most specific is nfamiliar face recognition. People can differentiate unfamiliar faces perceptually. Unfamiliar refers to faces that a person has not encountered previously or has encountered infrequently. Unfamiliar face recognition is less robust to image and appearance change than is familiar face recognition. The least specific form of learning is object recognition. At a fundamental level of analysis, faces are objects, and both share early visual processing wetware.

4.2. How Deep Convolutional Neural Networks Learn Face Identification

Training DCNNs for face recognition involves a sequence of learning stages, each with a concrete objective. Unlike human learning, machine learning stages are executed in strict sequence. The goal across all stages of training is to build an effective method for converting images of faces into points in a high-dimensional space. The resulting high-dimensional space allows for easy comparison among faces, search, and clustering. In this section, we sketch out the engineering approach to learning, working forward from the most general to the most specific form of learning. This follows the implementation order used by engineers.

4.2.1. Object classification (between-category learning): Stage 1.

Deep networks for face identification are commonly built on top of DCNNs that have been pretrained for object classification. Pretraining is carried out using large data sets of objects, such as those available in ImageNet ( Russakovsky et al. 2015 ), which contains more than 14 million images of over 1,000 classes of objects (e.g., volcanoes, cups, chihuahuas). The object categorization training procedure involves adjusting the weights on all layers of the network. For training to converge, a large training set is required. The loss function optimized in this procedure typically uses the well-understood cross-entropy loss + Softmax combination. Most practitioners do not execute this step because it has been performed already in a pretrained model downloaded from a public repository in a format compatible with DCNN software libraries [e.g., PyTorch ( Paszke et al. 2019 ), TensorFlow ( Abadi et al. 2016 )]. Networks trained for object recognition have proven better for face identification than networks that start with a random configuration ( Liu et al. 2015 , Yi et al. 2014 ).

4.2.2. Face recognition (within-category learning): Stage 2.

Face recognition training is implemented in a second stage of training. In this stage, the last fully connected layer that connects to object-category nodes (e.g., volcanoes, cups) is removed from the results of the Stage 1 training. Next, a fully connected layer that maps to the number of face identities available for face training is connected. Depending on the size of the face training set, the weights of either all layers or all but a few layers at the beginning of the network are updated. The former is common when very large numbers of face identities are available for training. In academic laboratories, data sets include 5–10 million face images of 40,000–100,000 identities. In industry, far larger data sets are often used ( Schroff et al. 2015 ). A technical difficulty encountered in retraining an object classification network to a face recognition network is the large increase in the number of categories involved (approximately 1,000 objects versus 50,000+ faces). Special loss functions can address this issue [e.g., L2-Softmax/crystal loss ( Ranjan et al. 2017 ), NormFace ( Wang et al. 2017 ), angular Softmax ( Li et al. 2018 ), additive Softmax ( Wang et al. 2018 ), additive angular margins ( Deng et al. 2019 )].

When the Stage 2 face training is complete, the last fully connected layer that connects to the 50,000+ face identity nodes is removed, leaving below it a relatively low-dimensional (128- to 5,000-unit) layer of output units. This can be thought of as the face representation. This output represents a face image, not a face identity. At this point in training, any arbitrary face image from any identity (known or unknown to the network) can be processed by the DCNN to produce a compact face image descriptor across the units of this layer. If the network functions perfectly, then it will produce identical codes for all images of the same person. This would amount to perfect image and appearance generalization. This is not usually achieved, even when the network is highly accurate (see Section 2 ).

In this state, the network is commonly employed to recognize faces not seen in training (unfamiliar faces). Stage 2 training supports a surprising degree of generalization (e.g., pose, expression, illumination, and appearance) for images of unfamiliar faces. This general face learning gives the system special knowledge of faces and enables it to perform within-category face discrimination for unfamiliar faces ( O’Toole et al. 2018 ). With or without Stage 3 training, the network is now capable of converting images of faces into points in a high-dimensional space, which, as noted above, is the primary goal of training. In practice, however, Stages 3 and 4 can provide a critical bridge to modeling behavioral characteristics of the human face processing system.

4.2.3. Adapting to local statistics of people and visual environments: Stage 3.

The objective of Stage 3 training is to finalize the modification of the DCNN weights to better adapt to the application domain. The term application domain can refer to faces from a particular race or ethnicity or, as it is commonly used in industry, to the type of images to be processed (e.g., in-the-wild faces, passport photographs). This training is a crucial step in many applications because there will be no further transformation of the weights. Special care is needed in this training to avoid collapsing the representation into a form that is too specific. Training at this stage can improve performance for some faces and decrease it for others.

Whereas Stages 1 and 2 are used in the vast majority of published computational work, in Stage 3, researchers diverge. Although there is no standard implementation for this training, fine-tuning and learning a triplet loss embedding ( van der Maaten & Weinberger 2012 ) are common methods. These methods are conceptually similar but differ in implementation. In both methods, ( a ) new layers are added to the network, ( b ) specific subsets of layers are frozen or unfrozen, and ( c ) optimization continues with an appropriate loss function using a new data set with the desired domain characteristics. Fine-tuning starts from an already-viable network state and updates a nonempty subset of weights, or possibly all weights. It is typically implemented with smaller learning rates and can use smaller training sets than those needed for full training. Triplet loss is implemented by freezing all layers and adding a new, fully connected layer. Minimization is done with the triplet loss, again on a new (smaller) data set with the desired domain characteristics.

A natural question is why Stage 2 (general face training) is not considered fine-tuning. The answer, in practice, comes down to viability and volume. When the training for Stage 2 starts, the network is not in a viable state to perform face recognition. Therefore, it requires a voluminous, diverse data set to function. Stage 3 begins with a functional network and can be tuned effectively with a small targeted data set.

This face knowledge history provides a tool for adapting to local face statistics (e.g., race) ( O’Toole et al. 2018 ).

4.2.4. Learning individual people: Stage 4.

In psychological terms, learning individual familiar faces involves seeing multiple, diverse images of the individuals to whom the faces belong. As we see more images of a person, we become more familiar with their face and can recognize it from increasingly variable images ( Dowsett et al. 2016 , Murphy et al. 2015 , Ritchie & Burton 2017 ). In computational terms, this translates into the question of how a network can learn to recognize a random set of special (familiar) faces with greater accuracy and robustness than other nonspecial (unfamiliar) faces—assuming, of course, the availability of multiple, variable images of the special faces. This stage of learning is defined, in nearly all cases, outside of the DCNN, with no change to weights within the DCNN.

The problem is as follows. The network starts with multiple images of each familiar identity and can produce a representation for each of the images–but what then? There is no standard familiarization protocol, but several approaches exist. We categorize these approaches first and link them to theoretical accounts of face familiarity in Section 4.3.3 .

The first approach is averaging identity codes, or 1-class learning. It is common in machine learning to use an average (or weighted average) of the DCNN-generated face image representations as an identity code (see also Crosswhite et al. 2018 , Su et al. 2015 ). Averaging creates a person-identity prototype ( Noyes et al. 2021 ) for each familiar face.

The second is individual face contrast, or 2-class learning. This technique employs direct learning of individual identities by contrasting them with all other identities. There are two classes because the model learns what makes each identity (positive class) different than all other identities (negative class). The distinctiveness of each familiar face is enhanced relative to all other known faces (e.g., Noyes et al. 2021 ).

The third is multiple face contrast, or K-class learning. This refers to the use of identification training for a random set of (familiar) faces with a simple network (often a one-layer network). The network learns to map DCNN-generated face representations of the available images onto identity nodes.

The fourth approach is fine-tuning individual face representations. Fine-tuning has also been used for learning familiar identities ( Blauch et al. 2020a ). It is an unusual method because it alters weights within the DCNN itself. This can improve performance for the familiarized faces but can limit the network’s ability to represent other faces.

These methods create a personal face learning history that supports more accurate and robust face processing for familiar people ( O’Toole et al. 2018 ).

4.3. Mapping Learning Between Humans and Machines

Deep networks rely on multiple types of learning that can be useful in formulating and testing complex, nuanced hypotheses about human face learning. Manipulable variables include order of learning, training data, and network plasticity at different learning stages. We consider a sample of topics in human face processing that can be investigated by manipulating learning in deep networks. Because these investigations are just beginning, we provide an overview of the work in progress and discuss possible next steps in modeling.

4.3.1. Development of face processing.

Early infants’ experience with faces is critical for the development of face processing skills ( Maurer et al. 2002 ). The timing of this experience has become increasingly clear with the availability of data sets gathered using head-mounted cameras in infants (1–15 months of age) (e.g., Jayaraman et al. 2015 , Yoshida & Smith 2008 ). In seeing the world from the perspective of the infant, it becomes clear that the development of sensorimotor abilities drives visual experience. Infants’ experience transitions from seeing only what is made available to them (often faces in the near range), to seeing the world from the perspective of a crawler (objects and environments), to seeing hands and the objects that they manipulate ( Fausey et al. 2016 , Jayaraman et al. 2015 , Smith & Slone 2017 , Sugden & Moulson 2017 ). Between 1 and 3 months of age, faces are frequent, temporally persistent, and viewed frontally at close range. This early experience with faces is limited to a few individuals. Faces become less frequent as the child’s first year progresses and attention shifts to the environment, to objects, and later to hands ( Jayaraman & Smith 2019 ).

The prevalence of a few important faces in the infants’ visual world suggests that early face learning may have an out-sized influence on structuring visual recognition systems. Infants’ visual experience of objects, faces, and environments can provide a curriculum for teaching machines ( Smith et al. 2018 ). DCNNs can be used to test hypotheses about the emergence of competence on different face processing tasks. Some basic computational challenges, however, need to be addressed. Training with very large numbers of objects (or faces) is required for deep network learning to converge (see Section 4.2.1 ). Starting small and building competence on multiple domains (faces, objects, environments) might require basic changes to deep network training. Alternatively, the small number of special faces in an infant’s life might be considered familiar faces. Perception and memory of these faces may be better modeled using tools that operate outside the deep network on representations that develop within the network (Stage 4 learning; Section 4.2.4 ). In this case, the quality of the representation produced at different points in a network’s development of more general visual knowledge varies (Stages 1 and 2 of training; Sections 4.2.1 and 4.2.2 ). The learning of these special faces early in development might interact with the learning of objects and scenes at the categorical level ( Rosch et al. 1976 , Yovel et al. 2012 ). A promising approach would involve pausing training in Stages 1 and 2 to test face representation quality at various points along the way to convergence.

4.3.2. Race bias in the performance of humans and deep networks.

People recognize own-race faces more accurately than other-race faces. For humans, this other-race effect begins in infancy ( Kelly et al. 2005 , 2007 ) and is manifest in children ( Pezdek et al. 2003 ). Although it is possible to reverse these effects in childhood ( Sangrigoli et al. 2005 ), training adults to recognize other-race faces yields only modest gains (e.g., Cavazos et al. 2019 , Hayward et al. 2017 , Laurence et al. 2016 , Matthews & Mondloch 2018 , Tanaka & Pierce 2009 ). Concomitantly, evidence for the experience-based contact hypothesis is weak when it is evaluated in adulthood ( Levin 2000 ). Clearly, the timing of experience is critical in the other-race effect. Developmental learning, which results in perceptual narrowing during a critical childhood period, may provide a partial account of the other-race effect ( Kelly et al. 2007 , Sangrigoli et al. 2005 , Scott & Monesson 2010 ).

Perceptual narrowing:

sculpting of neural and perceptual processing via experience during a critical period in child development

Face recognition algorithms from the 1990s and present-day DCNNs differ in accuracy for faces of different races (for a review, see Cavazos et al. 2020 ; for a comprehensive test of race bias in DCNNs, see Grother et al. 2019 ). Although training with faces of different races is often cited as a cause of race effects, it is unclear which training stage(s) contribute to the bias. It is likely that biased learning affects all learning stages. From the human perspective, for many people, experience favors own-race faces across the lifespan, potentially impacting performance through multiple learning mechanisms (developmental, unfamiliar, and familiar face learning). DCNN training may also use race-biased data at all stages. For humans, understanding the role of different types of learning in the other-race effect is challenging because experience with faces cannot be controlled. DCNNs can serve as a tool for studying critical periods and perceptual narrowing. It is possible to compare the face representations that emerge from training regimes that vary in the time course of exposure to faces of different races. The ability to manipulate training stage order, network plasticity, and training set diversity in deep networks offers an opportunity to test hypotheses about how bias emerges. The major challenge for DCNNs is the limited availability of face databases that represent the diversity of humans.

4.3.3. Familiar versus unfamiliar face recognition.

Face familiarity in a deep network can be modeled in more ways than we can count. The approaches presented in Section 4.2.4 are just a beginning. Researchers should focus first on the big questions. How do familiar and unfamiliar face representations differ—beyond simple accuracy and robustness? This has been much debated recently, and many questions remain ( Blauch et al. 2020a , b ; Young & Burton 2020 ; Yovel & Abudarham 2020 ). One approach is to ask where in the learning process representations for familiar and unfamiliar faces diverge. The methods outlined in Section 4.2.4 make some predictions.

In the individual and multiple face contrast methods, familiar and unfamiliar face representations are not differentiated within the deep network. Instead, familiar face representations generated by the DCNN are enhanced in another, simpler network populated with known faces. A familiar face’s representation is affected, therefore, by the other faces that we know well. Contrast techniques have preliminary empirical support. In the work of Noyes et al. (2021) , familiarization using individual-face contrast improved identification for both evasion and impersonation disguise. It also produced a pattern of accuracy similar to that seen for people familiar with the disguised individuals ( Noyes & Jenkins 2019 ). For humans who were unfamiliar with the disguised faces, the pattern of accuracy resembled that seen after general face training inside of the DCNN. There is also support for multiple-face contrast familiarization. Perceptual expertise findings that emphasize the selective effects of the exemplars experienced during highly skilled learning are consistent with this approach ( Collins & Behrmann 2020 ) (see Section 3.2 ).

Familiarization by averaging and fine-tuning both improve performance, but at a cost. For example, averaging the DCNN representations increased performance for evasion disguise by increasing tolerance for appearance variation ( Noyes et al. 2021 ). It decreased performance, however, for imposter disguise by allowing too much tolerance for appearance variation. Averaging methods highlight the need to balance the perception of identity across variable images with an ability to tell similar faces apart.

Familiarization via fine-tuning was explored by Blauch et al. (2020a) , who varied the number of layers tuned (all layers, fully connected layers, only the fully connected layer mapping the perceptual layer to identity nodes). Fine-tuning applied at lower layers alters the weights within the deep network to produce a perceptual representation potentially affected by familiar faces. Fine-tuning in the mapping layer is equivalent to multiclass face contrast learning ( Blauch et al. 2020b ). Blauch et al. (2020b) show that fine-tuning the perceptual representation, which they consider analogous to perceptual learning, is not necessary for producing a familiarity effect ( Blauch et al. 2020a ).

These approaches are not (necessarily) mutually exclusive and therefore can be combined to exploit useful features of each.

4.3.4. Objects, faces, both.

The organization of face-, body-, and object-selective areas in the ventral temporal cortex has been studied intensively (cf. Grill-Spector & Weiner 2014 ). Neuroimaging studies in childhood reveal the developmental time course of face selectivity and other high-level visual tasks (e.g., Natu et al. 2016 ; Nordt et al. 2019 , 2020 ). How these systems interact during development in the context of constantly changing input from the environment is an open question. DCNNs can be used to test functional hypotheses about the development of object and face learning (see also Grill-Spector et al. 2018 ).

In the case of machine learning, face recognition networks are more accurate when pretrained to categorize objects ( Liu et al. 2015 , Yi et al. 2014 ), and networks trained with only faces are more accurate for face recognition than networks trained with only objects ( Abudarham & Yovel 2020 , Blauch et al. 2020a ). Human-like viewpoint invariance was found in a DCNN trained for face recognition but not in one trained for object recognition ( Abudarham & Yovel 2020 ). In machine learning, networks are trained first with objects, and then with faces. Moreover, networks can simultaneously learn object and face recognition ( Dobs et al. 2020 ), which incurs minimal duplication of neural resources.

4.4. New Tools, New Questions, New Data, and a New Look at Old Data

Psychologists have long posited diverse and complex learning mechanisms for faces. Deep networks provide new tools that can be used to model human face learning with greater precision than was possible previously. This is useful because it encourages theoreticians to articulate hypotheses in ways specific enough to model. It may no longer be sufficient to explain a phenomenon in terms of generic learning or contact. Concepts such as perceptual narrowing should include ideas about where and how in the learning process this narrowing occurs. A major challenge ahead is the sheer number of knobs to be set in deep networks. Plasticity, for example, can be dialed up or down, and it can be applied to selected network layers or specific face diets administered across multiple learning stages (in sequence or simultaneously). The list goes on. In all of the topics discussed, and others not discussed, theoretical ideas should specify the manipulations thought to be most critical. We should follow the counsel of Box (1976) to avoid worrying selectivity and instead focus on what is most important. New tools succeed when they facilitate the discovery of things that we did not know or had not hypothesized. Testing these hypotheses will require new data and may suggest a reevaluation of existing data.

5. THE PATH FORWARD

In this review, we highlight fundamental advances in thinking brought about by deep learning approaches. These networks solve the inverse optics problem for face identification by untangling image, appearance, and identity over layers of neural-like processing. This demonstrates that robust face identification can be achieved with a representation that includes specific information about the face image(s) actually experienced. These representations retain information about appearance, perceived traits, expressions, and identity.

Direct-fit models posit that deep networks operate by placing new observations into the context of past experience. These models depend on overparameterized networks that create a high-dimensional space from real-world training data. Face representations housed within this space project onto units, thereby confounding stimulus features that (may) separate in the high-dimensional space. This raises questions about the transparency and interpretability of information gained by examining the response properties of network units. Deep networks can be studied at the both micro- and macroscale simultaneously and can be used to formulate hypotheses about the underlying neural code for faces. A key to understanding face representations is to reconcile the responses of neurons to the structure of the code in the high-dimensional space. This is a challenging problem best approached by combining psychological, neural, and computational methods.

The process of training a deep network is complex and layered. It draws on learning mechanisms aimed at objects and faces, visual categories of faces (e.g., race), and special familiar faces. Psychological and neural theory considers the many ways in which people and brains learn faces from real-world visual experience. DCNNs offer the potential to implement and test sophisticated hypotheses about how humans learn faces across the lifespan.

We should not lose sight of the fact that a compelling reason to study deep networks is that they actually work, i.e., they perform nearly as well as humans, on face recognition tasks that have stymied computational modelers for decades. This might qualify as a property of deep networks that is importantly right ( Box 1976 ). There is a difference, of course, between working and working like humans. Determining whether a deep network can work like humans, or could be made to do so by manipulating other properties of the network (e.g., architectures, training data, learning rules), is work that is just beginning.

SUMMARY POINTS

  • Face representations generated by DCNN networks trained for identification retain information about the face (e.g., identity, demographics, attributes, traits, expression) and the image (e.g., viewpoint).
  • Deep learning face networks generate a surprisingly structured face representation from unstructured training with in-the-wild face images.
  • Individual output units from deep networks are unlikely to signal the presence of interpretable features.
  • Fundamental structural aspects of high-level visual codes for faces in deep networks replicate over a wide variety of network architectures.
  • Diverse learning mechanisms in DCNNs, applied simultaneously or in sequence, can be used to model human face perception across the lifespan.

FUTURE ISSUES

  • Large-scale systematic manipulations of training data (race, ethnicity, image variability) are needed to give insight into the role of experience in structuring face representations.
  • Fundamental challenges remain in understanding how to combine deep networks for face, object, and scene recognition in ways analogous to the human visual system.
  • Deep networks model the ventral visual stream at a generic level, arguably up to the level of the IT cortex. Future work should examine how downstream systems, such as face patches, could be connected into this system.
  • In rethinking the goals of face processing, we argue in this review that some longstanding assumptions about visual representations should be reconsidered. Future work should consider novel experimental questions and employ methods that do not rely on these assumptions.

ACKNOWLEDGMENTS

The authors are supported by funding provided by National Eye Institute grant R01EY029692-03 to A.J.O. and C.D.C.

DISCLOSURE STATEMENT

C.D.C. is an equity holder in Mukh Technologies, which may potentially benefit from research results.

1 This is the case in networks trained with the Softmax objective function.

LITERATURE CITED

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, et al. 2016. Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) , pp. 265–83. Berkeley, CA: USENIX [ Google Scholar ]
  • Abudarham N, Shkiller L, Yovel G. 2019. Critical features for face recognition . Cognition 182 :73–83 [ PubMed ] [ Google Scholar ]
  • Abudarham N, Yovel G. 2020. Face recognition depends on specialized mechanisms tuned to view-invariant facial features: insights from deep neural networks optimized for face or object recognition . bioRxiv 2020.01.01.890277 . 10.1101/2020.01.01.890277 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Azevedo FA, Carvalho LR, Grinberg LT, Farfel JM, Ferretti RE, et al. 2009. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain . J. Comp. Neurol 513 ( 5 ):532–41 [ PubMed ] [ Google Scholar ]
  • Barlow HB. 1972. Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1 ( 4 ):371–94 [ PubMed ] [ Google Scholar ]
  • Bashivan P, Kar K, DiCarlo JJ. 2019. Neural population control via deep image synthesis . Science 364 ( 6439 ):eaav9436 [ PubMed ] [ Google Scholar ]
  • Best-Rowden L, Jain AK. 2018. Learning face image quality from human assessments . IEEE Trans. Inform. Forensics Secur 13 ( 12 ):3064–77 [ Google Scholar ]
  • Blanz V, Vetter T. 1999. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques , pp. 187–94. New York: ACM [ Google Scholar ]
  • Blauch NM, Behrmann M, Plaut DC. 2020a. Computational insights into human perceptual expertise for familiar and unfamiliar face recognition . Cognition 208 :104341. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Blauch NM, Behrmann M, Plaut DC. 2020b. Deep learning of shared perceptual representations for familiar and unfamiliar faces: reply to commentaries . Cognition 208 :104484. [ PubMed ] [ Google Scholar ]
  • Box GE. 1976. Science and statistics . J. Am. Stat. Assoc 71 ( 356 ):791–99 [ Google Scholar ]
  • Box GEP. 1979. Robustness in the strategy of scientific model building. In Robustness in Statistics , ed. Launer RL, Wilkinson GN, pp. 201–36. Cambridge, MA: Academic Press [ Google Scholar ]
  • Bruce V, Young A. 1986. Understanding face recognition . Br. J. Psychol 77 ( 3 ):305–27 [ PubMed ] [ Google Scholar ]
  • Burton AM, Bruce V, Hancock PJ. 1999. From pixels to people: a model of familiar face recognition . Cogn. Sci 23 ( 1 ):1–31 [ Google Scholar ]
  • Cavazos JG, Noyes E, O’Toole AJ. 2019. Learning context and the other-race effect: strategies for improving face recognition . Vis. Res 157 :169–83 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cavazos JG, Phillips PJ, Castillo CD, O’Toole AJ. 2020. Accuracy comparison across face recognition algorithms: Where are we on measuring race bias? IEEE Trans. Biom. Behav. Identity Sci 3 ( 1 ):101–11 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Chang L, Tsao DY. 2017. The code for facial identity in the primate brain . Cell 169 ( 6 ):1013–28 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Chen JC, Patel VM, Chellappa R. 2016. Unconstrained face verification using deep CNN features. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 1–9. Piscataway, NJ: IEEE [ Google Scholar ]
  • Cichy RM, Kaiser D. 2019. Deep neural networks as scientific models . Trends Cogn. Sci 23 ( 4 ):305–17 [ PubMed ] [ Google Scholar ]
  • Collins E, Behrmann M. 2020. Exemplar learning reveals the representational origins of expert category perception . PNAS 117 ( 20 ):11167–77 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Colón YI, Castillo CD, O’Toole AJ. 2021. Facial expression is retained in deep networks trained for face identification . J. Vis 21 ( 4 ):4 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cootes TF, Taylor CJ, Cooper DH, Graham J. 1995. Active shape models-their training and application . Comput. Vis. Image Underst 61 ( 1 ):38–59 [ Google Scholar ]
  • Crosswhite N, Byrne J, Stauffer C, Parkhi O, Cao Q, Zisserman A. 2018. Template adaptation for face verification and identification . Image Vis. Comput 79 :35–48 [ Google Scholar ]
  • Deng J, Guo J, Xue N, Zafeiriou S. 2019. Arcface: additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4690–99. Piscataway, NJ: IEEE [ PubMed ] [ Google Scholar ]
  • Dhar P, Bansal A, Castillo CD, Gleason J, Phillips P, Chellappa R. 2020. How are attributes expressed in face DCNNs? In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) , pp. 61–68. Piscataway, NJ: IEEE [ Google Scholar ]
  • DiCarlo JJ, Cox DD. 2007. Untangling invariant object recognition . Trends Cogn. Sci 11 ( 8 ):333–41 [ PubMed ] [ Google Scholar ]
  • Dobs K, Kell AJ, Martinez J, Cohen M, Kanwisher N. 2020. Using task-optimized neural networks to understand why brains have specialized processing for faces . J. Vis 20 ( 11 ):660 [ Google Scholar ]
  • Dowsett A, Sandford A, Burton AM. 2016. Face learning with multiple images leads to fast acquisition of familiarity for specific individuals . Q. J. Exp. Psychol 69 ( 1 ):1–10 [ PubMed ] [ Google Scholar ]
  • El Khiyari H, Wechsler H. 2016. Face verification subject to varying (age, ethnicity, and gender) demographics using deep learning . J. Biom. Biostat 7 :323 [ Google Scholar ]
  • Fausey CM, Jayaraman S, Smith LB. 2016. From faces to hands: changing visual input in the first two years . Cognition 152 :101–7 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Freiwald WA, Tsao DY. 2010. Functional compartmentalization and viewpoint generalization within the macaque face-processing system . Science 330 ( 6005 ):845–51 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Freiwald WA, Tsao DY, Livingstone MS. 2009. A face feature space in the macaque temporal lobe . Nat. Neurosci 12 ( 9 ):1187–96 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fukushima K 1988. Neocognitron: a hierarchical neural network capable of visual pattern recognition . Neural Netw 1 ( 2 ):119–30 [ Google Scholar ]
  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, et al. 2014. Generative adversarial nets. In NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems , pp. 2672–80. New York: ACM [ Google Scholar ]
  • Goodman CS, Shatz CJ. 1993. Developmental mechanisms that generate precise patterns of neuronal connectivity . Cell 72 :77–98 [ PubMed ] [ Google Scholar ]
  • Grill-Spector K, Kushnir T, Edelman S, Avidan G, Itzchak Y, Malach R. 1999. Differential processing of objects under various viewing conditions in the human lateral occipital complex . Neuron 24 ( 1 ):187–203 [ PubMed ] [ Google Scholar ]
  • Grill-Spector K, Weiner KS. 2014. The functional architecture of the ventral temporal cortex and its role in categorization . Nat. Rev. Neurosci 15 ( 8 ):536–48 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Grill-Spector K, Weiner KS, Gomez J, Stigliani A, Natu VS. 2018. The functional neuroanatomy of face perception: from brain measurements to deep neural networks . Interface Focus 8 ( 4 ):20180013. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gross CG. 2002. Genealogy of the “grandmother cell” . Neuroscientist 8 ( 5 ):512–18 [ PubMed ] [ Google Scholar ]
  • Grother P, Ngan M, Hanaoka K. 2019. Face recognition vendor test (FRVT) part 3: demographic effects . Rep., Natl. Inst. Stand. Technol., US Dept. Commerce, Gaithersburg, MD [ Google Scholar ]
  • Hancock PJ, Bruce V, Burton AM. 2000. Recognition of unfamiliar faces . Trends Cogn. Sci 4 ( 9 ):330–37 [ PubMed ] [ Google Scholar ]
  • Hasson U, Nastase SA, Goldstein A. 2020. Direct fit to nature: an evolutionary perspective on biological and artificial neural networks . Neuron 105 ( 3 ):416–34 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hayward WG, Favelle SK, Oxner M, Chu MH, Lam SM. 2017. The other-race effect in face learning: using naturalistic images to investigate face ethnicity effects in a learning paradigm . Q. J. Exp. Psychol 70 ( 5 ):890–96 [ PubMed ] [ Google Scholar ]
  • Hesse JK, Tsao DY. 2020. The macaque face patch system: a turtle’s underbelly for the brain . Nat. Rev. Neurosci 21 ( 12 ):695–716 [ PubMed ] [ Google Scholar ]
  • Hill MQ, Parde CJ, Castillo CD, Colon YI, Ranjan R, et al. 2019. Deep convolutional neural networks in the face of caricature . Nat. Mach. Intel 1 ( 11 ):522–29 [ Google Scholar ]
  • Hong H, Yamins DL, Majaj NJ, DiCarlo JJ. 2016. Explicit information for category-orthogonal object properties increases along the ventral stream . Nat. Neurosci 19 ( 4 ):613–22 [ PubMed ] [ Google Scholar ]
  • Hornik K, Stinchcombe M, White H. 1989. Multilayer feedforward networks are universal approximators . Neural Netw 2 ( 5 ):359–66 [ Google Scholar ]
  • Huang GB, Lee H, Learned-Miller E. 2012. Learning hierarchical representations for face verification with convolutional deep belief networks. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 2518–25. Piscataway, NJ: IEEE [ Google Scholar ]
  • Huang GB, Mattar M, Berg T, Learned-Miller E. 2008. Labeled faces in the wild: a database for studying face recognition in unconstrained environments . Paper presented at the Workshop on Faces in “Real-Life” Images: Detection, Alignment, and Recognition, Marseille, France [ Google Scholar ]
  • Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A. 2019. Adversarial examples are not bugs, they are features . arXiv:1905.02175 [stat.ML] [ Google Scholar ]
  • Issa EB, DiCarlo JJ. 2012. Precedence of the eye region in neural processing of faces . J. Neurosci 32 ( 47 ):16666–82 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jacquet M, Champod C. 2020. Automated face recognition in forensic science: review and perspectives . Forensic Sci. Int 307 :110124. [ PubMed ] [ Google Scholar ]
  • Jayaraman S, Fausey CM, Smith LB. 2015. The faces in infant-perspective scenes change over the first year of life . PLOS ONE 10 ( 5 ):e0123780. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jayaraman S, Smith LB. 2019. Faces in early visual environments are persistent not just frequent . Vis. Res 157 :213–21 [ PubMed ] [ Google Scholar ]
  • Jenkins R, White D, Van Montfort X, Burton AM. 2011. Variability in photos of the same face . Cognition 121 ( 3 ):313–23 [ PubMed ] [ Google Scholar ]
  • Kandel ER, Schwartz JH, Jessell TM, Siegelbaum S, Hudspeth AJ, Mack S, eds. 2000. Principles of Neural Science , Vol. 4 . New York: McGraw-Hill [ Google Scholar ]
  • Kay KN, Weiner KS, Grill-Spector K. 2015. Attention reduces spatial uncertainty in human ventral temporal cortex . Curr. Biol 25 ( 5 ):595–600 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kelly DJ, Quinn PC, Slater AM, Lee K, Ge L, Pascalis O. 2007. The other-race effect develops during infancy: evidence of perceptual narrowing . Psychol. Sci 18 ( 12 ):1084–89 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kelly DJ, Quinn PC, Slater AM, Lee K, Gibson A, et al. 2005. Three-month-olds, but not newborns, prefer own-race faces . Dev. Sci 8 ( 6 ):F31–36 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kietzmann TC, Swisher JD, König P, Tong F. 2012. Prevalence of selectivity for mirror-symmetric views of faces in the ventral and dorsal visual pathways . J. Neurosci 32 ( 34 ):11763–72 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krishnapriya KS, Albiero V, Vangara K, King MC, Bowyer KW. 2020. Issues related to face recognition accuracy varying based on race and skin tone . IEEE Trans. Technol. Soc 1 ( 1 ):8–20 [ Google Scholar ]
  • Krishnapriya K, Vangara K, King MC, Albiero V, Bowyer K. 2019. Characterizing the variability in face recognition accuracy relative to race. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , Vol. 1 , pp. 2278–85. Piscataway, NJ: IEEE [ Google Scholar ]
  • Krizhevsky A, Sutskever I, Hinton GE. 2012. Imagenet classification with deep convolutional neural networks. In NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems , pp. 1097–105. New York: ACM [ Google Scholar ]
  • Kumar N, Berg AC, Belhumeur PN, Nayar SK. 2009. Attribute and simile classifiers for face verification. In Proceedings of the 2009 IEEE International Conference on Computer Vision , pp. 365–72. Piscataway, NJ: IEEE [ Google Scholar ]
  • Laurence S, Zhou X, Mondloch CJ. 2016. The flip side of the other-race coin: They all look different to me . Br. J. Psychol 107 ( 2 ):374–88 [ PubMed ] [ Google Scholar ]
  • LeCun Y, Bengio Y, Hinton G. 2015. Deep learning . Nature 521 ( 7553 ):436–44 [ PubMed ] [ Google Scholar ]
  • Levin DT. 2000. Race as a visual feature: using visual search and perceptual discrimination tasks to understand face categories and the cross-race recognition deficit . J. Exp. Psychol. Gen 129 ( 4 ):559–74 [ PubMed ] [ Google Scholar ]
  • Lewenberg Y, Bachrach Y, Shankar S, Criminisi A. 2016. Predicting personal traits from facial images using convolutional neural networks augmented with facial landmark information . arXiv:1605.09062 [cs.CV] [ Google Scholar ]
  • Li Y, Gao F, Ou Z, Sun J. 2018. Angular softmax loss for end-to-end speaker verification. In Proceedings of the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) , pp. 190–94. Baixas, France: ISCA [ Google Scholar ]
  • Liu Z, Luo P, Wang X, Tang X. 2015. Deep learning face attributes in the wild. In Proceedings of the 2015 IEEE International Conference on Computer Vision , pp. 3730–38. Piscataway, NJ: IEEE [ Google Scholar ]
  • Lundqvist D, Flykt A, Ohman A. 1998. Karolinska directed emotional faces . Database of standardized facial images, Psychol. Sect., Dept. Clin. Neurosci. Karolinska Hosp., Solna, Swed. https://www.kdef.se/#:~:text=The%20Karolinska%20Directed%20Emotional%20Faces,from%20the%20original%20KDEF%20images [ Google Scholar ]
  • Malpass RS, Kravitz J. 1969. Recognition for faces of own and other race . J. Personal. Soc. Psychol 13 ( 4 ):330–34 [ PubMed ] [ Google Scholar ]
  • Matthews CM, Mondloch CJ. 2018. Improving identity matching of newly encountered faces: effects of multi-image training . J. Appl. Res. Mem. Cogn 7 ( 2 ):280–90 [ Google Scholar ]
  • Maurer D, Le Grand R, Mondloch CJ. 2002. The many faces of configural processing . Trends Cogn. Sci 6 ( 6 ):255–60 [ PubMed ] [ Google Scholar ]
  • Maze B, Adams J, Duncan JA, Kalka N, Miller T, et al. 2018. IARPA Janus Benchmark—C: face dataset and protocol. In Proceedings of the 2018 International Conference on Biometrics (ICB) , pp. 158–65. Piscataway, NJ: IEEE [ Google Scholar ]
  • McCurrie M, Beletti F, Parzianello L, Westendorp A, Anthony S, Scheirer WJ. 2017. Predicting first impressions with deep learning. In Proceedings of the 2017 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 518–25. Piscataway, NJ: IEEE [ Google Scholar ]
  • Murphy J, Ipser A, Gaigg SB, Cook R. 2015. Exemplar variance supports robust learning of facial identity . J. Exp. Psychol. Hum. Percept. Perform 41 ( 3 ):577–81 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Natu VS, Barnett MA, Hartley J, Gomez J, Stigliani A, Grill-Spector K. 2016. Development of neural sensitivity to face identity correlates with perceptual discriminability . J. Neurosci 36 ( 42 ):10893–907 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Natu VS, Jiang F, Narvekar A, Keshvari S, Blanz V, O’Toole AJ. 2010. Dissociable neural patterns of facial identity across changes in viewpoint . J. Cogn. Neurosci 22 ( 7 ):1570–82 [ PubMed ] [ Google Scholar ]
  • Nordt M, Gomez J, Natu V, Jeska B, Barnett M, Grill-Spector K. 2019. Learning to read increases the informativeness of distributed ventral temporal responses . Cereb. Cortex 29 ( 7 ):3124–39 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Nordt M, Gomez J, Natu VS, Rezai AA, Finzi D, Grill-Spector K. 2020. Selectivity to limbs in ventral temporal cortex decreases during childhood as selectivity to faces and words increases . J. Vis 20 ( 11 ):152 [ Google Scholar ]
  • Noyes E, Jenkins R. 2019. Deliberate disguise in face identification . J. Exp. Psychol. Appl 25 ( 2 ):280–90 [ PubMed ] [ Google Scholar ]
  • Noyes E, Parde C, Colon Y, Hill M, Castillo C, et al. 2021. Seeing through disguise: getting to know you with a deep convolutional neural network . Cognition . In press [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Noyes E, Phillips P, O’Toole A. 2017. What is a super-recogniser. In Face Processing: Systems, Disorders and Cultural Differences , ed. Bindemann M, pp. 173–201. Hauppage, NY: Nova Sci. Publ. [ Google Scholar ]
  • Oosterhof NN, Todorov A. 2008. The functional basis of face evaluation . PNAS 105 ( 32 ):11087–92 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • O’Toole AJ, Castillo CD, Parde CJ, Hill MQ, Chellappa R. 2018. Face space representations in deep convolutional neural networks . Trends Cogn. Sci 22 ( 9 ):794–809 [ PubMed ] [ Google Scholar ]
  • O’Toole AJ, Phillips PJ, Jiang F, Ayyad J, Pénard N, Abdi H. 2007. Face recognition algorithms surpass humans matching faces over changes in illumination . IEEE Trans. Pattern Anal. Mach. Intel ( 9 ):1642–46 [ PubMed ] [ Google Scholar ]
  • Parde CJ, Castillo C, Hill MQ, Colon YI, Sankaranarayanan S, et al. 2017. Face and image representation in deep CNN features. In Proceedings of the 2017 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) , pp. 673–80. Piscataway, NJ: IEEE [ Google Scholar ]
  • Parde CJ, Colón YI, Hill MQ, Castillo CD, Dhar P, O’Toole AJ. 2021. Face recognition by humans and machines: closing the gap between single-unit and neural population codes—insights from deep learning in face recognition . J. Vis In press [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Parde CJ, Hu Y, Castillo C, Sankaranarayanan S, O’Toole AJ. 2019. Social trait information in deep convolutional neural networks trained for face identification . Cogn. Sci 43 ( 6 ):e12729. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Parkhi OM, Vedaldi A, Zisserman A. 2015. Deep face recognition . Rep., Vis. Geom. Group, Dept. Eng. Sci., Univ. Oxford, UK [ Google Scholar ]
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, et al. 2019. Pytorch: an imperative style, high-performance deep learning library. In NeurIPS 2019: Proceedings of the 32nd International Conference on Neural Information Processing Systems , pp. 8024–35. New York: ACM [ Google Scholar ]
  • Pezdek K, Blandon-Gitlin I, Moore C. 2003. Children’s face recognition memory: more evidence for the cross-race effect . J. Appl. Psychol 88 ( 4 ):760–63 [ PubMed ] [ Google Scholar ]
  • Phillips PJ, Beveridge JR, Draper BA, Givens G, O’Toole AJ, et al. 2011. An introduction to the good, the bad, & the ugly face recognition challenge problem. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG) , pp. 346–53. Piscataway, NJ: IEEE [ Google Scholar ]
  • Phillips PJ, O’Toole AJ. 2014. Comparison of human and computer performance across face recognition experiments . Image Vis. Comput 32 ( 1 ):74–85 [ Google Scholar ]
  • Phillips PJ, Yates AN, Hu Y, Hahn CA, Noyes E, et al. 2018. Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms . PNAS 115 ( 24 ):6171–76 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Poggio T, Banburski A, Liao Q. 2020. Theoretical issues in deep networks . PNAS 117 ( 48 ):30039–45 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ponce CR, Xiao W, Schade PF, Hartmann TS, Kreiman G, Livingstone MS. 2019. Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences . Cell 177 ( 4 ):999–1009 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ranjan R, Bansal A, Zheng J, Xu H, Gleason J, et al. 2019. A fast and accurate system for face detection, identification, and verification . IEEE Trans. Biom. Behav. Identity Sci 1 ( 2 ):82–96 [ Google Scholar ]
  • Ranjan R, Castillo CD, Chellappa R. 2017. L2-constrained softmax loss for discriminative face verification . arXiv:1703.09507 [cs.CV] [ Google Scholar ]
  • Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R. 2017c. An all-in-one convolutional neural network for face analysis. In Proceedings of the 2017 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) , pp. 17–24. Piscataway, NJ: IEEE [ Google Scholar ]
  • Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, et al. 2019. A deep learning framework for neuroscience . Nat. Neurosci 22 ( 11 ):1761–70 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ritchie KL, Burton AM. 2017. Learning faces from variability . Q. J. Exp. Psychol 70 ( 5 ):897–905 [ PubMed ] [ Google Scholar ]
  • Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P. 1976. Basic objects in natural categories . Cogn. Psychol 8 ( 3 ):382–439 [ Google Scholar ]
  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, et al. 2015. ImageNet Large Scale Visual Recognition Challenge . Int. J. Comput. Vis 115 ( 3 ):211–52 [ Google Scholar ]
  • Russell R, Duchaine B, Nakayama K. 2009. Super-recognizers: people with extraordinary face recognition ability . Psychon. Bull. Rev 16 ( 2 ):252–57 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Sangrigoli S, Pallier C, Argenti AM, Ventureyra V, de Schonen S. 2005. Reversibility of the other-race effect in face recognition during childhood . Psychol. Sci 16 ( 6 ):440–44 [ PubMed ] [ Google Scholar ]
  • Sankaranarayanan S, Alavi A, Castillo C, Chellappa R. 2016. Triplet probabilistic embedding for face verification and clustering . arXiv:1604.05417 [cs.CV] [ Google Scholar ]
  • Schrimpf M, Kubilius J, Hong H, Majaj NJ, Rajalingham R, et al. 2018. Brain-score: Which artificial neural network for object recognition is most brain-like? bioRxiv 407007 . 10.1101/407007 [ CrossRef ] [ Google Scholar ]
  • Schroff F, Kalenichenko D, Philbin J. 2015. Facenet: a unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition , pp. 815–23. Piscataway, NJ: IEEE [ Google Scholar ]
  • Scott LS, Monesson A. 2010. Experience-dependent neural specialization during infancy . Neuropsychologia 48 ( 6 ):1857–61 [ PubMed ] [ Google Scholar ]
  • Sengupta S, Chen JC, Castillo C, Patel VM, Chellappa R, Jacobs DW. 2016. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 1–9. Piscataway, NJ: IEEE [ Google Scholar ]
  • Sim T, Baker S, Bsat M. 2002. The CMU pose, illumination, and expression (PIE) database. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition , pp. 53–58. Piscataway, NJ: IEEE [ Google Scholar ]
  • Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition . arXiv:1409.1556 [cs.CV] [ Google Scholar ]
  • Smith LB, Jayaraman S, Clerkin E, Yu C. 2018. The developing infant creates a curriculum for statistical learning . Trends Cogn. Sci 22 ( 4 ):325–36 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Smith LB, Slone LK. 2017. A developmental approach to machine learning? Front. Psychol 8 :2124. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Song A, Linjie L, Atalla C, Gottrell G. 2017. Learning to see people like people: predicting social impressions of faces . Cogn. Sci 2017 :1096–101 [ Google Scholar ]
  • Storrs KR, Kietzmann TC, Walther A, Mehrer J, Kriegeskorte N. 2020. Diverse deep neural networks all predict human it well, after training and fitting . bioRxiv 2020.05.07.082743 . 10.1101/2020.05.07.082743 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Su H, Maji S, Kalogerakis E, Learned-Miller E. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision , pp. 945–53. Piscataway, NJ: IEEE [ Google Scholar ]
  • Sugden NA, Moulson MC. 2017. Hey baby, what’s “up”? One-and 3-month-olds experience faces primarily upright but non-upright faces offer the best views . Q. J. Exp. Psychol 70 ( 5 ):959–69 [ PubMed ] [ Google Scholar ]
  • Taigman Y, Yang M, Ranzato M, Wolf L. 2014. Deepface: closing the gap to human-level performance in face verification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition , pp. 1701–8. Piscataway, NJ: IEEE [ Google Scholar ]
  • Tanaka JW, Pierce LJ. 2009. The neural plasticity of other-race face recognition . Cogn. Affect. Behav. Neurosci 9 ( 1 ):122–31 [ PubMed ] [ Google Scholar ]
  • Terhörst P, Fährmann D, Damer N, Kirchbuchner F, Kuijper A. 2020. Beyond identity: What information is stored in biometric face templates? arXiv:2009.09918 [cs.CV] [ Google Scholar ]
  • Thorpe S, Fize D, Marlot C. 1996. Speed of processing in the human visual system . Nature 381 ( 6582 ):520–22 [ PubMed ] [ Google Scholar ]
  • Todorov A 2017. Face Value: The Irresistible Influence of First Impressions . Princeton, NJ: Princeton Univ. Press [ Google Scholar ]
  • Todorov A, Mandisodza AN, Goren A, Hall CC. 2005. Inferences of competence from faces predict election outcomes . Science 308 ( 5728 ):1623–26 [ PubMed ] [ Google Scholar ]
  • Valentine T 1991. A unified account of the effects of distinctiveness, inversion, and race in face recognition . Q. J. Exp. Psychol. A 43 ( 2 ):161–204 [ PubMed ] [ Google Scholar ]
  • van der Maaten L, Weinberger K. 2012. Stochastic triplet embedding. In Proceedings of the 2012 IEEE International Workshop on Machine Learning for Signal Processing , pp. 1–6. Piscataway, NJ: IEEE [ Google Scholar ]
  • Walker M, Vetter T. 2009. Portraits made to measure: manipulating social judgments about individuals with a statistical face model . J. Vis 9 ( 11 ):12 [ PubMed ] [ Google Scholar ]
  • Wang F, Liu W, Liu H, Cheng J. 2018. Additive margin softmax for face verification . IEEE Signal Process. Lett 25 :926–30 [ Google Scholar ]
  • Wang F, Xiang X, Cheng J, Yuille AL. 2017. Normface: L 2 hypersphere embedding for face verification. In MM ‘17: Proceedings of the 25th ACM International Conference on Multimedia , pp. 1041–49. New York: ACM [ Google Scholar ]
  • Xie C, Tan M, Gong B, Wang J, Yuille AL, Le QV. 2020. Adversarial examples improve image recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 819–28. Piscataway, NJ: IEEE [ Google Scholar ]
  • Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex . PNAS 111 ( 23 ):8619–24 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yi D, Lei Z, Liao S, Li SZ. 2014. Learning face representation from scratch . arXiv:1411.7923 [cs.CV] [ Google Scholar ]
  • Yoshida H, Smith LB. 2008. What’s in view for toddlers? Using a head camera to study visual experience . Infancy 13 ( 3 ):229–48 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Young AW, Burton AM. 2020. Insights from computational models of face recognition: a reply to Blauch, Behrmann and Plaut . Cognition 208 :104422. [ PubMed ] [ Google Scholar ]
  • Yovel G, Abudarham N. 2020. From concepts to percepts in human and machine face recognition: a reply to Blauch, Behrmann & Plaut . Cognition 208 :104424. [ PubMed ] [ Google Scholar ]
  • Yovel G, Halsband K, Pelleg M, Farkash N, Gal B, Goshen-Gottstein Y. 2012. Can massive but passive exposure to faces contribute to face recognition abilities? J. Exp. Psychol. Hum. Percept. Perform 38 ( 2 ):285–89 [ PubMed ] [ Google Scholar ]
  • Yovel G, O’Toole AJ. 2016. Recognizing people in motion . Trends Cogn. Sci 20 ( 5 ):383–95 [ PubMed ] [ Google Scholar ]
  • Yuan L, Xiao W, Kreiman G, Tay FE, Feng J, Livingstone MS. 2020. Adversarial images for the primate brain . arXiv:2011.05623 [q-bio.NC] [ Google Scholar ]
  • Yue X, Cassidy BS, Devaney KJ, Holt DJ, Tootell RB. 2010. Lower-level stimulus features strongly influence responses in the fusiform face area . Cereb. Cortex 21 ( 1 ):35–47 [ PMC free article ] [ PubMed ] [ Google Scholar ]

This week: the arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: going deeper into face detection: a survey.

Abstract: Face detection is a crucial first step in many facial recognition and face analysis systems. Early approaches for face detection were mainly based on classifiers built on top of hand-crafted features extracted from local image regions, such as Haar Cascades and Histogram of Oriented Gradients. However, these approaches were not powerful enough to achieve a high accuracy on images of from uncontrolled environments. With the breakthrough work in image classification using deep neural networks in 2012, there has been a huge paradigm shift in face detection. Inspired by the rapid progress of deep learning in computer vision, many deep learning based frameworks have been proposed for face detection over the past few years, achieving significant improvements in accuracy. In this work, we provide a detailed overview of some of the most representative deep learning based face detection methods by grouping them into a few major categories, and present their core architectural designs and accuracies on popular benchmarks. We also describe some of the most popular face detection datasets. Finally, we discuss some current challenges in the field, and suggest potential future research directions.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • DOI: 10.54254/2755-2721/71/20241642
  • Corpus ID: 271975638

A review of the development of YOLO object detection algorithm

  • Junbiao Liang
  • Published in Applied and Computational… 27 August 2024
  • Computer Science

Related Papers

Showing 1 through 3 of 0 Related Papers

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Systematic Literature Review on the Accuracy of Face Recognition Algorithms

Profile image of Marcos Lazarini

EAI Endorsed Transactions on Internet of Things

Real-time facial recognition systems have been increasingly used, making it relevant to address the accuracy of these systems given the credibility and trust they must offer. Therefore, this article seeks to identify the algorithms currently used by facial recognition systems through a Systematic Literature Review that considers recent scientific articles, published between 2018 and 2021. From the initial collection of ninety-three articles, a subset of thirteen was selected after applying the inclusion and exclusion procedures. One of the outstanding results of this research corresponds to the use of algorithms based on Artificial Neural Networks (ANN) considered in 21% of the solutions, highlighting the use of Convolutional Neural Network (CNN). Another relevant result is the identification of the use of the Viola-Jones algorithm, present in 19% of the solutions. In addition, from this research, two specific facial recognition solutions associated with access control were found co...

Related Papers

HAL (Le Centre pour la Communication Scientifique Directe)

Kadri Chaibou

literature review on face detection algorithms

International Journal of Interactive Mobile Technologies (iJIM)

International Journal of Machine Learning and Computing

Imran Mumtaz

Feature extracting and training module can be done by using face recognition neural learning techniques. Moreover, these techniques are widely employed to extract features from human images. Some detection systems are capable to scan the full body, iris detection, and finger print detection systems. These systems have deployed for safety and security intension. In this research work, we compare different machine learning algorithms for face recognition. Four supervised face recognition machine-learning classifiers such as Principal Component Analysis (PCA), 1-nearest neighbor (1-NN), Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM) are considered. The efficiency of multiple classification systems is also demonstrated and tested in terms of their ability to identify a face correctly. Face Recognition is a technique to identify faces of people whose images are stored in some databases and available in the form of datasets. Extensive experiments conducted on these d...

Darcan, E & Aydogan, H., Facial Recognition Technology. Harvey, K. (Ed.). (2014). Encyclopedia of Social Media and Politics (Vol. 1). SAGE Publications.

Emirhan Darcan PhD.

Face recognition systems are still an effective law-enforcement tool, despite their flaws in accuracy of identification. It is still a very new technology that is being developed, and researchers are constantly trying to optimize its efficiency in accurately identifying people. In the future, face recognition systems may be perfected and become more functional. They may help find missing persons, track down fugitives, and secure national borders. Face recognition systems could provide a more convenient and safe world in the future. Hakan Aydogan Emirhan Darcan Rutgers, The State University of New Jersey

IAEME PUBLICATION

IAEME Publication

Facial recognition using Artificial Intelligence (AI) has become a ubiquitous technology with numerous applications in the modern world. The technology involves analyzing and identifying human faces in digital images or video footage through algorithms and machine learning techniques. This process involves face detection, face alignment, feature extraction, and face matching. While facial recognition technology has numerous benefits, such as enhancing security and streamlining identification processes, there are also concerns about its potential misuse, invasion of privacy, and bias. Therefore, it is essential to use facial recognition technology responsibly and with proper oversight to ensure that it is used for ethical purposes. This article provides an overview of facial recognition using AI, its benefits and drawbacks, and the importance of using it ethically

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT

In the digital world, biometrics is used for authentication or recognition to examine and confirm a person's distinguishing physical or behavioral attributes. There are many authentication systems available today that use iris, fingerprint, and face features for identification and verification. Face recognition-based systems are the most popular since they don't always need the user's assistance, are more automated, and are simple to use. Face recognition paves the way for an innovative way to perceive a human face. Face recognition and identification have been used in access control systems, which have become widely used in security frameworks during the past few years. With the help of biometrics, a facial recognition system can extract facial details from a picture or video. The data is compared to a database of recognized faces to identify a match. Personal identity can be confirmed through facial recognition. This review paper offers a comparison of various facial recognition methods.

International Journal of Information Processing and Communication (IJIPC)

Olayemi M Olaniyi , Shefiu Olusegun Ganiyu , TERFA AKPAGHER

Crime control in human societies has continued to pose significant problems and requires dynamic approaches to be subdued through effective investigation mechanisms. Notable among these approaches is the use of biometric facial recognition which has proven to be ideal, due to its flexible and nonintrusive nature. Mainly, this research conducted a systematic review on algorithms and approaches for facial recognition to aid security operatives in crime investigations. For the first time, the review coined and described three operational environments namely, regulated, unregulated and semi-regulated to which facial recognition is applicable. However, semi-regulated environment is yet to be addressed based on its peculiar characteristics. Subsequently, this study proposed the design of a facial recognition system premised on deep learning and local binary patterns histograms algorithm (LBPH). Certainly, future implementation of the design will help to identify and document known and unknown individuals, thus creating a more efficient and effective approach for crime control.

Nigerian Journal of Technology

John A Popoola

Systems and applications embedded with facial detection and recognition capabilities are founded on the notion that there are differences in face structures among individuals, and as such, we can perform face-matching using the facial symmetry. A widely used application of facial detection and recognition is in security. It is important that the images be processed correctly for computer-based facial recognition, hence, the usage of efficient, cost-effective algorithms and a robust database. This research work puts these measures into consideration and attempts to determine a cost-effective and reliable algorithm out of three algorithms examined.

IJESRT Journal

Face Recognition is the highly used method in security issues and also used in various applications. This technique highly used as the most reliable real time application for victim identification. This paper gives the study of evolution of face recognition method in each period and analyzing merit and demerit of the each evolutionary period. This paper helps to realize the importance of the face recognition technique.

Law, technology and humans

Pedro Zucchetti Filho

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Revue d'intelligence artificielle

bibigul razakhova

International Journal of Recent Research Aspects ISSN 2349-7688

Jagruti wagh

Alexandru Ciocan

Journal of Engineering Science and Technology Review

Pramod Deore

IRJET Journal

2008 8th IEEE International Conference on Automatic Face & Gesture Recognition

Alice O'Toole

International Journal on Recent and Innovation Trends in Computing and Communication

Dr. Kshitij Shinghal

Dr.N. RAGHAVENDRA SAI

Pattern Recognition

Ognjen Arandjelovic

IRJEMTS Publication

International Research Journal of Modernization in Engineering Technology and Science (IRJMETS)

Muhammad Khalid Badini

Jeffrey Huang

Neurocomputing

Umarani Jayaraman

euroasiapub.org

mukesh gollen

International Journal of Research

Dr.S.Aruna Mastani

The AI Journal

Marco Mendola

Ambrose Azeta

International Journal IJRITCC

Snehal Gorde

Library for Science AND Technology. (FREE ARCTICLE FOR SCIENCE)

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

drones-logo

Article Menu

literature review on face detection algorithms

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Applications of gans to aid target detection in sar operations: a systematic literature review.

literature review on face detection algorithms

1. Introduction

2. research method, 2.1. research definition.

  • How can GAN algorithms help detect edges or objects in images generated by UAVs?

Click here to enlarge figure

  • How can GANs be addressed in search and rescue(SAR) operations?
  • What benefits are gained from using a pre-trained model rather than training one from scratch?
  • Which metrics are most suitable for validating these algorithms?
  • (“edge detection” OR “object detection”) AND (uav OR drones) AND (gan OR “generative adversarial networks”)

2.2. Abstract Analysis

2.3. quality assessment.

StudyYearIC1IC2IC3IC4IC5IC6IC7TotalCitationsBasePub. TypeTarget
[ ]20170.51.00.51.00.00.00.03.0-IEEE XplorebRoads
[ ]20175.00.01.01.00.00.00.02.5-IEEE XplorebTransmission Lines
[ ]20180.00.01.01.00.01.00.03.0-IEEE XplorebStingrays
[ ]20190.00.01.00.00.00.00.01.0-IEEE XploreaDiverse entities
[ ]20191.00.01.01.00.00.00.03.0-IEEE XplorebCars
[ ]20190.00.01.01.00.00.00.02.033SCOPUSaDiverse entities
[ ]20191.00.01.01.00.00.00.03.08SCOPUSbVehicles
[ ]20191.00.01.01.00.00.00.03.0-IEEE XplorebVehicles
[ ]20200.00.01.00.50.00.01.02.512SCOPUSaMarkers
[ ]20200.51.01.01.00.00.00.03.515SCOPUSaDiverse entities
[ ]20201.00.01.01.00.00.00.03.07SCOPUSaInsulators
[ ]20200.51.01.01.00.00.00.03.5-IEEE XplorebVehicles
[ ]20211.00.01.01.00.01.00.04.05SCOPUSaPedestrians
[ ]20211.00.01.01.01.00.01.05.01SCOPUSaDiverse entities
[ ]20210.00.01.01.00.00.00.02.023SCOPUSaPlants
[ ]20211.00.01.01.00.00.00.03.00SCOPUSaDiverse entities
[ ]20210.00.01.01.00.00.01.03.026SCOPUSaSmall Entities
[ ]20210.50.01.00.00.00.01.02.510SCOPUSaInsulators
[ ]20210.00.01.01.00.00.00.02.06SCOPUSaDiverse entities
[ ]20211.00.00.01.00.01.00.03.0-IEEE XplorebPedestrians
[ ]20210.00.00.00.50.01.01.02.5-IEEE XplorebLiving beings
[ ]20221.00.01.01.01.00.01.05.01SCOPUSbDebris
[ ]20221.00.01.01.00.00.01.04.023SCOPUSaPavement cracks
[ ]20220.00.01.01.00.00.00.02.00SCOPUSbTransm. Lines defects
[ ]20221.00.01.01.00.00.01.04.08SCOPUSaWildfire
[ ]20220.00.01.01.00.00.00.02.08SCOPUSaAnomaly Entities
[ ]20221.00.01.01.01.00.00.03.08SCOPUSaSmall drones
[ ]20221.00.01.01.00.00.00.03.01SCOPUSaPeach tree crowns
[ ]20221.00.01.00.00.00.00.02.0-IEEE XplorebUAVs
[ ]20221.01.01.00.00.00.00.03.0-IEEE XplorebHuman faces
[ ]20220.00.01.00.01.00.00.02.0-IEEE XplorebObject distances from UAV
[ ]20220.00.01.01.00.00.00.02.0-IEEE XplorebDiverse entities
[ ]20231.00.01.01.00.00.01.04.01SCOPUSbVehicles
[ ]20230.00.01.01.00.00.01.03.02SCOPUSaDiverse entities
[ ]20230.50.01.00.01.00.00.02.5-IEEE XplorebSmall entities
[ ]20231.00.01.01.01.00.01.05.0-IEEE XplorebDrones
[ ]20231.00.01.00.50.00.01.03.5-IEEE XplorebSmall objects

3.1. Human and Animal Detection

3.2. object detection, 3.3. infrared spectrum, 3.4. yolo versions, 3.5. pre-trained models, 4. benchmark, 5. discussion, 6. conclusions, author contributions, informed consent statement, data availability statement, conflicts of interest.

  • McIntosh, S.; Brillhart, A.; Dow, J.; Grissom, C. Search and Rescue Activity on Denali, 1990 to 2008. Wilderness Environ. Med. 2010 , 21 , 103–108. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Curran-Sills, G.; Karahalios, A.; Dow, J.; Grissom, C. Epidemiological Trends in Search and Rescue Incidents Documented by the Alpine Club of Canada From 1970 to 2005. Wilderness Environ. Med. 2015 , 26 , 536–543. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Heggie, T.; Heggie, T.; Dow, J.; Grissom, C. Search and Rescue Trends and the Emergency Medical Service Workload in Utah’s National Parks. Wilderness Environ. Med. 2015 , 19 , 164–171. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ciesa, M.; Grigolato, S.; Cavalli, R. Retrospective Study on Search and Rescue Operations in Two Prealps Areas of Italy. Wilderness Environ. Med. 2015 , 26 , 150–158. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Freitas, C.; Barcellos, C.; Asmus, C.; Silva, M.; Xavier, D. From Samarco in Mariana to Vale in Brumadinho: Mining dam disasters and Public Health. Cad. Saúde Pública 2019 , 35 , e00052519. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sássi, C.; Carvalho, G.; De Castro, L.; Junior, C.; Nunes, V.; Do Nascimento, A. Gonçalves, One decade of environmental disasters in Brazil: The action of veterinary rescue teams. Front. Public Health 2021 , 9 , 624975. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Stoddard, M.; Pelot, R. Historical maritime search and rescue incident data analysis. In Governance of Arctic Shipping: Rethinking Risk, Human Impacts and Regulation ; Chircop, A., Goerlandt, F., Aporta, C., Pelot, R., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 43–62. [ Google Scholar ]
  • Wajeeha, N.; Torres, R.; Gundersen, O.; Karlsen, A. The Use of Decision Support in Search and Rescue: A Systematic Literature Review. ISPRS Int. J. Geo-Inf. 2023 , 12 , 182. [ Google Scholar ] [ CrossRef ]
  • Levine, A.; Feinn, R.; Foggle, J.; Karlsen, A. Search and Rescue in California: The Need for a Centralized Reporting System. Wilderness Environ. Med. 2023 , 34 , 164–171. [ Google Scholar ] [ CrossRef ]
  • Prata, I.; Almeida, A.; de Souza, F.; Rosa, P.; dos Santos, A. Developing a UAV platform for victim localization on search and rescue operations. In Proceedings of the 2022 IEEE 31st International Symposium on Industrial Electronics (ISIE), Anchorage, AK, USA, 1–3 June 2022; pp. 721–726. [ Google Scholar ]
  • Lyu, M.; Zhao, Y.; Huang, C.; Huang, H. Unmanned aerial vehicles for search and rescue: A survey. Remote Sens. 2023 , 15 , 3266. [ Google Scholar ] [ CrossRef ]
  • Braga, J.; Shiguemori, E.; Velho, H. Odometria Visual para a Navegação Autônoma de VANT. Rev. Cereus 2019 , 11 , 184–194. [ Google Scholar ] [ CrossRef ]
  • Cho, S.; Matsushita, Y.; Lee, S. Removing non-uniform motion blur from images. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [ Google Scholar ]
  • Li, Z.; Gao, Z.; Yi, H.; Fu, Y.; Chen, B. Image Deblurring with Image Blurring. IEEE Trans. Image Process. 2023 , 32 , 5595–5609. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fan, L.; Zhang, F.; Fan, H.; Zhang, C. Brief review of image denoising techniques. Vis. Comput. Ind. Biomed. Art 2019 , 2 , 7. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ibrahim, H.; Kong, N. Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 2007 , 53 , 1752–1758. [ Google Scholar ] [ CrossRef ]
  • Park, S.; Park, M.; Kang, M. Super-resolution image reconstruction: A technical overview. IEEE Signal Process. Mag. 2003 , 20 , 21–36. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020 , 63 , 139–144. [ Google Scholar ] [ CrossRef ]
  • Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Shi, W. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [ Google Scholar ]
  • Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 701–710. [ Google Scholar ]
  • Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. Lect. Notes Comput. Sci. 2019 , 11133 , 63–79. [ Google Scholar ]
  • Bell-Kligler, S.; Shocher, A.; Irani, M. Blind super-resolution kernel estimation using an internal-gan. Adv. Neural Inf. Process. Syst. 2019 , 32 . [ Google Scholar ]
  • Zhang, K.; Gool, L.; Timofte, R. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3217–3226. [ Google Scholar ]
  • Zhang, M.; Ling, Q. Supervised pixel-wise GAN for face super-resolution. IEEE Trans. Multimed. 2020 , 23 , 1938–1950. [ Google Scholar ] [ CrossRef ]
  • Zhang, K.; Liang, J.; Van Gool, L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4791–4800. [ Google Scholar ]
  • Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1905–1914. [ Google Scholar ]
  • Wang, X.; Sun, L.; Chehri, A.; Song, Y. A Review of GAN-Based Super-Resolution Reconstruction for Optical Remote Sensing Images. Remote Sens. 2023 , 15 , 5062. [ Google Scholar ] [ CrossRef ]
  • Bok, V.; Langr, J. GANs in Action: Deep Learning with Generative Adversarial Networks ; Manning Publishing: New York, NY, USA, 2019. [ Google Scholar ]
  • Parsif.al. Available online: https://parsif.al (accessed on 25 April 2024).
  • Yu, J.; Xue, H.; Liu, B.; Wang, Y.; Zhu, S.; Ding, M. GAN-Based Differential Private Image Privacy Protection Framework for the Internet of Multimedia Things. Sensors 2021 , 21 , 58. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gipiškis, R.; Chiaro, D.; Preziosi, M.; Prezioso, E.; Piccialli, F. The impact of adversarial attacks on interpretable semantic segmentation in cyber–physical systems. IEEE Syst. J. 2023 , 17 , 5327–5334. [ Google Scholar ] [ CrossRef ]
  • Hu, M.; Ju, X. Two-stage insulator self-explosion defect detection method based on Mask R-CNN. In Proceedings of the 2nd IEEE International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), Shenyang, China, 17–19 November 2021; pp. 13–18. [ Google Scholar ]
  • Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A. Vehicle detection from UAV imagery with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2021 , 33 , 6047–6067. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gan, Z.; Xu, H.; He, Y.; Cao, W.; Chen, G. Autonomous landing point retrieval algorithm for uavs based on 3d environment perception. In Proceedings of the 2021 IEEE 7th International Conference on Virtual Reality (ICVR), Foshan, China, 20–22 May 2021; pp. 104–108. [ Google Scholar ]
  • Shen, Y.; Lee, H.; Kwon, H.; Bhattacharyya, S. Progressive transformation learning for leveraging virtual images in training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 835–844. [ Google Scholar ]
  • Chen, X.; Li, T.; Liu, H.; Huang, Q.; Gan, X. A GPS Spoofing Detection Algorithm for UAVs Based on Trust Evaluation. In Proceedings of the IEEE 13th International Conference on CYBER Technology in Automation Control and Intelligent Systems (CYBER), Qinhuangdao, China, 11–14 July 2023; pp. 315–319. [ Google Scholar ]
  • More, D.; Acharya, S.; Aryan, S. SRGAN-TQT, an Improved Motion Tracking Technique for UAVs with Super-Resolution Generative Adversarial Network (SRGAN) and Temporal Quad-Tree (TQT) ; SAE Technical Paper; SAE: London, UK, 2022. [ Google Scholar ]
  • Gong, Y.; Liu, Q.; Que, L.; Jia, C.; Huang, J.; Liu, Y.; Zhou, J. Raodat: An energy-efficient reconfigurable AI-based object detection and tracking processor with online learning. In Proceedings of the 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC), Busan, Republic of Korea, 7–10 November 2021; pp. 1–3. [ Google Scholar ]
  • Gong, Y.; Zhang, T.; Guo, H.; Liu, Q.; Que, L.; Jia, C.; Zhou, J. An energy-efficient reconfigurable AI-based object detection and tracking processor supporting online object learning. IEEE Solid-State Circuits Lett. 2022 , 5 , 78–81. [ Google Scholar ] [ CrossRef ]
  • Kostin, A.; Gorbachev, V. Dataset Expansion by Generative Adversarial Networks for Detectors Quality Improvement. In Proceedings of the CEUR Workshop Proceedings, Saint Petersburg, Russia, 22–25 September 2020. [ Google Scholar ]
  • Shu, X.; Cheng, X.; Xu, S.; Chen, Y.; Ma, T.; Zhang, W. How to construct low-altitude aerial image datasets for deep learning. Math. Biosci. Eng. 2021 , 18 , 986–999. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Costea, D.; Marcu, A.; Slusanschi, E.; Leordeanu, M. Creating roadmaps in aerial images with generative adversarial networks and smoothing-based optimization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2100–2109. [ Google Scholar ]
  • Tian, B.; Yan, W.; Wang, W.; Su, Q.; Liu, Y.; Liu, G.; Wang, W. Super-Resolution Deblurring Algorithm for Generative Adversarial Networks. In Proceedings of the 2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 8–10 December 2017; pp. 135–140. [ Google Scholar ]
  • Chou, Y.; Chen, C.; Liu, K.; Chen, C. Stingray detection of aerial images using augmented training images generated by a conditional generative model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1403–1409. [ Google Scholar ]
  • Bi, F.; Lei, M.; Wang, Y.; Huang, D. Remote sensing target tracking in UAV aerial video based on saliency enhanced MDnet. IEEE Access 2019 , 7 , 76731–76740. [ Google Scholar ] [ CrossRef ]
  • Krajewski, R.; Moers, T.; Eckstein, L. VeGAN: Using GANs for augmentation in latent space to improve the semantic segmentation of vehicles in images from an aerial perspective. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 1440–1448. [ Google Scholar ]
  • Zhou, J.; Vong, C.; Liu, Q.; Wang, Z. Scale adaptive image cropping for UAV object detection. Neurocomputing 2019 , 366 , 305–313. [ Google Scholar ] [ CrossRef ]
  • Chen, Y.; Li, J.; Niu, Y.; He, J. VeGAN: Small object detection networks based on classification-oriented super-resolution GAN for UAV aerial imagery. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang China, 3–5 June 2019; pp. 4610–4615. [ Google Scholar ]
  • Xing, C.; Liang, X.; Bao, Z. VeGAN: Small object detection networks based on classification-oriented super-resolution GAN for UAV aerial imagery. A small object detection solution by using super-resolution recovery. In Proceedings of the 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 19–20 October 2019; pp. 313–316. [ Google Scholar ]
  • Truong, N.; Lee, Y.; Owais, M.; Nguyen, D.; Batchuluun, G.; Pham, T.; Park, K. SlimDeblurGAN-based motion deblurring and marker detection for autonomous drone landing. Sensors 2020 , 20 , 3918. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yu, H.; Li, G.; Su, L.; Zhong, B.; Yao, H.; Huang, Q. Conditional GAN based individual and global motion fusion for multiple object tracking in UAV videos. Pattern Recognit. Lett. 2020 , 131 , 219–226. [ Google Scholar ] [ CrossRef ]
  • Wang, D.; Li, Y. Insulator object detection based on image deblurring by WGAN. Dianli Zidonghua Shebei/Electr. Power Autom. Equip. 2020 , 40 , 188–194. [ Google Scholar ]
  • Song, W.; Li, S.; Chang, T.; Hao, A.; Zhao, Q.; Qin, H. VeGAN: Cross-view contextual relation transferred network for unsupervised vehicle tracking in drone videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 1707–1716. [ Google Scholar ]
  • Hou, X.; Zhang, K.; Xu, J.; Huang, W.; Yu, X.; Xu, H. Object detection in drone imagery via sample balance strategies and local feature enhancement. Appl. Sci. 2021 , 11 , 3547. [ Google Scholar ] [ CrossRef ]
  • Kniaz, V.; Moshkantseva, P. Object re-identification using multimodal aerial imagery and conditional adversarial networks. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2021 , 54 , 131–136. [ Google Scholar ] [ CrossRef ]
  • Velumani, K.; Lopez-Lozano, R.; Madec, S.; Guo, W.; Gillet, J.; Comar, A.; Baret, F. Estimates of maize plant density from UAV RGB images using faster-RCNN detection model: Impact of the spatial resolution. Plant Phenomics 2021 , 2021 , 9824843. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mu, J.; Li, S.; Liu, Z.; Zhou, Y. Integration of gradient guidance and edge enhancement into super-resolution for small object detection in aerial images. IET Image Process. 2021 , 15 , 3037–3052. [ Google Scholar ] [ CrossRef ]
  • Zhao, W.; Yamada, W.; Li, T.; Digman, M.; Runge, T. Augmenting crop detection for precision agriculture with deep visual transfer learning—a case study of bale detection. Remote Sens. 2021 , 13 , 23. [ Google Scholar ] [ CrossRef ]
  • Chen, W.; Li, Y.; Zhao, Z. InsulatorGAN: A transmission line insulator detection model using multi-granularity conditional generative adversarial nets for UAV inspection. Remote Sens. 2021 , 13 , 3971. [ Google Scholar ] [ CrossRef ]
  • Wang, J.; Yang, Y.; Chen, Y.; Han, Y. LighterGAN: An illumination enhancement method for urban UAV imagery. Remote Sens. 2021 , 13 , 1371. [ Google Scholar ] [ CrossRef ]
  • Chen, L.; Liu, G.; Tan, Y.; Sun, Z.; Ge, H.; Duan, F.; Zhu, C. VeGAN: Cross-view contextual relation transferred network for unsupervised vehicle tracking in drone videos. A UA-net based Salient Object Detection Method for UAV. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Snowmass, Sanya, China, 27–31 December 2021; pp. 1662–1667. [ Google Scholar ]
  • Liu, G.; Tan, Y.; Chen, L.; Kuang, W.; Li, B.; Duan, F.; Zhu, C. VeGAN: The development of a UAV target tracking system based on YOLOv3-tiny object detection algorithm. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Snowmass, Sanya, China, 27–31 December 2021; pp. 1636–1641. [ Google Scholar ]
  • Ahmed, M.; Khan, N.; Ovi, P.; Roy, N.; Purushotham, S.; Gangopadhyay, A.; You, S. Gadan: Generative adversarial domain adaptation network for debris detection using drone. In Proceedings of the 2022 18th International Conference on Distributed Computing in Sensor Systems (DCOSS), Los Angeles, CA, USA, 30 May–1 June 2022; pp. 277–282. [ Google Scholar ]
  • Ma, D.; Fang, H.; Wang, N.; Zhang, C.; Dong, J.; Hu, H. LighterGAN: Automatic detection and counting system for pavement cracks based on PCGAN and YOLO-MF. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 22166–22178. [ Google Scholar ] [ CrossRef ]
  • Wang, W.; Huang, W.; Zhao, H.; Zhang, M.; Qiao, J.; Zhang, Y. Generative adversarial domain adaptation network for debris detection using drone. Data Enhancement Method Based on Generative Adversarial Network for Small Transmission Line Detection. In Proceedings of the International Conference on Neural Computing for Advanced Applications, Jinan, China, 8–10 July 2022; pp. 413–426. [ Google Scholar ]
  • Park, M.; Bak, J.; Park, S. LighterGAN: Advanced wildfire detection using generative adversarial network-based augmented datasets and weakly supervised object localization. Int. J. Appl. Earth Obs. Geoinf. 2022 , 114 , 103052. [ Google Scholar ]
  • Avola, D.; Cannistraci, I.; Cascio, M.; Cinque, L.; Diko, A.; Fagioli, A.; Pannone, D. A novel GAN-based anomaly detection and localization method for aerial video surveillance at low altitude. Remote Sens. 2022 , 14 , 4110. [ Google Scholar ] [ CrossRef ]
  • Ren, K.; Gao, Y.; Wan, M.; Gu, G.; Chen, Q. Infrared small target detection via region super resolution generative adversarial network. Appl. Intell. 2022 , 52 , 11725–11737. [ Google Scholar ] [ CrossRef ]
  • Hu, J.; Zhang, Y.; Zhao, D.; Yang, G.; Chen, F.; Zhou, C.; Chen, W. A robust deep learning approach for the quantitative characterization and clustering of peach tree crowns based on UAV images. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 4408613. [ Google Scholar ] [ CrossRef ]
  • Balachandran, V.; Sarath, S. A novel approach to detect unmanned aerial vehicle using Pix2Pix generative adversarial network. In Proceedings of the 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 23–25 February 2022; pp. 1368–1373. [ Google Scholar ]
  • Xu, Y.; Luan, F.; Liu, X.; Li, X. Edge4fr: A novel device-edge collaborative framework for facial recognition in smart uav delivery systems. In Proceedings of the 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS), Chengdu, China, 26–28 November 2022; pp. 95–101. [ Google Scholar ]
  • Shimada, T.; Nishikawa, H.; Kong, X.; Tomiyama, H. Depth Estimation from Monocular Infrared Images for Autonomous Flight of Drones. In Proceedings of the 2022 International Conference on Electronics, Information, and Communication (ICEIC), Jeju, Republic of Korea, 6–9 February 2022; pp. 1–6. [ Google Scholar ]
  • Marathe, A.; Jain, P.; Walambe, R.; Kotecha, K. Restorex-AI: A contrastive approach towards guiding image restoration via explainable AI systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 3030–3039. [ Google Scholar ]
  • Du, Y.; Yi, Y.; Guo, H.; Tian, X. Restorex-AI: A contrastive approach towards guiding image restoration via explainable AI systems. Vehicle detection in UAV traffic videos using GAN online augmentation: A transfer learning approach. In Proceedings of the Third International Conference on Computer Vision and Data Mining (ICCVDM), Hulunbuir, China, 19–21 August 2022; p. 1251132. [ Google Scholar ]
  • Zhu, B.; Lv, Q.; Tan, Z. Adaptive Multi-Scale Fusion Blind Deblurred Generative Adversarial Network Method for Sharpening Image Data. Drones 2022 , 7 , 96. [ Google Scholar ] [ CrossRef ]
  • Sigillo, L.; Grassucci, E.; Comminiello, D. StawGAN: Structural-aware generative adversarial networks for infrared image translation. In Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 21–25 May 2023; pp. 1–5. [ Google Scholar ]
  • Li, R.; Peng, Y.; Yang, Q. Fusion enhancement: UAV target detection based on multi-modal GAN. In Proceedings of the 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 15–17 September 2023; pp. 1953–1957. [ Google Scholar ]
  • Wu, H. Research on Motion Trend Enhanced 2D Detection on Drones. In Proceedings of the 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI), Taiyuan, China, 26–28 May 2023; pp. 221–226. [ Google Scholar ]
AspectScope
PopulationStudies that use GAN algorithm applied to images generated by UAVs.
InterventionAlgorithms and image enhancement techniques for detecting people in search and rescue operations carried out by UAVs.
ComparisonN/A As this is not a comparative study but rather a review of articles utilizing GAN algorithms
OutcomeValidation of the best GAN solutions for improving edge or target detection. Detection of objects and targets specifically focused on the search and localization of people.
ContextPublications centered on the utilization of Generative Adversarial Network (GAN) algorithms for image analysis from Unmanned Aerial Vehicles (UAVs), particularly emphasizing edge detection, object detection, and classification tasks.
CodeCriteria
EX1False positives
EX2Secondary studies
EX3Not directly related
CodeCriteria
IC1Does the GAN algorithm aim to assist in detecting edges or objects?
IC2Do the authors employ a pre-trained model?
IC3Does the paper provide the metrics used for the applied model?
IC4Is the solution proposed in the study aimed at images within the visible light spectrum?
IC5Does the solution presented in the study target images within the infrared spectrum?
IC6Was the SAR algorithm designed to detect people or animals?
IC7Does the study utilize any version of YOLO in its development?
StudyAccuracyPrecisionAPRecallPSNRSSIMAUCMAPF1-Score
[ ]--------
[ ]--------
[ ]---------
[ ]-------
[ ]--------
[ ]------
[ ]--------
[ ]-------
[ ]-----
[ ]---------
[ ]-------
[ ]-----
[ ]-------
[ ]-----
[ ]---------
[ ]--------
[ ]--------
[ ]---
[ ]-------
[ ]------
[ ]-------
StudyPSNRSSIMOtherOtherOther
[ ]SMDEAV---
[ ]IOU --
[ ]PI----
[ ]AR----
[ ]ROC----
[ ]AGNIQE---
[ ]PIQE----
[ ]FDRFNR---
[ ]DSCFIDISS-ScoreMAE
[ ]MAP 0.5MAP 0.95---
ClusterStudyMethod UsedMain AdvantageMain DisadvantageApplicability to SAR
A[ ]C-GLO + Faster-RCNNIncreases dataset for stingray detectionComplex setupIndirect (methodology can be applied to increase dataset for SAR targets)
[ ]GAN for negative samplesImproves target distinction in remote sensingNeeds large dataIndirect (improves training for target distinction in UAV aerial videos)
[ ]VeGANEnhances segmentation accuracy in aerial imagesHigh computational demandYes (segmentation of vehicles and potential targets in aerial images)
[ ]GAN for real-time data augmentationIncreases accuracy in entity detectionHigh resource needsYes (vehicle, pedestrians and bicycles data augmentation)
[ ]Weight-GANEnhances detectionHigh complexityYes (detecting small targets in complex scenes)
[ ]GAN for wildfire imagesAddresses data scarcityNeeds tuningYes (detecting wildfires to locate potential SAR areas)
[ ]GAN for motion predictionImproves individual prediction from last image framesHigh complexityYes (predicting the motion of objects/individuals in SAR scenarios)
B[ ]GAN for super-resolutionEnhances resolution in UAV imagesHigh computational demandYes (improving resolution for better object detection on visible light spectrum)
[ ]GAN for deblurringImproves clarity of motion-blurred imagesHigh processing demandYes (enhancing clarity of UAV-captured images)
[ ]AMD-GAN + YOLOv5Improves deblurringHigh complexityYes (enhancing image quality for better detection in SAR)
[ ]Multi-task GANCombines SR and detection tasksExtensive training neededYes (improving small-object detection in UAV images)
[ ]LighterGANEnhances low-light imagesSensitive to lightingYes (enhancing visibility in low-light conditions, especially during night-time)
ClusterStudyMethod UsedMain AdvantageMain DisadvantageApplicability to SAR
C[ ]CSRGANImproves classification of small objectsHigh computational demandYes (identifying small objects in aerial images)
[ ]GAN for small-object detectionEnhances resolutionHigh computational demandYes (improving detection of small objects)
[ ]RSRGANEnhances small-target detection in infrared imagesHigh complexityYes (enhancing detection of small targets in infrared imagery)
[ ]Two-branch GANEnhances anomaly detection and localization on RGB imagesHigh processing powerYes (detecting anomalies and potential threats like explosives)
[ ]CycleGAN with CFMEnhances small-object featuresHigh computational demandYes (improving detection of small objects)
[ ]GAN for fusion and harmonizing thermal and RGB images with differing characteristics.Harmonizes modalitiesHigh computational demandYes (fusing data from different modalities for better detection of targets)
D[ ]GAN for image color to thermal translationImproves cross-modality ReIDNeeds calibrationYes (improving identification in cross-modality scenarios)
[ ]StawGANImproves image translation from night-time thermal data to daytimeSensitive to inputYes (enhancing night-time image translation for better visibility)
[ ]GAN for fusionEnhances image quality for detectionHigh computational demandYes (fusing images to improve detection)
E[ ]Weather-RainGAN and Weather-NightGANRestores weather-corrupted imagesHigh computational demandYes (improving detection in adverse weather conditions)
[ ]RSRGANEnhances detection in adverse weatherHigh complexityYes (enhancing detection in adverse conditions)
[ ]GAN for deblurringImproves clarity of imagesHigh processing demandYes (enhancing image quality in adverse conditions)
[ ]GAN for adverse weather issues on imagesRestores weather-corrupted imagesHigh computational demandYes (restoring image quality in adverse weather)
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Correa, V.; Funk, P.; Sundelius, N.; Sohlberg, R.; Ramos, A. Applications of GANs to Aid Target Detection in SAR Operations: A Systematic Literature Review. Drones 2024 , 8 , 448. https://doi.org/10.3390/drones8090448

Correa V, Funk P, Sundelius N, Sohlberg R, Ramos A. Applications of GANs to Aid Target Detection in SAR Operations: A Systematic Literature Review. Drones . 2024; 8(9):448. https://doi.org/10.3390/drones8090448

Correa, Vinícius, Peter Funk, Nils Sundelius, Rickard Sohlberg, and Alexandre Ramos. 2024. "Applications of GANs to Aid Target Detection in SAR Operations: A Systematic Literature Review" Drones 8, no. 9: 448. https://doi.org/10.3390/drones8090448

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Open access
  • Published: 28 August 2024

Optimized deep CNN for detection and classification of diabetic retinopathy and diabetic macular edema

  • V Thanikachalam 1 ,
  • K Kabilan 1 &
  • Sudheer Kumar Erramchetty 1  

BMC Medical Imaging volume  24 , Article number:  227 ( 2024 ) Cite this article

Metrics details

Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) are vision related complications prominently found in diabetic patients. The early identification of DR/DME grades facilitates the devising of an appropriate treatment plan, which ultimately prevents the probability of visual impairment in more than 90% of diabetic patients. Thereby, an automatic DR/DME grade detection approach is proposed in this work by utilizing image processing. In this work, the retinal fundus image provided as input is pre-processed using Discrete Wavelet Transform (DWT) with the aim of enhancing its visual quality. The precise detection of DR/DME is supported further with the application of suitable Artificial Neural Network (ANN) based segmentation technique. The segmented images are subsequently subjected to feature extraction using Adaptive Gabor Filter (AGF) and the feature selection using Random Forest (RF) technique. The former has excellent retinal vein recognition capability, while the latter has exceptional generalization capability. The RF approach also assists with the improvement of classification accuracy of Deep Convolutional Neural Network (CNN) classifier. Moreover, Chicken Swarm Algorithm (CSA) is used for further enhancing the classifier performance by optimizing the weights of both convolution and fully connected layer. The entire approach is validated for its accuracy in determination of grades of DR/DME using MATLAB software. The proposed DR/DME grade detection approach displays an excellent accuracy of 97.91%.

Peer Review reports

Introduction

Diabetes Mellitus (DM) has reached epidemic proportions in terms of global incidence and predominance in recent years, and the study show the expected range will be in 2030 more than 360 million people who are expected to be affected by DM around the world [ 1 ]. DM is a condition in which the blood glucose level increases excessively in response to insulin insufficiency, leading to impairment of the functioning of the retina, nerves, heart and kidneys. With changes in lifestyle and dietary habits coupled with factors such as physical inactivity and obesity, DM has become more prevalent and has surpassed the status of being a disease just confined to the rich [ 2 , 3 ]. DM patients are highly susceptible to developing DR, which results in abnormal retinal blood vessel growth and has a debilitating effect on vision. This progressive microvascular disorder leads to physical complications such as Diabetic Macular Edema (DME), retinal neovascularization, retinal permeability and retinal ischemia. In DR, abnormal blood vessel growth is caused by the need to supply oxygenated blood to the hypoxic retina. In addition, retinal thickening in the macular regions causes DME. It is an undisputable fact that medical treatments are more successful when diseases are discovered in their early stages.

Thereby, it is crucial to cure DR and DME in their earlier stages to prevent the serious consequence of vision loss in patients. Moreover, prior to complete blindness, there are rarely any visual or ophthalmic symptoms related to DR [ 4 , 5 , 6 ]. The high blood sugar levels seen in a DM patient, damages the retinal blood vessels, resulting in the leakage and accumulation of fluids such as soft exudates, hard exudates, haemorrhages and microaneurysms in the eye. The volume of these accumulated fluid defines the grade of DR, while the distance between macula and hard exudate defines the degree of DME [ 7 ]. Through early detection of DR, almost 90% of visual impairment cases are possible to be prevented. Additionally, through proper classification of DME/DR intensity, devising a suitable treatment for the DM patients is accomplished [ 8 ].

Consequently, patients with diabetes are recommended to undertake regular retinal fundal photography, in which retinal images are gathered and analysed by an ophthalmologist. Following the Airlie House DR classification, the Early Treatment Diabetic Retinopathy Study (ETDRS) group and the literature by Diabetic Retinopathy Study (DRS) group presents the classification of grades of DR using retinal fundus imaging. A conventional film camera was used in earlier days for capturing fundus images, which was later substituted by a digital camera. The fundus photography captured using Scanning Laser Ophthalmoscope (SLO) is popular nowadays [ 9 , 10 ]. The manual analysis of fundus images by ophthalmologist are ineffectual in terms of high throughput screening, therefore several automatic machine learning and deep learning fundal photography-based DR/DME screening techniques are introduced [ 11 , 12 , 13 ].

The image processing approach is the most effective technique for identifying the grades of DME/DR owing to its promising attributes of excellent adaptability, quicker processing time and maximum reliability. In case of image processing approach, the input retinal fundus image undergoes five different stages namely pre-processing methods, segmentation, feature extraction techniques, feature selection process and efficient classification. The pre-processing technique is carried out with the intention of enhancing the quality of the input image by minimizing the noises. The mean filter is one of the prominently used filter for pre-processing owing to its effectiveness in lessening pixel intensity variations and removing redundant pixels.

However, its application is limited due to the drawback of initiating pseudo noise edges [ 14 ]. The linear filters are inept for pre-processing, since it blurs the edges and contrast of the image, while the non-linear filters such as median filter [ 15 ] and adaptive mean filter [ 16 ] are effective in minimizing the noises in the image, however on the downside, the blurring of vital and edge regions leads to information loss. Therefore, to overcome the drawbacks, DWT is used as the pre-processing technique. The accuracy of identification of grades of DR/DME is further improved with the aid of an appropriate segmentation technique, effective in accurate segmentation of the retinal vessels and lesions. The segmentation of the retinal fundus image is hindered by several obstacles such as non-uniform illumination, undefined artefacts, improper image acquisition, complex components and lesion shape variability [ 17 ].

The Fuzzy C-Means clustering methods presented [ 18 ] is a predominantly used segmentation technique in recent research work, which forms diverse clusters through image pixel division. The complex nature of this technique however prevents its wide scale implementation. Here, in this work, ANN is used for segmentation in response to its simple structure and high accuracy in segmentation. Some of the commonly used feature extraction techniques are sparse representation [ 19 ], global histogram normalization [ 20 ] and Fourier Transform [ 21 ]. However, these techniques are inept in terms of retinal vein recognition. Gabor filter is suitable for retinal vein extraction, but its application is hindered due to the difficulty experienced in parameter configuration. Hence, Adaptive Gabor Filter (AGF), that resolves the complications in parameter configuration of conventional Gabor filter is used in this work for feature extraction.

The choice of an appropriate feature selection technique significantly improves the classification accuracy of the classifier. The feature selection approaches like Maximize Relevancy and Minimize Redundancy (mRMR) and Relief operates with excellent computational efficiency but less accuracy in terms of feature selection. The Genetic Algorithm [ 22 ] is an also a commonly used approach for feature selection, but it is in efficient in handling huge input samples due to computational complexity. The neural network techniques like Recurrent Neural network (RNN) and Probabilistic Neural Network (PNN) require large training data sets and display weak interpretability. Thereby, in this work, RF is selected for feature selection in view of its implementational ease and robust generalization capability. After feature selection comes the process of classification. The machine learning based Logistic Regression [ 23 ] Classifier is an efficient technique with excellent discriminative potential, but it is incapable of solving linear problems. The CNN [ 24 , 25 ] is a highly accurate technique, capable of quickly identifying and classifying any medical disorder. However, it requires large number of training images. Hence, a Deep CNN based classification is proposed in this work for the accurate classification of grades of DR/DME. Moreover, the working of the Deep CNN classifier is optimized using Chicken Swarm Algorithm (CSA).

A novel automatic DR/DME detection approach using optimized Deep CNN is proposed in this work. The different phases of the proposed image processing approach involve DWT for pre-processing, ANN for segmentation, AGF for feature extraction, RF for feature selection and finally CS optimized Deep CNN for classification. The retinal fundus images are provided as input for the proposed diagnosis model, and it is evaluated for its performance using MATLAB software.

As shown below, we provide numerous major breakthroughs and additions in this work that greatly improve our model’s efficacy and applicability for the identification of DME and DR:

While some literature utilizes various optimization techniques, such as Genetic Algorithms or Harris Hawks Optimization, this paper uses the Chicken Swarm Algorithm (CSA) to optimize the deep CNN model, which is unique.

The paper combines several techniques, including DWT for preprocessing, AGF for feature extraction, and RF for feature selection. While these methods have been individually used in other studies, the combination and the specific workflow are distinct.

The novelty lies in the integrated approach combining DWT, ANN for segmentation, AGF, RF, and CSA-optimized Deep CNN for classifying the grades of DR/DME. This combination of methods aims to enhance the detection accuracy.

The proposed method achieves a high accuracy rate of 97.91% in detecting and classifying DR/DME grades, which is presented as an improvement over existing methods.

The paper highlights the effectiveness of using CSA to optimize the Deep CNN classifier, which is a novel application of this algorithm in this context.

Literature study

DR and DME are two common complications of diabetes that can lead to vision loss and blindness if not detected and treated early. In recent research studies, the application of CNNs has shown promising results in the early detection and classification of DR and DME, ultimately contributing to the development of more effective and automated screening processes in diabetic eye care. Sundaram et al., [ 26 ] discusses an artificial intelligence-based approach for the detection of DR and DME. This model utilizes preprocessing, blood vessel segmentation, feature extraction, and classification techniques. It also introduces a contrast enhancement methodology using the Harris hawks optimization technique. The model was tested on two datasets, IDRiR and Messidor, and evaluated based on its accuracy, precision, recall, F-score, computational time, and error rate. This technology aims to assist in the early detection of these severe eye conditions, which are common causes of vision impairment in the working population, and it suggests a significant positive impact on the healthcare sector by enabling timely and cost-effective diagnosis.

He et al., [ 27 ] discusses a deep learning approach to classify DR severity and DME risk from fundus images. Three independent CNN’s were developed for classifying DR grade, DME risk, and a combination of both. They introduced a fusion method to combine features extracted by the CNNs, aiming to assist clinicians with real-time, accurate assessments of DR. The paper highlights the potential for automated systems to enhance early detection and treatment, and reports classification accuracy rates of 0.65 for DR grade and 0.72 for DME risk. Reyes et al., [ 28 ] discusses a system designed to classify DR and DME, which are common causes of blindness in diabetic patients. The system employs the Inception v3 transfer learning model and MATLAB digital image processing to analyze retinal images without the need for dilating drops, which can have side effects. Tested by medical professionals in the Philippines, the system showed reliable and accurate results, indicating its potential as an assistive diagnostic device for endocrinologists and ophthalmologists.

Kiruthikadevi et., [ 29 ] discusses the development and implementation of a system designed to detect and assess DR and DME from color fundus images using CNN’s. The system aims to automate the detection process to support early diagnosis and effective treatment, as substantially manual diagnosis by clinicians is not feasible at scale, particularly in resource-limited settings. The proposed two-stage approach first verifies the presence of Hemorrhages and Exudates in fundus images, and then evaluates the macular region to determine the risk of DME. The methodology includes image preprocessing to reduce noise, extraction of regions of interest focusing on the macular area, and generation of motion patterns to imitate the human visual system, all with the broader goal of contributing to the prevention of vision loss due to diabetes-related complications.

Sudha Abirami R and Suresh Kumar G [ 30 ] provides a comprehensive overview of the application of deep learning and machine learning models for the detection and classification of diabetic eye diseases, with a primary focus on DR. Various public datasets, like EyePACS and Messidor, and image preprocessing techniques are used to enhance the images before they are input into machine learning models like CNN’s. Transfer learning is emphasized as a critical technique to improve model performance, with most of the past work highlighting the need for classification of all types of diabetic eye diseases, not just DR. Despite powerful commercial AI solutions available, the review identifies a gap in affordable methods and suggests further development of computer-aided diagnostic models that are efficient and reliable for categorizing various diabetic eye conditions.

Lihteh Wu et al., [ 31 ] discusses the importance of categorizing and staging the severity of DR to provide adequate treatment and prevent visual loss. The paper emphasizes the global epidemic of diabetes mellitus and the associated risk of DR, a leading cause of blindness in the working-age population. DR is characterized by progressive microvascular changes leading to retinal ischemia, neovascularization, and macular edema. The International Clinical Disease Severity Scale for DR is highlighted as a simple and evidence-based classification system that facilitates communication among various healthcare providers involved in diabetes care without the need for specialized examinations. The scale is based on the Early Treatment of DR Study’s 4:2:1 rule relying on clinical examination.

This work [ 32 ] introduces a new framework for classifying DR and DME from retinal images. Using deep learning methods, particularly CNN’s, coupled with a modified Grey Wolf Optimizer (GWO) algorithm with variable weights, the research seeks to improve the precision and performance of the classification. This approach addresses the urgent problem of early detection and treatment of diabetic eye diseases, which are the major causes of blindness worldwide. The experimental results show that the suggested approach is an effective method for the accurate diagnosis of DR and DME, highlighting its potential in improving the diagnostic capabilities and care of patients in ophthalmology.

The paper [ 33 ] proposes a robust framework for classifying retinopathy grade and assessing the risk of macular edema in DR images. The study introduces a comprehensive approach that integrates image preprocessing, feature extraction, and machine learning algorithms to accurately classify retinal images and predict the likelihood of macular edema. By leveraging a combination of handcrafted features and deep learning techniques, such as CNN’s, the framework achieves high classification accuracy and robustness. The proposed methodology addresses the urgent need for automated and accurate diagnosis of DR, providing a valuable tool for clinicians in assessing disease severity and guiding treatment decisions. Experimental results demonstrate the effectiveness of the proposed framework in accurately classifying retinopathy grade and predicting macular edema risk, highlighting its potential for enhancing clinical workflows and improving patient outcomes in diabetic eye care.

In summary, CNN’s are a highly effective method for the classification and grading of DR and DME, with various approaches including feature reduction, attention mechanisms, and network fusion methods contributing to their success. The integration of deep learning techniques with traditional image processing methods and novel architectures has led to significant improvements in the accuracy and efficiency of diagnosing these conditions.

Proposed system framework

The disease of DM has become a prominent disorder found in many middle aged and older generations due to the drastic unhealthy changes witnessed in food habits and lifestyle of humans. Thus, the DM is no longer considered to be the disease only confined to the rich. The person who develops DM are affected many complications among which DR and DME are the one that has direct impact over the vision. The effects of DR and DME are highly critical, since it eventually leads to a complete blindness. Through a timely accurate identification of degree of DR/DME in a diabetic patient, the condition of blindness is greatly prevented [ 34 ]. Thereby, an accurate DR/DME grade detection approach as illustrated in Fig.  1 is proposed in this work.

figure 1

Automatic DR/DME grade detection using optimized Deep CNN architecture

The proposed approach using DWT for pre-processing of the retinal fundus image. Through pre-processing, the unwanted noises that affects the retinal photography is removed and an enhanced image with uniform resolution is obtained as output. Next the pre-processed image is subjected to ANN segmentation, which is highly effective in isolation of the required region of interest. Subsequently, AGF with high reginal vein recognition capability is used for feature extraction. Moreover, the vital features that assists classification are selected among all the extracted features using the approach of RF. Finally, the degree of DR/DME is accurately detected using CS optimized Deep CNN classifier. The CSA is used for optimizing the weights of both convolution and fully connected layer, resulting in the improvement of the classification performance of Deep CNN. Moreover, the entire technique is validated in MATLAB software for ascertaining its significance in identification of DR/DME grades.

Preprocessing using DWT

Pre-processing is one of the crucial steps undertaken in image processing to improve the image quality and thereby enhance the accuracy of DR and DME identification. Here, the pre-processing of fundus images is done using DWT [ 35 ], which is characterized with an excellent image decomposition property. Initially the images are resized to obtain uniform resolution and increased processing speed. Then the green channel image that has vital information are extracted before undergoing histogram equalization. The resultant image with improved dynamic range and contrast are made noise free through filtering.

The fundus image is decomposed into several sub band images. At the end of every computed value in decomposition stage, the frequency resolution is twice, and the computed time resolution is halved. The products of decomposition are detail coefficients and approximation coefficients, where the latter is further decomposed into detail coefficients and values of approximation coefficients in every later level. The approximation coefficient is the first sub-band image, while the remaining coefficient are detailed coefficients, so resulting in the formation of several sub-band images. The translation parameters and discrete set of scale used in DWT are \(\:\left(\tau\:=n{2}^{-m}\right)\) and \(\:\left(s={2}^{-m}\right)\) respectively. The wavelet family is given as,

The  \(\:\:x\left[n\right]\) decomposition is given as,

Where the scaling and wavelet coefficients are specified as  \(\:{\:d}_{j,k}j=1\dots\:J\) and  \(\:{\:c}_{j,k}j=1\dots\:J\) respectively.

Where, the scaling sequence, wavelet and complex conjugate are expressed as  \(\:\:{h}_{J}\left[n-{2}^{J}k\right]\) , \(\:{g}_{j}\left[n-{2}^{j}k\right]\) and (*) respectively. The DWT is implemented separately for every column and row of the image. The image \(\:X\) is decomposed into high frequency detail coefficients  \(\:\:{X}_{H}^{1},\:{X}_{V}^{1}\:and\:{X}_{D}^{1}\) and low frequency approximation coefficient  \(\:\:{X}_{A}^{1}\) .

The image after \(\:{N}^{th}\) level decomposition is expressed as,

The preprocessed image is then segmented using ANN.

Segmentation using ANN

The process of segmentation is also a crucial procedure like pre-processing and is vital for the precise detection of DR and DME owing to its significant role in understanding the complex areas of interest of retinal fundus images. This image subdivision process ceases with the complete isolation of the required object of interest. In this work, ANN is used for segmentation, and it segments the pre-processed fundus images into areas and pixel groups that stands for micro aneurysms, lesions like haemorrhages, retinal blood vessels, optic disc and fovea in addition to hard and soft exudates. The ANN can impersonate the working of human brain in resolving complicated real-world problems and its structure encompasses three connected sequential layers normally called as input layer, hidden layer and output layer as presented in Fig.  2 [ 36 ].

figure 2

Structure of ANN

The number of multipliers in ANN characterised with N output nodes, W hidden layer nodes and M inputs is given as,

The computational complexity of operation and calculation in each layer is reduced with the implementation of multipliers using add and shift operations rather than floating point numbers. Weights are quantized on the assumption that only a small number of shift and add operations are permitted due to the complexity of design hardware implementation. As a result, the quantization value of an original number is chosen to be the closest to it. Consider the following scenario: the maximum number of shift and add operations is 3, and the weights in the ANN are integers 0.8735 and 0.3811. The following new addition and shift operation representation may be used to represent these numbers:

With this form, every weight is converted into a sum of power-2 integers that can be executed using shift and add operations. The ANN’s multiplier modules are therefore broken down into a few adder and shifter modules, one for each multiplier that is necessary. Even if the computational complexity is reduced by a straightforward quantization with regard to the number of power-2 operations, an error is still produced, which might be problematic in some circumstances. To solve this issue, a potential error compensation approach is shown below.

Average quantization error reduction

Weights are quantized using only their values in the typical kind of quantization. As a result, there can be a considerable loss of accuracy due to accumulating quantization errors. Consequently, a compensating error approach is suggested [ 37 ]. There might be some accuracy decrease with each quantization. However, each image region is similar, and subsequent weight quantization can make up for the accuracy loss caused by weight quantization. By doing this, both average error and accuracy loss may be decreased. This is accomplished by distributing the generated mistake in the subsequent weight quantization, which comes after each weight has been quantized. Take the following instance into consideration. Three different weight coefficients of 0.8000, 0.4250, and 0.4050 are considered, and only three shift and add operations are permitted. It is displayed how close the closest quantized value is as shift and add number.

Consequently, the average quantization error is

Diffusion of each quantization mistake during the subsequent phases of weight quantization might lower the average quantization error. In the instance of example that has.

The current quantization step considers all quantization faults from earlier levels. Consequently, + 0.0500 is added to current value of 0.4250. The present quantization considers the values (+ 0.0500 and 0.0750). This implies that 0.4050 is added to previous values of + 0.0500 and 0.0750. Because the prior quantization mistakes are considered in the current weight quantization in this case, the average error is lowered. The overall quantization error can be decreased using this method.

Activation function linearization

The most popular ANN activation function is hyperbolic tangent, which has the following form.

Thus, a floating-point division and an exponential operation both need to be computed. It may be effective to lower the overall computation volume by linearizing and simplifying activation function. The four intervals that make up domain of tanh(x) function in this chapter are utilised to create a linear approximation function in each interval.

With the aid of pricewise linear function, computation is accomplished leaving division and multiplication and all operations are in shift or addition form.

Feature extraction using adaptive gabor filter (AGF)

The AGF is used for feature extraction of the ANN segmented retinal fundus images [ 38 ]. Because it resembles the receptive field profiles in human cortical simple cells, Gabor filtering is an effective computer vision feature analysis function. Gabor filters have been effectively used by earlier academics to exploit a variety of biometric traits. A complex sinusoidal grating that is directed and modulated by a 2D Gaussian function is known as a circular AGF.

Where, the term j = \(\:\sqrt{-1}\) and \(\:{g}_{\sigma\:}\left(x,y\right)\) refers to Gaussian envelope,

The span-limited sinusoidal grating frequency  \(\:\:\mu\:\) , the direction in the range of  \(\:\:{0}^{^\circ\:}-{180}^{^\circ\:}\) , and the standard deviation of a Gaussian envelope which is indicated by \(\:\sigma\:\) . The \(\:{G}_{\sigma\:,\mu\:,\theta\:}\left(x,y\right)\) term may be divided into a real part, \(\:{R}_{\sigma\:,\mu\:,\theta\:}(x,y)\) and an imaginary part, \(\:{I}_{\sigma\:,\mu\:,\theta\:}\) (x, y), using Euler’s formula, as illustrated in (6)–(8). In a picture, the genuine portion may be used for ridge detection while the fictitious portion is useful for edge detection.

Regions of uniform brightness, however, cause a negligible response from AGF. Direct current (DC) is what being used here. DC component is eliminated by using Eq. (9) so that Gabor filter would be insensitive to illumination:

Where  \(\:(2k+1{)}^{2}\) is 2Dd Gabor filter size. As a result, the definition of a Gabor transform with robust illumination is given in (26), where \(\:I(x,\:y)\) is an image.

According to earlier studies, AGF-based edge identification performs best when filter parameters match the direction  \(\:\:\theta\:\) , variance  \(\:\:\sigma\:\) , and center frequency  \(\:\:\mu\:\) of input picture texture. After AGF based feature extraction, the process of feature selection RF is carried out.

Feature selection using random forest

The feature selection process aids in the identification of the smallest feature subset, which is pivotal to predict DR and DME with higher degree of accuracy by eliminating other irrelevant or redundant features. Thus, the choice of an effective feature selection process complements the classifier performance in identifying the DR/DME grades. The RF technique is adopted in this work for feature selection on account of its robust anti-interference and generalization capability [ 39 ]. This model aggregation-based machine learning algorithm is well suited for ill-posed and high-dimensional regression tasks. The RF when employed for feature selection, evaluates the importance score of every feature and determines their impact on the classification prediction. The RF builds decision trees using gini index and determines the final class in every tree. The impurity of node  \(\:\:v\) is estimated using the gini index,

Where, the fraction of \(\:class-i\) records are specified as  \(\:\:{f}_{i}\) . For splitting the tree node \(\:v\) , the Gini gain information of feature  \(\:{\:X}_{i}\) is given as,

Where, the right and left child node of node \(\:v\) is specified as  \(\:{\:v}^{R}\) and  \(\:{\:v}^{L}\) respectively, while the node \(\:v\) impurity is specified as \(\:Gini\left({X}_{i},v\right)\) . The child nodes are assigned with fraction of examples referred as  \(\:{\:W}_{R}\) and  \(\:{\:W}_{L}\) . The splitting feature is the one that maximizes impurity reduction. The \(\:gain\left({X}_{i},v\right)\) is used for calculating the importance score of  \(\:{\:X}_{i}\) ,

Where, the split nodes and ensemble size is specified as \(\:k \epsilon S{x}_{i}\) and  \(\:{\:n}_{tree}\) respectively. The normalization of the importance score is,

Here, the maximum importance is specified as  \(\:{\:Imp}_{max}\) [ \(\:{0\le\:Imp}_{max}\le\:1\) ]. The weight \(\:gain\left({X}_{i},v\right)\) utilizes the importance score of preliminary RFs, thereby the penalized gini information gain is estimated as,

The regularization level is regulated by the base coefficient of  \(\:{\:X}_{i}\) , which is represented as  \(\:{\:\lambda\:}_{i} \epsilon \left[\text{0,1}\right]\) .

The weight of  \(\:{\:Imp}_{norm}\) is controlled by the importance coefficient represented as \(\:\:\gamma\: \epsilon \left[\text{0,1}\right]\) . For an  \(\:{\:X}_{i}\) without maximum  \(\:{\:Imp}_{norm}\) , smaller  \(\:{\:\lambda\:}_{i}\) is effectuated by larger \(\:\:\gamma\:\) , ultimately leading to a larger penalty on  \(\:{\:gain}_{G}\left({X}_{i},v\right)\) . In case of maximum penalty,

The  \(\:{\:gain}_{G}\left({X}_{i},v\right)\) is,

By injecting the normalized importance score, the Gini information gain weighting is achieved. Thus, the smallest and appropriate features are selected using RF and these features are used for enhancing the classification using CS optimized Deep CNN.

Classification using chicken swarm optimized deep CNN

The CS optimized Deep CNN model that are widely used for the detection are employed for classifying the grades of DME and DR. The CS algorithm is employed for optimizing the kernel values of convolution layer and optimizing the weights of the fully connected layer [ 40 ]. The features extracted using RF is provided as input to the CS optimized Deep CNN. The architecture of CNN comprises of distinct layers like convolution and pooling layers, which are grouped as modules. These modules are then subsequently followed by the fully connected layer that ultimately provides the class labels as outcomes. Modules are usually stacked on top of each other to build a deep model, which is becoming more and more popular. The structure of CS optimized Deep CNN used for the detection of DR/DME grades is given in Fig.  3 .

figure 3

Architecture of CNN

Convolution layers

The convolution layer observes and analyses the features of the given input and performs the operation of a feature extractor. This layer comprises of several neurons that are grouped as feature maps. Each neuron belonging to a particular feature map is connected to the other neurons in the vicinity (previous layer) using their receptive field and the filter bank, which is a trainable weight set. In this layer, the weights and inputs are combined, and the output is moved to the successive layer using a non-linear activation function. The weights of the neurons grouped in a feature map are required to be uniform, but this is not the case due to the presence of distinct feature maps with different weights, enabling the extraction of multiple features from a specific region. The \(\:\:{e}^{th}\) output feature map is expressed as,

Where, the terms \(\:F{M}_{e},*and\:{I}_{M}^{seg}\) represents the  \(\:{\:e}^{th}\) feature map associated convolution filter, convolution operator and the input image respectively. The non-linear activation function is represented using the term \(\:f(\bullet\:)\) .

Pooling layers

The pooling layers aids with attaining the spatial invariance to translation and distortion in the input. Moreover, the feature map’s spatial resolution is decreased in this layer. Initially, it is a common norm to employ average pooling layer for broadcasting the input average of small region of the image to the successive layer. The pooling layer output is given as,

Where, down sampling layer and the convolution layer are specified as \(\:PL-1\) and \(\:PL\) respectively. The input features of down sampling layer are represented as  \(\:\:{x}^{PL-1}\) , while the additive bias and kernel maps of the convolution layer is specified as  \(\:\:{Bi}^{PL}\) and  \(\:{\:K}_{ij}\) respectively. The input map selection is referred as  \(\:{\:M}_{j}\) , the output and input are indicated as  \(\:\:i\) and  \(\:\:j\) respectively. The crucial element of a field is chosen using max pooling.

Fully connected layers

Several convolution and pooling layers are stacked with one another to obtain optimal feature representation. These feature representations are fully analysed by the fully connected layer to accomplish operation of high-level reasoning. The accuracy of the Deep CNN is further improved with the aid of CS optimization. The flowchart of CS optimized Deep CNN for identification DR/DME grades is shown in Fig.  4 .

figure 4

Flowchart of CS optimized Deep CNN

Chicken swarm (CS) optimization

The CS optimization algorithm enhances the classification accuracy of the Deep CNN through optimization of the fully connected layer and convolution layer. The characteristic traits of a chicken swarm that encompasses roosters, chicks and hens forms the basis of this algorithm. The rules associated with this algorithm is given as:

The rooster is the head of a chicken swarm, which comprises of numerous chicks and hens.

The fitness value of the chicken determines its individuality and aids in distinguishing itself from the others. The chief rooster is the one with the best fitness value, while chicks are the ones with worst fitness value. The rest are termed as hens and a casual mother-child relationship is created between the chicks and hens.

After several steps, each of their status gets updated.

The rooster guides the others in search of their food, while the chick forages for its food by staying in the vicinity of their mothers. In a dimensional space (DS), at a time step \(\:ts\) , the positions of the N virtual hens are represented as,

Where, the mother hens, the chicks, hens and roosters are represented using the terms \(\:NM,\:NC,\:NHl\) and \(\:NR\) respectively. The chance of obtaining the food is more for the rooster with best fitness value.

Where, the fitness value associated with A is specified as \(\:fv\) , the rooster index is specified as \(\:\:l\) , the smallest constant used for evading the zero-division error is specified as \(\:\:\epsilon\:\) and the gaussian distribution with SD  \(\:{\:\sigma\:}^{2}\) and mean 0 is represented as \(\:Randn(0,{\sigma\:}^{2})\) .

Where, a random number between [0,1] is specified as  \(\:\:Rand\) . The randomly selected index from the swarm and the rooster index is represented as \(\:\:ro2 \epsilon [1,\dots\:,N]\) and \(\:ro1 \epsilon [1,\dots\:,N]\)  respectively. Furthermore, \(\:f{v}_{m}>f{v}_{ro1}\)  and \(\:f{v}_{m}>f{v}_{ro2}\) , hence  \(\:\:S2<1<S1\) . The probability of the chick staying nearby its mother is specified using the term FL, which lies between [0, 2].

Results and discussion

The proposed automatic DR/DME grade detection model was confirmed for its effectiveness by executing in MATLAB. The dataset having 2072 high resolution retinal fundus images is collected from MESSIDOR [ 41 ] to assess the performance of research work proposed under CS optimized Deep CNN based diagnostic technique. Among the gathered 2072 image samples, 1402 samples belong to healthy people without diabetic condition, while 520 samples belong to diabetic patients having DR/DME. A total of 150 retinal fundus images is considered as testing data. The overall details of the selected dataset are tabulated in Table  1 .

figure 5

Input Image

The provided input retinal fundus image seen in Fig.  5 , undergoes the process of pre-processing initially. The several stages involved in pre-processing is displayed in Fig.  6 . The images are resized in view of supporting a uniform resolution. Then the resized input image undergoes gray scale conversion, noise reduction and filtering to obtain a pre-processed retinal fundus image of enhanced quality. In addition to obtaining a pristine noise-free image, the DWT based pre-processing also aids with reducing the processing time required for the execution of the entire technique.

figure 6

Stages of Pre-processing

The DWT pre-processing is compared against prominent techniques including the filer methods such as Mean filter, Median filter, Wiener filter and Hilbert Transform in terms of Root Mean Square Error (RMSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM) and Mean Square Error (MSE). The results obtained are taken for comparison in Table  2 .

On analyzing the observations given in Table  2 , it is concluded that the DWT performs better than all the other commonly used pre-processing techniques. Thus, the DWT technique is successful in its role of enhancing the accuracy of the proposed automatic DR/DME diagnostic system.

figure 7

Segmentation using ANN outputs

The output obtained using ANN based segmentation is provided in Fig.  7 . From the obtained segmented retinal image, it is noted that the ANN is capable of accurately segmenting lesions affecting the eyes. Moreover, it is also seen that the ANN is effective in accurate segmentation of the DR/DME affected regions without compromising the image clarity. The different grades of DR are Proliferative DR, Severe Non-Proliferative DR (NPDR), Moderate NPDR and mild NPDR. Moreover, the DME is categorized in to three different grades namely mild DME, moderate DME and severe DME. So, the final classified output of the CS optimized Deep CNN classifier is shown in Fig.  8 .

As seen in Fig.  8 , the Deep CNN accurately classifies the retinal fundus image as Severe NPDR condition. The influence of CS optimized CNN in classification is verified by comparing with the existing classifier techniques and the concerned results are tabulated in Table  3 and is also graphically represented in Fig.  9 . The developed CS optimized Deep CNN has an enhanced accuracy of 97.91, sensitivity of 97.82%, specificity of 98.64%, Precision value of 0.97 and F1 score of value 0.98. Moreover, it is also noted that the CSA is effective in improving the overall performance of Deep CNN.

figure 8

CS optimized Deep CNN classifier output

figure 9

Classifier comparison in terms of ( a ) Accuracy ( b ) Sensitivity ( c ) Specificity ( d ) Precision and ( e ) F1 Score

To assess the effect of the Random Forest feature selection procedure on the functionality of our model, we conducted an ablation study. The findings projected in Table  4 showed that adding feature selection increased the accuracy of the model from 93.85 to 97.91%, along with gains in precision, recall, and F1-score. This proves how well the feature selection process works to improve the model’s ability to correctly categorize the various grades of diabetic macular oedema (DME) and diabetic retinopathy (DR), underscoring the crucial role that feature selection plays in the overall performance of the classification process.

Recent discoveries in deep learning and medical imaging, such as Zhang et al. [ 42 ] and Zhang et al. [ 43 ], have shown the usefulness of region-based integration-and-recalibration networks for nuclear cataract categorization for AS-OCT images. These investigations emphasize the increasing significance of advanced image processing methods in raising diagnostic precision, as does the work of Xiao et al. [ 44 ], who presented a multi-style spatial attention module for cortical cataract classification.

In contrast with existing research, which mainly concentrates on AS-OCT pictures, our study improves feature extraction from retinal images by using CNNs in conjunction with Discrete Wavelet Transform (DWT). To further set our method apart, we also used the Chicken Swarm Algorithm (CSA) for model weight optimization. Our strategy provides a unique combination of DWT and CSA, exceeding the performance metrics stated in the referenced publications, which focus on attention mechanisms and recalibration.

Furthermore, our results highlight the potential of deep learning methods in real-time clinical settings, especially in automated DR and DME detection, which hasn’t been thoroughly studied with the attention mechanisms employed in existing studies, as far as we came to know. This demonstrates how innovative our methodology is in bringing these approaches to a new setting in medical imaging and advances the area of automated medical diagnosis.

An automatic DR/DME grade detection approach using optimized Deep CNN is introduced in this article. The rise seen in patients affected by DM in recent times has in turn resulted in an increased risk of early age blindness because of DR and DME. Thereby, the proposed work has an impact in aiding with the earlier detection of this serious medical condition. Through prompt detection and proper treatment, a substantial number of DM patients are saved from a potential sightless dark future. In this approach, the input retinal fundus images are initially pre-processed using DWT, resulting in the deliverance of noise-free sharp contrast retinal images. Then with the application of ANN, the exact region of interest is found and segmented. The vital features that support effective classification is obtained using AGF, while RF is used as the feature selection technique in this work. Ultimately, the grades of DR/DME are identified using CS optimized Deep CNN classifier. The entire approach is evaluated for its accuracy using MATLAB software and from the derived results, it is concluded that the CSA is successful in improving the classification accuracy of the Deep-CNN classifier. The proposed automatic DR/DME grade detection technique works with an outstanding accuracy of 97.91%.

Availability of data and materials

IDRiR Dataset: https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid . Messidor Dataset: https://www.adcis.net/en/third-party/messidor/ .

Abbreviations

  • Diabetic Retinopathy
  • Diabetic Macular Edema

Discreate Wavelet Transform

Artificial Neural Network

  • Adaptive Gabor Filter
  • Random Forest

Convolutional Neural Network

  • Chicken Swarm Algorithm

Diabetes Mellitus

Scanning Laser Ophthalmoscope

Maximize Relevancy and Minimize Redundancy

Recurrent Neural Network

Probabilistic Neural Network

Grey Wolf Optimizer

Root Mean Square Error

Peak Signal to Noise Ratio

Mean Square Error

Structural Similarity Index

Wu L, Fernandez-Loaiza P, Sauma J, Hernandez-Bogantes E, Masis M. Classification of diabetic retinopathy and diabetic macular edema. World J Diabetes. 2013;4(6):290.

Article   PubMed   PubMed Central   Google Scholar  

Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018;9:515.

Cole JB, Jose C. Florez. Genetics of diabetes mellitus and diabetes complications. Nat Rev Nephrol. 2020;16:377–90.

Li X, Zhu XHLYL, Fu C-W, Pheng-Ann H. CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE Trans Med Imaging. 2019;39(5):1483–93.

Article   PubMed   Google Scholar  

Markan A, Agarwal A, Arora A, Bazgain K. Vipin Rana, and Vishali Gupta. Novel imaging biomarkers in diabetic retinopathy and diabetic macular edema. Therapeutic Adv Ophthalmol. 2020;12:2515841420950513.

Article   Google Scholar  

Everett LA, Yannis M. Paulus. Laser therapy in the treatment of diabetic retinopathy and diabetic macular edema. Curr Diab Rep. 2021;21(9):1–12.

Chaudhary PK, Pachori RB. Automatic diagnosis of different grades of diabetic retinopathy and diabetic macular edema using 2-D-FBSE-FAWT. IEEE Transact Instrument Measure. 2022;71:1–9.

Tu Z, Gao S, Zhou K, Chen X, Fu H, Gu Z, Cheng J, Zehao Yu, Liu J. SUNet: A lesion regularized model for simultaneous diabetic retinopathy and diabetic macular edema grading. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). 2020. p. 1378–82.

Chapter   Google Scholar  

Kobat SG, Baygin N, Yusufoglu E, Baygin M, Barua PD, Dogan S, Yaman O, et al. Automated diabetic retinopathy detection using horizontal and vertical patch division-based pre-trained DenseNET with digital fundus images. Diagnostics. 2022;12(8):1975.

Horie S, Ohno-Matsui K. Progress of imaging in diabetic retinopathy—from the past to the present. Diagnostics. 2022;12:1684.

Mustafa H, Ali SF, Bilal M, Hanif MS. Multi-Stream Deep Neural Network for Diabetic Retinopathy Severity Classification Under a Boosting Framework. IEEE Access. 2022;10:113172–83.

Wang J, Bai Y, Xia B. Simultaneous diagnosis of severity and features of diabetic retinopathy in fundus photography using deep learning. IEEE J Biomed Health Inform. 2020;24(12):3397–407.

Abdelsalam MM, Zahran MA. A novel approach of diabetic retinopathy early detection based on multifractal geometry analysis for OCTA macular images using support vector machine. In IEEE Access. 2021;9:22844–58.

Thanh DN, Hoang, Serdar Engínoğlu. An iterative mean filter for image denoising. IEEE Access. 2019;7:167847–59.

Tang H, Ni R, Zhao Y, Li X. Median filtering detection of small-size image based on CNN. J Vis Commun Image Represent. 2018;51:162–8.

Rakshit M. An efficient ECG denoising methodology using empirical mode decomposition and adaptive switching mean filter. Biomed Signal Process Control. 2018;40:140–8.

He Y, Jiao W, Shi Y, Lian J, Zhao B, Zou W, Zhu Y, Zheng Y. Segmenting diabetic retinopathy lesions in multispectral images using low-dimensional spatial-spectral matrix representation. IEEE J Biomed Health Inform. 2019;24(2):493–502.

Cai W, Zhai B, Liu Y, Liu R, Xin Ning. Quadratic polynomial guided fuzzy C-means and dual attention mechanism for medical image segmentation. Displays. 2021;70: 102106.

Zhai S, Jiang T. Sparse representation-based feature extraction combined with support vector machine for sense‐through‐foliage target detection and recognition. IET Signal Proc. 2014;8(5):458–66.

Menotti D, Najman L, Facon J, Arnaldo de A, Araújo. Multi-histogram equalization methods for contrast enhancement and brightness preserving. IEEE Trans Consum Electron. 2007;53(3):1186–94.

Islam, Nahidul MD, Sulaiman N, Rashid M, Bari BS, Jahid Hasan MD, Mustafa M, Jadin MS. "Empirical mode decomposition coupled with fast fourier transform based feature extraction method for motor imagery tasks classification. In: 2020 IEEE 10th International Conference on System Engineering and Technology (ICSET). 2020. p. 256–61.

Ullah N, Mohmand MI, Ullah K, Gismalla MSM, Ali L, Khan SU, Ullah N. Diabetic Retinopathy Detection Using Genetic Algorithm-Based CNN Features and Error Correction Output Code SVM Framework Classification Model. In: Wireless Communications and Mobile Computing 2022. 2022.

Google Scholar  

Leontidis G, Hunter A. A new unified framework for the early detection of the progression to diabetic retinopathy from fundus images. Comput Biol Med. 2017;90:98–115.

Khalil H, El-Hag N, Sedik A, El-Shafie W, Mohamed AE, Khalaf AAM, El-Banby GM, Abd El-Samie FI, El-Fishawy AS. Classification of Diabetic Retinopathy types based on Convolution Neural Network (CNN). Menoufia Journal of Electronic Engineering Research, 28(ICEEM2019-Special Issue). 2019:126–53. https://doi.org/10.21608/mjeer.2019.76962 .

Khan S, Haris Z, Abbas, Danish Rizvi SM. Classification of diabetic retinopathy images based on customised CNN architecture. In: 2019 Amity International conference on artificial intelligence (AICAI). 2019. p. 244–8.

S Sundaram, et al. Diabetic Retinopathy and Diabetic Macular Edema Detection Using Ensemble Based Convolutional Neural Networks. Multidisciplinary Digital Publishing Institute. 2023;13(5):1001–1001. https://doi.org/10.3390/diagnostics13051001 .

J. He, L. Shen, X. Ai and X. Li. "Diabetic Retinopathy Grade and Macular Edema Risk Classification Using Convolutional Neural Networks". Jul. 2019. https://doi.org/10.1109/icpics47731.2019.8942426 .

Reyes ACS et al. Sep. SBC Based Diabetic Retinopathy and Diabetic Macular Edema Classification System using Deep Convolutional Neural Network. vol. 9. no. 3. pp. 9–16. 2020. https://doi.org/10.35940/ijrte.c4195.099320 .

Kiruthikadevi K. Convolutional neural networks for diabetic retinopathy macular edema from color fundus image. Int J Res Appl Sci Eng Technol (IJRASET). 2021;9(3):1436–40. https://doi.org/10.22214/ijraset.2021.33514 .

Kumar GS, SSAR 1. “A comprehensive review on detecting diabetic eye diseases using deep learning and machine learning models.” Int J Res Appl Sci Eng Technol (IJRASET). 2023;11(9):49–58. https://doi.org/10.22214/ijraset.2023.55596 .

L Wu. Classification of diabetic retinopathy and diabetic macular edema. 2013;4(6):290–290. https://doi.org/10.4239/wjd.v4.i6.290 .

Reddy VPC, Gurrala KK. Joint DR-DME classification using deep learning-CNN based modified grey-wolf optimizer with variable weights. Biomed Signal Process Control. 2022;73:103439–103439.

Balasuganya B, Chinnasamy A, Sheela D. An effective framework for the classification of retinopathy grade and risk of macular edema for diabetic retinopathy images. J Med Imaging Health Inf. 2022;12:138–48. https://doi.org/10.1166/jmihi.2022.3933 .

Gangaputra S, Lovato JF, Larry Hubbard, Davis MD, Esser BA, Ambrosius WT, Chew EY, Greven C, Perdue LH, Wong WT, Audree Condren, Wilkinson CP, Agrón E, Adler S, Danis RP, ACCORD Eye Research Group. Comparison of standardized clinical classification with fundus photograph grading for the assessment of diabetic retinopathy and diabetic macular edema severity. Retina. 2023;33(7):1393–9. https://doi.org/10.1097/IAE.0b013e318286c952 .

Xu J, Yang W, Wan C, Shen J. Weakly supervised detection of central serous chorioretinopathy based on local binary patterns and discrete wavelet transform. Comput Biol Med. 2020;127: 104056. https://doi.org/10.1016/j.compbiomed.2020.104056 . Epub 2020 Oct 14. PMID: 33096297.

Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H. State-of-the-art in artificial neural network applications: a survey. Heliyon. 2018;4(11): e00938. https://doi.org/10.1016/j.heliyon.2018.e00938 . PMID: 30519653; PMCID: PMC6260436.

Shen H, Mellempudi N, He X, Gao Q, Wang C, Wang M. Efficient post-training quantization with fp8 formats. ArXiv. /abs/2309.14592. 2023.

Belgacem R, et al. Applying a set of gabor filter to 2D- retinal Fundus image to detect the Optic nerve Head (ONH). Ann Med Health Sci Res. 2018;8:48–58.

Chen RC, Dewi C, Huang SW, et al. Selecting critical features for data classification based on machine learning methods. J Big Data. 2020;7:52. https://doi.org/10.1186/s40537-020-00327-4 .

Article   CAS   Google Scholar  

Wang H, Chen Z, Liu G. An Improved Chicken Swarm Optimization Algorithm for Feature Selection. In: Qian, Z., Jabbar, M., Li, X, editors Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications. WCNA 2021. Lecture Notes in Electrical Engineering. Springer, Singapore; 2022. https://doi.org/10.1007/978-981-19-2456-9_19 .

Decencière, et al. Feedback on a publicly distributed database: the Messidor database. Image Analysis & Stereology. 2014;v. 33(n. 3):231–4 ISSN 1854–5165.

Zhang X, Xiao Z, Fu H, et al. Attention to region: region-based integration-and-recalibration networks for nuclear cataract classification using AS-OCT images. Med Image Anal. 2022;80: 102499.

Zhang X, Xiao Z, Yang B, et al. Regional context-based recalibration network for cataract recognition in AS-OCT. Pattern Recogn. 2024;147: 110069.

Xiao Z, Zhang X, Zheng B, et al. Multi-style spatial attention module for cortical cataract classification in AS-OCT image with supervised contrastive learning. Comput Methods Programs Biomed. 2024;244: 107958.

Download references

Acknowledgements

We would like to thank VIT Chennai for providing funding for open access publication.

Open access funding provided by Vellore Institute of Technology. This research received funding from VIT Chennai for open access publication.

Author information

Authors and affiliations.

School of Computer Science & Engineering, Vellore Institute of Technology, Chennai, India

V Thanikachalam, K Kabilan & Sudheer Kumar Erramchetty

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed significantly to the development and completion of this manuscript. Their specific contributions are detailed below:- Thanikachalam V: Conceptualization, methodology, formal analysis, investigation, and writing—original draft preparation.- Kabilan K: Data curation, software implementation, visualization, and writing—review and editing.- Sudheer Kumar Erramchetty: Supervision, project administration, funding acquisition, and writing—review and editing.Each author has approved the submitted version and has agreed to be personally accountable for their contributions to the work, ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding authors

Correspondence to V Thanikachalam or Sudheer Kumar Erramchetty .

Ethics declarations

Ethics approval and consent to participate.

Not applicable. This study did not involve any human or animal subjects that require ethics approval.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Thanikachalam, V., Kabilan, K. & Erramchetty, S.K. Optimized deep CNN for detection and classification of diabetic retinopathy and diabetic macular edema. BMC Med Imaging 24 , 227 (2024). https://doi.org/10.1186/s12880-024-01406-1

Download citation

Received : 03 July 2024

Accepted : 21 August 2024

Published : 28 August 2024

DOI : https://doi.org/10.1186/s12880-024-01406-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Retinal Fundus Image
  • Discreate Wavelet transform
  • Artificial neural network
  • Deep convolutional neural network

BMC Medical Imaging

ISSN: 1471-2342

literature review on face detection algorithms

A systematic literature review for load balancing and task scheduling techniques in cloud computing

  • Open access
  • Published: 05 September 2024
  • Volume 57 , article number  276 , ( 2024 )

Cite this article

You have full access to this open access article

literature review on face detection algorithms

  • Nisha Devi 1 ,
  • Sandeep Dalal 1 ,
  • Kamna Solanki 2 ,
  • Surjeet Dalal 3 ,
  • Umesh Kumar Lilhore 4 ,
  • Sarita Simaiya 4 &
  • Nasratullah Nuristani 5  

Cloud computing is an emerging technology composed of several key components that work together to create a seamless network of interconnected devices. These interconnected devices, such as sensors, routers, smartphones, and smart appliances, are the foundation of the Internet of Everything (IoE). Huge volumes of data generated by IoE devices are processed and accumulated in the cloud, allowing for real-time analysis and insights. As a result, there is a dire need for load-balancing and task-scheduling techniques in cloud computing. The primary objective of these techniques is to divide the workload evenly across all available resources and handle other issues like reducing execution time and response time, increasing throughput and fault detection. This systematic literature review (SLR) aims to analyze various technologies comprising optimization and machine learning algorithms used for load balancing and task-scheduling problems in a cloud computing environment. To analyze the load-balancing patterns and task-scheduling techniques, we opted for a representative set of 63 research articles written in English from 2014 to 2024 that has been selected using suitable exclusion-inclusion criteria. The SLR aims to minimize bias and increase objectivity by designing research questions about the topic. We have focused on the technologies used, the merits-demerits of diverse technologies, gaps within the research, insights into tools, forthcoming opportunities, performance metrics, and an in-depth investigation into ML-based optimization techniques.

Explore related subjects

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The surge in IoT device usage has led to the emergence of cloud computing as a significant research focus. It offers a variety of services in many different application areas, with the highest level of flexibility and scalability. The high growth of information and communication technologies (ICT) has resulted in integrating big data with the IoT, revolutionizing cloud services. Within this transformative framework, cloud computing is pivotal in enabling efficient and scalable solutions for managing big data. Numerous cloud service providers enable organizations to obtain the optimal software, storage, and hardware facilities needed to accomplish their goals at a much more affordable cost. Customers subscribe to the services they require under the cloud computing paradigm and sign a service level agreement (SLA) with the cloud vendor, outlining the quality of service (QoS) and conditions of service provision. Table 1 presents the service control that the various cloud service models offer to end-users. Load balancing is a method that distributes tasks among virtual machines (VMs) using a Virtual Machine Manager (VMM). It assists in handling different types of workloads, such as CPU, network, and memory demands (Buyya 2018 ) (Mishra and Majhi 2020 ). The cloud computing infrastructure has three significant challenges: virtualization, distributed frameworks, and load balancing. The load-balancing problem is defined as the allocation of workloads among the processing modules. In a multi-node environment, it is quite probable that certain nodes will experience excessive workload while others will remain inactive. Load unbalancing is a harmful event for cloud service providers (CSPs), as it diminishes the dependability and effectiveness of computing services while also putting at risk the quality of service (QoS) guaranteed in the service level agreement (SLA) between the customer and the cloud service provider (Oduwole et al. 2022 ). Verma et al. ( 2024 ) introduced a load-balancing methodology, utilizing genetic algorithms (GA), to improve the quality of the telemedicine industry by efficiently adapting to changing workloads and network conditions at the fog level. The flexibility to adapt can enhance patient care and provide scalability for future healthcare systems. Walia et al. ( 2023 ) cover several emerging technologies in their survey, including Software-Defined Networking (SDN), Blockchain, Digital Twins, Industrial IoT (IIoT), 5G, Serverless computing, and quantum computing. These technologies can be incorporated with the current fog/edge-of-things models for improved analysis and provide business intelligence for IoT platforms. Adaptive resource management strategies are necessary for efficient scheduling and decision-offloading due to the infrastructural efficiency of these computing paradigms.

1.1 Need for load balancing, factors affecting and associated challenges

Intelligent Computing Resource Management (ICRM) is rapidly evolving to meet the increasing needs of businesses and sectors, driven by the proliferation of Internet-based technologies, cloud computing, and cyber-physical systems. With the rise of information-intensive applications, artificial intelligence, cloud computing, and IoT, intelligent computing monitoring and resource allocation have become crucial (Biswas et al. 2024 ). Cloud data centers typically need to be optimized because they are built to handle hundreds of loads, which could result in low resource utilization and energy waste. The goals of load balancing include reduced job execution times, optimal resource utilization, and high system throughput. Load balancing reduces the overall resource waiting time and avoids resource overload (Apat et al. 2023 ). In terms of the equilibrium load distribution, load balancing between virtual machines (VMs) is an NP-hard problem. The difficulty of this problem can be determined by taking two elements into account: huge solution spaces and polynomial-bounded computing. The load can be characterized as under-load, overloaded, or balanced in a cloud computing environment. Identifying overloaded and under-loaded nodes and then distributing the load across them is critical to load balancing (Santhanakrishnan and Valarmathi 2022 ). With the emergence of technology, many challenges have also ushered in a sequence. These challenges include storage capacity, high processing speed, low latency, fast transmission, load balancing, efficient routing, cost efficiency, etc. Load balancing is a crucial optimisation procedure in cloud computing, and achieving this objective depends on dynamic resource allocation. Some factors that affect load balancing in cloud computing are as follows:

Workload patterns: The variating workload, unpredictable traffic patterns, and heterogeneous applications may affect the efficiency of the cloud system.

Geographical distribution: The cloud data centres are generally located in remote areas that contribute to transmission delays. So, fog computing and edge computing are required to reduce these delays. We must efficiently manage the limited resources of the fog and edge devices.

Cost and budget constraints: Cost considerations have a big impact on load-balancing strategies. It frequently aims to use less expensive resources or minimize idle assets.

The dynamic nature of applications and monitoring necessitates the elasticity and scalability of cloud services. In addition, inadequate monitoring makes it challenging to balance the load.

SLA agreements and breaches: SLA violations are impacted by the services offered by cloud service providers. It is quite necessary to maintain the quality without compromising other factors like throughput, makespan, energy consumption, and cost.

Virtual Machine (VM) Migrations: An increase in the number of VM migrations leads to a decrease in service quality. While VM migration can be beneficial to some extent, its frequency can lead to an increase in time complexity. It takes a lot of time to transfer data from one VM to another, including copying memory pages to the host machine.

Resource availability: Insufficient resources, such as CPU, memory, or bandwidth, limit the load balancing efficiency.

Energy consumption is a critical factor in data centers. Load balancing is very necessary to reduce energy consumption by migrating VMs from overloaded resources to underloaded hosts.

Other factors like fault tolerance, predictive analytics, network latency and data security also affect load balancing in a cloud system. We have divided the technologies reviewed through this SLR into five categories: conventional/traditional, heuristic, meta-heuristic, ML-Centric and Hybrid. Traditional approaches to cloud computing resource allocation and load balancing are time-consuming, unable to yield fast results, and frequently trapped in local optima (Mousavi et al. 2018 ). In different cloud systems, where resource requirements are estimated at runtime, static load balancing algorithms might not be successful. Dynamic load balancing algorithms, like ESCE and Throttled mechanism, analyse resource requirements and usage during runtime, yet they may result in extra costs and overhead. Traditional algorithms often struggle to scale with the size and complexity of problems. Several articles explore traditional task scheduling algorithms, including Min-min, First come-first serve (FCFS), and Shortest-job-first (SJF). These algorithms are not used often due to their slow processing and time-consuming behaviour. To overcome the issue of conventional methods, a heuristic approach came into the area of research. Kumar and Sharma ( 2018 ) propose a resource provisioning and de-provisioning algorithm that outperforms FCFS, SJF, and Min-min in terms of makespan time and task acceptance ratio. However, the priority of tasks is poorly considered, highlighting a limitation in task allocation strategies. Heuristic algorithms demonstrate remarkable scalability. They are highly suitable for handling large-scale optimisation challenges in various industries, including manufacturing, banking, and logistics, due to their efficiency in locating approximate solutions, even in enormous search spaces (Mishra and Majhi 2020 ). Kumar et al. ( 2018 ) presented another heuristic method named ‘Dynamic Load Balancing Algorithm with Elasticity’, showcasing reduced makespan time and increased task completion ratio. Dubey et al. ( 2018 ) introduced a Modified Heterogeneous Earliest Finish Time (HEFT) algorithm, demonstrating improved server workload distribution to reduce makespan time. While promising, both studies lack comprehensive performance evaluations and limitedly address other Quality of Service (QoS) metrics, such as response time and cost efficiency. Hung et al. ( 2019 ) proposed an Improved Max–min algorithm, achieving the lowest completion and optimal response times. It outperformed the conventional RR, max–min and min-min algorithms.

The development of meta-heuristic algorithms aimed to address the shortcomings of heuristic algorithms, which typically produce approximate rather than ideal solutions. Hybrid techniques have gained traction in recent years, combining heuristic, traditional, and machine-learning approaches. Mousavi et al. ( 2018 ) propose a hybrid technique combining Teaching Learning-Based Optimization (TLBO) and Grey Wolf Optimization (GWO), achieving maximized throughput without falling into local optima. Similarly, Behera and Sobhanayak ( 2024 ) propose a hybrid GWO-GA algorithm, outperforming GWO, GA (Rekha and Dakshayini 2019 ), and PSO in terms of makespan, cost, and energy consumption. Further, we have also discussed the cloud and fog architecture and its working principles in the upcoming sections.

1.2 Motivation for the study

The Industrial Internet of Things (IIoT) has experienced significant advancement and implementation due to the quick progress and use of artificial intelligence techniques. In Industry 5.0, the hyper-automation process involves the deployment of intelligent devices connected to the Industrial Internet of Things (IIoT), cloud computing, smart robots, agile software, and embedded components. These systems can leverage the Industry 5.0 concept, which generates massive amounts of data for hyper-automated communication across cloud computing, digital transformation, human sectors, intelligent robots, and industrial production. Big data management requires cloud and fog technology (Souri et al. 2024 ). Similarly, telemedicine, facilitated by fog computing, has revolutionized the healthcare industry by providing remote access to medical treatments. However, ensuring minimal latency and effective resource utilization are essential for providing high-quality healthcare (Verma et al. 2024 ). Big data in the industrial sector is crucial for predictive maintenance, enabling informed decisions and enhancing task allocation in Industry 4.0, thus necessitating a proficient resource management system (Teoh et al. 2023 ). The growing demand for load balancing in various industries using cloud/fog services prompted us to contemplate and inspired us to compose an evaluation of the escalating necessity for resource management technologies. This review’s core contribution is to provide insights into innovative algorithms, their weaknesses and strengths, used dataset details, simulation tools, research gaps, and future research directions.

1.3 Objectives of the SLR

After a detailed review of the selected studies, we observe the following objectives:

Systematically categorise and identify different load balancing and task scheduling algorithms used in cloud computing.

To address fundamental research questions, such as the effectiveness of different algorithmic approaches, simulation tools, metrics evaluation, etc.

To analyse trends and patterns in the literature, such as the prevalence of Meta-heuristic, Hybrid, and ML-centric approaches, and identify any shifts or emerging paradigms in algorithm design.

To conduct a comparative analysis of the different algorithm categories, identifying strengths, weaknesses, research limitations and trade-offs between them.

Lay the groundwork for future technological advancements by identifying areas where further research and development are needed.

1.4 Research contributions of the SLR

Through this SLR, we have attempted to contribute the following insights, which are based on authentic, selected study material:

We have examined selected articles to identify the research patterns and technological advancements related to resource load balancing in cloud computing. We have devised research questions and attempted to ascertain their solutions.

Using this SLR, we presented a taxonomy of algorithms that provide solutions to the chosen problem.

We provided an in-depth examination of the limitations and advantages of different strategies, along with a thorough comparison study of the techniques discussed in Table  5 , Table  7 , and Table  8 .

We have discussed the performance metrics related to load balancing and task scheduling in the cloud system. We have also explored the simulation tools that the authors in this field prefer.

We have tabulated some benchmarked datasets (Table  6 ) utilized by various authors to achieve several performance metrics.

Finally, we compiled the research gaps and potential areas for future research.

The paper is structured in nine sections, as shown in Fig.  1 above.

figure 1

Various sections and subsections of the SLR

2 Methodology of the systematic literature review

This section lays out the components of a systematic literature review, including the search criteria, review methodology, and research questions. This process involves defining research questions or objectives, identifying relevant databases and sources, and systematically searching and screening for eligible studies. The search term constitutes a string encompassing all essential keywords in the research questions and their corresponding synonyms.

2.1 Search criteria and quality assessment:

The keywords utilized to form the search strings are “load balancing”, “task scheduling”, “cloud computing”, and “machine learning.” To extract relevant papers, the below advanced search query was used in Scopus Database:

figure a

The various computer science publication libraries were manually searched. The SLR search was conducted using the Scopus database, IEEE Computer Society, ResearchGate, Science Direct, Springer, and ACM Digital Archive.

A total of 550 papers were found initially using the above-mentioned advanced query. Then we applied the Inclusion–exclusion criteria provided in Table  2 . Approximately 122 papers were excluded based on having zero citations or requiring purchase to access. We have incorporated cross-referenced studies to obtain a more comprehensive and quality analysis. We manually chose 35 cross-references from the extracted set that strictly adhered to the search criteria to encompass a broader range of reliable studies. A comprehensive selection of 96 papers was finalised, comprising 63 research articles exclusively considered for the technological survey.

2.2 Inclusion–exclusion criteria

The criterion for accepting or rejecting a research paper for the study is explained in Table  2 below.

Data extraction has been performed to capture key information from each study, such as design, methods or techniques, research limitations, future scope, tools, evaluation metrics, and other significant findings. This captured information was then synthesized and analyzed through a systematic and structured approach and placed in a tabular format to provide insights and draw conclusions about the research questions.

2.3 Research questions:

This study aims to search for answers to the following research issues by investigating, comprehending, and evaluating the methods, models, and algorithms utilized to achieve task scheduling and load balancing.

What are the current load balancing and task scheduling techniques commonly used in cloud computing environments?

What are the key factors influencing the performance of load-balancing mechanisms in cloud computing?

Which evaluation matrices are predominantly utilized for assessing the efficacy of load-balancing techniques in cloud computing environments?

Which categories of algorithms are used more in the recent research trend in the cloud computing environment for solving load balancing issues??

Which simulation software tools have garnered prominence in recent scholarly analyses within the domain of cloud computing research?

What insights do the future perspectives within the reviewed literature offer in terms of potential avenues for exploration and advancement within the field?

This next section explores the working principle and architecture of cloud computing, which consists of fog and IoT application layers.

3 Cloud-fog architecture and relevant frameworks

Cloud architecture represents a centralized infrastructure that broadens the scope of cloud computing functionalities towards the network’s edge. It leverages fog computing, an intermediate layer between cloud servers and end devices, to enable real-time processing, data storage, and analytics closer to the data source. Fog nodes, deployed at the network edge, play the role of mediators linking end devices and the cloud, thus reducing latency and bandwidth consumption. These nodes can be physical or virtual entities, such as routers, switches, gateways, or even edge servers.

3.1 Working principles

The working principles of cloud architecture involve collaboration between cloud servers, fog nodes, and end devices, creating a distributed computing environment. An end device initiates a request, which first passes through the nearest fog node. The fog node performs initial processing, filtering, and aggregation of the data before sending a subset of it to the cloud for further analysis or storage. By offloading some processing tasks to the fog nodes, cloud-fog architecture reduces the burden on the cloud, improves response times, and enhances the overall system performance. During task execution, dynamic cloud load balancing techniques assign tasks to virtual machines and adjust the load on these machines based on the system’s conditions. (Tawfeeg et al. 2022 ). Alatoun et al. ( 2022 ) presented an EEIoMT framework for critical task execution in the shortest time in smart medical services while balancing energy consumption with other tasks. The authors have utilized ECG sensors for health monitoring at home. Similarly, Swarna Priya, et al ( 2020 ) have proposed an energy-efficient framework known as the ‘EECloudIoE framework’ for retrieving information from the IoE cloud network. The authors have adopted the ‘Wind Driven optimization algorithm’ to form clusters of sensor nodes in the IoE network. Then, the Firefly algorithm is utilized to select the ‘cluster head’ (CH) for each cluster. Sensor nodes in sensor networks are also used to track physical events in cases of widely dispersed geographic locations. These nodes assist in gathering crucial data from these sites over extended periods; however, they have problems with low battery power. Therefore, it is essential to implement energy-efficient systems using wireless sensor networks to collect this data. Still, cloud computing has some limitations, such as geographical locations of cloud data centers, network connectivity with end nodes, weather conditions, etc. To overcome these issues, Fog computing emerged as a solution. Fog computing acts as an arbitrator between end devices and Cloud Computing, providing storage, networking, and computation services closer to edge devices. The introduction of Edge Computing has brought about the emergence of various computing paradigms, such as Mobile Edge Computing (MEC) and Mobile Cloud Computing (MCC). The MEC primarily emphasizes a 2- or 3-tier application in the network and mobile devices equipped with contemporary cellular base stations. It improves the efficiency of networks by optimizing content distribution and facilitating the creation of applications (Sabireen and Neelanarayanan 2021 ). Figure  2 shows how the cloud, fog, and IoT layers work in collaboration.

figure 2

The fog extends the cloud closer to the devices producing data (Swarna Priya, et al 2020 ; Vergara et al. 2023 )

3.2 Cloud computing layer

Cloud computing facilitates virtualization technology, which combines distributed and parallel processes. Using centralized data centers, it transfers computations from off-premises to on-premises. It has become an advanced technology within the swiftly expanding realm of computing paradigms owing to these two principles: (1) ‘Dynamic Provisioning’ and (2) ‘Virtualization Technology’ (Tripathy et al. 2023 ). Dynamic provisioning is a fundamental concept in the realm of cloud computing. It refers to the automated process of allocating and adjusting computing resources to meet the changing needs of cloud-based applications and services. Virtual network embedding is essential to load balancing in cloud computing as it ensures the mapping of virtual network requests onto physical resources in an effective and balanced manner. By effectively embedding virtual networks onto physical machines, load-balancing algorithms can divide network traffic and workload evenly across the network infrastructure, preventing any single resource from becoming overloaded. Virtual network embedding may be utilized with load-balancing strategies like least connections, weighted round-robin, and round-robin to maximize resource usage and network performance (Apat et al. 2023 ; Santhanakrishnan and Valarmathi 2022 ).

3.3 Fog computing layer

Cisco researchers first used the term fog computing in 2012 to address the shortcomings of cloud computing. To offer fast and reliable services to mobile consumers, fog computing enhances their experiences by introducing a middle fog layer between consumers and the cloud. It is an improvement over cloud-based networking and computing services. The architecture of fog computing consists of a fog server as a fog device or fog node deployed in the proximity of IoT devices to provide resources for different applications. As a promising concept, fog computing introduces a decentralized architecture that enhances data processing capabilities at the network’s edge (Goel and Tiwari 2023 ). However, the limited resources in the fog computing model undoubtedly make it difficult to support several services for these Internet of Things applications. A prompt choice must be made regarding load balancing and application placement in the fog layer due to the diverse and ever-changing nature of application requests from IoT devices. Therefore, it is crucial to allocate resources optimally to maintain service continuity for end customers (Vergara et al. 2023 ). Unlike cloud computing, fog utilizes distributed computing with devices near clients with good computing capacity and diverse organizations for global connectivity. Mahmoud et al. ( 2018 ) introduced a new fog-enabled cloud IoT model by observing that cloud IoT is not the best option in situations where energy usage and latency are important considerations, such as the healthcare sector, where patients need to be monitored in real-time without delay. The energy allocation method used to load jobs into a fog device serves as the foundation for the entire concept. Table 3 presents a comparison between the features of cloud and fog computing paradigms.

3.4 IoT applications layer

Cloud-fog architecture finds applications in various domains, including IoT, healthcare (Alatoun et al. 2022 ), transportation, smart cities, and industrial automation (Dogo et al. 2019 ). Healthcare providers can leverage fog nodes for real-time patient monitoring, while industrial automation systems can benefit from edge analytics for predictive maintenance. Telemedicine, smart agriculture and industry 4.0 and 5.0 are other areas that employ IoT applications. Edge computing and cloud computing have given rise to additional computing paradigms such as mobile edge computing (MEC) and mobile cloud computing (MCC). The MEC primarily emphasizes a network architecture that includes a 2- or 3-tier application, and mobile devices equipped with modern wireless base stations. It improves network efficiency, as well as the dissemination of application content (Sabireen and Neelanarayanan 2021 ).

4 Literature review on load balancing (LB) and task scheduling

We have curated a representative collection of 63 research articles for a technology review. The literature review covers the period from 2014 to 2024. The main target of the LB is to spread the workload on available assets and optimize the overall turnaround time. Before 2014, traditional methods such as FCFS, SJF, MIM-min, Max–min, RR, etc., were recognized for their poor processing speeds and time-consuming job scheduling and load balancing systems. Konjaang et al. ( 2018 ) examine the difficulties associated with the conventional Max–Min algorithm and propose the Expa-Max–Min method as a possible solution. The algorithm prioritizes cloudlets with the longest and shortest execution times to schedule them efficiently. The workload can be divided into memory capacity issues, CPU load, and network load. In the meantime, load balancing techniques, with virtual machine management (VMM), are employed in cloud computing to distribute the load among virtual machines (Velpula et al. 2022 ). In 2019, Hung et al. ( 2019 ) introduced an enhanced max–min algorithm called MMSIA. The objective of the MMSIA algorithm is to improve the completion time in cloud computing by utilizing machine learning to cluster requests and optimize the utilization of virtual machines. The system allocates big requests to virtual machines (VMs) with the lowest utilization percentage, improving processing efficiency. The approach integrates supervised learning into the Max–Min scheduling algorithm to enhance clustering efficiency. Kumar et al. ( 2018 ) state that the updated HEFT algorithm creates a Directed Acyclic Graph (DAG) for all jobs submitted to the cloud. It also assigns computation costs and communication edges across processing resources.

The ordering of tasks is determined by their execution priority, which considers the average time it takes to complete each work on all processors and the expenses associated with communication between predecessor tasks. Subsequently, the tasks are organized in a list according to their decreasing priority and assigned to processors based on the shortest execution time. In the same way, Seth and Singh ( 2019 ) propose the Dynamic Heterogeneous Shortest Job First (DHSJF) model as a solution for work scheduling in cloud computing systems with varying capabilities. The algorithm entails the establishment of a heterogeneous cloud computing environment, the dynamic generation of cloudlet lists, and the analysis of workload and resource heterogeneity to minimize the Makespan. The DHSJF algorithm efficiently schedules dynamic requests to various resources, resulting in optimized utilization of resources. This method overcomes the limitations of the conventional Shortest Job First (SJF) method. A task scheduling process is shown graphically in Fig.  3 .

figure 3

Working of task scheduling in cloud computing

Another technique that many authors increasingly employ is GWO. The GWO technique correlates the duties of grey wolves with viable solutions for distributing jobs or equalizing workloads inside a network or computing system. The Alpha wolves lead the pack, representing the most optimal solution achieved up to this point. The Alpha receives assistance in decision-making and problem-solving from the Beta and Delta wolves, who represent the second and third most optimal alternatives, respectively. The omega wolves, who stand for the remaining solutions, are inspired by the top three wolves. The algorithm represents the exploration and exploitation stages in pursuing the optimal solution through a repetitive process of encircling, hunting, and attacking the target. In 2020, Farrag et al. ( 2020 ) published a work that examines the application of the Ant-Lion optimizer (ALO) and Grey wolf optimizer (GWO) in job scheduling for Cloud Computing. The objective of ALO and GWO is to optimize the makespan of tasks in cloud systems by effectively dividing the workload. Although ALO and GWO surpass the Firefly Algorithm (FFA) in minimizing makespan, their performance relative to PSO varies depending on the specific conditions. Reddy et al. ( 2022 ) introduced the AVS-PGWO-RDA scheme, which utilizes Probabilistic Grey Wolf optimization (PGWO) in the load balancer unit to find the ideal fitness value for selecting user tasks and allocating resources for tasks with lower complexity and time consumption. The AVS approach is employed to cluster related workloads, and the RDA-based scheduler ultimately assigns these clusters to suitable virtual machines (VMs) in the cloud environment. Similarly, Janakiraman and Priya ( 2023 ) introduced the Hybrid Grey Wolf and Improved Particle Swarm Optimization Algorithm with Adaptive Inertial Weight-based multi-dimensional Learning Strategy (HGWIPSOA). This algorithm combines the Grey Wolf Optimization Algorithm (GWOA) with Particle Swarm Optimization (PSO) to efficiently assign tasks to Virtual Machines (VMs) and improve the accuracy and speed of task scheduling and resource allocation in cloud environments. The suggested system effectively tackles the limitations of previous LB approaches by preventing premature convergence and enhancing global search capability. As a result, it provides several benefits, including improved throughput, reduced makespan, reduced degree of imbalance, decreased latency, and reduced execution time. The combination of GWO with GA, as demonstrated by Behera and Sobhanayak ( 2024 ), yields superior results. It provides faster convergence and minimum makespan in large task scheduling scenarios.

At the beginning of 2014, metaheuristic and hybrid-metaheuristic algorithms were used to address cloud computing optimization and load-balancing challenges. Zhan et al. ( 2014 ) suggested a load-aware genetic algorithm called LAGA which is a modified version of the genetic algorithm (GA). LAGA employs the TLB model to optimize makespan and load balance, establishing a new fitness function to find suitable schedules that maintain makespan while maintaining load balance. Rekha and Dakshayini ( 2019 ) introduced a task allocation method for cloud environments that utilizes a Genetic Algorithm. The purpose of this strategy is to minimize job completion time and enhance overall performance. The algorithm considers multiple objectives, such as energy consumption and quick responses, to make the best decisions regarding resource allocation. The evaluation findings exhibit superior throughput using the proposed approach, indicating its efficacy in task allocation decision-making. In 2023, Mishra and Majhi ( 2023 ) proposed a hybrid meta-heuristic technique called GAYA, which combines the Genetic Algorithm (GA) and JAYA algorithm. The purpose of this technique is to efficiently schedule dynamically independent biological data. The GAYA algorithm showcases improved abilities in exploiting and exploring, rendering it a highly viable solution for scheduling dynamic medical data in cloud-based systems. Brahmam and Vijay Anand ( 2024 ) developed a model called VMMISD, where they combined a Genetic Algorithm (GA) with Ant Colony Optimization (ACO) for resource allocation. The system also utilizes combined optimization techniques, iterative security protocols, and deep learning algorithms to enhance the efficiency of load balancing during virtual machine migrations. The model employs K K-means clustering, Fuzzy Logic, Long Short-Term Memory (LSTM) networks, and Graph Networks to anticipate workloads, make decisions, and measure the affinity between virtual machines (VMs) and physical machines. Behera and Sobhanayak ( 2024 ) also proposed a hybrid approach that combines the Grey Wolf Optimizer (GWO) and Genetic Algorithm (GA). The hybrid GWO-GA algorithm effectively reduces makespan, energy consumption, and computing costs, surpassing conventional algorithms in performance. It exhibits accelerated convergence in extensive scheduling problems, offering an edge over earlier techniques.

The combination of autoscaling and reinforcement learning (RL) has garnered significant attention in recent years due to its ability to allocate resources actively in a calm and focused environment (Joshi et al. 2024 ). Deep reinforcement learning (DRL) is a promising technique that automates the process of predicting workloads. DRL may make immediate decisions on resource allocation based on real-time monitoring of the system’s workload and performance parameters to effectively fulfil the system’s present demands. Ran et al. ( 2019 ) introduced a task-scheduling strategy based on deep reinforcement learning (DRL) in 2019. The working of the DRL-based load balancer is shown in Fig.  4 . This method assigns tasks to various virtual machines (VMs) in a dynamic manner, resulting in a decrease in average response time and ensuring load balancing. The technique is examined on a tower server with specific configurations and software tools. It showcases its efficacy in balancing load across virtual machines (VMs) while adhering to service level agreement (SLA) limits. The approach employs deep reinforcement learning (DRL) and deep deterministic policy gradients (DDPG) networks to create optimal scheduling decisions by learning directly from experience without prior knowledge. In addition, Jyoti and Shrimali ( 2020 ) employed DRL in their research and proposed a technique called Multi-agent ‘Deep Reinforcement Learning-Dynamic Resource Allocation’ (MADRL-DRA) in the Local User Agent (LUA) and Dynamic Optimal Load-Aware Service Broker (DOLASB) in the Global User Agent (GUA) to improve the quality of service (QoS) metrics by allocating resources dynamically. The method demonstrates enhanced performance in terms of execution time, waiting time, energy efficiency, throughput, resource utilization, and makespan when compared to traditional approaches. Tong et al. ( 2021 ) present a new technique for task scheduling using deep reinforcement learning (DRL) that aims to reduce the imbalance of virtual machines (VMs) load and the rate of job rejection while also considering service-level agreement limitations. The proposed DDMTS method exhibits stability and outperforms other algorithms in effectively balancing the Degree of Imbalance (DI) and minimizing job rejection rate. The precise configurations of state, action, and reward in the DDMTS algorithm are essential for its efficacy in resolving task scheduling difficulties using the DQN algorithm.

figure 4

Working of load balancer in cloud computing

Double Deep Q-learning has been employed to address load-balancing concerns. Swarup et al. ( 2021 ) introduced a method utilizing Deep Reinforcement Learning (DRL) to address job scheduling in cloud computing. Their approach employs a Clipped Double Deep Q-learning algorithm to minimize computational costs while adhering to resource and deadline constraints. The algorithm employs target network and experience relay techniques to maximize its objective function. The algorithm balances exploration and exploitation by using the e-greedy policy. This policy establishes the approach for selecting actions by considering the trade-off between exploration and exploitation. The system chooses actions randomly for exploration or based on Q-values for exploitation, thus maintaining a balance between attempting new alternatives and utilizing existing ones. In the same way, Kruekaew et al. (Mao et al. ( 2014 ) employ Q-learning to optimize job scheduling and resource utilization. The suggested method, Multi-Objective ABCQ, integrates the Artificial Bee Colony Algorithm with Q-learning to optimize task scheduling, resource utilization, and load balancing in cloud environments. MOABCQ exhibited superior throughput and a higher Average Resource Utilization Ratio (ARUR) than alternative algorithms. Q-learning enhances the efficiency of the ABC algorithm. Figure  5 presents the hybridisation trend of various techniques observed in the literature review.

figure 5

Hybridization trend of some techniques as observed in SLR

Furthermore, the swarm-based technique known as Particle Swarm Optimisation (PSO) is increasingly being adopted by researchers to address challenges related to load balancing in cloud computing. Using PSO, combined with other prominent methods, leads to attaining an ideal solution through extensive investigation and exploration of the search space. Panwar et al. ( 2019 ) introduced a TOPSIS-PSO method designed for non-preemptive task scheduling in cloud systems. The approach tackles task scheduling challenges by employing the TOPSIS method to evaluate tasks according to execution time, transmission time, and cost. Subsequently, optimisation is performed using PSO. The proposed method optimises the Makespan, execution time, transmission time, and cost metrics. In 2020, Agarwal et al. ( 2020 ) introduced a Mutation-based particle swarm Optimization (PSO) algorithm to tackle issues such as premature convergence, decreased convergence speed, and being trapped in local optima. The suggested method seeks to minimise performance characteristics such as Makespan time and enhance the fitness function in cloud computing. In 2021, Negi et al. ( 2021 ) introduced a hybrid load-balancing algorithm in cloud computing called CMODLB. This technique combines machine learning and soft computing techniques. The method employs artificial neural networks, fuzzy logic, and clustering techniques to distribute the workload evenly. The system utilises Bayesian optimization-based augmented K-means for virtual machine clustering and the TOP-SIS-PSO method for work scheduling. VM migration decisions are determined with an interval type 2 fuzzy logic system that relies on load conditions. Although these algorithms demonstrated strong performance, they do not consider the specific type of content used by users. Adil et al. ( 2022 ) found that knowledge about the type of content in tasks can significantly enhance scheduling efficiency and reduce the workload on virtual machines (VMs). The PSO-CALBA system categorises user tasks into several content types, such as video, audio, image, and text, using a Support Vector Machine (SVM) classifier. The categorisation begins by selecting file fragments, which are tasks that consist of diverse file fragments of different content types. The initial classification stage involves utilising the Radial Basis Function (RBF) kernel approach to analyse high-dimensional data, which is a big challenge. Pradhan et al. ( 2022 ) provided a solution for the issue of handling complicated and high-dimensional data in a cloud setting. To address this challenge, they utilised deep reinforcement learning (DRL) and parallel particle swarm optimisation (PSO). The proposed technique synergistically integrates Particle Swarm Optimisation (PSO) and Deep Reinforcement Learning (DRL) to optimise rewards by minimising both makespan time and energy consumption while ensuring high accuracy and fast execution. The algorithm iteratively enhances accuracy, demonstrating superior performance in dynamic environments, and can handle various tasks in cloud environments. Jena et al. ( 2022 ) found that the QMPSO algorithm successfully distributes the workload evenly among virtual machines, resulting in improved makespan, throughput, energy utilisation, and reduced task waiting time. The performance of the hybridisation of modified Particle Swarm Optimisation (MPSO) and improved Q-learning in QMPSO is enhanced by modifying velocity based on the best action generated through Q-learning. The technique employs dynamic resource allocation to distribute tasks among virtual machines (VMs) with varying priorities. This approach aims to minimise task waiting time and maximise VM throughput. This strategy is highly efficient for independent tasks.

Load balancing poses a significant challenge in Fog computing due to limited resources. Talaat et al. ( 2022 ) introduced a method called Effective Dynamic Load Balancing (EDLB) that utilises Convolutional Neural Networks (CNN) and Multi-Objective Particle Swarm Optimisation (MPSO) to optimise resource allocation in fog computing environments to maximise resource utilisation. The EDLB system comprises three primary modules: the Fog Resource Monitor (FRM), the CNN-based Classifier (CBC), and the Optimised Dynamic Scheduler (ODS). The FRM system monitors the utilisation of server resources, while the CBC system classifies fog servers. Additionally, the ODS system allocates incoming tasks to the most appropriate server, reducing response time and enhancing resource utilisation. This strategy effectively decreases response time. Comparably, Nabi et al. ( 2022 ) presented an Adaptive Particle Swarm Optimisation (PSO)-Based Task Scheduling Approach for Cloud Computing, explicitly emphasising achieving load balance and optimisation. The solution incorporates a technique called Linearly Descending and Adaptive Inertia Weight (LDAIW) to improve the efficiency of job scheduling. The methodology employs a population-based scheduling system that draws inspiration from swarm intelligence. In this technique, particles represent solutions, and their updates are determined by factors such as inertia weight, personal best, and global best. The method can reduce task execution time, increase throughput, and better balance local and global search.

Table 4 gives an overview of the advantages and disadvantages of the state-of-the-art techniques. A comparative analysis of state-of-the-art methods on publicly benchmarked datasets is presented in Table  5 .

4.1 Some essential load balancing metrics:

It is evident that meticulous monitoring and analysis of metrics enhance resource utilization, minimize downtime, and ensure a seamless user experience, ultimately boosting overall system reliability and scalability. Several metrics employed for assessing the balance of loads in the cloud are illustrated in Fig.  6 .

Throughput:  In cloud load balancing, throughput refers to the rate at which a cloud infrastructure can process and serve data or requests. Specifically, it represents the amount of work accomplished within a given time frame, reflecting the efficiency of the system’s ability to handle concurrent user demands. High throughput ensures that data or requests can be processed quickly and reliably, minimising latency and optimising resource utilisation. Throughput (t p ) can be calculated by using the mathematical formula given in Eq. ( 1 ) below:

where n is the number of tasks, and ExT is the execution time of the j th task.

Makespan: Makespan denotes the overall duration needed to finish a specific set of tasks or jobs within a cloud computing environment. Minimum makespan represents the efficiency and performance of the system in handling and processing tasks. It can be calculated with the help of the following formula:

In equation ( 2 ), ExT j is the execution time of the j th virtual machine. A robust and efficient load balancing algorithm has minimum Makespan time.

Response time: Response time is when a user makes a request and when the cloud infrastructure delivers a response. Minimizing response time is crucial to providing a seamless user experience and ensuring optimal performance.

Reliability:  It indicates the system’s ability to effectively handle failures, prevent downtime, and maintain continuous service availability. To detect and mitigate failures promptly, ensure seamless failover mechanisms, and provide continuous and reliable service to users even in the event of disruptions or high load conditions.

Migration time:  Migration time refers to the duration required to transfer workloads or applications from one server or data center to another within the cloud infrastructure. It encompasses the process of migrating virtual machines, containers, or services to optimize resource allocation and handle changes in demand.

Bandwidth:  It represents the capacity or available channel for data communication. It also refers to the maximum data capacity that may be transferred across a network connection within a specific period. Adequate bandwidth is essential for efficient load balancing, as it ensures the smooth and timely flow of data between servers and clients.

Resource utilization:  It refers to the efficient allocation and management of computing resources within a cloud infrastructure to meet the demands of varying workloads. It involves optimizing the utilization of servers, storage, network bandwidth, and other resources to maximize performance and minimize waste. It can be measured with the help of a mathematical formula, as given in Eq. ( 3 ):

figure 6

Classification of load balancing algorithms

In equation ( 3 ), R es U is the resource utilization of the k th virtual machine (VM); CTjk is the completion time of the j th  job on the k th  VM.

Energy consumption:  It can be defined as the ability of a cloud infrastructure to optimize its power consumption while maintaining optimal performance. It reduces energy consumption by dynamically allocating computing resources and powering down underutilized servers during low-demand periods. By minimizing power usage, cloud load balancing systems contribute to reducing carbon footprints, operational costs, and environmental impact while ensuring sustainable and eco-friendly operations in cloud computing environments.

Fault tolerance: A system can continue functioning uninterrupted in the presence of failures or errors. It involves designing load-balancing algorithms and mechanisms that can withstand and recover from various faults, such as server failures, network outages, or traffic spikes (Tawfeeg et al. 2022 ).

4.2 Taxonomy of load balancing algorithms and challenges associated with them

Mishra and Majhi ( 2020 ) have categorized the load balancing algorithms into four broad classes: Traditional, Heuristic, Meta-heuristic, and Hybrid. The authors have also explained the subcategories of meta-heuristic and hybrid algorithms based on their nature. Tawfeeg et al. ( 2022 ) have discussed three main categories of load-balancing algorithms, namely static, dynamic, and hybrid. Tripathy et al. ( 2023 ) mentioned in their review that the load-balancing algorithm based on their environment is generally classified into three main classes: static, dynamic, and nature-inspired. In this systematic review paper, we have tried to include the maximum range of algorithms by covering all the categories and sub-categories. Figure  6 represents all categories of load-balancing algorithms (Table  6 ).

Traditional Algorithms: Traditional algorithms are mainly classified into preemptive and non-preemptive. Preemptive means to forcefully stop an ongoing execution to serve a higher-priority task. After the completion of the execution of a higher-priority job, the preempted job is resumed. The priority of the task can be internal or external. Traditional algorithms commonly employed for load balancing include Round Robin (RR), Weighted Round Robin, Least Connection, and Weighted Least Connection. Round Robin assigns requests cyclically to each server, ensuring an equal distribution. Weighted Round Robin provides scalability by considering server weights and allocating a proportionate number of requests to each server based on its capabilities and performance (Praditha, et al. 2023 ). The Least connection (LC) algorithm assigns requests to the server with the fewest active connections, promoting load distribution efficiency. The Weighted Least Connection (WLC) enhances the previous algorithm by considering server weights. It assigns requests to servers with the least active connections, scaling the distribution based on server capabilities. Preemptive scheduling algorithms include round-robin and priority-based. Non-preemptive algorithms include Shortest Job First (SJF) and First Come First Serve (FCFS).

Heuristic-based Algorithms: Heuristic algorithms are problem-solving techniques that rely on practical rules, intuition, and experience rather than precise mathematical models. These are used to find approximate solutions in a reasonable amount of time. The heuristic algorithms aim to distribute workload efficiently among cloud and fog nodes. Compared to hybrid and meta-heuristic algorithms, heuristic algorithms are relatively straightforward and have reduced computational complexity. They often provide reasonable solutions but lack guarantees of optimality. There are two types of heuristic techniques: static and dynamic. When a task’s estimated completion time is known, the static heuristic is used. When tasks arrive dynamically, a dynamic heuristic can be applied. Algorithms like Min-min, Max-min (Mao et al. 2014 ), RASA, Modified Heterogeneous Earliest Finish Time (HEFT) (Dubey et al. 2018 ), Improved Max-min (Hung et al. 2019 ) and DHSJF (Seth and Singh 2019 ) are the prominent examples of the heuristic category.

Meta-heuristic based algorithms: Meta-heuristic algorithms are good at finding a global solution without falling into local optima. A meta-heuristic algorithm is a problem-solving technique that guides the search process by iteratively refining potential solutions. It is used to find approximate solutions for complex optimization problems, especially in cloud computing, where traditional algorithms often struggle due to the inherent complexity and dynamic nature of the environment. A particular meta-heuristic algorithm that has proven effective in cloud computing is the Genetic Algorithm (GA) (Rekha and Dakshayini 2019 ). GA mimics the process of natural selection, evolving a population of solutions to find strong candidates. By employing genetic operators like selection, crossover, and mutation, GA explores the solution space intelligently, adapting to changing conditions and providing near-optimal solutions for resource allocation, task scheduling, and load balancing in cloud computing environments. Other examples from the reviewed literature are GWO (Reddy et al. 2022 ), ACO (Dhaya and Kanthavel 2022 ), TBSLB PSO (Ramezani et al. 2014 ), TOPSIS-PSO (Konjaang et al. 2018 ), and Modified BAT (Latchoumi and Parthiban 2022 ). When two meta-heuristic methods are combined the new method is a hybrid meta-heuristic. An example of a hybrid metaheuristic is Ant Colony Optimization with Particle Swarm (ACOPS) (Cho et al. 2015 ).

Hybrid based algorithms: The hybrid algorithms integrate the advantages of centralized and distributed load-balancing algorithms to achieve better performance and scalability. It leverages the centralized approach to monitor and collect real-time information about the system’s state, workload, and resource availability (Geetha et al. 2024 ). Simultaneously, it incorporates distributed load-balancing techniques to efficiently divide the workload among fog nodes. This hybrid approach enhances the overall load-balancing efficiency, reduces network congestion, and improves the system’s response time. By dynamically adapting to changing workload patterns and resource availability, the hybrid algorithm ensures optimal resource utilization and enhances user satisfaction. A hybrid method that combines the Genetic Algorithm (GA) and the Grey Wolf Optimization Algorithm (GWO) is proposed by Behera and Sobhanayak ( 2024 ). The hybrid GWO-GA algorithm minimizes cost, energy usage, and Makespan. Similarly, other examples from the literature review are GAYA (Mishra and Majhi 2023 ), VMMSID (Brahmam and Vijay Anand 2024 ), DTSO-TS (Ledmi et al. 2024 ), etc.

ML-Centric algorithms: These algorithms combine machine learning facilities with existing algorithms to automate the function. This is one of the latest approaches in the research area and has proven to be the best way to deal with real-time-based scenarios. To address the challenges of load balancing, researchers have been increasingly focusing on machine-learning-centric algorithms. ML-based algorithms offer promising results in load balancing by dynamically allocating tasks based on workload characteristics and resource availability. These algorithms leverage ML techniques such as reinforcement learning, deep learning, and clustering to intelligently predict and allocate the workload across cloud fog computing environments. ML-centric algorithms deliver improved performance, reduced response time, and enhanced resource utilization by continuously learning from historical data and adapting to changing conditions. Furthermore, these algorithms also consider energy consumption and network traffic factors, ensuring a holistic load-balancing approach (Muchori and Peter 2022 ). Examples of ML-centric algorithms from reviewed literature are DRL (Ran et al. 2019 ), MADRL-DRA (Jyoti and Shrimali 2020 ), TS-DT (Mahmoud et al. 2022 ), FF-NWRDLB (Prabhakara et al. 2023 ) etc.

Table 7 provides a comprehensive overview of recent load balancing and task scheduling algorithms, presenting information on the technology proposed, comparing technologies, research limitations, results, tools used, and potential future directions. Additionally, Table  8 outlines the evaluation metrics, advantages/disadvantages of the technologies reviewed, and objectives of the study.

5 Applications areas of load balancing in cloud and fog computing

There are various areas of applications where load balancing is very crucial. The healthcare sector is one area where efficient resource utilization and load balancing are highly desirable. According to Mahmoud et al. ( 2018 ), Fog computing integrated with IoT-based healthcare architecture improves latency, energy consumption, mobility, and Quality of Service, enabling efficient healthcare services regardless of location. Fog-enabled Cloud-of-Things (CoT) system models with energy-aware allocation strategies result in more energy-efficient operations, which are crucial for healthcare applications sensitive to delays and energy consumption. Yong et al. ( 2016 ) propose a dynamic load balancing approach using SDN technology in a cloud data center, enabling real-time monitoring of service node flow and load state, as well as global resource assignment for uneven system load distribution. Dogo et al. ( 2019 ) introduced a mist computing system for better connectivity and resource utilization of smart cities and industries. According to the authors, Mist computing enables smart cities to intelligently adapt to dynamic events and changes, enhancing urban operations. Mist computing is more suitable for realizing smart city solutions where streets adapt to different conditions, promoting energy conservation and efficient operations. Similarly, Sharif et al. ( 2023 ) presented a paper that discusses the rapid growth of IoT devices and applications, emphasizing the need for efficient task scheduling and resource allocation in edge computing for health surveillance systems. The proposed Priority-based Task Scheduling and Resource Allocation (PTS-RA) mechanism aims to manage emergency conditions efficiently, meeting latency-sensitive tasks’ requirements with reduced bandwidth cost. On the same track, Aqeel, et al. ( 2023 ) proposed a CHROA model that can be utilized for energy-efficient and intelligent load balancing in cloud-enabled IoT environments, particularly in healthcare, where real-time applications generate large volumes of data. Sah Tyagi et al. ( 2021 ) presented a neural network-based resource allocation model for an energy-efficient WSN-based smart Agri-IoT framework. The model improves dynamic clustering and optimizes cluster size. The approach combines the use of BPNN (Backpropagation Neural Network), APSO (Adaptive Particle Swarm Optimization), and BNN (Binary Neural Network) to accomplish the effective allocation of agricultural resources. This integration showcases notable progress in cooperative networking and overall optimization of resources . In the same manner, Dhaya and Kanthavel ( 2022 ) emphasize the importance of energy efficiency in agriculture, and the challenges in resource allocation, and introduce a novel algorithm ‘Naive Multi-Phase Resource Allocation Algorithm’ to enhance energy efficiency and optimize agricultural resources effectively in a dynamic environment. In this way, there are several application areas where load balancing and resource scheduling is crucial. In future, transportation, industry 4.0 and 5.0, IoT network systems, Smart cities, smart agriculture, and healthcare systems will be hotspots for research on load balancing. The following are the areas where resource allocation and utilization are critical, and where cloud service utilization is highest:

Telemedicine (Verma et al. 2024 )

Industry 4.0 and Industry 5.0 (Teoh et al. 2023 )

Healthcare system (Talaat et al. 2022 )

Agriculture (Agri-IoT) (Dhaya and Kanthavel 2022 ; Sah Tyagi et al. 2021 )

Real-time monitoring services (Yong et al. 2016 )

Smart cities (Alam 2021 )

Digital twining (Zhou et al. 2022 ; Adibi et al. 2024 )

Smart business and analytics (Nag et al. 2022 )

E-commerce (Sugan and Isaac Sajan 2024 )

6 Research queries and inferences

After a detailed literature review, the answers to the research questions have been inferred successfully without any bias or by adding views from researchers. Below, the inferences drawn are given in the form of answers:

We elucidate the answers to the research questions below to provide a thorough understanding based on the examination of existing material.

Q1. What load balancing and task scheduling techniques are commonly used in cloud computing environments?

This SLR divides the current techniques into five categories: traditional, heuristic, meta-heuristic, ML-centric, and hybrid. We employed the content analysis method to determine the category of each technique used in the literature study, as shown in Table  7 . From the literature review, it has been inferred that hybrid, meta-heuristic and ML-centric algorithms/techniques are researchers’ favourite choices for solving load-balancing issues in a cloud computing system. The percentage-wise utilization of various techniques is depicted in Fig.  7 . In the future, ML/DL-based load-balancing algorithms will be the hotspot for researchers as there is an emerging trend of hybridising ML-centric approaches with existing ones.

figure 7

Percentage-wise utilisation of various categories of load balancing algorithm from 2014 to 2024 based on SLR

7 What are the key factors influencing the performance of load-balancing mechanisms in cloud computing?

The performance of load balancing in the cloud is influenced by several aspects, including the availability of resources such as CPU, memory, storage, and network bandwidth, the nature of the workload, network latency, the load balancer algorithm, and the health of the server as well as fault detection and tolerance. The selection of the load balancing algorithm can significantly influence performance, as different algorithms vary in complexity and efficiency, affecting how resources are distributed. In cases of server overload or issues, the load balancer must be able to identify these problems and redirect traffic to other servers to maintain optimal performance.

Q3. Which evaluation matrices are predominantly utilized for assessing the efficacy of load-balancing techniques in cloud computing environments?

The utilization trend of various metrics over the period 2014–2024 is shown graphically in Fig.  8 . We have employed the frequency analysis method to determine the year-wise utilization of each performance metric. Table 8 provides an in-depth analysis of the performance metrics attained in every study. The year-wise categorization of each metric is shown in Table  9 . The metrics most frequently used to gauge load balancing in cloud computing environments are Makespan, resource utilization (RU), Degree of Imbalance (DI), cost efficiency, throughput, and execution time. Evaluation metrics like fault tolerance, QoS, reliability and migration rate require additional attention without compromising other factors. The row named ‘other’ in Table  9 includes parameters like convergence speed, network longevity, fitness function, packet loss ratio, success rate, task scheduling efficiency, scalability, clustering phase duration, standard deviation of load, accuracy, precision and time complexity.

figure 8

The analysis of performance metrics used in load balancing based on SLR

Q4. Which categories of algorithms have been used more in recent research trends in the cloud computing environment for solving load-balancing issues?

According to Fig.  9 , it is inferred that the researchers prefer using hybrid algorithms for addressing load balancing and task scheduling problems in cloud computing. This preference arises because hybrid algorithms combine the functionalities of various algorithms, resulting in a precise and multi-objective solution to task scheduling and load-balancing challenges. During the period 2014, a heuristic approach was commonly used, but meta-heuristic approaches later replaced it. By 2022, the hybrid approach had become the dominant method. Interestingly, many of these hybrid techniques incorporate machine learning techniques to combine with other optimization methods.

figure 9

Year-wise utilisation trend of various techniques used in load balancing

Q5. Which simulation software tools have garnered prominence in recent scholarly analyses within the domain of cloud computing research?

Figure  10 shows that 51% of the researchers use the CloudSim tool for simulation purposes, followed by Python with 11%. We have employed the frequency analysis method to quantify and compare the utilization of different simulation tools within each study. According to the literature review, the CloudSim simulation tool is the first choice of researchers, with 51% utilization and has been used more in the last few years. It allows users to model and simulate cloud computing infrastructure, resource provisioning policies, and application scheduling algorithms. The CloudSim simulation tool is an external framework that is available for download and can be imported into various programming software options like Eclipse, NetBeans IDE, Maven, etc. To simulate the cloud computing environment, the CloudSim toolkit has been explicitly integrated with NetBeans IDE 8.2 and Windows 10 as operating systems (Vergara et al. 2023 ).

figure 10

Analysis of simulation tools based on SLR

Q6. What insights do the future perspectives within the reviewed literature offer in terms of potential avenues for exploration and advancement within the field?

According to this article, the future directions of this field focus on developing more advanced algorithms that harness the potential of machine learning and deep learning, enabling enhanced energy efficiency and overall system performance in cloud computing environments. Real-time monitoring and automation of systems using the AI approach are also hot topics to explore in future research. The future scopes recorded during the literature review are shown in Table  7 .

All the responses in this study are deduced and documented based on the above literature review. It is important to note that these responses are impartial and not generated by the researchers.

8 Statistical analysis

The SLR attempts a bibliographic analysis to understand the development and present condition of research in various domains and investigates the dissemination of scholarly materials, which can unveil both dominant patterns and possible deficiencies within the academic body of work. We used the Scopus academic database to collect important information based on the keywords “load balancing and task scheduling in cloud computing using machine learning”. A total of 129 items were found. This analysis centres on this dataset of 129 items, illustrating the distribution of documents published in many critical subject areas. It offers valuable insights into the current priorities and interests of the academic community.

These publications are distributed across various subjects, providing insights into the interdisciplinary nature of this field, as shown in Fig.  11 .

figure 11

Subject-wise analysis of publications from 2014 to 2024 related to used keywords

9 Discussion

Our extensive literature study has discovered valuable insights and emerging trends crucial for advancing cloud computing technology. This discussion summarizes the research findings, answering the initial research questions and making conclusions based on a thorough examination of chosen studies conducted between 2014 and 2024.

9.1 Research gaps

Most research efforts are concentrated on a specific aspect of load balancing. Many systems are limited to either data center or network load balancing. There is an urgent necessity to address multiple aspects.

Load balancing is a single point-of-failure issue. Furthermore, most of the research concentrates solely on a limited number of performance parameters, such as Makespan, throughput, completion time, etc. The degree of Imbalance (DI) is a crucial parameter to work on.

There is a significant need to enhance quality measures such as QoS (Quality of Service), fault tolerance, network delay, VM (Virtual Machine) migration and risk assessment.

The integration of fog and edge computing to mitigate the requirement for massive amounts of data transfer. This will improve the flexibility and usefulness of cloud computing in multiple sectors.

Finally, the power conservation mechanism has not been given much thought by the researchers. There is a shortage of innovative thinking in power conservation when it comes to load balancing.

Geographical barriers impose network delay and data transmission delay issues. We need to focus on the development of cutting-edge technologies to overcome distance-related and delay-related issues (Muchori and Peter 2022 ).

Virtual Machine Migrations (VMM) is also a challenge that highly impacts the efficacy of cloud services. There is a dire need for design technologies that allow fewer VM migrations.

Despite the advancements, applying machine learning algorithms in cloud computing is complicated. The intricacy of these algorithms, combined with the requirement for extensive training data, presents substantial obstacles. The dynamic nature of cloud environments requires constant learning and adjustment of these models, which raises questions about their ability to handle large-scale operations and maintain long-term viability.

9.2 Integration of machine learning for enhanced load balancing and task scheduling

One key insight from this analysis is a growing reliance on machine learning methods to enhance load balancing and task scheduling processes. Although somewhat successful, conventional algorithms generally struggle in dynamic cloud systems where data and workload patterns continuously change. Due to their capacity to acquire knowledge and adjust accordingly, machine learning algorithms have demonstrated potential in forecasting workload patterns, enabling the implementation of more effective resource allocation strategies. This enhances efficiency and substantially decreases execution time and energy consumption, aligning with the objectives of achieving optimal resource utilisation and high system throughput (Janakiraman and Priya 2023 ; Edward Gerald et al. 2023 ).

9.3 Future directions

The future of cloud computing rests on advancing auto-adaptive systems capable of independently handling load balancing and task scheduling without human involvement. Fusing artificial intelligence (AI) and cloud computing can create systems that provide unparalleled efficiency and reliability. Creating efficient cloud services could be significantly improved by developing lightweight machine learning models that require minimum training data and can quickly adapt to changing conditions. Moreover, investigating unsupervised learning algorithms can potentially eliminate the requirement for large, labelled data, enhancing the application’s practicality. These are some of the most frequently observed future scopes based on this SLR:

Deployment of deep learning (DL) and machine learning (ML) techniques to predict load patterns: The predictive analysis of workload patterns can prevent resource underutilization or overloading. We can also use ML to reduce energy consumption and predict faults in cloud computing (Reddy et al. 2022 ; Mishra and Majhi 2023 ; Agarwal et al. 2020 ; Negi et al. 2021 ; Latchoumi and Parthiban 2022 ; Shuaib, et al. 2023 ).

Development of fault tolerance techniques integrated with load balancing: Only a small number of research studies examine security concerns on cloud computing services, like load balancing and fault tolerance, without elaborating on the connection between the two (Behera and Sobhanayak 2024 ; Tawfeeg et al. 2022 ; Brahmam and Vijay Anand 2024 ).

To extend the existing techniques for data security and privacy by incorporating blockchain technology with cloud computing (Edward Gerald et al. 2023 ; Saba et al. 2023 ; Li et al. 2020 ).

To achieve more QoS metrics such as scalability, elasticity, and applicability to cover extensive domains, is also scoped to extend research work (Adil et al. 2022 ; Talaat et al. 2022 ; Sultana et al. 2024 ).

Most of the researchers have focused on the energy consumption aspect. Future research should aim to achieve energy efficiency as energy is going to be one of the scantiest resources in future (Rekha and Dakshayini 2019 ; Farrag et al. 2020 ; Panwar et al. 2019 ; Mahmoud et al. 2022 ; Asghari and Sohrabi 2021 ).

To achieve cost-effectiveness and real-time load balancing are prominent research areas. Most of the researchers have plans to extend their work to real-time analytics and dynamic cloud networks (Kumar and Sharma 2018 ; Ni et al. 2021 ).

Response delays in real-time applications are crucial. Real-time analytics in a complex and dynamic environment is a hotspot for researchers. Healthcare systems, telemedicine domains and real-time monitoring or surveillance services are examples of delay-sensitive applications (Verma et al. 2024 ; Pradhan et al. 2022 ; Nabi et al. 2022 ; Shahakar et al. 2023 ).

Dynamic reallocation of dependent tasks is another scope for future research. Task priority-based scheduling optimize the cloud performance (Ran et al. 2019 ; Jena et al. 2022 ; Prabhakara et al. 2023 ).

Fog and edge computing Architectures have limited resources, and optimal resource scheduling is essential. Many authors have also discussed resource scheduling in fog and edge computing as a potential future area of study (Swarup et al. 2021 ; Kruekaew and Kimpan 2022 ).

This SLR records the future research scopes mentioned above, and Table  7 provides detailed information.

10 Conclusion

The study of the computational cloud is vast and comes with numerous challenges. It allows end users to access computational processes, leading to many individuals’ widespread use of cloud services. This widespread adoption has made cloud computing an essential part of various businesses, notably online shopping sites. This increased usage has put more strain on cloud resources like hardware, software, and network devices. Consequently, we need load-balancing solutions for efficient utilization of these resources. This SLR categorizes technologies into five classes: conventional/traditional, heuristic, meta-heuristic, ML-Centric, and Hybrid. Traditional approaches are time-consuming, slow, and often stuck in local optima. Traditional algorithms struggle to scale with problem size and complexity, leading to slow processing and time-consuming behavior. Heuristic algorithms, which demonstrate remarkable scalability, are suitable for large-scale optimization challenges in industries like manufacturing, banking, and logistics. Heuristic algorithms often produce approximate answers rather than perfect ones; consequently, meta-heuristic algorithms emerged to address these drawbacks. In recent years, hybrid strategies, which combine heuristic, conventional, and machine-learning approaches, have become increasingly popular. These approaches aim to utilize the advantages of several algorithms to overcome limitations and improve performance. This systematic literature review conducted on efficient load balancing and task scheduling in a cloud computing environment has provided valuable insights into different algorithms, research limitations, evaluation metrics, challenges, simulation tools, and potential future directions. The analysis has demonstrated that the current trend in the cloud computing environment involves the utilization of ML-centric and hybrid algorithms to address load balancing and job/task scheduling issues effectively. Furthermore, the findings indicate a growing interest among researchers in ML-centric techniques, showcasing a shift towards incorporating ML/DL approaches. Our study explained the fundamental structure of cloud computing and its operational principles. A comprehensive examination of evaluation metrics and simulation tools is conducted impartially. Lastly, we addressed the research questions that formed the basis of this literature review, providing well-supported answers derived from the information gathered. This systematic review is a foundational resource for future scopes in this domain. It offers valuable information to researchers and practitioners involved in the domain of load balancing in cloud computing architecture. Additionally, this SLR does not delve into specific aspects concerning security and privacy considerations or issues related to load balancing. This will be retained as a topic for future investigation on our part. Table 10 provides abbreviations for several terms.

Data availability

No datasets were generated or analysed during the current study.

Adibi S, Rajabifard A, Shojaei D, Wickramasinghe N (2024) Enhancing healthcare through sensor-enabled digital twins in smart environments: a comprehensive analysis. Sensors. https://doi.org/10.3390/s24092793

Article   Google Scholar  

Adil M, Nabi S, Raza S (2022) PSO-CALBA: Particle swarm optimization based content-aware load balancing algorithm in cloud computing environment. Comput Inform 41(5):1157–1185. https://doi.org/10.31577/cai_2022_5_1157

Adil M, Nabi S, Aleem M, Diaz VG, Lin JC-W (2023) CA-MLBS: content-aware machine learning based load balancing scheduler in the cloud environment. Expert Syst. https://doi.org/10.1111/exsy.13150

Agarwal R, Baghel N, Khan MA (2020) Load balancing in cloud computing using mutation based particle swarm optimization. In: presented at the 2020 International Conference on Contemporary Computing and Applications, IC3A 2020, pp 191–195 https://doi.org/10.1109/IC3A48958.2020.233295

Alahmad Y, Agarwal A (2024) Multiple objectives dynamic VM placement for application service availability in cloud networks. J Cloud Comput. https://doi.org/10.1186/s13677-024-00610-2

Alam T (2021) Cloud-based iot applications and their roles in smart cities. Smart Cities 4(3):1196–1219. https://doi.org/10.3390/smartcities4030064

Alatoun K, Matrouk K, Mohammed MA, Nedoma J, Martinek R, Zmij P (2022) A novel low-latency and energy-efficient task scheduling framework for internet of medical things in an edge fog cloud system. Sensors. https://doi.org/10.3390/s22145327

Apat HK, Nayak R, Sahoo B (2023) A comprehensive review on internet of things application placement in Fog computing environment. InteRnet Things Neth. https://doi.org/10.1016/j.iot.2023.100866

Aqeel I et al (2023) Load balancing using artificial intelligence for cloud-enabled internet of everything in healthcare domain. Sensors. https://doi.org/10.3390/s23115349

Asghari A, Sohrabi MK (2021) Combined use of coral reefs optimization and reinforcement learning for improving resource utilization and load balancing in cloud environments. Computing 103(7):1545–1567. https://doi.org/10.1007/s00607-021-00920-2

Behera I, Sobhanayak S (2024) Task scheduling optimization in heterogeneous cloud computing environments: a hybrid GA-GWO approach. J Parallel Distrib Comput. https://doi.org/10.1016/j.jpdc.2023.104766

Biswas D, Dutta A, Ghosh S, Roy P (2024) future trends and significant solutions for intelligent computing resource management, pp 187–208 https://doi.org/10.4018/979-8-3693-1552-1.ch010

Brahmam MG, Vijay Anand R (2024) VMMISD: an efficient load balancing model for virtual machine migrations via fused metaheuristics with iterative security measures and deep learning optimizations. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3373465

Buyya R et al (2018) A manifesto for future generation cloud computing: research directions for the next decade. ACM Comput Surv. https://doi.org/10.1145/3241737

Cho K-M, Tsai P-W, Tsai C-W, Yang C-S (2015) A hybrid meta-heuristic algorithm for VM scheduling with load balancing in cloud computing. Neural Comput Appl 26(6):1297–1309. https://doi.org/10.1007/s00521-014-1804-9

Dhaya R, Kanthavel R (2022) Energy efficient resource allocation algorithm for agriculture IoT. Wirel Pers Commun 125(2):1361–1383. https://doi.org/10.1007/s11277-022-09607-z

Dogo EM, Salami AF, Aigbavboa CO, Nkonyana T (2019) Taking cloud computing to the extreme edge: a review of mist computing for smart cities and industry 4.0 in Africa. In: EAI/Springer Innovations in Communication and Computing, pp 107–132 https://doi.org/10.1007/978-3-319-99061-3_7

Dubey K, Kumar M, Sharma SC (2018) Modified HEFT algorithm for task scheduling in cloud environment. In: presented at the Procedia Computer Science, pp 725–732 https://doi.org/10.1016/j.procs.2017.12.093

Edward Gerald B, Geetha P, Ramaraj E (2023) A fruitfly-based optimal resource sharing and load balancing for the better cloud services. Soft Comput 27(10):6507–6520. https://doi.org/10.1007/s00500-023-07873-y

Farrag AAS, Mohamad SA, El-Horbaty ESM (2020) Swarm optimization for solving load balancing in cloud computing. In: presented at the advances in intelligent systems and computing, pp 102–113 https://doi.org/10.1007/978-3-030-14118-9_11

Geetha P, Vivekanandan SJ, Yogitha R, Jeyalakshmi MS (2024) Optimal load balancing in cloud: Introduction to hybrid optimization algorithm. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2023.121450

Goel G, Tiwari R (2023) Resource scheduling techniques for optimal quality of service in fog computing environment: a review. Wirel Pers Commun 131(1):141–164. https://doi.org/10.1007/s11277-023-10421-4

Hashem W, Nashaat H, Rizk R (2017) Honey bee based load balancing in cloud computing. KSII Trans Internet Inf Syst 11(12):5694–5711. https://doi.org/10.3837/tiis.2017.12.001

Hung TC, Hy PT, Hieu LN, Phi NX (2019) MMSIA: improved max-min scheduling algorithm for load balancing on cloud computing. In: presented at the ACM International Conference Proceeding Series, pp 60–64 https://doi.org/10.1145/3310986.3311017

Huo L, Shao P, Ying F, Luo L (2019) The research on task scheduling algorithm for the cloud management platform of mimic common operating environment. In: presented at the Proceedings - 2019 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science, DCABES 2019, pp 167–171 https://doi.org/10.1109/DCABES48411.2019.00049

Jalalian Z, Sharifi M (2022) A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput 78(2):2307–2336. https://doi.org/10.1007/s11227-021-03960-9

Janakiraman S, Priya MD (2023) Hybrid grey wolf and improved particle swarm optimization with adaptive intertial weight-based multi-dimensional learning strategy for load balancing in cloud environments. Sustain Comput Inform Syst. https://doi.org/10.1016/j.suscom.2023.100875

Jena UK, Das PK, Kabat MR (2022) Hybridization of meta-heuristic algorithm for load balancing in cloud computing environment. J King Saud Univ Comput Inf Sci 34(6):2332–2342. https://doi.org/10.1016/j.jksuci.2020.01.012

Joshi S, Panday N, Mishra A (2024) Reinforcement learning based auto scaling strategy used in cloud environment: State of Art, p 736 https://doi.org/10.1109/CSNT60213.2024.10545922

Jyoti A, Shrimali M (2020) Dynamic provisioning of resources based on load balancing and service broker policy in cloud computing. Clust Comput 23(1):377–395. https://doi.org/10.1007/s10586-019-02928-y

Khodar A, Chernenkaya LV, Alkhayat I, Fadhil Al-Afare HA, Desyatirikova EN (2020) Design model to improve task scheduling in cloud computing based on particle swarm optimization. In: presented at the Proceedings of the 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, EIConRus 2020, pp 345–350 https://doi.org/10.1109/EIConRus49466.2020.9039501

Kiruthiga G, Maryvennila S (2020) Robust resource scheduling with optimized load balancing using grasshopper behavior empowered intuitionistic fuzzy clustering in cloud paradigm. Int J Comput Netw Appl 7(5):137–145. https://doi.org/10.22247/ijcna/2020/203851

Konjaang JK, Ayob FH, Muhammed A (2018) Cost effective Expa-Max-Min scientific workflow allocation and load balancing strategy in cloud computing. J Comput Sci 14(5):623–638. https://doi.org/10.3844/jcssp.2018.623.638

Kruekaew B, Kimpan W (2022) Multi-objective task scheduling optimization for load balancing in cloud computing environment using hybrid artificial bee colony algorithm with reinforcement learning. IEEE Access 10:17803–17818. https://doi.org/10.1109/ACCESS.2022.3149955

Kumar M, Dubey K, Sharma SC (2018) Elastic and flexible deadline constraint load Balancing algorithm for Cloud Computing. In: presented at the Procedia Computer Science, pp 717–724 https://doi.org/10.1016/j.procs.2017.12.092

Kumar M, Sharma SC (2018) Deadline constrained based dynamic load balancing algorithm with elasticity in cloud environment. Comput Electr Eng 69:395–411. https://doi.org/10.1016/j.compeleceng.2017.11.018

Latchoumi TP, Parthiban L (2022) Quasi oppositional dragonfly algorithm for load balancing in cloud computing environment. Wirel Pers Commun 122(3):2639–2656. https://doi.org/10.1007/s11277-021-09022-w

Ledmi A, Ledmi M, Souidi MEH, Haouassi H, Bardou D (2024) Optimizing task scheduling in cloud computing using discrete tuna swarm optimization. Ing Syst Inf 29(1):323–335. https://doi.org/10.18280/isi.290132

Li X, Qin Y, Zhou H, Chen D, Yang S, Zhang Z (2020) An intelligent adaptive algorithm for servers balancing and tasks scheduling over mobile fog computing networks. Wirel Commun Mob Comput. https://doi.org/10.1155/2020/8863865

Liu X, Qiu T, Wang T (2019) Load-balanced data dissemination for wireless sensor networks: a nature-inspired approach. IEEE Internet Things J 6(6):9256–9265. https://doi.org/10.1109/JIOT.2019.2900763

Mahmoud MME, Rodrigues JJPC, Saleem K, Al-Muhtadi J, Kumar N, Korotaev V (2018) Towards energy-aware fog-enabled cloud of things for healthcare. Comput Electr Eng 67:58–69. https://doi.org/10.1016/j.compeleceng.2018.02.047

Mahmoud H, Thabet M, Khafagy MH, Omara FA (2022) Multiobjective task scheduling in cloud environment using decision tree algorithm. IEEE Access 10:36140–36151. https://doi.org/10.1109/ACCESS.2022.3163273

Mao Y, Chen X, Li X (2014) Max–min task scheduling algorithm for load balance in cloud computing. Adv Intell Syst Comput 255:457–465. https://doi.org/10.1007/978-81-322-1759-6_53

Mishra K, Majhi SK (2020) A state-of-art on cloud load balancing algorithms. Int J Comput Digit Syst 9(2):201–220. https://doi.org/10.12785/IJCDS/090206

Mishra K, Majhi SK (2023) A novel improved hybrid optimization algorithm for efficient dynamic medical data scheduling in cloud-based systems for biomedical applications. Multimed Tools Appl 82(18):27087–27121. https://doi.org/10.1007/s11042-023-14448-4

Mousavi S, Mosavi A, Varkonyi-Koczy AR (2018) A load balancing algorithm for resource allocation in cloud computing. In: presented at the advances in intelligent systems and computing, pp 289–296 https://doi.org/10.1007/978-3-319-67459-9_36

Muchori J, Peter M (2022) Machine learning load balancing techniques in cloud computing: a review. Int J Comput Appl Technol Res 11:179–186. https://doi.org/10.7753/IJCATR1106.1002

Nabi S, Ahmad M, Ibrahim M, Hamam H (2022) AdPSO: adaptive PSO-based task scheduling approach for cloud computing. Sensors. https://doi.org/10.3390/s22030920

Nag A, Sen M, Saha J (2022) Integration of predictive analytics and cloud computing for mental health prediction. In: Predictive Analytics in Cloud, Fog, and Edge Computing: Perspectives and Practices of Blockchain, IoT, and 5G, pp 133–160 https://doi.org/10.1007/978-3-031-18034-7_8

Neelakantan P, Yadav NS (2023) An optimized load balancing strategy for an enhancement of cloud computing environment. Wirel Pers Commun 131(3):1745–1765. https://doi.org/10.1007/s11277-023-10520-2

Negi S, Rauthan MMS, Vaisla KS, Panwar N (2021) CMODLB: an efficient load balancing approach in cloud computing environment. J Supercomput 77(8):8787–8839. https://doi.org/10.1007/s11227-020-03601-7

Ni L, Sun X, Li X, Zhang J (2021) GCWOAS2: multiobjective task scheduling strategy based on gaussian cloud-whale optimization in cloud computing. Comput Intell Neurosci. https://doi.org/10.1155/2021/5546758

Oduwole O, Akinboro S, Lala O, Fayemiwo M, Olabiyisi S (2022) Cloud computing load balancing techniques: retrospect and recommendations. FUOYE J Eng Technol 7:17–22. https://doi.org/10.46792/fuoyejet.v7i1.753

Pabitha P, Nivitha K, Gunavathi C, Panjavarnam B (2024) A chameleon and remora search optimization algorithm for handling task scheduling uncertainty problem in cloud computing. Sustain Comput Inform Syst. https://doi.org/10.1016/j.suscom.2023.100944

Pang S, Zhang W, Ma T, Gao Q (2017) Ant colony optimization algorithm to dynamic energy management in cloud data center. Math Probl Eng. https://doi.org/10.1155/2017/4810514

Panwar N, Negi S, Rauthan MMS, Vaisla KS (2019) TOPSIS–PSO inspired non-preemptive tasks scheduling algorithm in cloud environment. Clust Comput 22(4):1379–1396. https://doi.org/10.1007/s10586-019-02915-3

Prabhakara BK, Naikodi C, Suresh L (2023) Ford fulkerson and Newey West regression based dynamic load balancing in cloud computing for data communication. Int J Comput Netw Inf Secur 15(5):81–95. https://doi.org/10.5815/IJCNIS.2023.05.08

Pradhan A, Bisoy SK, Kautish S, Jasser MB, Mohamed AW (2022) Intelligent decision-making of load balancing using deep reinforcement learning and parallel PSO in cloud environment. IEEE Access 10:76939–76952. https://doi.org/10.1109/ACCESS.2022.3192628

Praditha VS et al (2023) A Systematical review on round robin as task scheduling algorithms in cloud computing. In: presented at the 2023 6th International Conference on Information and Communications Technology, ICOIACT 2023, pp 516–521 https://doi.org/10.1109/ICOIACT59844.2023.10455832

Prashanth SK, Raman D, (2021) Optimized dynamic load balancing in cloud environment using B+ Tree. In: presented at the Advances in Intelligent Systems and Computing, pp 391–401 https://doi.org/10.1007/978-981-33-4859-2_39

Ramezani F, Lu J, Hussain FK (2014) Task-based system load balancing in cloud computing using particle swarm optimization. Int J Parallel Prog 42(5):739–754. https://doi.org/10.1007/s10766-013-0275-4

Ran L, Shi X, Shang M (2019) SLAs-aware online task scheduling based on deep reinforcement learning method in cloud environment. In: presented at the Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, pp 1518–1525 https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00209

Reddy KL, Lathigara A, Aluvalu R, Viswanadhula UM (2022) PGWO-AVS-RDA: An intelligent optimization and clustering based load balancing model in cloud. Concurr Comput Pract Exp. https://doi.org/10.1002/cpe.7136

Rekha PM, Dakshayini M (2019) Efficient task allocation approach using genetic algorithm for cloud environment. Clust Comput 22(4):1241–1251. https://doi.org/10.1007/s10586-019-02909-1

Rostami S, Broumandnia A, Khademzadeh A (2024) An energy-efficient task scheduling method for heterogeneous cloud computing systems using capuchin search and inverted ant colony optimization algorithm. J Supercomput 80(6):7812–7848. https://doi.org/10.1007/s11227-023-05725-y

Saba T, Rehman A, Haseeb K, Alam T, Jeon G (2023) Cloud-edge load balancing distributed protocol for IoE services using swarm intelligence. Clust Comput 26(5):2921–2931. https://doi.org/10.1007/s10586-022-03916-5

Sabireen H, Neelanarayanan V (2021) A Review on Fog computing: architecture, Fog with IoT, algorithms and research challenges. ICT Express 7(2):162–176. https://doi.org/10.1016/j.icte.2021.05.004

Sah Tyagi SK, Mukherjee A, Pokhrel SR, Hiran KK (2021) An intelligent and optimal resource allocation approach in sensor networks for smart Agri-IoT. IEEE Sens J 21(16):17439–17446. https://doi.org/10.1109/JSEN.2020.3020889

Santhanakrishnan M, Valarmathi K (2022) Load balancing techniques in cloud environment - a big picture analysis, p 310 https://doi.org/10.1109/ICCST55948.2022.10040387

Seth S, Singh N (2019) Dynamic heterogeneous shortest job first (DHSJF): a task scheduling approach for heterogeneous cloud computing systems. Int J Inf Technol Singap 11(4):653–657. https://doi.org/10.1007/s41870-018-0156-6

Shafiq DA, Jhanjhi N, Abdullah A (2019) Proposing a load balancing algorithm for the optimization of cloud computing applications. In: presented at the MACS 2019 - 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics, Proceedings https://doi.org/10.1109/MACS48846.2019.9024785

Shahakar M, Mahajan S, Patil L (2023) Load balancing in distributed cloud computing: a reinforcement learning algorithms in heterogeneous environment. Int J Recent Innov Trends Comput Commun 11(2):65–74. https://doi.org/10.17762/ijritcc.v11i2.6130

Shakkeera L, Tamilselvan L (2016) QoS and load balancing aware task scheduling framework for mobile cloud computing environment. Int J Wirel Mob Comput 10(4):309–316. https://doi.org/10.1504/IJWMC.2016.078201

Sharif Z, Tang Jung L, Ayaz M, Yahya M, Pitafi S (2023) Priority-based task scheduling and resource allocation in edge computing for health monitoring system. J King Saud Univ Comput Inf Sci 35(2):544–559. https://doi.org/10.1016/j.jksuci.2023.01.001

Shetty S, Shetty S (2019) Analysis of load balancing in cloud data centers. J Ambient Intell Humaniz Comput 15:1–9. https://doi.org/10.1007/s12652-018-1106-7

Shuaib M et al (2023) An optimized, dynamic, and efficient load-balancing framework for resource management in the internet of things (IoT) environment. Electron SwiTz. https://doi.org/10.3390/electronics12051104

Souri A, Norouzi M, Alsenani Y (2024) A new cloud-based cyber-attack detection architecture for hyper-automation process in industrial internet of things. Clust Comput 27(3):3639–3655. https://doi.org/10.1007/s10586-023-04163-y

Sugan J, Isaac Sajan R (2024) PredictOptiCloud: A hybrid framework for predictive optimization in hybrid workload cloud task scheduling. Simul Model Pract Theory. https://doi.org/10.1016/j.simpat.2024.102946

Sultana Z, Gulmeher R, Sarwath A (2024) Methods for optimizing the assignment of cloud computing resources and the scheduling of related tasks. Indones J Electr Eng Comput Sci 33(2):1092–1099. https://doi.org/10.11591/ijeecs.v33.i2.pp1092-1099

Swarna Priya RM et al (2020) Load balancing of energy cloud using wind driven and firefly algorithms in internet of everything. J Parallel Distrib Comput 142:16–26. https://doi.org/10.1016/j.jpdc.2020.02.010

Swarup S, Shakshuki EM, Yasar A (2021) Task scheduling in cloud using deep reinforcement learning. In: presented at the Procedia Computer Science, pp 42–51 https://doi.org/10.1016/j.procs.2021.03.016

Talaat FM, Ali HA, Saraya MS, Saleh AI (2022) Effective scheduling algorithm for load balancing in fog environment using CNN and MPSO. Knowl Inf Syst 64(3):773–797. https://doi.org/10.1007/s10115-021-01649-2

Tawfeeg TM et al (2022) Cloud dynamic load balancing and reactive fault tolerance techniques: a systematic literature review (SLR). IEEE Access 10:71853–71873. https://doi.org/10.1109/ACCESS.2022.3188645

Teoh YK, Gill SS, Parlikad AK (2023) IoT and Fog-computing-based predictive maintenance model for effective asset management in industry 4.0 using machine learning. IEEE Internet Things J 10(3):2087–2094. https://doi.org/10.1109/JIOT.2021.3050441

Tong Z, Deng X, Chen H, Mei J (2021) DDMTS: a novel dynamic load balancing scheduling scheme under SLA constraints in cloud computing. J Parallel Distrib Comput 149:138–148. https://doi.org/10.1016/j.jpdc.2020.11.007

Tripathy SS et al (2023) State-of-the-art load balancing algorithms for mist-fog-cloud assisted paradigm: a review and future directions. Arch Comput Methods Eng 30(4):2725–2760. https://doi.org/10.1007/s11831-023-09885-1

Ullah A, Chakir A (2022) Improvement for tasks allocation system in VM for cloud datacenter using modified bat algorithm. Multimed Tools Appl 81(20):29443–29457. https://doi.org/10.1007/s11042-022-12904-1

Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kołodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71. https://doi.org/10.1016/j.future.2014.11.019

Velpula P, Pamula R, Jain PK, Shaik A (2022) Heterogeneous load balancing using predictive load summarization. Wirel Pers Commun 125(2):1075–1093. https://doi.org/10.1007/s11277-022-09589-y

Vergara J, Botero J, Fletscher L (2023) A comprehensive survey on resource allocation strategies in fog/cloud environments. Sensors. https://doi.org/10.3390/s23094413

Verma R, Singh PD, Singh KD, Maurya S (2024) Dynamic load balancing in telemedicine using genetic algorithms and fog computing. In: presented at the AIP Conference Proceedings https://doi.org/10.1063/5.0223933

Walia R, Kansal L, Singh M, Kumar KS, Mastan Shareef RM, Talwar S (2023) Optimization of load balancing algorithm in cloud computing. In: presented at the 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2023, pp 2802–2806 https://doi.org/10.1109/ICACITE57410.2023.10182878

Yong W, Xiaoling T, Qian H, Yuwen K (2016) A dynamic load balancing method of cloud-center based on SDN. China Commun 13(2):130–137. https://doi.org/10.1109/CC.2016.7405731

Zhan ZH, Zhang GY, Gong YJ, Zhang J (2014) Load balance aware genetic algorithm for task scheduling in cloud computing. Lect. Notes Comput. Sci. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinforma. 8886: pp. 644–655 https://doi.org/10.1007/978-3-319-13563-2_54 .

Zhou X et al (2022) Intelligent small object detection for digital twin in smart manufacturing with industrial cyber-physical systems. IEEE Trans Ind Inform 18(2):1377–1386. https://doi.org/10.1109/TII.2021.3061419

Download references

Acknowledgements

We would like to express our gratitude and appreciation to Rabdan Academy Abu Dhabi UAE for their generous support and funding that made this research possible. Their contribution has been invaluable in enabling us to carry out this work to a high standard.

There is no funding associated with this work.

Author information

Authors and affiliations.

Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India

Nisha Devi & Sandeep Dalal

Department of CSE, UIET, Maharshi Dayanand University, Rohtak, Haryana, India

Kamna Solanki

Department of Computer Science and Engineering, Amity University Haryana, Gurugram, India

Surjeet Dalal

Department of Computer Science and Engineering, Galgotias University, Greater Noida, UP, India

Umesh Kumar Lilhore & Sarita Simaiya

Department of Spectrum Management, Afghanistan Telecommunication Regulatory Authority, Kabul, 2496300, Afghanistan

Nasratullah Nuristani

You can also search for this author in PubMed   Google Scholar

Contributions

SD & UKL: Design and methods, KS & SSD: Conclusion and review of the first draft, SS & NN: Introduction and background, SSD & KS: Results and analysis, NM& NN: Discussion and review of the final draft, NN & SD: Conceptualization and corresponding authors.. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Surjeet Dalal or Nasratullah Nuristani .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Devi, N., Dalal, S., Solanki, K. et al. A systematic literature review for load balancing and task scheduling techniques in cloud computing. Artif Intell Rev 57 , 276 (2024). https://doi.org/10.1007/s10462-024-10925-w

Download citation

Accepted : 22 August 2024

Published : 05 September 2024

DOI : https://doi.org/10.1007/s10462-024-10925-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing
  • Task scheduling
  • Load balancing
  • Machine learning
  • Optimization techniques
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Types of face detection algorithms.

    literature review on face detection algorithms

  2. (PDF) A Review of Facial Feature Detection Algorithms

    literature review on face detection algorithms

  3. (PDF) Face Detection and Counting Algorithms Evaluation using OpenCV

    literature review on face detection algorithms

  4. (PDF) Review and comparison of face detection algorithms

    literature review on face detection algorithms

  5. (PDF) Face Recognition: A Literature Review

    literature review on face detection algorithms

  6. Face detection analysis.

    literature review on face detection algorithms

VIDEO

  1. Face Detection Comparison

  2. Unveiling the Secrets How to Detect Deception in Interrogations

  3. Bio Inspired Feature Selection Algorithms With Their Applications A Systematic Literature Review

  4. Session 7- Face Recognition Algorithms and implementation with Python

  5. Face Detection and Tracking using MTCNN and SORT

  6. Leveraging Artificial Intelligence for Literature Review

COMMENTS

  1. (PDF) Face Recognition: A Literature Review

    Face Recognition: A Literature Review. January 2005; 2(2):88-103; ... Description and limitations of face databases which are used to test the performance of these face recognition algorithms are ...

  2. Facial-recognition algorithms: A literature review

    Face recognition: a literature review. Int J Inf Commun Eng 2006; 2: 88-103. Google Scholar. 8. Thorat SB, Head SKN, Jyoti M, et al. Facial recognition technology: an analysis with scope in India. Int J Comput Sci Inf Secur 2010; 8: 325-330. ... Dave S. Comparison of face recognition algorithms and its subsequent impact on side face. In: ...

  3. Face recognition: Past, present and future (a review)☆

    We give an up-to-date, comprehensive and compact overview of the vast amount of work on image and video based face recognition in the literature including the image and video databases and evaluation methods. Approximately 300 papers, which were published between 1990s and the beginning of 2020 have been reviewed.

  4. A Systematic Literature Review on the Accuracy of Face Recognition

    According to [6] the facial recognition process considers. three phases: 1) Face detection - responsible for identifying. and locating the image as a human face; 2) Featu re. extraction - deals ...

  5. Human Face Detection Techniques: A Comprehensive Review and Future

    Many recent research papers on face detection are also available in the literature [5,6,7,8,9], which, closely related to our work, attempted to review face detection algorithms. Our study, on the other hand, has conducted a more thorough review with more technical details than these reviews and is multi-dimensional as shown in Table 1 .

  6. A review on face recognition systems: recent approaches and ...

    Face recognition is an efficient technique and one of the most preferred biometric modalities for the identification and verification of individuals as compared to voice, fingerprint, iris, retina eye scan, gait, ear and hand geometry. This has over the years necessitated researchers in both the academia and industry to come up with several face recognition techniques making it one of the most ...

  7. Facial-recognition algorithms: A literature review

    Review article Facial-recognition algorithms: A literature review Paramjit Kaur1, Kewal Krishan2, Suresh K. Sharma3 and Tanuj Kanchan4 Abstract The face is an important part of the human body, distinguishing individuals in large groups of people. Thus, because of its

  8. Face Recognition: From Traditional to Deep Learning Methods

    The first face recognition algorithms were developed in the early seventies [1], [2]. Since then, their accuracy ... nent of a face recognition system and the focus of the literature review in Section II. (a) (b) (c) Fig. 3: (a) Bounding boxes found by a face detector. (b) and (c) Aligned faces and reference points. II.

  9. DEEP LEARNING FOR FACE RECOGNITION: A CRITICAL ANALYSIS

    Current research in both face detection and recognition algorithms is focused on Deep ... this paper will review all relevant literature for the period from 2003-2018 focusing on the contribution of deep neural networks in drastically improving accuracy. Furthermore, it will

  10. A Systematic Literature Review of Face Recognition Algorithms

    A Systematic Literature Review on the Accuracy of Face Recognition Algorithms. M. A. Lazarini1, R. Rossi2,* and K. Hirama2. Electrical Engineering and Computer Science Department of Inacian Educational Foundation (FEI), São Paulo, Brazil. Digital and Computer Systems Engineering Department of Polytechnic School of University of São Paulo ...

  11. Facial-recognition algorithms: A literature review

    This review presents the broad range of methods used for face recognition and attempts to discuss their advantages and disadvantages, and presents the possibilities and future implications for further advancing the field. The face is an important part of the human body, distinguishing individuals in large groups of people. Thus, because of its universality and uniqueness, it has become the ...

  12. Facial-recognition algorithms: A literature review

    Facial-recognition algorithms: A literature review Med Sci Law. 2020 Apr;60(2):131-139. doi: 10.1177/0025802419893168. ... computer-based facial recognition; criminalistics; human face; knowledge-based methods. Publication types Historical Article Research Support, Non-U.S. Gov't Review

  13. Systematic Literature Review on the Accuracy of Face Recognition Algorithms

    Therefore, this article seeks to identify the algorithms currently used by facial recognition systems through a Systematic Literature Review that considers recent scientific articles, published between 2018 and 2021. From the initial collection of ninety-three articles, a subset of thirteen was selected after applying the inclusion and ...

  14. Face Recognition by Humans and Machines: Three Fundamental Advances

    Face recognition algorithms from the 1990s and present-day DCNNs differ in accuracy for faces of different races (for a review, see Cavazos et al. 2020; for a comprehensive test of race bias in DCNNs, see Grother et al. 2019). Although training with faces of different races is often cited as a cause of race effects, it is unclear which training ...

  15. (PDF) Face Detection Techniques: A Review

    interaction etc. Face detection is a computer technology that determines the location and size. of a human face in a digital image. Face detection has been a standout amongst topics in the ...

  16. PDF Face Recognition Algorithms: A Review

    The engineering approaches were introduced in the 1980s, using simple measurements like the distance between the eyes and the forms of lines that connect face features to face recognition. The overall methods became very common in the 1990s with the famous method of Eigen-faces [3]. Feature-based approaches initial process the input image to ...

  17. Face Recognition: A Literature Review

    An up-to-date review of major human face recognition research is provided and a literature review of the most recent face recognition techniques is presented. — The task of face recognition has been actively researched in recent years. This paper provides an up-to-date review of major human face recognition research. We first present an overview of face recognition and its applications. Then ...

  18. Face detection techniques: a review

    Face recognition and face detection by Lambda Labs With over 1000 calls per month in the free pricing tier, and only $0.0024 per extra API call, this API is a really affordable option for developers wanting to use a facial recognition API. EmoVu by Eyeris This API was created by Eyeris and it is a deep learning-based emotion recognition API ...

  19. Student attendance with face recognition (LBPH or CNN): Systematic

    This study analyzes the best algorithm in face recognition by conducting literature reviews and summarizes how the researchers minimize mistakes in implementing face recognition. 2. Literature Review Face recognition is a technology created by Woodrow Wilson Bleadsoe in 1966 that works to match human faces through digital images or video ...

  20. [2103.14983] Going Deeper Into Face Detection: A Survey

    Going Deeper Into Face Detection: A Survey. Face detection is a crucial first step in many facial recognition and face analysis systems. Early approaches for face detection were mainly based on classifiers built on top of hand-crafted features extracted from local image regions, such as Haar Cascades and Histogram of Oriented Gradients. However ...

  21. PDF Face Recognition: A Literature Review

    Face recognition is a crucial and rapidly evolving field within computer vision and artificial in - telligence. Various algorithms have been implemented and face recognition applications have been developed. Over the past decade, there have been significant advancements in both the accuracy and applicability of face recognition systems.

  22. IoT-MFaceNet: Internet-of-Things-Based Face Recognition Using

    Significant advancements are being achieved in biometric security, surveillance, and personalized user experiences. This is attributed to the intricate algorithms present in deep-learning models, resulting in a substantial enhancement of our ability to comprehend facial features [11,12,13].Facial recognition technology powered by deep-learning algorithms is integrated into IoT systems, thereby ...

  23. The future of skin cancer diagnosis: a comprehensive systematic

    1.2.2. Search criteria. This systematic literature review, conducted over the past decade, focuses on advancements in skin cancer classification using ML, DL, and other techniques, aiming to provide a comprehensive overview of the current state of the field and potential solutions for this critical and timely issue.

  24. A review of the development of YOLO object detection algorithm

    A thorough performance evaluation of the YOLO series algorithms is offered to identify potential areas for future improvements to advance YOLO technology and address some challenges faced by the YOLO algorithms and potential future research directions. The You Only Look Once (YOLO) algorithm series, as the forefront of object detection technology, has evolved from YOLOv1 to YOLOv10 ...

  25. Systematic Literature Review on the Accuracy of Face Recognition Algorithms

    EAI Endorsed Transactions on Internet of Things Research Article A Systematic Literature Review on the Accuracy of Face Recognition Algorithms M. A. Lazarini1, R. Rossi2,* and K. Hirama2 1 Electrical Engineering and Computer Science Department of Inacian Educational Foundation (FEI), São Paulo, Brazil 2 Digital and Computer Systems Engineering ...

  26. A Review of Existing Face Detection & Recognition Algorithms and The

    LITERATURE REVIEW . 4 . 2.1.2 Fisherface . ... Fisherface is one of the popular algorithms used in face recognition, and is widely believed to be superior to other techniques, such as eigenface ...

  27. Applications of GANs to Aid Target Detection in SAR Operations: A

    Considering the applicability of GAN networks for image enhancement, we conducted a systematic literature review focused on the utilization of GAN algorithms for improvements in target detection in images captured by UAVs, aiming to gain insights into the techniques and metrics employed in this task and potential adaptations for search and ...

  28. Optimized deep CNN for detection and classification of diabetic

    While some literature utilizes various optimization techniques, such as Genetic Algorithms or Harris Hawks Optimization, this paper uses the Chicken Swarm Algorithm (CSA) to optimize the deep CNN model, which is unique. The paper combines several techniques, including DWT for preprocessing, AGF for feature extraction, and RF for feature selection.

  29. A systematic literature review for load balancing and task scheduling

    Cloud computing is an emerging technology composed of several key components that work together to create a seamless network of interconnected devices. These interconnected devices, such as sensors, routers, smartphones, and smart appliances, are the foundation of the Internet of Everything (IoE). Huge volumes of data generated by IoE devices are processed and accumulated in the cloud ...