• Biomedical Signal Processing
  • Machine Learning

A Comprehensive Survey of Loss Functions in Machine Learning

  • Annals of Data Science 9(5500)
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Yue Ma at Zhengzhou University

  • Zhengzhou University

Yingjie Tian at Chinese Academy of Sciences

  • Chinese Academy of Sciences

Abstract and Figures

The illustrations of loss functions in classification

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Asifullah Khan

  • Anabia Sohail

Mustansar Fiaz

  • Naeem Akhter

Jeffrey Tedjasulaksana

  • MULTIMED TOOLS APPL

Francis Jesmar Perez Montalbo

  • Xianmei Zhou
  • Shanliang Zhu
  • Hengkai Yao

Mushir Akhtar

  • APPL SOFT COMPUT
  • Rajni Goyal
  • Utkarsh Niranjan
  • Maarten C. Stol
  • Alessandra Mileo
  • J Manuf Process

Ruoxiang Gao

  • EXPERT SYST APPL

Mohammad Derakhshi

  • Weiyang Liu

Haijun Liu

  • Nathan Tsoi
  • JunYoung Gwak

Silvio Savarese

  • Marvin Minsky
  • Seymour A. Papert

Yitong Wang

  • Naiyang Deng

Yingjie Tian

  • Chunhua Zhang

Georgia Gkioxari

  • Piotr Dollár
  • Ross Girshick
  • Tsung-Yi Lin

Priya Goyal

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • DOI: 10.48550/arXiv.2312.05391
  • Corpus ID: 266162529

Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

  • Reza Azad , Moein Heidary , +5 authors D. Merhof
  • Published in arXiv.org 8 December 2023
  • Computer Science

Figures and Tables from this paper

figure 1

4 Citations

Deep bayesian segmentation for colon polyps: well-calibrated predictions in medical imaging, uu-mamba: uncertainty-aware u-mamba for cardiac image segmentation, biological trajectory prediction of beet armyworm larva based on computer vision and time-series forecasting model, two-phase optimization for pinn training, 71 references, deep semantic segmentation of natural and medical images: a review, loss max-pooling for semantic image segmentation, a survey of loss functions for semantic segmentation.

  • Highly Influential

Image Segmentation Using Deep Learning: A Survey

Optimizing intersection-over-union in deep neural networks for image segmentation, region mutual information loss for semantic segmentation, advances in medical image analysis with vision transformers: a comprehensive review, boundary loss for highly unbalanced segmentation, distance map loss penalty term for semantic segmentation, medical image segmentation review: the success of u-net, related papers.

Showing 1 through 3 of 0 Related Papers

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Novel loss functions for ensemble-based medical image classification

Sivaramakrishnan Rajaraman

National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America

Ghada Zamzmi

Sameer k. antani, associated data.

Information on the datasets used in this study can be found within the article in the Materials and methods section, under the heading Datasets. The data are third-party data and were accessed as described in the article. The authors had no special access privileges.

Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. In this work, we benchmark various state-of-the-art loss functions, critically analyze model performance, and propose improved loss functions for a multi-class classification task. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance ( p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behavior and confirm that the individual models and ensembles learned task-specific features and highlighted disease-specific regions of interest. The code is available at https://github.com/sivaramakrishnan-rajaraman/multiloss_ensemble_models .

Introduction

Deep learning (DL) has demonstrated superior performance in natural and medical computer vision tasks. Computer-aided diagnostic tools developed with DL models have been widely used in analyzing medical images including Chest-X-rays (CXRs) and computerized tomography (CT). CXRs have been studied extensively where the models are used to predict manifestations of cardiopulmonary diseases such as pneumonia opacities, pneumothorax, cardiomegaly, Tuberculosis (TB), lung nodules, and, more recently, COVID-19 [ 1 , 2 ]. Such tools are extremely helpful, particularly in resource-constrained regions where there exists a scarcity of expert radiologists.

The DL model parameters are iteratively modified to minimize the training error using several optimization methods (e.g., stochastic gradient descent). This error is computed using a loss function, also called a cost function, that maps model predictions to their associated costs. Cross-entropy loss is the most commonly used loss function in medical image classification tasks, including CXRs [ 3 – 7 ]. This loss function outputs a class probability value between 0 and 1, where high values indicate high disagreement of the predicted class with the ground truth label. In class-imbalanced medical image classification tasks, training a model to minimize the cross-entropy loss might lead to biased learning since (i) the loss asserts equal weights to all the classes, and (ii) the model would predict the majority of test samples as belonging to the dominant normal class. To mitigate these issues, the authors of [ 8 ] proposed a loss function, called focal loss, for object detection tasks. Here, the standard cross-entropy loss function is modified to down-weight the majority background class so the model would focus on learning the minority object samples. Following this study, the focal loss function has been used in several medical image classification studies. For example, the authors of [ 9 ] trained DL models to minimize the focal loss and improve pulmonary nodule detection and classification performance using CT scans. They observed that the model trained with the focal loss resulted in superior performance with 97.2% accuracy and 96.0% sensitivity. Another study [ 10 ] used the focal loss to train the models toward classifying CXRs into normal, bacterial pneumonia, viral pneumonia, or COVID-19 categories. It was observed that the models trained with the focal loss outperformed other models by demonstrating superior values for precision (78.33%), recall (86.09%), and F-score (81.68%). Aside from these studies, the literature does not have a comprehensive study that investigates the effects of loss functions on medical image classification, particularly CXRs.

DL models learn a mapping function through error backpropagation and update model weights to minimize error. They can vary in their architecture, hyper-parameters, and training strategy, thereby resulting in varying degrees of bias and variance errors. Ensemble learning, a paradigm of machine learning, helps to (i) reduce prediction variance and achieve improved performance over any individual constituent model, and (ii) increase robustness by reducing the range (spread) of the predictions. There are several ensemble methods reported in the literature including majority voting, simple averaging, weighted averaging, and stacking, among others [ 11 ]. Ensemble models have been widely used in medical image classification tasks including CXRs [ 2 , 7 , 12 – 16 ]. However, these studies trained ensemble models to minimize the de-facto cross-entropy loss in their respective classification tasks. To the best of our knowledge, we observed that no studies reported evaluations on the performance of ensemble DL models trained with other loss functions toward improving classification performance.

In this study, we aim to demonstrate the benefits of (i) training DL classification models using existing and proposed loss functions and (ii) constructing model ensembles to improve performance in a multi-class classification task that classifies pediatric CXRs as showing normal lungs, bacterial pneumonia, or viral pneumonia manifestations. This systematic study is performed as follows. First, we train an EfficientNet-B0-based U-Net model on a collection of CXRs and their associated lung masks [ 17 ] to segment lungs in the pediatric pneumonia CXR collection [ 6 ]. Lung segmentation helps to exclude irrelevant image regions and learn lung region-specific features. We select the EfficientNet-B0-based model because it delivered state-of-the-art (SOTA) performance in ImageNet classification tasks, with reduced computational complexity [ 18 ]. Next, the encoder from the trained EfficientNet-B0-based U-Net model is truncated and appended with classification layers. This is done to transfer CXR modality-specific knowledge for improving performance in the task of classifying CXRs in the pediatric pneumonia CXR dataset into normal, bacterial pneumonia, or viral pneumonia categories. Finally, the top-K (K = 3, 5) performing models are used to construct prediction-level and model-level ensembles. The performance of the individual models, prediction-level, and model-level ensembles are further analyzed for statistical significance. We also performed localization studies to ensure that the individual models and their ensembles learned task-specific features and highlighted the disease-manifested regions of interest (ROIs) in the CXRs.

Materials and methods

This retrospective study uses the following two datasets:

  • Montgomery TB CXRs [ 19 ]: This is a publicly available collection of 58 CXRs showing TB-related manifestations and radiologist readings and 80 CXRs showing lungs with no findings. The images and their associated lung masks are deidentified and exempted from the National Institutes of Health (NIH) IRB review (OHSRP#5357). We use this as an independent test set to evaluate the segmentation model proposed in this study.
  • Pediatric pneumonia [ 6 ]: A set of 4273 CXRs showing lungs infected with bacterial and viral pneumonia and 1583 CXRs showing normal lungs are collected from children of 1 to 5 years of age at the Guangzhou Medical Center in China. The author-defined [ 6 ] training set contains 1349, 2538, and 1345 CXRs and the test set contains 234, 242, and 148 CXRs showing normal lungs, bacterial pneumonia, and viral pneumonia manifestations, respectively. The CXRs are acquired as a part of routine clinical care, curated by expert radiologists, and made publicly available with IRB approvals. We use this dataset toward classifying CXRs as showing normal lungs, bacterial pneumonia, or viral pneumonia manifestations.

Lung segmentation and cropping

As CXR images contain irrelevant regions that do not help in learning classification task-specific features, we segmented the ROI, i.e., the lungs from the CXRs, and used the lung-segmented images for training the classification models. Our review of the literature reveals that U-Net [ 20 ] is widely used for segmenting ROIs in natural and medical images. Further, the study of the literature shows that EfficientNet [ 18 ] models have achieved superior performance in natural and medical computer vision tasks, as compared to other models, in terms of accuracy, efficiency, and computational complexity. Hence, we used an EfficientNet-B0-based U-Net model [ 21 ] to perform pixel-wise segmentation. The EfficientNet-B0-based U-Net model is trained using the CXR collection and their associated lung masks discussed in [ 17 ] to minimize the following loss functions: (i) Binary cross-entropy (BCE), (ii) Weighted BCE-Dice [ 2 ], (iii) Focal [ 8 ], (iv) Tversky [ 22 ], and (v) Focal Tversky [ 23 ]. We used 10% of the training data for validation with a fixed seed. Each mini-batch of the training data is augmented using random affine transformations such as pixel shifting [-2 +2], horizontal flipping, and rotations [-5 +5] to introduce variability into the training process. The model is trained using an Adam optimizer with an initial learning rate of 1e-3. The learning rate is reduced whenever the validation loss ceased to improve. The model demonstrating the least validation loss is used to predict lung masks of a reduced 512×512 pixel resolution for the CXRs in the Montgomery TB CXR collection. The images are resized using bicubic interpolation from the OpenCV software library. The performance of the segmentation models is evaluated using the following metrics: (i) Segmentation accuracy; (ii) Dice coefficient, and (iii) Intersection over union (IoU). We selected the top-3 segmentation models from those that are trained using the aforementioned loss functions based on segmentation accuracy, Dice coefficient, and IoU metrics. The selected models are used to predict the lung masks for the CXRs in the Montgomery CXR collection. These masks are then bitwise-ANDed to produce the final lung mask. The bitwise-AND operation compares each pixel of the predicted masks by the top-3 performing models. If only all the pixels are 1, i.e., belonging to the lung ROI, the corresponding bit in the final mask is set to 1, otherwise, it is set to 0. The final lung mask is then overlaid on the original CXR image to delineate the lung boundaries and the bounding box containing the lung pixels is cropped. The resulting lung-cropped image is resized to 512×512 pixel resolution. Then, the cropped CXRs are contrast-enhanced by saturating the top and bottom 1% of all the image pixels followed by normalizing the pixels to the range [0 1]. Fig 1 shows the diagram of the segmentation module proposed in this study.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g001.jpg

The U-Net constructed with an EfficientNet-B0-based encoder and symmetrical decoder is trained to minimize the following losses: (i) BCE; (ii) Weighted BCE-Dice, (iii) Focal, (iv) Tversky, and (v) Focal Tversky. The trained models predict lung masks in the Montgomery TB CXR collection. The predictions of the top-3 performing models are bitwise-ANDed to produce the final lung mask.

Classification module

The encoder from the trained EfficientNet-B0-based U-Net model is truncated at the ‘block5c_add’ layer (TensorFlow Keras naming convention) with feature map dimensions of [16, 16, 512]. This approach is followed to transfer CXR modality-specific knowledge to improve performance in the current CXR classification task. The truncated model is appended with the following layers: (i) a zero-padding (ZP) layer, (ii) a convolutional layer with 512 filters, each of size 3×3, (iii) a global averaging pooling (GAP) layer; and (iv) a final dense layer with three neurons and Softmax activation, to classify the pediatric CXRs as showing normal lungs, bacterial pneumonia, or viral pneumonia manifestations.

We used the train and test splits published in [ 6 ] to compare our model performance with the SOTA literature [ 6 , 24 ]. We allocated 10% of the training data for validation with a fixed seed. The model is trained using a stochastic gradient descent optimizer with an initial learning rate of 1e-3 and momentum of 0.9, to minimize the loss functions discussed in this study. The best-performing model is selected based on the least loss obtained with the validation data. These models are evaluated with the test set, and the performance is recorded in terms of the following metrics: (a) accuracy; (b) AUROC; (c) area under the precision-recall curve (AUPRC); (d) precision; (e) recall; (f) F-score; and (g) MCC.

The top-K (K = 3, 5) models that deliver superior performance with the test set are used to construct the ensembles. We constructed prediction-level and model-level ensembles. At the prediction level, the models’ predictions are combined using various ensemble strategies such as majority voting, simple averaging, weighted averaging, and stacking. In a majority voting ensemble, the most voted predictions are considered final for classifying CXRs to their respective classes. In a simple averaging ensemble, the individual model predictions are averaged to generate the final prediction. For the weighted averaging ensemble, we propose to optimize the weights that minimize the total logarithmic loss so that the predicted labels converge to the target labels. We iteratively minimized the logarithmic loss using the Sequential Least-Squares Programming (SLSQP) algorithm [ 25 ]. In a stacking ensemble, the predictions are fed into a meta-learner that consists of a single hidden layer with 9 and 15 neurons respectively, for the top-3 and top-5 performing models. The weights of the top-K models are frozen and only the meta-learner is trained to optimally combine the models’ predictions. A dense layer with three neurons and Softmax activation is appended to output prediction probabilities. Fig 2 shows the classification and ensemble frameworks proposed in this study.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g002.jpg

The EfficientNet-B0-based encoder is truncated at the block-5c-add layer and appended with the classification layers to output multi-class prediction probabilities. GAP denotes the global average pooling layer and DCL denotes the deepest convolutional layer in the trained models. The classification model is trained to minimize the various loss functions discussed in this study. The top-K (K = 3, 5) performing models are used to construct prediction-level and model-level ensembles.

For the model level ensemble, the top-K models are instantiated with their trained weights and truncated at their deepest convolutional layer. The features from these layers are concatenated and appended with a 1×1 convolutional layer, to reduce feature dimensions. This is followed by appending a GAP layer and a dense layer with three neurons and Softmax activation to classify the CXRs as showing normal lungs, bacterial pneumonia, or viral pneumonia manifestations. The performance of the individual models, prediction-level ensembles, and model-level ensembles are further compared for statistical significance. All the models are trained and evaluated using Tensorflow Keras 2.4 on a Windows system with an Intel Xeon 3.80 GHz CPU, NVIDIA GeForce GTX 1050 Ti GPU, and CUDA dependencies for GPU acceleration. Statistical significance analysis is performed using R software version 4.1.1.

Classification losses

We experimented with the following loss functions to provide a comprehensive evaluation of their impact on the multi-class classification task under study: (i) Categorical cross-entropy (CCE) loss; (ii) Categorical focal loss [ 8 ]; (iii) Kullback-Leibler (KL) divergence loss [ 26 ]; (iv) Categorical Hinge loss [ 27 ]; (v) Label-smoothed CCE loss [ 28 ]; (vi) Label-smoothed categorical focal loss [ 28 ], and (vii) Calibrated CCE loss [ 29 ]. We also propose several loss functions, as follows, that mitigate the issues with the existing loss functions when applied to the multi-class classification task under study: (i) CCE loss with entropy-based regularization; (ii) Calibrated negative entropy loss, (iii) Calibrated KL divergence loss; (iv) Calibrated categorical focal loss, and (v) Calibrated categorical Hinge loss. The details of the proposed loss functions are discussed below.

(i) CCE with entropy-based regularization

DL models demonstrate low entropy values for the output distributions when they are confident about their predictions [ 29 ]. However, under class-imbalanced training conditions, the models might be overconfident about the majority class and classify most of the samples as belonging to this dominant class. This may lead to model overfitting and adversely impact generalization performance. Under these circumstances, a penalty could be introduced in the form of a regularization term that penalizes peaked distributions, thereby reducing overfitting and improving generalization. A model produces a conditional distribution p Ω ( y | x ) through the Softmax function, over a set of classes y given an input x . The entropy of this conditional distribution is given by,

Here, H denotes the entropy term. A regularization term is proposed where the negative entropy is added to the negative log-likelihood to penalize over-confident output distributions. It is given by,

Here, β controls the intensity of the penalty. Through empirical evaluations, we set the value of β = 2. We used this regularization term in the final dense layer as an activity regularizer and trained the model to minimize the CCE loss.

(ii) Calibrated negative entropy loss

We propose an entropy-based loss function where the negative entropy is added as an auxiliary term to the negative log-likelihood term as shown in Eqs [ 1 ] and [ 2 ] to penalize over-confident output distributions. A model is said to demonstrate poor calibration if it is overconfident or underconfident about its predictions and would not reflect the true occurrence likelihood of the class events. Motivated by [ 29 ], we propose to add a regularization term that computes the difference between the accuracy and the predicted probabilities to the entropy-based loss function. This regularization term helps to penalize the model when the entropy-based loss function reduces without a corresponding change in the accuracy. The regularization term forces the accuracy to match the average predicted probabilities, thereby (i) acting as a smoothing parameter that smoothens overconfident or underconfident predictions and (ii) pushing the model to converge to the ideal condition when the accuracy would reflect the true occurrence likelihood. The calibrated negative entropy loss is given by,

Here, β controls the penalty intensity. The auxiliary term difference is calculated for each mini-batch, as given by,

Here, y k ′ denotes the predicted label. The value of c k is 1 if y k ′ = y k ; otherwise, c k is 0. This auxiliary term forces the average value of the predicted probabilities to match the accuracy over all training examples. This pushes the model closer to the ideal situation, where the model accuracy would reflect the true occurrence likelihood of the samples. The auxiliary term serves as a smoothing parameter for predictions with extremely low or high prediction confidences. We tested with different weights for β = [0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 2] and λ = [0.5, 1, 2, 5, 10, 15, 20]. After empirical evaluations, we set the value of β = 0.001 and λ = 10.

(iii) Calibrated KL divergence loss

The KL divergence, also called relative entropy, measures the difference between the observed and actual probability distributions. The KL divergence between two distributions A ( x ) and B ( x ) is given by,

We propose to benefit from the regularization term mentioned in Eq [ 4 ] to smoothen model predictions when trained to minimize the KL divergence loss. We propose the calibrated KL divergence loss where the regularization term in Eq [ 4 ] is added to the KL divergence loss. This is done to penalize the model when the KL divergence loss reduces without a corresponding change in the accuracy. The calibrated KL divergence loss is given by,

The auxiliary term difference is calculated for each mini-batch and is given by Eq [ 4 ]. We tested with different weights for λ = [0.5, 1, 2, 5, 10, 15, 20]. After empirical evaluations, the value of λ is set to 1.

(iv) Calibrated categorical focal loss

The principal limitation of CCE loss is that the loss asserts equal learning from all the classes. This adversely impacts training and classification performance during class-imbalanced training. This holds for medical images, particularly CXRs, where a class imbalance exists between the majority normal class and other minority disease classes. In this regard, the authors of [ 8 ] proposed the focal loss for object detection tasks, in which the standard cross-entropy loss function is modified to down weight the majority class so that the model would focus on learning the minority classes. In a multi-class classification setting, the categorical focal loss is given by,

Here, K = 3, denotes the number of classes, k = {0, 1, K −1} denotes the class labels for bacterial pneumonia, normal, and viral pneumonia classes respectively, and p = ( p 0 ′ , p 1 ′ , p 2 ′ ) ∈ [ 0 , 1 ] 3 is a vector representing an estimated probability distribution over the three classes. The value γ denotes the rate at which the easy samples are down-weighted. The categorical focal loss converges to CCE loss at γ = 0. We propose the calibrated categorical focal loss, where the difference between the accuracy and predicted probabilities is added as a regularization term to penalize the model for overconfident and underconfident predictions when trained to minimize the categorical focal loss. The calibrated categorical focal loss is given by,

The auxiliary term difference is calculated for each mini-batch and is given by Eq [ 4 ]. We tested with different weights for γ = [0.5, 1, 2, 5] and λ = [0.5, 1, 2, 5, 10, 15, 20]. After empirical evaluations, the value of γ and λ is set to 1.

(v) Calibrated categorical Hinge loss

The Hinge loss is widely used in binary classification problems to produce “maximum-margin” classification [ 27 ], particularly with SVM classifiers. This loss could be used in a multi-class classification setting and is given by,

Here, y true and y pred denote the ground truth one-hot encoded labels and predictions, respectively. We propose the calibrated categorical Hinge loss, where the difference between the accuracy and predicted probabilities is added as an auxiliary term to the categorical Hinge loss. This auxiliary term penalizes the model when the categorical Hinge loss reduces without a corresponding change in the accuracy. The calibrated categorical Hinge loss is given by,

The negative and positive terms are given by Eqs [ 10 ] and [ 11 ]. The auxiliary term difference is calculated for each mini-batch and is given by Eq [ 4 ]. We tested with different weights for λ = [0.5, 1, 2, 5, 10, 15, 20]. After empirical evaluations, the value of λ is set to 10.

CXR lung segmentation

Recall that an EfficientNet-B0-based U-Net model is trained to minimize BCE, weighted BCE-Dice, focal, Tversky, and focal Tversky loss functions and predict lung masks for the CXRs in the Montgomery TB CXR collection. The lung masks predicted by the top-3 performing models are bitwise-ANDed to produce the final lung mask. The performance of the individual models and the bitwise ANDed model ensemble is evaluated using segmentation accuracy, IoU, and Dice coefficient as shown in Table 1 . We observed that the segmentation model demonstrated higher values for the Dice coefficient compared to the IoU metrics due to the way the two functions are defined. The Dice coefficient value is given by twice the area of the intersection of two masks, divided by the sum of the areas of the masks. It is observed from Table 1 that, considering individual models, the segmentation model trained to minimize the focal Tversky loss demonstrated superior performance in terms of IoU, Dice coefficient, and accuracy metrics, followed by those trained with Tversky and weighted BCE-Dice losses. These top-3 performing models are used to construct the ensemble. Here, the lung masks predicted by the top-3 performing models are bitwise-ANDed to produce the final lung mask. We observed that the IoU, Dice coefficient, and accuracy, achieved using the bitwise-ANDed model ensemble are superior compared to any individual constituent model. However, we observed no statistically significant difference in performance ( p > 0.05) between the individual models and the ensemble.

Loss/MethodMetrics
IoUDiceAccuracy
BCE0.8186±0.03840.9571±0.03610.9720±0.0096
Weighted BCE-Dice0.8465±0.04010.9601±0.03960.9732±0.0104
Focal0.2601±0.06210.9189±0.05270.7788±0.0485
Tversky0.9360±0.03680.9624±0.02250.9912±0.0102
Focal Tversky0.9510±0.04150.9637±0.02710.9925±0.0130
Ensemble

The bold numerical values denote the best performance in respective columns.

We used the top-3 performing models and the bitwise-ANDed ensemble approach to predict lung masks for the CXRs in the pediatric pneumonia CXR collection. As the ground truth lung masks for these CXRs are not made available by the authors of [ 6 ], the segmentation performance could not be validated. The predicted lung masks are overlaid on the original CXRs to delineate the lung boundaries and are cropped. The cropped images are resized to 512×512 pixel resolution and used for further analysis (i.e., disease classification).

CXR disease classification

Recall that the encoder from the trained EfficientNet-B0-based U-Net model is truncated and appended with classification layers. This approach is followed to perform a CXR modality-specific knowledge transfer [ 2 , 15 , 16 , 30 ] to improve performance in a relevant task of classifying the CXRs in the pediatric pneumonia CXR collection into normal, bacterial pneumonia, or viral pneumonia categories. The classification models are trained to minimize the existing and proposed loss functions in this study. Table 2 summarizes the classification performance achieved by these models. We measured the 95% CI as the exact Clopper–Pearson interval for the MCC metric to test for statistical significance. It is observed that the classification models demonstrated higher values for F-score compared to the MCC metric. F-score provides a balanced measure of precision and recall but could provide a biased estimate since it does not consider TN values. MCC considers TPs, TNs, FPs, and FNs in its computation. The score of MCC lies in the range [-1 +1] where +1 demonstrates a perfect model while -1 demonstrates poor performance. The authors of [ 31 ] discuss the benefits of using MCC metric over F-score and accuracy in evaluating classification models. It is observed from Table 2 that the model trained to minimize the calibrated CCE loss demonstrated superior values for accuracy (0.9343), AUROC (0.9928), AUPRC (0.9869), precision (0.9345), recall (0.9343), F-score (0.9338), and MCC (0.8996) metrics. The 95% CI for the MCC metric demonstrated a tighter error margin and hence higher precision as compared to other models. The performance achieved with the calibrated CCE loss is significantly superior ( p < 0.05) as compared to those achieved by the models that are trained to minimize the categorical focal and calibrated categorical focal loss functions. Fig 3 shows the confusion matrix, AUROC, and AUPRC curves obtained with the calibrated CCE loss-trained model. This performance is followed by the models that are trained to minimize the CCE with entropy-based regularization, calibrated negative entropy, label-smoothed categorical focal, and calibrated categorical Hinge loss functions.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g003.jpg

LossMetrics
AccuracyAUROCAUPRCPrecisionRecallF-ScoreMCC
CCE0.92790.99210.98570.92920.92790.92820.8899
(0.8653, 0.9145)
CCE with entropy-based regularization (β = 2.0)0.93110.99130.98440.93370.93110.93190.8953
(0.8712, 0.9194)
KL divergence0.92310.990.98250.92610.92310.9240.8831
(0.8578, 0.9084)
Categorical focal (γ = 1)0.90540.9840.97530.90790.90540.90540.8562
(0.8286, 0.8838)
Categorical Hinge0.92470.98920.98030.9280.92470.92550.8858
(0.8608, 0.9108)
Smoothed-CCE (σ = 0.2)0.92310.98990.98210.92520.92310.92370.8829
(0.8576, 0.9082)
Smoothed-focal (σ = 0.2)0.92790.98470.97440.93170.92790.92870.8909
(0.8664, 0.9154)
Calibrated-CCE (λ = 10)
Calibrated-KL divergence (λ = 1)0.92150.98950.98170.92390.92150.92170.8807
(0.8552, 0.9062)
Calibrated focal (γ = λ = 1)0.91670.9860.97770.91870.91670.91640.8734
(0.8473, 0.8995)
Calibrated Hinge (λ = 10)0.92790.98940.98030.92920.92790.92750.8903
(0.8657, 0.9149)
Calibrated negative entropy(β = 1e-3; λ = 10)0.93110.99170.98510.93160.93110.93080.8947
(0.8706, 0.9188)

The top-K (K = 3, 5) models are selected based on the MCC metric. The values in parentheses denote the 95% CI measured as the exact Clopper–Pearson interval for the MCC metric. Bold numerical values denote superior performance in respective columns.

The top-3 (i.e., models that are trained to minimize the calibrated CCE, CCE with entropy-based regularization, and calibrated negative entropy losses) and top-5 (i.e., models that are trained to minimize the calibrated CCE, CCE with entropy-based regularization, calibrated negative entropy, label-smoothed categorical focal, and calibrated categorical Hinge losses) are used to construct prediction-level and model-level ensembles. Recall that for the prediction-level ensemble, the models’ predictions are combined using majority voting, simple averaging, weighted averaging, and stacking-based ensemble methods. Table 3 summarizes the classification performance achieved by the prediction-level ensembles.

ModelsMethodMetrics
AccuracyAUROCAUPRCPrecisionRecallF-ScoreMCC
Top-3Max voting0.92950.94710.94120.93050.92950.92970.8923
(0.8679, 0.9167)
Simple averaging0.92790.99240.98630.92870.92790.92810.8898
(0.8652, 0.9144)
Weighted averaging0.9343 0.93450.93430.93380.8996
(0.876, 0.9232)
Stacking0.92630.990.98310.92840.92630.92690.8877
(0.8629, 0.9125)
Top-5Max voting0.93270.94950.94390.93340.93270.93270.8972
(0.8733, 0.9211)
Simple averaging0.92950.99230.98630.93110.92950.92980.8926
(0.8683, 0.9169)
Weighted averaging
Stacking0.92790.98730.98010.93030.92790.92860.8903
(0.8657, 0.9149)

The values in parentheses denote the 95% CI measured as the exact Clopper–Pearson interval for the MCC metric. Bold numerical values denote superior performance in respective columns.

It is observed from Table 3 that the prediction-level ensembles constructed using the top-3 and top-5 performing models demonstrated higher values for F-score as compared to the MCC metrics for the reasons discussed before. The weighted averaging ensemble of the top-5 performing models using the optimal weights [0.40560531, 0.192276399, 0.00356809023, 0.3985502, 1.10927275e-16] calculated using the SLSQP method achieved superior performance compared to other ensembles. The 95% CI obtained using the MCC metric demonstrated a tighter error margin and hence higher precision compared to other ensemble methods. However, we observed no statistically significant difference ( p > 0.05) in performance across the ensemble methods. Fig 4 shows the confusion matrix, AUROC, and AUPRC curves achieved using the top-5 weighted averaging ensemble.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g004.jpg

Recall that the model-level ensembles are constructed using the top-K (K = 3, 5) models by instantiating them with their trained weights and truncating them at their deepest convolutional layers. The feature maps from these layers are concatenated and appended with a 1×1 convolutional layer for feature dimensionality reduction. In our study, the feature maps of the deepest convolutional layers for the models have [16, 16, 512] dimensions. Hence, after concatenation, the feature maps for the top-3 models are of [16, 16, 1536] dimensions, and that for the top-5 models are of [16, 16, 2560] dimensions. We used 1×1 convolutions to reduce these dimensions to [16, 16, 512]. The 1×1 convolutional layer is appended with a GAP and dense layer with three neurons to classify the CXRs into normal, bacterial pneumonia, or viral pneumonia categories. Table 4 shows the classification performance achieved in this regard. We observed no statistically significant difference ( p > 0.05) in performance between the top-3 and top-5 model-level ensembles. We further performed a weighted averaging of the predictions of the top-3 and top-5 model-level ensembles. We calculated the optimal weights [0.3764, 0.6236] using the SLSQP method to improve performance. Fig 5 shows the confusion matrix, AUROC, and AUPRC curves obtained by the weighted averaging ensemble using the predictions of the top-3 and top-5 model-level ensembles. We observed that this ensemble approach demonstrated superior performance for all metrics compared to the individual models and all ensemble methods discussed in this study.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g005.jpg

MethodMetrics
AccuracyAUROCAUPRCPrecisionRecallF-ScoreMCC
Top-30.93270.99330.98810.93340.93270.9330.897
(0.8731, 0.9209)
Top-50.93590.99280.98720.93650.93590.9360.9019
(0.8785, 0.9253)
Weighted averaging

The values in parentheses denote the 95% CI measured as the exact Clopper–Pearson interval for the MCC metric.

Table 5 shows a comparison of the performance achieved with (i) the weighted averaging ensemble of top-3 and top-5 model-level predictions and (ii) SOTA literature.

StudyMetrics
Acc.AUROCAUPRCPrec.Rec.FMCC
Kermany et al. [ ]NANANANANANANA
Rajaraman et al. [ ]0.9180.939NA0.920.90.910.87
(0.8436, 0.8964)
Proposed

The authors of [ 6 ] that released the pediatric pneumonia CXR dataset performed binary classification to classify the CXRs as showing normal lungs or other abnormal manifestations. To the best of our knowledge, only the authors of [ 24 ] performed a multi-class classification using the train and test splits released by the authors of [ 6 ]. We observed that the MCC metric achieved by the weighted averaging ensemble of top-3 and top-5 model-level predictions is significantly superior ( p < 0.05) compared to the MCC metric reported in the literature [ 24 ].

Disease ROI localization

We used Grad-CAM tools [ 32 ] for localizing the disease-manifested ROIs to ensure that the models learned meaningful features. Fig 6 shows instances of pediatric CXRs showing expert ground truth annotations for bacterial and viral pneumonia manifestations and Grad-CAM localizations of the top-5 performing models and the top-5 model-level ensemble. It is observed from Fig 6 that the classification models trained using the existing and proposed loss functions and the top-5 model-level ensemble highlighted the ROIs showing disease manifestations. The highest activations, observed as the hottest region in the heatmap, contribute the majority toward the models’ decision toward classifying the CXRs into their respective categories.

An external file that holds a picture, illustration, etc.
Object name is pone.0261307.g006.jpg

(a) and (h) denote instances of CXR with expert annotations showing bacterial and viral pneumonia manifestations, respectively. The sub-parts (b), (c), (d), (e), (f), and (g) show Grad-CAM-based ROI localization achieved using the models trained with calibrated CCE, CCE with entropy-based regularization, calibrated negative entropy, label-smoothed categorical focal, calibrated categorical Hinge loss functions, and the top-5 model-level ensemble, respectively, highlighting regions of bacterial pneumonia manifestations. The sub-parts (i), (j), (k), (l), (m), and (n) show the localization achieved using the models in the same order as above, highlighting viral pneumonia manifestations.

Discussion and conclusions

While several studies [ 33 , 34 ] report using the pediatric pneumonia CXR dataset [ 6 ] in a binary classification setting, only the authors of [ 24 ] trained models for a multi-class classification task. Further, studies in [ 33 , 34 ] used ImageNet-pretrained models to transfer knowledge to a target CXR classification task as opposed to a CXR modality-specific pretrained model. Such transfer of knowledge may not be relevant since the characteristics of natural images are distinct from medical images. In this work, we propose to resolve the aforementioned issues by transferring knowledge from a CXR modality-specific pretrained model to improve performance in a relevant CXR classification task. We trained the models using existing loss functions and also proposed several loss functions. Our experimental results showed that the model trained to minimize the calibrated CCE loss demonstrated superior values for all metrics. This performance is followed by those that are trained to minimize the proposed losses such as CCE with entropy-based regularization, calibrated negative entropy, label-smoothed categorical focal, and calibrated categorical Hinge loss.

We evaluated the performance of both prediction-level and model-level ensembles. We observed from the experiments that the model-level ensembles demonstrated markedly improved performance than the prediction-level ensembles. We further improved performance by (i) deriving optimal weights using the SLSQP method, and (ii) using the derived weights to perform weighted averaging of the predictions of top-3 and top-5 model-level ensembles. We observed that the weighted averaging ensemble demonstrated superior performance for all metrics compared to other individual models, their ensemble, and the SOTA literature. Finally, we used Grad-CAM-based visualization tools to interpret the learned weights in the individual models and model-level ensembles. We observed that these models precisely localized the ROIs showing disease manifestations, confirming the expert’s knowledge of the problem.

Our study combined the benefits of (i) performing CXR modality-specific knowledge transfer, (ii) proposing loss functions that delivered superior classification performance in a multi-class classification setting, (iii) constructing prediction-level and model-level ensembles to achieve SOTA performance as shown in Table 5 . However, there are a few limitations to this study. For example, novel loss functions could be proposed for classification tasks to train models and their ensembles. Other ensemble methods such as blending and snapshot ensembles could also be attempted to improve performance. It is becoming increasingly viable to deploy ensemble models in real-time for image and video analysis with the advent of low-cost computation, storage solutions, and cloud technology [ 35 ]. The methods proposed in this study could be extended to the classification and detection of cardiopulmonary abnormalities [ 36 ] including COVID-19, TB, cardiomegaly, and lung nodules, among others.

Funding Statement

This study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH). The funder provided support in the form of salaries for authors SR, GZ, and SA, but did not have any additional role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Data Availability

  • PLoS One. 2021; 16(12): e0261307.

Decision Letter 0

PONE-D-21-32872Deep model ensembles with novel loss functions for multi-class medical image classificationPLOS ONE

Dear Dr. Rajaraman,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR: Based on the comments from the reviewers and my own observation I recommend major revisions.

Please submit your revised manuscript by Dec 23 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at  gro.solp@enosolp . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Thippa Reddy Gadekallu

Academic Editor

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following financial disclosure: 

"This study is supported by the Intramural Research Program (IRP) of the National

Library of Medicine (NLM) and the National Institutes of Health (NIH). The intramural research scientists (authors) at the NIH dictated study design, data collection, data analysis, decision to publish and preparation of the manuscript."

We note that one or more of the authors is affiliated with the funding organization, indicating the funder may have had some role in the design, data collection, analysis or preparation of your manuscript for publication; in other words, the funder played an indirect role through the participation of the co-authors. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please do the following:

a. Review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. These amendments should be made in the online form.

b. Confirm in your cover letter that you agree with the following statement, and we will change the online submission form on your behalf: 

“The funder provided support in the form of salaries for authors SR, GZ and SA, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"This study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH)."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Additional Editor Comments:

The authors are suggested to address all the comments carefully. The authors can cite the references suggested by the reviewers only if they are relevant and strengthen the references section.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Partly

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: I Don't Know

Reviewer #3: Yes

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The proposed work presents a deep model ensembling approach with loss functions for multiclass image classification. There have been numerous research works is already conducted in this domain. However, the approach brings some interesting discussions about the application of loss functions in medical images. However, the following revisions are required.

• The title of the paper needs revision. The length of the paper is very long, and I recommend authors to focus on the essential parts and discard the basic stuff such as what is ensemble deep learning, what is statistical analysis, etc.

• Secondly, the manuscript should be focused on the loss functions for multiclass image classification. But the paper discusses so many other things, such as disease classification details. If authors want to keep these contents, then organize the manuscript so that it is easy for readers to read. I recommend, authors to revise the entire manuscript with the focus on “with loss functions for multiclass image classification.”

• The literature review carried out for the proposed work is not up to date. The proposed research demands the referral of some of the latest research works published recently, such as “ReCognizing SUspect and PredictiNg ThE SpRead of Contagion Based on Mobile Phone LoCation DaTa (COUNTERACT): A System of identifying COVID-19 infectious and hazardous sites, detecting disease outbreaks based on the internet of things, edge computing, and artificial intelligence”, “Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences

Reviewer #2: The work lacks novelty. Literature review is poor. Proposed work should be described clearly with clear diagram/algorithm along with discussion.Introduction section and conclusion need to be revised.

I strongly suggest the authors to format the content and structure of the paper before submission. This article does not look like a research paper, just like a manual. Simultaneously, the figures included in this paper are not obvious and casual. The authors should refer to some related papers in some venue for revision.

Reviewer #3: Abstract should be concise yet. But should give complete overview of the work and study.

Abstract should reflect the background knowledge on the problem addressed need to be added.

Abstract should reflect the wide range of applications and its possible solutions need to be added.

In Introduction section, the drawbacks of each conventional technique should be described clearly.

Introduction section can be extended to add the issues in the context of the existing work

What is the motivation of the proposed work?

Literature review techniques have to be strengthened by including the issues in the current system and how the author proposes to overcome the same

Research gaps, objectives of the proposed work should be clearly justified.

The writing of the paper needs a lot of improvement in terms of grammar, spellings, and presentations. The paper needs careful English polishing since there are many typos and poorly written sentences.

Authors can use latest related works from reputed journals like IEEE/ACM Transactions, MDPI, Elsevier, Inderscience, Springer, Taylor & Francis etc. and write the references in proper format, from year 2020-21.

The authors seem to disregard or neglect some important finding in results that have been achieved in paper. So, elaborate and explain the results in more details.

Improve the results and discussion section in paragraph.

The conclusion should state scope for future work.

6. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,  https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at  gro.solp@serugif . Please note that Supporting Information files do not need this step.

Author response to Decision Letter 0

19 Nov 2021

Response to the Editor:

We render our sincere thanks to the Editor for arranging peer review and encouraging resubmission of our manuscript. To the best of our knowledge and belief, we have addressed the concerns of the Editor and the reviewers in the revised manuscript.

Q1: Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

Author response: We have formatted the manuscript per the templates recommended by the Editor.

Q2: We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. Thank you for stating the following financial disclosure: "This study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH). The intramural research scientists (authors) at the NIH dictated study design, data collection, data analysis, decision to publish, and preparation of the manuscript." We note that one or more of the authors is affiliated with the funding organization, indicating the funder may have had some role in the design, data collection, analysis, or preparation of your manuscript for publication; in other words, the funder played an indirect role through the participation of the co-authors. If the funding organization did not play a role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please do the following: a. Review your statements relating to the author contributions and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. These amendments should be made in the online form. b. Confirm in your cover letter that you agree with the following statement, and we will change the online submission form on your behalf: “The funder provided support in the form of salaries for authors SR, GZ, and SA, but did not have any additional role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Author response: All authors of this manuscript are employed by the National Library of Medicine. Our research is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH). We do not have a specific grant number. All authors reviewed the contributions listed in the manuscript. We hereby agree to include the following statements under the “Funding Information and Financial Disclosure” sections in the online submission form.

“This study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH). The funder provided support in the form of salaries for authors SR, GZ, and SA, but did not have any additional role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

Q3: Thank you for stating the following in the Acknowledgments Section of your manuscript: "This study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH).” We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement

Author response: We have removed the Acknowledgment section (and included text) per the Editor’s recommendation.

Response to Reviewer #1:

We render our sincere thanks to the reviewer for the valuable comments and appreciation of our study. To the best of our knowledge and belief, we have addressed the reviewer’s concerns.

Q1: Is the manuscript technically sound, and do the data support the conclusions? Yes; Has the statistical analysis been performed appropriately and rigorously? No; Have the authors made all data underlying the findings in their manuscript fully available? No; Is the manuscript presented in an intelligible fashion and written in standard English? Yes.

Author response:

We wish to reiterate, as indicated in the manuscript, that the data used in this study is publicly available without restriction. The details of the data and their availability are discussed under the Materials and methods section. We compared models’ performance and reported statistical significance in the results. We computed the binomial confidence intervals as the exact Clopper–Pearson interval for the MCC metric to analyze statistical significance. These results are comprehensively discussed in the revised manuscript.

Q2: The proposed work presents a deep model ensembling approach with loss functions for multiclass image classification. There have been numerous research works is already conducted in this domain. However, the approach brings some interesting discussions about the application of loss functions in medical images.

Author response: We sincerely thank the reviewer for the words of appreciation. The study aims to compare multi-class classification performance using the models trained on existing and novel loss functions proposed in this study. We propose several loss functions including cross-entropy loss, negative entropy loss, KL divergence loss, categorical focal loss, and categorical hinge loss, each added with a calibration component (to penalize overconfident/underconfident predictions) and a cross-entropy loss with entropy-based regularization. We demonstrate that, compared to using the de-facto cross-entropy loss function, the proposed loss functions demonstrated superior performance toward this classification task. We further improved performance by constructing prediction- and model-level ensembles. In the process, we obtained state-of-the-art performance in classifying the pediatric CXR dataset into normal, bacterial pneumonia, and viral pneumonia classes.

Q3: However, the following revisions are required. The title of the paper needs revision.

Author response: Agreed. The title is modified as “Novel loss functions for ensemble-based medical image classification” to make it simpler and convey clarity.

Q4: The length of the paper is very long, and I recommend authors to focus on the essential parts and discard the basic stuff such as what is ensemble deep learning, what is statistical analysis, etc.

Author response: Thanks for these insightful comments. Indeed, addressing this comment has helped improve readability in the revised manuscript. The following changes are made to the revised manuscript:

(i) Discussions regarding CXR modality-specific knowledge transfer, deep ensemble learning, and statistical analysis are removed from the introduction section for redundancy.

(ii) Discussions regarding the existing segmentation losses, segmentation evaluation metrics, and existing classification loss functions are removed but adequate references are provided.

(iii) The polar plots are removed for redundancy.

Q5: Secondly, the manuscript should be focused on the loss functions for multiclass image classification. But the paper discusses so many other things, such as disease classification details. If authors want to keep these contents, then organize the manuscript so that it is easy for readers to read. I recommend, authors to revise the entire manuscript with the focus on “with loss functions for multiclass image classification.”

Author response: We sincerely thank the reviewer for these valuable comments. Addressing Q4 has helped remove redundant information and improve readability. However, this systematic study includes several steps toward attaining state-of-the-art performance in classifying the pediatric CXR data which provides an objective way of evaluating the benefits of this method. These steps include (i) performing lung segmentation to prevent learning irrelevant features in the background, (ii) training and evaluating models with the existing and proposed loss functions, (iii) improving performance by constructing prediction- and model-level ensembles, and (iv) using visualization tools to interpret the learned behavior of the models and the ensemble. We illustrate these steps in Fig. 1 and Fig. 2. We observed that the weighted averaging of the predictions of the top-3 and top-5 model-level ensembles obtained state-of-the-art performance using the pediatric CXR data.

Q6: The literature review carried out for the proposed work is not up to date. The proposed research demands the referral of some of the latest research works published recently, such as “ReCognizing SUspect and PredictiNg ThE SpRead of Contagion Based on Mobile Phone LoCation DaTa (COUNTERACT): A System of identifying COVID-19 infectious and hazardous sites, detecting disease outbreaks based on the internet of things, edge computing, and artificial intelligence”, “Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences

Author response: Thanks. We have cited the COVID-19 study per the reviewer’s suggestions.

[35] Ghayvat H, Awais M, Gope P, Pandya S, Majumdar S. 2021. Recognizing suspect and predicting the spread of contagion based on mobile phone location data (counteract): a system of identifying covid-19 infectious and hazardous sites, detecting disease outbreaks based on the internet of things, edge computing, and artificial intelligence. Sustainable Cities and Society 69(12):102798

Response to Reviewer #2:

We thank the reviewer for the valuable comments on this study.

Q1: The work lacks novelty. Literature review is poor. Proposed work should be described clearly with clear diagram/algorithm along with discussion. Introduction section and conclusion need to be revised.

Author response: The principal limitation of the de-facto cross-entropy loss is that it asserts equal learning from all the classes. This adversely impacts training and classification performance during class-imbalanced training. This holds for medical images, particularly CXRs, where a class imbalance exists between the majority normal class and other minority disease classes. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. The contribution of this study includes a comprehensive statistical evaluation of several existing and proposed loss functions toward a medical image classification task. This guides the researchers regarding making an appropriate selection of a loss function for the task under study. The proposed loss functions could be applied for binary, multi-class, and multi-label classification tasks. We further improve classification performance by constructing an ensemble of models trained with the existing and proposed loss functions. In the process, we observed that the ensemble delivered superior performance compared to the individual models.

We made sure to include relevant references from the current year. The references are formatted per PLOS ONE requirements. The citations include those published in reputed journals like IEEE, Elsevier, Springer, and MDPI.

The proposed work is briefly discussed in the introduction in lines 80 – 96. Fig. 1 and Fig. 2 illustrate the steps involved in this systematic study.

The introduction section has been revised to remove contents for redundancy. The conclusion discusses the benefits and limitations of the current study and the scope for future study.

Q2: I strongly suggest the authors to format the content and structure of the paper before submission. This article does not look like a research paper, just like a manual. Simultaneously, the figures included in this paper are not obvious and casual. The authors should refer to some related papers in some venue for revision.

Author response: Thanks for these insightful comments. We have made the following changes to the manuscript to improve readability.

(i) Discussions regarding CXR modality-specific knowledge transfer, deep ensemble learning, and statistical analysis are removed from the introduction section for redundancy; (ii) Discussions regarding the existing segmentation losses, segmentation evaluation metrics, and existing classification loss functions are removed but adequate references are provided; (iii) The polar plots are removed for redundancy; (iv) We made sure to include relevant references from the current year (2021). The references are formatted per PLOS ONE requirements. The citations include those published in reputed journals from publishers such as IEEE, Elsevier, Springer, and MDPI. (v) The figures (Fig. 1 and Fig. 2) illustrate the steps involved in this systematic study. Fig. 3, Fig. 4, and Fig. 5 illustrate the performances (in terms of AUROC, AUPRC, and confusion matrix) obtained by the individual models and the prediction- and model-level ensembles. Fig. 6 illustrates Grad-CAM-based localization of the disease ROIs achieved using the trained models and the ensemble. This provides a qualitative analysis of the learned behavior by the individual models and the ensemble.

Response to Reviewer #3:

We thank the reviewer for the appreciative and constructive comments on this study.

Q1: Abstract should be concise yet. But should give complete overview of the work and study. Abstract should reflect the background knowledge on the problem addressed need to be added. Abstract should reflect the wide range of applications and its possible solutions need to be added.

Author response: Thanks for these insightful comments. We confirmed that the abstract does not exceed the 300 words count as recommended in the PLOS ONE submission guidelines. We modified the abstract to include the background knowledge about the problem and the proposed solution. The revised abstract is given below. Note that while we provide the link for our code, we will open the site only after the manuscript is published.

Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. In this work, we benchmark various state-of-the-art loss functions, critically analyze model performance, and propose improved loss functions for a multi-class classification task. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance (p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behavior and confirm that the individual models and ensembles learned task-specific features and highlighted disease-specific regions of interest. The code is available at https://github.com/sivaramakrishnan-rajaraman/multiloss_ensemble_models .

Q2: In Introduction section, the drawbacks of each conventional technique should be described clearly. Introduction section can be extended to add the issues in the context of the existing work

Author response: Thanks for these comments. The drawbacks of using the de-facto cross-entropy loss function for model training and the need to propose novel loss functions are described in lines 49 – 68. The need for ensemble learning applied is discussed in lines 69 – 79. A brief overview of the proposed methodology is mentioned in lines 80 – 96 in the revised manuscript. The merits, limitations, and scope for future work are discussed in lines 429 – 459.

Q3: What is the motivation of the proposed work?

Author response: The principal limitation of the de-facto cross-entropy loss is that it asserts equal learning from all the classes. This adversely impacts training and classification performance during class-imbalanced training. This holds for medical images, particularly CXRs, where a class imbalance exists between the majority normal class and other minority disease classes. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. The contribution of this study includes a comprehensive statistical evaluation of several existing and proposed loss functions toward a medical image classification task. We further improve performance by constructing an ensemble of models trained with diverse loss functions. We observed that, unlike individual models, the weighted averaging of the predictions of top-3 and top-5 model-level ensembles delivered superior performance toward this task. This underscores that an ensemble of models trained with diverse loss functions improves performance compared to using individual models. We demonstrated these results with statistical significance analysis.

Q4: Literature review techniques have to be strengthened by including the issues in the current system and how the author proposes to overcome the same. Research gaps, objectives of the proposed work should be clearly justified.

Author response: The issues with training the models with the de-facto cross-entropy loss function are discussed in lines 49 – 68. Considering class-imbalanced classification tasks that are common in medical images, using the cross-entropy loss and asserting equal learning to all classes would lead to a biased estimate of the performance. To overcome these limitations, the authors of [8] proposed the focal loss function that down weights the majority class and improves the learning of the minority class. Aside from the literature discussed in lines 49 – 68, the literature does not include a comprehensive study that investigates the effects of loss functions on medical image classification, particularly using CXRs. This study aims to provide a comprehensive analysis of using the existing and proposed loss functions to improve performance in a multi-class CXR classification task. We further improved performance through constructing ensembles of models trained with various loss functions. This systematic procedure is discussed in lines 80 – 96 in the revised manuscript. We observed that the models trained with the proposed loss functions delivered superior classification performance compared to the model trained on the de-facto cross-entropy loss function. The ensemble of the models trained with diverse loss functions achieved state-of-the-art performance using the pediatric CXR data used in this study.

Q5: The writing of the paper needs a lot of improvement in terms of grammar, spellings, and presentations. The paper needs careful English polishing since there are many typos and poorly written sentences.

Author response: Thanks for these comments. We made sure to rectify the typos and grammatical errors and the revised manuscript has been proofread by a native English speaker.

Q6: Authors can use latest related works from reputed journals like IEEE/ACM Transactions, MDPI, Elsevier, Inderscience, Springer, Taylor & Francis etc. and write the references in proper format, from year 2020-21.

Author response: Thanks for these insightful comments. The revised manuscript includes several citations from the current year 2021. The references are formatted per PLOS ONE requirements. The citations include those published in reputed journals like IEEE, Elsevier, Springer, and MDPI.

Q7: The authors seem to disregard or neglect some important finding in results that have been achieved in paper. So, elaborate and explain the results in more details. Improve the results and discussion section in paragraph.

Author response: Thanks for these valuable comments. We made sure to discuss the results obtained in every step of this systematic study, with statistical significance analysis. We also performed qualitative analyses to interpret the learned behavior of the trained models and the ensemble.

Q8: The conclusion should state scope for future work.

Author response: Thanks. We have discussed the scoped for future work in lines 453 – 459 in the revised manuscript.

Submitted filename: Response to Reviewers.docx

Decision Letter 1

PONE-D-21-32872R1

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/ , click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at gro.solp@gnillibrohtua .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact gro.solp@sserpeno .

Additional Editor Comments (optional):

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

2. Is the manuscript technically sound, and do the data support the conclusions?

3. Has the statistical analysis been performed appropriately and rigorously?

4. Have the authors made all data underlying the findings in their manuscript fully available?

5. Is the manuscript presented in an intelligible fashion and written in standard English?

6. Review Comments to the Author

Reviewer #1: Authors have addressed all the concerns. The research work should be shared with the science community.

Reviewer #3: All the comments made by the reviewers are addressed well by the authors.

No further comments are required.

7. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

Reviewer #1:  Yes:  Sharnil Pandya

Acceptance letter

21 Dec 2021

Dear Dr. Rajaraman:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact gro.solp@sserpeno .

If we can help with anything else, please email us at gro.solp@enosolp .

Thank you for submitting your work to PLOS ONE and supporting open access.

PLOS ONE Editorial Office Staff

on behalf of

Dr. Thippa Reddy Gadekallu

  • Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

loss function literature review

Your purchase has been completed. Your documents are now available to view.

A review of small object and movement detection based loss function and optimized technique

The objective of this study is to supply an overview of research work based on video-based networks and tiny object identification. The identification of tiny items and video objects, as well as research on current technologies, are discussed first. The detection, loss function, and optimization techniques are classified and described in the form of a comparison table. These comparison tables are designed to help you identify differences in research utility, accuracy, and calculations. Finally, it highlights some future trends in video and small object detection (people, cars, animals, etc.), loss functions, and optimization techniques for solving new problems.

1 Introduction

Object detection is a computer technology method that is connected to object recognition and networks. It can recognize instances of specific semantic item categories (for example, people, buildings, or cars) in computer-generated pictures and videos [ 1 ]. In-depth object recognition research may be shown in the recognition of faces and pedestrians. Object recognition is used in many image-processing applications, including image search and video surveillance [ 2 ]. Each object class contains distinguishing characteristics that help in categorizing the class. For example, all circles are circular. These specialized procedures are employed in the identification of object classes. When looking for a circle, for example, you are looking for anything that is at a certain distance away from the point (i.e., the center). Items that are upright at the corners and have the same side length are also necessary while looking for a square. A similar method is used for facial recognition, which can identify the eyes, nose, and mouth as well as skin color and the distance between the eyes [ 3 , 4 ]. The challenge of anticipating the types and locations of distinct things present in a picture is known as a realm of image processing, there is a difficulty with object recognition. In contrast to classification, each instance of an object is recognized in the object recognition task, so object recognition is a measurement task for each instance. In contrast to classification, each instance of an object is recognized in the object recognition task, so object recognition is a measurement task for each instance as the scale-invariant feature transform [5] and the histogram orientation gradient [ 6 ].

A remote sensing imaging study has gained a lot of interest as the remote sensing technology has advanced. Simultaneously, the identification of ships and airplanes using optical remote sensing images [ 7 ] is significant in a wide range of applications [ 8 ]. Over the last 10 years, many regional convolutional neural network (R-CNN) techniques [ 9 ], particularly faster R-CNN [ 10 ], have been utilized in the high-resolution identification of high-resolution items in the PASCAL VOC dataset. They cannot detect extremely small things information in general because complex pictures are hard to evaluate due to their low construction and appearance [ 11 ]. Objects in optical distant sensing pictures, on the other hand, typically have smaller characteristics, which bring more problems than traditional object detection, and there are still few good solutions [ 12 , 13 ]. There have been some attempts to overcome the issue of small object detection (SOD). By simply raising the input picture resolution, it is simple to enhance the resolution of the object’s fine details, which generally results in a substantial investment in training and testing [ 14 , 15 ]. Another method aimed to create a multi-scale representation that combined multiple functions at a lower level to expand the function at a higher level, thereby naively magnifying the size of the function [ 16 , 17 ].

Video motion detection is a function of IP cameras, recording software and NVR used to trigger an alarm by detecting physical movement in a designated area. In real time, the data from the current image is compared to the data from the previous image. Every major change triggers a camera warning [ 18 ]. This alarm can be used to trigger many operations, such as sending current real-time images via email, tilt or move the camera at a certain point, control external devices (such as turning on lights or beeping) etc. and use still images for regular object recognition. etc. Use still images for regular object recognition. There is increasing recognition of video objects (VOs), autonomous driving [ 19 ], and video surveillance [ 20 , 21 ]. In 2015, the use of video for object recognition became a new challenge for the surveillance [ 22 ]. The Image Net Visual Recognition Challenge is a large-scale visual recognition competition (ILSVRC2015). With the help of ILSVRC2015, the research on VO identification has progressed. Identifying things in each frame was one of the first attempts to recognize VOs.

This work studies and analyses the abovementioned network-based small target method, video detection, loss function, and optimization methods at different stages.

The main objective of this study is to give a summary of the studies on the identification of small objects and video-based networks. The first topic covered is the detection of small objects and VOs, as well as a study on current technology. The classification and description of the detection, loss function, and optimization strategies are presented as a comparison table.

2 Literature review

There are various approaches present for small object and movement detection. Some of the important literature that covers more important object detection is discussed below.

Chen et al. [ 23 ] proposed using deep learning to identify small objects. This study starts with a short overview of the four pillars of microscopic item identification: multi-scale rendering, contextual information, super-resolution, and range. Then, it offers a range of modern datasets for detecting small objects. Furthermore, current micro-object detection systems are being studied with an emphasis on modifications and tweaks to improve detection efficiency, in comparison to conventional object recognition technologies.

Ren et al. [ 24 ] studied how to tackle the challenge of employing remote sensing technology to identify tiny objects in optical imaging, and an enhanced faster R-CNN approach was developed. As a consequence of common characteristics, the studio built a comparable architecture that used downlink and avoided the use of connections to produce a single high-resolution, high-level feature map. This is critical so that we can view all the identified items.

Huang et al. [ 25 ] created a model for recognizing prominent objects in hyperspectral pictures on wireless networks, thereby using visibility optimization to CNN characteristics. The model first uses a two-channel CNN to extract the spatial and spectral properties of the same measurement and then employs functional combinations to produce the final bump map, which optimizes the bump value of the foreground and foreground signals. The CNN function is used to compute the background.

Hua et al. [ 26 ] proposed a real-time object recognition framework for cascaded convolutional networks using visual attention mechanisms, convolutional storage network inference methods, and semantic object relevance, combined with the fast and exact functions of deep learning algorithms, and performed ablation and comparative experiments. By testing the cascade network introduced in this study, different datasets can be used and more complex detection results can be obtained.

Yundong et al. [ 27 ] proposed a new method, that is, multi-block SSDs that add sub-layers to detect and extend local context information. The test results of multiple SSDs and conventional SSDs are compared. The algorithm shown increases the detection rate of small objects to 23.2%.

Bosquet et al. [ 28 ] proposed STDnet and ConvNet to identify tiny objects with a size of less than 16 × 16 pixels based on regional ideas. STDnet relies on an additional visual attention process called RCN, which chooses the most likely candidate area, consisting of one or more tiny items and their surrounding RCN feeds are more accurate and economical, improving accuracy while conserving memory and increasing the frame rate. This study also incorporates automated k-means anchoring, which improves on traditional heuristics.

Kunfu et al. [ 29 ] proposed a fully integrated framework for identifying objects in any orientation in remote sensing pictures. The web provides a functional aggregation architecture for obtaining functional representations for ROI discovery and ROI provision. The combination of quality recommendations and ROI-O is used to process recommendations for effective implementation.

Zheng et al. [ 30 ] introduced a new framework for large-scale target recognition, namely, HyNet for MSR remote sensing imaging, which opens up a new avenue for research of the depiction of scale-invariant functions. Display zoom functions are elements with pyramid-shaped detection areas, which are used to detect objects more accurately with multiple scales in MSR remote sensing images.

Tian et al. [ 31 ] provided a 3D recognition network that can provide a wide range of local functions from images, BEV maps, to point clouds. The adaptive merging network provides an effective method to merge multi-mode data functions. Whenever a vast number of objects appear, the adaptive weighting component restricts the intensity of each signal and chooses information for further evaluation, while the spatial fusion module includes azimuth and geometry info.

Li et al. [ 32 ] reported that PDF-Net is an optical RSI-specific SOD network that may employ mapping and cross-path data, as well as multi-resolution features, to efficiently and accurately identify outgoing objects of various sizes in optical RSIs. PDF-Net has always outperformed the modern SOD method in the ORSSD dataset in terms of visual comparison and quantification. Furthermore, ablation analysis verified the efficacy of the main components.

Fadl et al. [ 33 ] proposed a system that uses spatio–temporal information and fusion of two-dimensional convolutional neural networks (2D-CNNs) to detect inter-frame operations (delete frames, insert frames, and copy frames). RBF-Gaussian support vector machine (SVM) is utilized in the classification phase before automatically extracting depth characteristics.

Zhu et al. [ 34 ] outlined the approaches that have been discovered thus far for detecting VOs. This research examines the available datasets, scoring criteria, and provides an overview of the various classes of deep learning-based methods for identifying VOs. Depending on how time and space information is used, detection methods have been developed. These categories include flow-based technology, LSTM, nursing technology, and follow-up technology.

Alhimale et al. [ 35 ] researched and successfully developed a fall detection system that can fulfill the demands of the elderly (especially indoors). As a result, our video-based fall detection system decreases the likelihood that older individuals will be concerned about falling and will limit their activities at home or in solitude. Furthermore, fall detection systems have been created to preserve people’s privacy, even when their everyday activities are dangerous, by tracking in real-time.

Lee et al. [ 36 ] proposed a new method using advanced neural network ART2 to detect scene changes. To capture the smooth interval, the suggested technique extracts the CC sequence from the video and then generates a gray-scale variance sequence. A typical progressively shifting local minimum sequence will develop during this procedure. It will be deleted from the softbox after being recovered by our local minimum detection method. Then, the resulting smooth intervals are combined to form a new sequence. From the new sequence, feature components such as pixel differences, histogram differences, and correlation coefficients can be extracted.

Kousik et al. [ 37 ] developed a deep learning problem-solving model that uses a new framework to combine CNNs with repetitive neural networks to discover the value of videos. By using recursive convolutional neural network (CRNN) to record time, space, and local restricted features to complete the task of finding obvious objects in the dynamic reference video dataset. Compared with conventional video recognition methods, the evaluation based on the reference dataset has advantages in accuracy, F -measure, mean absolute error, and calculation amount.

Xu et al. [ 38 ] presented a unique video smoke detection system based on a deep distribution network. The goal of bump detection is to emphasize the most important parts of things in a photograph. To generate realistic smoke highlights, outbound CNNs at the pixel and object levels are merged. For use in video smoke detection, an end-to-end architecture for recording departing smoke and predicting the existence of smoke is given.

Yang et al. [ 39 ] described a narrowband Internet of Things (NB-IoT) based digital video intrusion detection method, and an NB network-based digital video intrusion detection system was constructed. Intelligent categorization is accomplished through the usage of IoT and the SVM algorithm. The classification time, accuracy, and false alarm rate of the model were examined. The classification time is 40.80 s, the shortest is 27 s, the recognition rate is 87.60%, and the worst is 83.70%. The false detection rate may reach 15%, but it is always less than 20%, demonstrating that the classification system is reliable and accurate.

Yamazaki et al. [ 40 ] proposed a method for autonomously identifying surgical tools from video footage during laparoscopic gastrectomy. Validation has been performed on a unique automated approach based on the open-source neural network framework YOLOv3 for detecting surgical instrument operation in laparoscopic gastrostomy videotapes.

Yue et al. [ 41 ] used YOLO-GD (Ghost Net and Depth wise convolution) to detect the images of cups, chopsticks, bowls etc., and capture the different types of dishes ( Table 1 ).

Comparative study of SOD as well as movement detection technique

Author Year Methodology Advantage Accuracy
Guang Chen, Zida Song 2020 DL-based SOD The network benefits from integrating multi-resolution context information from available modules rather than directly from a single layer, resulting in more efficient computing. Accuracy could be significantly improved.
Yun Ren, Changren Zhu 2018 Modified faster R-CNN method Information extended up to some extent. MER gather more region than bounding box method.
Chen Huang, Tingfa Xu 2020 Hyper spectral images in wireless network Less noise. Improves its accuracy and efficiency.
Xia Hua, Xinqing Wang 2020 A cascaded CNN Has high accuracy and adaptability.
Yundong LI, Han DONG 2019 Multi-block SSD SSD offers two benefits: real-time processing and excellent precision. The SSD technique yields an overall accuracy of 96.6%.
Brais Bosquet, Manuel Mucientes 2020 STDnet Enhancements to speed and memory optimization. Small-item detection accuracy using CNNs falls behind other bigger things.
Kun Fu, Zhonghan Chang 2020 Rotation-aware and multi-scale CNN
Zhuo Zheng, Yanfei Zhong 2020 HyNet framework Improves the scale-invariant properties, making it ideal for object recognition in large-scale MSR remote sensing pictures. Improves multi-scale item recognition accuracy in HSR remote sensing images.
Yonglin Tian, Kunfeng Wang 2020 Adaptive azimuth-aware fusion network The advantage of this technique is that it makes use of point cloud data and the fusion style.
Chongyi Li, Runmin Cong 2020 A parallel down-up fusion network In optical RSIs, the combination of various resolutions has a benefit in addressing size variation and seeing. Increases the generalization capabilities and accuracy of networks.
Sondos Fadl, Qi Han 2020 CNN spatiotemporal features and fusion Reduces the number of post-processing steps. CNN system achieves high accuracy.
Laila Alhimale, Hussein Zedan 2014 video-based fall detection system using a NN A trustworthy, accurate, user-friendly, and low-cost fall detection system that protects the privacy. High sensitivity in detecting falls (greater than 90%).
Man-Hee Lee, Hun-Woo Yoo 2006 Improved ART2 The benefit of quick detection since no decompression time is required. This method has a high degree of accuracy.
Nalliyanna V, Kousik 2020 Hybrid CRNN The proposed approach is successful in terms of accuracy as well as speed.
Gao Xu, Yongming Zhang 2019 Video smoke detection based on DSNN CNN was used to automatically learn the interaction between several low-level saliency signals and to capitalize on the information gained from these classical saliency detections. Achieves the best result in terms of existing forecast accuracy.
Aimin Yang, Huixiang Liu 2020 A digital video intrusion detection method Consistent performance, minimal power usage, and dependable data. Low accuracy.
Yuta Yamazaki, MD, Shingo Kanaji 2020 Automated surgical instrument detection Complex image analysis for medical professionals.
Xuebin Yue 2022 YOLO-GD (GhostNet and depth wise convolution) Function of detection dish will lead to further development. Higher accuracy.

The above comparison table represents some small objects as well as movement detection techniques. Compared to the above techniques the Multi-block SSD approach achieves 96.6% percent overall accuracy, while CNN spatiotemporal features and fusion for surveillance video forgery detection yields excellent accuracy.

3 Studies related to SOD

Increasing picture capture resolution.

Increasing the input resolution of the model.

Using tiling on the pictures.

Increasing data generation through augmentation.

Model anchoring for self-learning.

Eliminating superfluous classifications.

Figure 1 specifies the simplest way of detecting small objects.

Figure 1 
               Structure of SOD.

Structure of SOD.

Zhang et al. [ 44 ] proposed the boundary-aware high-resolution network (BHNet), which is a novel protruding item-detecting technique. BHNet is intended to be a parallel architecture. It allows for high-resolution information extraction from low-level functions, which is reinforced by various semantics, using a parallel architecture with a low resolution. There are also several multipath channel estimators and region extenders that capture more precise context-sensitive layer functionalities. To track the borders of visible objects, a loss function is given, which can assist us in determining precise detection bounds. BHNet is a specialist at locating exceptional items with powerful functions for extracting numerous characteristics.

Liang et al. [ 45 ] provided a context-sensitive network for identifying outgoing RGB-D objects. The suggested approach is divided into three components: feature extraction, multi-mode context fusion, and context-sensitive expansion. The first component is in charge of determining hierarchical functions based on color and depth. CNN was used in each photograph. The second component employs an LSTM version to include additional characteristics to represent multimodal spatial correlation in context. Experiment findings with two publicly accessible reference datasets demonstrate that the suggested technique is capable of providing the most recent performance for recognizing significant stereo RGB-D objects.

Kumar and Srivastava [ 46 ] developed an object identification method that recognizes things in pictures using deep learning neural networks. To obtain high target detection accuracy in real-time, this study integrates the Single Shot Multi-Block Detection method with faster CNN. This method is appropriate for both still pictures and videos. The proposed model’s accuracy is greater than 75%. This model takes around 5–6 h to train. To extract information from visual characteristics, this model employs a CNN. The class names are then classified using function mapping. This technique, by default, employs distinct filters with various frames to remove aspect ratio discrepancies, as well as multi-scale feature maps for object recognition.

Jiao et al. [ 47 ] developed a new network for object identification, RFP-Net. RFP-Net was the first to apply the RF and eRF concepts to generate bids based on regions. The RF from each sliding window is used as a reference frame in this technique, and the eRF range is used to filter out low-quality phrases. In addition, we developed an eRF-based matching technique to identify positive and negative samples trained by RFP-Net, therefore addressing the imbalance between positive and negative samples as well as the scaling problem in object recognition.

Liang et al. [ 48 ] proposed a multi-style attention fusion network (MAFNet). MAFNet, in particular, is made up of a dual signal spatial attention (DSA) module, an attention middle presentation module, and a dual service module (DAIR). He used a multi-level service function merging module and advanced channel attention module (HCA and MLFF). DSA seeks to increase low-level performance while filtering out background noise. DAIR utilizes two branches to adaptively integrate spatial and semantic information from intermediate layer functions. HCA reserves the block’s high-level semantic characteristics via two distinct channel operations. The abovementioned multi-level functions are successfully integrated in a trainable manner by MLFF.

Liu et al. [ 49 ] presented image processing-based integrated traffic sign recognition. Color-based techniques, shape-based methods, color and shape-based methods, LIDAR, and machine learning are the five primary inspection methods studied in this study. To comprehend and summarize the mechanics of different techniques, the methods in each category are also split into distinct sub-categories. Some of the comparison techniques have been implemented in some updated methods that are not compared in public records.

Pollara et al. [ 50 ] described different ways of detecting and monitoring low-cost, low-power devices using certain hydrophones. The ship’s acoustic properties were thoroughly examined to establish its physical specifications. These variables can be used to categorize ships. The Stevens Acoustic Library is a collection of acoustic instruments.

Wang et al’s. [ 51 ] study is broken into two sections: A data collection based on the drone’s point of view is developed and a variety of approaches are utilized to detect tiny objects. Through a series of comparative experiments, a machine learning technique based on SVM and a deep learning method based on the YOLO network were effectively constructed. We can see that the SVM-based machine learning method uses less computer resources and saves time. However, due to the selection of the region of interest, it is impossible to enhance accuracy and dependability in some particular scenarios. Deep learning based on neural networks, on the other hand, can give more accuracy.

Xue et al. [ 52 ] presented an improved approach for identifying small things, which improves the performance of different scales and integrates contextual semantic information across them. The results of tests on the large MS COCO dataset show that this method can improve the accuracy of small object identification while staying reasonably quick.

Zhiqiang and Jun [ 53 ] introduced CNN-based object recognition, CNN structure, features of CNN-based object recognition structure, and methods to improve recognition efficiency. CNN has a powerful feature extraction function, which can make up for the inconvenience caused by using it. Compared with traditional real-time methods, CNN also has more advantages, accuracy, and adaptability, but there is still room for improvement. This can reduce the loss of functional information, make full use of object relationships, and context and fuzzy inference can help computers deal better with issues such as occlusion and low resolution.

Elakkiya et al. [ 54 ] gave an idea of how the cervical lesions can be found and categorized. The proposed method used the tiny object identification mechanism to identify the cervical closure from the colposcopy pictures because the cervical cells are much smaller than the uterine cells. The proposed strategy also used Bayesian optimization to optimize the SOD-GAN’s hyper parameters, which reduced time complexity and improved performance in terms of efficient classification. The proposed improved SOD-GAN uses eight alternative colposcopy images as inputs and eight randomly generated noise images as outputs to produce the right colposcopy image.

Ji et al. [ 55 ] combined the YOLOv4 with two other approaches which are multi-scale contextual information and Soft-CIOU, and called it as MCS-YOLOv4. Extra scales were added to the approach to gain definite data. The authors also encompassed the perception block within the structure of the model.

Sun et al. [ 56 ] talked about real time detection of small objects especially for the moving vehicles. The approach was to gain better results from less deeper networks and by assigning the weights to the feature gained in a such a way so as to have better quantifying results ( Table 2 ).

Comparative study of SOD

Author Year Methodology Advantage Object type Accuracy
Xue Zhang, Zheng Wang 2020 Boundary-aware high-resolution network The conspicuous object is successfully detected Salient object
Fangfang Liang, Lijuan Duan 2020 RGB-D The capacity of these structures to enhance feature maps is given an edge. People, cars, building, and animals
Ashwani Kumar, Sonam Srivastava 2020 CNN Multi-box detection at various layers gives different outcomes Salient object Achieves 65% accuracy only
Lin Jiao, Shengyu Zhang 2020 RFP-Net Salient object Largely improving the accuracy
Yanhua Liang, Guihe Qin 2020 MAFNet To some extent efforts have been made to get worldwide information People and cars Improve accuracy while reducing network space redundancy
Chunsheng Liu, Shuang L 2019 Machine vision-based traffic sign detection Salient object Highest accuracy of 88%
Alexander Pollara, Dr. Alexander Sutin 2017 Small boat detection, tracking, and classification People and cars
Jianhang Wang, Sitan Jiang 2019 Comparison of small objects High detection accuracy
Zhijun Xue, Wenjie Chen 2020 Enhancement and fusion of multi-scale feature maps Increase the network’s size People and cars Enhance accuracy while minimizing network redundancy
Wang Zhiqiang, Liu Jun 2017 CNN Reduce the loss of feature information People, cars, buildings, and animals Excellent outcomes in terms of accuracy and speed
Elakkiya R, Teja KS, Jegatha Deborah L, Bisogni C, Medaglia C 2021 RCNN Reduced the complexity and improved the efficiency Microscope-based object Highest accuracy of 97.08%
Ji, Shu-Jun, Qing-Hua Ling, and Fei Han 2023 Yolov4 Multi-scale contextual information and Soft-CIOU Small object 65.7 at AP50
Wei Sun, Liang Dai, Xiaorui Zhang, Pengshuai Chang, Xiaozheng He 2021 RSOD Traffic monitoring Small object 52.7 at mAP50

The table above compares several approaches for tiny item identification. In comparison to the preceding approaches, RFP-Net, the object detection technique, employs a receptive field-based proposal generation network, which results in significantly improved accuracy.

4 Studies related to moving object detection

VO detection [ 57 ] is the task of detecting VOs instead of images. VO are free-format video clips with semantic meaning. A two-dimensional snapshot of a VO at a certain point in time is called the video object plane (VOP). VOP is determined by its texture (luminance and chroma values) and shape.

4.1 Methods for detecting objects in videos

As seen in Figure 2 , VO detectors may be categorized as streaming based on how they use temporal dependencies and aggregate attributes generated from video clips, LSTM [ 58 ], due diligence [ 59 ], and subsequent detectors. These methods of VO detection are shown schematically in Figure 2 [ 60 ].

Figure 2 
                  Categories of VO detection.

Categories of VO detection.

4.2 Video forgery detection

Activity removal: removing the frames in question using frame deletion.

Activity addition: to introduce a foreign video from some other video, frame insertion is used.

Activity replication: the process of repeating an event by using frame duplication.

Figure 3 
                  Inter-frame forgeries.

Inter-frame forgeries.

Salvadori et al. [ 62 ] reduced the transmission capacity of uncompressed video streams and thereby boosted frame rate using a low-complexity approach based on background removal and error recovery technologies. JPEG is a modern solution. The findings of this study will be taken into account while designing next-generation smart cameras for 6LoWPAN.

Amosov et al. [ 63 ] proposed to employ a set of deep neural networks (DNNs) to develop an intelligent context classifier that can recognize and discriminate between regular and critical occurrences in the security service system’s continuous video feed. Their artworks are examined by utilizing cutting-edge technologies. A probability score for each video segment is the outcome of computer vision and software technologies. To identify and detect normal and abnormal situations, a Python software module was built.

El Kaid et al. [ 64 ] proposed a CNN model, which can be used to minimize the false alarm rate, because we can delete 98% of images of someone in a wheelchair, and can more or less reduce false alarms by 17%. However, there are numerous false positives in the blank space image, and none of the evaluated CNN models can identify them owing to the image’s complexity. As a result, another concept should be considered in this study to increase the accuracy of the fall detection system.

Najva and Bijoy [ 65 ] presented a unique method for detecting and categorizing objects in movies, which uses a tensor function and SIFT to categorize items detected by a DNN. DNN, like the human brain, is capable of analyzing massive quantities of high-dimensional data with billions of variables. The results of this study show that the proposed classifier and most of the existing techniques for feature extraction and classification combine SIFT and tensor features.

Yan and Xu [ 66 ] proposed a straight-through pipeline for video caption detection. To recognize video subtitles, the Connected Text Proposal Network (CTPN) is utilized, while the residual network (ResNet), gated recurrent unit (GRU), and connected time classification (CTC) are used to detect Chinese and English subtitles in video pictures. First, use the CTPN technique to determine the subtitle region in the video picture. The identified subtitle range should then be pasted into ResNet to extract the function sequence. Then, add a bidirectional GRU layer to represent the feature sequence.

Wu et al. [ 67 ] proposed a straight-through pipeline to detect video captions. To recognize video subtitles, the CTPN is utilized, while the ResNet, GRU, and CTC are used to detect Chinese and English subtitles in video pictures. To begin, identify the subtitle region in the video picture using the CTPN technique. After determining the subtitle range, use ResNet to extract the function sequence. After that, add a bidirectional GRU layer to represent the feature sequence.

Fang et al. [ 68 ] introduced a Deep Video Saliency Network (DevsNet), a new deep learning platform with which the meaning of video streams can be determined. DevsNet is primarily made up of two parts: 3D convolutional network (3D-ConvNet) and bidirectional long-term and short-term memory convolutional networks. (BConvLSTM). 3D-ConvNet aims to examine short-term spatio–temporal information, while B-ConvLSTM examines long-term spatio–temporal attributes.

Wang et al. [ 69 ] proposed a completely scalable network with a communication structure for high-precision VO recognition and cost-effective computation. The scale recognition module, in particular, is added to acquire characteristics with bigger alterations. The ROI structure module retrieves and combines RoI’s location and context functions. Feature aggregation is also used to improve the performance of the reference frame by deforming the flow. SCNet’s efficacy has been demonstrated through several trials. In our RoI module, you may add another auxiliary branch with a paired structure for invoking RoI functions, similar to the local function block in BConvLSTM. In addition, SCNet now mainly controls accuracy, so there is still a lot of room for speed improvement.

Zhu and Yan [ 70 ] proposed traffic sign recognition using YOLOv5 and compared with SSD with some extended features ( Table 3 ).

Comparative study of moving object detection

Author Year Methodology Advantage Accuracy
Claudio Salvadori, Matteo Petracca 2013 Video streaming 6LoWPAN Reduce the number of discarded packets
O.S Amosov, S.G. Amosova 2019 Ensemble of DNN stream of the security system Accuracy 96% for 80 epochs
Amal EL KAID, Karim BA¨ INA 2019 Video detection algorithm by CNN Improve its accuracy
Najva N, Edet Bijoy K 2016 SIFT and DNN The major benefit of SIFT is that they are resistant to distortion, noise addition, and changes in light It requires high accuracy
Hongyu Yan, Xin Xu 2020 DRNN Its advantage is more flexible 89.2% recognition accuracy
Peng Wu, Jing Liu 2020 Fast Sparse Coding Networks Encoding-decoding neural networks have several advantages, including efficient inference Higher accuracy
Yuming Fang, Chi Zhang, 2020 DevsNet
Fengchao Wang, Zhewei Xu 2020 SCNet High-precision and low-cost calculation
Yanzhao Zhu 2022 Traffic sign recognition by CNN Improve its accuracy

The above comparison table represents some moving object detection techniques.

5 Studies related to loss function

In object recognition tasks, the loss function is the most important element in determining identification accuracy. First, the connection between location and classification is established by multiplying the factor based on IoU by the classification loss function’s typical cross-entropy loss [ 71 ]. The square mistake represented by the root (MSE) [ 72 ] is the main force of the basic loss function. It is simple to comprehend and apply, and it works effectively in most cases. Take the difference between the forecast and the ground truth, blockage, and the average of the whole dataset to compute the MSE. In statistics, the loss function is frequently used to estimate parameters, and the event in question is a function of the difference between the estimated and true values of the data instance. Abraham Wald reintroduced statistics in the middle of the 20th century, reintroducing this concept is as old as Laplace [ 73 , 74 ]. For example, in an economic context, this is usually economic loss or regret. In classification, this is the penalty for misclassifying the example. In actuarial science, especially after Harald Kramer’s work in the 1920s, it is used in the insurance industry to model premium payment models. The model manages the Loss which is the price of not meeting expectations, in the best way. Loss is the price of not meeting expectations. In financial risk management, this function is allocated to monetary loss [ 75 – 77 ]. Some important studies covering the more important objective-based loss function research are discussed below.

Fang et al. [ 78 ] proposed a hostile network based on conditional patches, which uses a generator network based on sampled data patches and a conditional discriminator network with additional loss functions to check fine blood vessels and coarse data. Experiments will be conducted on the public STARE and DRIVE datasets, showing that the proposed model is superior to more advanced methods.

Fan and Liu [ 79 ] investigated GAN training with various combination techniques and discovered that synchronization of the discriminator and generator between clients offers the best outcomes for two distinct challenges. The study also discovered empirical results indicating that federated learning is typically resilient for the number of consumers having IID learning data and modest non-IID learning data. However, if the data distribution is significantly skewed, the existing compound learning scheme (such as FedAvg) would be anomalous owing to the weight difference.

Liu et al. [ 80 ] proposed a model based on a two-layer backbone architecture, it provides end-to-end pose estimation at the 6D category level to detect bounding boxes. In this scenario, the 6D posture is created straight from the network and ensures that no further steps or post-processing are needed, such as Perspective-n-point. Our loss function and CNN’s two-layer architecture make collaborative multi-task learning quick and effective. This study increases posture estimation accuracy by substituting completely linked layers with fully folded layers. Transform your pose estimation challenge into a classification and regression problem with the aid of our network, which are termed as Pose-cls and Pose-reg.

Sharma and Mir [ 81 ] developed a unique technique for segmenting VOs using unsupervised learning. The process is divided into two stages, each of which considers the basic frame and the current frame for segmentation. We build dense region clauses, bounding boxes, and scores in the first step. Following that, we develop a feature extraction technique that utilizes the attention network for feature encoding. Finally, using the Softmax technique, these functions are scaled and combined to generate object segmentation.

Liu et al. [ 82 ] proposed a continuous deep network based on mixed sampling and mixed loss computation to detect salient items. Not only the hybrid sampling may integrate original and sample features but it can also acquire a wider receiving field using horrible convolution. The hybrid loss function, which combines cross-entropy loss and area loss, can further minimize the gap between the salient map and the terrain’s realism. A fully linked CRF model might be used to increase spatial coherence and contour placement even further.

Steno et al. [ 83 ] attempted to enhance the accuracy of threat localization and minimize detection time by employing a quicker and better R-CNN (with a suggested network divided by region). The planned network by area has been modified to make it simpler to discover things using the new docking box design. Improved RPN can give a more comprehensive summary of characteristics. Furthermore, by including sample weights into the classification loss function, an enhanced cross-entropy function is created, which improves the classification deficit and the multi-task loss function’s performance. In MATLAB, the average accuracy is improved to 0.27, the average processing time is lowered, and the average processing time is increased by 0.27.

Gu et al. [ 84 ] proposed better lightweight detection using Context Aware Dense Feature Distillation. And use rich contextual feature for SOD ( Table 4 ).

Comparative study of loss function

Author Year Methodology Advantage Accuracy
Yunchun Fang, Yilu Cao, Wei Zhang, Quilong Yuan 2019 Dual networks for attribute prediction Improve its accuracy
Chenyou Fan, Ping Liu 2020 Federated generative adversarial learning The capacity to forecast a trajectory with high accuracy in a limited number of trials 90% accuracy
Fuchang Liu, Pengfei Fang 2018 Recovering 6D object pose from RGB Achieve increased accuracy
Vipul Sharma, Roohie Naaz Mir 2020 SSFNET-VOS Accuracy is improved while computing complexity is kept to a minimum
Zhengui Liu, Jiting Tang, Peng Zhao 2019 Hybrid upsampling and hybrid loss computing Less accuracy
Priscilla Steno, Abeer Alsadoon 2020 DNN Improve the accuracy
Lingyun Gu 2023 Context-aware Dense Feature Distillation (CDFD)

The above comparison table represents some loss functions and their calculation techniques. Compared to the above techniques federated generative adversarial learning produces a higher accuracy and has the advantage of accurate trajectory prediction with few attempts.

6 Studies related to optimization technique

In the network, optimization methods are employed to minimize a function known as the loss function or error function. The optimization approach may generate the smallest difference between the actual output and the predicted output by minimizing the loss function, allowing our model to accomplish the task more correctly.

Dumitru et al. [ 85 ] suggested an edge detector, which was compared against one of the most sophisticated techniques, the “Tricky Edge” detector. Our edge detection methodology combines particle swarm optimization with monitored optimization of cellular machine rules. We developed transferable rules that may be used for a variety of pictures with comparable features. On average, the recommended approach outperforms Canny in our advanced dataset.

Huang et al. [ 25 ] proposed a model for detecting prominent items in hyperspectral pictures on wireless networks, which employs visibility optimization to the characteristics of CNN. To define the ultimate melting behavior, to extract spatial and spectral characteristics of the same size, we first use a CNN with two channels. By maximizing the bump values of the foreground and background signals from the CNN characteristics, the final bump map is generated. The findings of this study show that the approach is effective and performs well in the creation of hyper spectral pictures.

Sasikala et al. [ 86 ] used a classifier in conjunction with an optimal model. Even with hundreds of blood vessel pictures, this experimental model outperforms previous detection techniques. This hybrid and adaptive optimization approach based on rhododendron search produces the greatest results in dynamic regions affected by the ocean, and the findings indicate a reduction in the false alarm rate of ports and other coastal surveillance locations.

Jain et al. [ 87 ] presented a novel social media-based whale optimization algorithm for identifying N thought leaders by analyzing user reputation using various popular Internet optimization functions. The approach is effective for identifying opinion leaders since it is based on humpback whale hunting behavior with bubble nets. As the number of users on the network grew, the algorithm determined the optimal option. As a consequence, the method’s total complexity remains constant. We also offered a novel community classification method based on the similarity index, which contains the clustering coefficient and the similarity of neighbors as important components. Local and worldwide opinion leaders were identified by using priorities and recommended methods and optimization features. We applied the suggested method to real-world and large-scale datasets and compared the outcomes in terms of precision, accuracy, recall, and F 1 score.

Rammurthy and Mahesh [ 88 ] recommended the Whale Harris Hawks Optimization (WHHO) technique to identify brain cancers using magnetic resonance imaging. For segmentation, we employed cellular automata and approximation set theory. Furthermore, characteristics such as tumor size, local optical orientation pattern, mean, variance, and kurtosis are retrieved from sections. Furthermore, brain tumor identification is performed using a deep CNN, while training is performed utilizing the suggested WHHO. The Whale Optimization Algorithm and the Harris Hawks Optimization Algorithm were combined (HHO). According to WHHO, deep CNN recommends utilizing alternative techniques with a maximum accuracy of 0.816, a maximum specificity of 0.791, and a maximum sensitivity of 0.974.

Zhang et al. [ 89 ] proposed the community detection based on whale optimization (WOCDA) method as a novel community discovery technique. WOCDA’s initialization strategy and three optimization operations simulate humpback whale hunting behavior and determine the community in experiments of synthetic and real networks, demonstrating that the community ratio algorithm identified by WOCDA can be detected in modern meta-heuristics in most cases. WOCDA’s efficacy, however, declines as the number of nodes in the network grows, because the random search process takes a long time until a big search space is reached.

Luo et al. [ 90 ] suggested a unique multi-scale and target vehicle recognition approach for identifying complex vehicles in natural situations. We improve the image of the dataset by utilizing the Retinex-based adaptive image correction approach to reduce the influence of shadows and highlights. This study describes a multi-layer feature extraction approach that explores the neural architecture for the best connection between layers, increasing the representation of the fundamental properties of the quicker R-CNN model and aims to analyze performance of multi-scale vehicles. We provide a target feature enhancement approach that integrates multi-layer feature information and context information from the final layer after the layers are connected to enrich the target information and improve the model’s reliability in recognizing big and small targets ( Table 5 ).

Comparative study of optimization technique

Author Methodology Advantage Sensitivity Specificity Precision Recall -measure Accuracy
Delia Dumitru, Anca Andreica Cellular automata rules optimized Capable of discovering a specific rule for a picture or group of images Good Medium Medium Good
Chen Huang, Tingfa Xu CNN and saliency optimization Less noise Good Good Medium Good
J. Sasikala, D. S. Juliet The optimized hybrid adaptive cuckoo search algorithm Good Good Good Good Good
Lokesh Jain, Rahul Katarya Whale optimization algorithm Better performance Good Medium Good Medium Medium
D. Rammurthy, P. K. Mahesh WHOA-DNN Good Good Medium Good
Yun Zhang, Yongguo Liu WOCDA Good Good Good Good
Delia Dumitru, Anca Andreica R-CNN with NAS optimization Good Good Good Good

The above comparison table represents some optimization techniques. Compared to the above optimization techniques, salient object identification on hyperspectral pictures in wireless networks utilizing CNN and saliency optimization results in improved accuracy and efficiency, as well as the benefit of fewer noise.

7 Conclusion

This study reviewed different small object and movement detection, loss functions, and optimization techniques. This approach is used to increase the small object in addition to movement detection with new ideas. In this study, there are 84 research articles with the same background as this article. Articles were selected from various journals. Through the overview and reference section of the previous research articles, individual articles were selected to study the previous literature. The selected research supports the detection of smaller moving objects through performance analysis, loss functions, and optimization techniques. After careful analysis of the previous work, some landmark articles were selected for research, which may be useful for this research.

8 Future scope

Over the past few years, the communities of computer vision and pattern recognition have paid a lot of attention to object detection in images and videos. Although we have created numerous ways for detecting objects, deep learning applications promise greater accuracy for a wider range of object types. In future, we would like to implement and compare models for aerial images and video frames. Also, there is a need for certain methods which would not only detect the objects but also analyze them for further investigations. It will be crucial to use this remarkable computer technology, which is related to computer vision and image processing that recognizes and characterizes items from digital images and videos, such as people, cars, and animals.

Author contributions: Ravi Prakash Chaturvedi collected, filtered, organized, compared and worked upon the data. Udayan Ghose validated and analyzed the results. He also audited the approach and results.

Conflict of interest: The authors declare no conflict of interest.

Data availability statement: Data was collected from various research papers that are already mentioned as references in paper.

[1] Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis VK, Strintzis MG. Knowledge-assisted semantic video object detection. IEEE Trans Circuits Syst Video Technol. 2005;15(10):1210–24. 10.1109/TCSVT.2005.854238 Search in Google Scholar

[2] Meng Q, Song H, Li G, Zhang Y, Zhang X. A block object detection method based on feature fusion networks for autonomous vehicles. Complexity. Feb. 2019;2019:1–14. 10.1155/2019/4042624 Search in Google Scholar

[3] Guan L, He Y, Kung SY. Multimedia image and video processing. CRC Press; 1 March 2012. p. 331. ISBN 978-1-4398-3087-1. Search in Google Scholar

[4] Wu J, Osuntogun A, Choudhury T, Philipose M, Rehg JM. A scalable approach to activity recognition based on object use. 2007 IEEE 11th International Conference on computer Vision. IEEE; 2007. 10.1109/ICCV.2007.4408865 Search in Google Scholar

[5] Hassan T, Akram MU, Hassan B, Nasim A, Bazaz SA. Review of OCT and fundus images for detection of Macular Edema. In 2015 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE; 2015, September. p. 1–4. 10.1109/IST.2015.7294517 Search in Google Scholar

[6] Bagci AM, Ansari R, Shahidi M. A method for detection of retinal layers by optical coherence tomography image segmentation. In 2007 IEEE/NIH Life Science Systems and Applications Workshop. IEEE; 2007, November. p. 144–7. 10.1109/LSSA.2007.4400905 Search in Google Scholar

[7] Dong C, Liu J, Xu F. Ship detection in optical remote sensing images based on saliency and a rotation-invariant descriptor. Remote Sens. 2018;10:400. 10.3390/rs10030400 Search in Google Scholar

[8] Yang X, Sun H, Fu K, Yang J, Sun X, Yan M, et al. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 2018;10:132. 10.3390/rs10010132 Search in Google Scholar

[9] Girshick R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: 7–13 December 2015. p. 1440–8. 10.1109/ICCV.2015.169 Search in Google Scholar

[10] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada. Cambridge, MA, USA: MIT Press; 7–12 December 2015. p. 91–9. Search in Google Scholar

[11] He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision, Proceedings of the 13th European Conference, Zurich, Switzerland. Cham, Switzerland: Springer; 6–12 September 2014. p. 346–61. 10.1007/978-3-319-10578-9_23 Search in Google Scholar

[12] Xu F, Liu J, Sun M, Zeng D, Wang X. A hierarchical maritime target detection method for optical remote sensing imagery. Remote Sens. 2017;9:280. 10.3390/rs9030280 Search in Google Scholar

[13] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: 23–28 June 2014. p. 580–7. 10.1109/CVPR.2014.81 Search in Google Scholar

[14] Chen X, Kundu K, Zhu Y, Berneshawi AG, Ma H, Fidler S, et al. 3D object proposals for accurate object class detection. Lect Notes Bus Inf Process. 2015;122:34–45. Search in Google Scholar

[15] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. Cham, Switzerland: Springer; 2016. p. 21–37. 10.1007/978-3-319-46448-0_2 Search in Google Scholar

[16] Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: 7–12 June 2015. p. 5325–34. 10.1109/CVPR.2015.7299170 Search in Google Scholar

[17] Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: 27–30 June 2016. p. 2129–37. 10.1109/CVPR.2016.234 Search in Google Scholar

[18] Bateni S, Wang Z, Zhu Y, Hu Y, Liu C. Co-optimizing performance and memory footprint via integrated CPU/GPU memory management, an implementation on autonomous driving platform. In Proceedings of the 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). Sydney, Australia: 21–24 April 2020. 10.1109/RTAS48715.2020.00007 Search in Google Scholar

[19] Lu J, Tang S, Wang J, Zhu H, Wang Y. A review on object detection based on deep convolutional neural networks for autonomous driving. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC). Nanchang, China: 3–5 June 2019. 10.1109/CCDC.2019.8832398 Search in Google Scholar

[20] Wei H, Laszewski M, Kehtarnavaz N. Deep learning-based person detection and classification for far field video surveillance. In Proceedings of the 2018 IEEE 13th Dallas Circuits and Systems Conference. Dallas, TX, USA: November 2018. 10.1109/DCAS.2018.8620111 Search in Google Scholar

[21] Guillermo M, Tobias RR, De Jesus LC, Billones RK, Sybingco E, Dadios EP, et al. Detection and classification of public security threats in the philippines using neural networks. In Proceedings of the 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech). Kyoto, Japan: 10–12 March 2020. p. 1–4. 10.1109/LifeTech48969.2020.1570619075 Search in Google Scholar

[22] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115:211–52. 10.1007/s11263-015-0816-y Search in Google Scholar

[23] Chen G, Wang H, Chen K, Li Z, Song Z, Liu Y, et al. A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Transactions on Systems, Man, and Cybernetics: Systems; 2020 Jul 17. Search in Google Scholar

[24] Ren Y, Zhu C, Xiao S. Small object detection in optical remote sensing images via modified faster R-CNN. Appl Sci. 2018 May;8(5):813. Search in Google Scholar

[25] Huang C, Xu T, Zhang Y, Pan C, Hao J, Li X. Salient object detection on hyperspectral images in wireless network using CNN and saliency optimization. Ad Hoc Netw. 2021 Mar 1;112:102369. 10.1016/j.adhoc.2020.102369 Search in Google Scholar

[26] Hua X, Wang X, Rui T, Zhang H, Wang D. A fast self-attention cascaded network for object detection in large scene remote sensing images. Appl Soft Comput. 2020 Sep 1;94:106495. 10.1016/j.asoc.2020.106495 Search in Google Scholar

[27] Yundong LI, Han DO, Hongguang LI, Zhang X, Zhang B, Zhifeng XI. Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin J Aeronautics. 2020 Jun 1;33(6):1747–55. 10.1016/j.cja.2020.02.024 Search in Google Scholar

[28] Bosquet B, Mucientes M, Brea VM. STDnet: Exploiting high resolution feature maps for small object detection. Eng Appl Artif Intell. 2020 May 1;91:103615. 10.1016/j.engappai.2020.103615 Search in Google Scholar

[29] Kunfu K, Chang Z, Zhang Y, Xu G, Zhang K, Sun X. Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens. 2020 Mar 1;161:294–308. 10.1016/j.isprsjprs.2020.01.025 Search in Google Scholar

[30] Zheng Z, Zhong Y, Ma A, Han X, Zhao J, Liu Y, et al. HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS J Photogramm Remote Sens. 2020 Aug 1;166:1–4. 10.1016/j.isprsjprs.2020.04.019 Search in Google Scholar

[31] Tian Y, Wang K, Wang Y, Tian Y, Wang Z, Wang FY. Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection. Neurocomputing. 2020 Oct 21;411:32–44. 10.1016/j.neucom.2020.05.086 Search in Google Scholar

[32] Li C, Cong R, Guo C, Li H, Zhang C, Zheng F, et al. A parallel down-up fusion network for salient object detection in optical remote sensing images. Neurocomputing. 2020 Nov 20;415:411–20. 10.1016/j.neucom.2020.05.108 Search in Google Scholar

[33] Fadl S, Han Q, Li Q. CNN spatiotemporal features and fusion for surveillance video forgery detection. Signal Processing Image Commun. 2021 Jan;90:116066. 10.1016/j.image.2020.116066 Search in Google Scholar

[34] Zhu H, Wei H, Li B, Yuan X, Kehtarnavaz N. A review of video object detection: datasets, metrics and methods. Appl Sci. 2020 Jan;10(21):7834. 10.3390/app10217834 Search in Google Scholar

[35] Alhimale L, Zedan H, Al-Bayatti A. The implementation of an intelligent and video-based fall detection system using a neural network. Appl Soft Comput. 2014 May 1;18:59–69. 10.1016/j.asoc.2014.01.024 Search in Google Scholar

[36] Lee MH, Yoo HW, Jang DS. Video scene change detection using neural network: Improved ART2. Expert Syst Appl. 2006 Jul 1;31(1):13–25. 10.1016/j.eswa.2005.09.031 Search in Google Scholar

[37] Kousik N, Natarajan Y, Raja RA, Kallam S, Patan R, Gandomi AH. Improved salient object detection using hybrid convolution recurrent neural network. Expert Syst Appl. 2021;166:114064. 10.1016/j.eswa.2020.114064 Search in Google Scholar

[38] Xu G, Zhang Y, Zhang Q, Lin G, Wang Z, Jia Y, et al. Video smoke detection based on deep saliency network. Fire Saf J. 2019 Apr 1;105:277–85. 10.1016/j.firesaf.2019.03.004 Search in Google Scholar

[39] Yang A, Liu H, Chen Y, Zhang C, Yang K. Digital video intrusion intelligent detection method based on narrowband Internet of Things and its application. Image Vis Comput. 2020 May 1;97:103914. 10.1016/j.imavis.2020.103914 Search in Google Scholar

[40] Yamazaki Y, Kanaji S, Matsuda T, Oshikiri T, Nakamura T, Suzuki S, et al. Automated surgical instrument detection from laparoscopic gastrectomy video images using an open source convolutional neural network platform. J Am Coll Surg. 2020 May 1;230(5):725–32. 10.1016/j.jamcollsurg.2020.01.037 Search in Google Scholar

[41] Yue X, Li H, Shimizu M, Kawamura S, Meng L. YOLO-GD: a deep learning-based object detection algorithm for empty-dish recycling robots. Machines. 2022;10(5):294. 10.3390/machines10050294 Search in Google Scholar

[42] Hu GX, Yang Z, Hu L, Huang L, Han JM. Small object detection with multiscale features. Int J Digital Multimed Broadcasting. 2018 Sep 30;2018. 10.1155/2018/4546896 Search in Google Scholar

[43] Ren Y, Zhu C, Xiao S. Small object detection in optical remote sensing images via modifed faster R-CNN. Appl Sci. 2018;8(5):813. 10.3390/app8050813 Search in Google Scholar

[44] Zhang X, Wang Z, Hu Q, Ren J, Sun M. Boundary-aware High-resolution Network with region enhancement for salient object detection. Neurocomputing. 2020 Dec 22;418:91–101. 10.1016/j.neucom.2020.08.038 Search in Google Scholar

[45] Liang F, Duan L, Ma W, Qiao Y, Miao J, Ye Q. Context-aware network for RGB-D salient object detection. Pattern Recognit. 2021;111:107630. 10.1016/j.patcog.2020.107630 Search in Google Scholar

[46] Kumar A, Srivastava S. Object detection system based on convolution neural networks using single shot multi-box detector. Procedia Comput Sci. 2020 Jan 1;171:2610–7. 10.1016/j.procs.2020.04.283 Search in Google Scholar

[47] Jiao L, Zhang S, Dong S, Wang H. RFP-Net: Receptive field-based proposal generation network for object detection. Neurocomputing. 2020 Sep 10;405:138–48. 10.1016/j.neucom.2020.04.106 Search in Google Scholar

[48] Liang Y, Qin G, Sun M, Yan J, Jiang H. MAFNet: Multi-style attention fusion network for salient object detection. Neurocomputing. 2021 Jan;422:22–33. 10.1016/j.neucom.2020.09.033 Search in Google Scholar

[49] Liu C, Li S, Chang F, Wang Y. Machine vision based traffic sign detection methods: Review, analyses and perspectives. IEEE Access. 2019 Jun 26;7:86578–96. 10.1109/ACCESS.2019.2924947 Search in Google Scholar

[50] Pollara A, Sutin A, Salloum H. Passive acoustic methods of small boat detection, tracking and classification. In 2017 IEEE International Symposium on Technologies for Homeland Security (HST). IEEE; 2017 Apr 25. p. 1–6. 10.1109/THS.2017.7943488 Search in Google Scholar

[51] Wang J, Jiang S, Song W, Yang Y. A comparative study of small object detection algorithms. In 2019 Chinese Control Conference (CCC). IEEE; 2019 Jul 27. p. 8507–12. 10.23919/ChiCC.2019.8865157 Search in Google Scholar

[52] Xue Z, Chen W, Li J. Enhancement and fusion of multi-scale feature maps for small object detection. In 2020 39th Chinese Control Conference (CCC). IEEE; 2020 Jul 27. p. 7212–7. 10.23919/CCC50068.2020.9189352 Search in Google Scholar

[53] Zhiqiang W, Jun L. A review of object detection based on convolutional neural network. In 2017 36th Chinese Control Conference (CCC). IEEE; 2017 Jul 26. p. 11104–9. 10.23919/ChiCC.2017.8029130 Search in Google Scholar

[54] Elakkiya R, Teja KS, Jegatha Deborah L, Bisogni C, Medaglia C. Imaging based cervical cancer diagnostics using small object detection-generative adversarial networks. Multimed Tools Appl. 2022;81:191–207. 10.1007/s11042-021-10627-3 Search in Google Scholar

[55] Ji SJ, Ling QH, Han F. An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information. Comput Electr Eng. 2023;105:108490. 10.1016/j.compeleceng.2022.108490 Search in Google Scholar

[56] Sun W, Dai L, Zhang X, Chang P, He X. RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring. Appl Intell. 2021;1–16. 10.1007/s10489-021-02893-3 Search in Google Scholar

[57] Wu H, Chen Y, Wang N, Zhang Z. Sequence level semantics aggregation for video object detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Gangnam-gu, Seoul, Korea: 27 October–2 November 2019. 10.1109/ICCV.2019.00931 Search in Google Scholar

[58] Lu Y, Lu C, Tang C-K. Online video object detection using association LSTM. In Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: 22–29 October 2017. p. 2363–71. 10.1109/ICCV.2017.257 Search in Google Scholar

[59] Chen Y, Cao Y, Hu H, Wang L. Memory enhanced global-local aggregation for video object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: June 2020. p. 16–8. 10.1109/CVPR42600.2020.01035 Search in Google Scholar

[60] IEEE International Conference on Multimedia and Expo. Shanghai, China: 8–12 July 2019. p. 1750–5. Search in Google Scholar

[61] Afchar D, Nozick V, Yamagishi J, Echizen I. Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE; 2018 Dec 11. p. 1–7. 10.1109/WIFS.2018.8630761 Search in Google Scholar

[62] Salvadori C, Petracca M, Madeo S, Bocchino S, Pagano P. Video streaming applications in wireless camera networks: A change detection based approach targeted to 6LoWPAN. J Syst Architecture. 2013 Nov 1;59(10):859–69. 10.1016/j.sysarc.2013.05.009 Search in Google Scholar

[63] Amosov OS, Amosova SG, Ivanov YS, Zhiganov SV. Using the ensemble of deep neural networks for normal and abnormal situations detection and recognition in the continuous video stream of the security system. Procedia Comput Sci. 2019 Jan 1;150:532–9. 10.1016/j.procs.2019.02.089 Search in Google Scholar

[64] El Kaid A, Baïna K, Baïna J. Reduce false positive alerts for elderly person fall video-detection algorithm by convolutional neural network model. Procedia Comput Sci. 2019 Jan 1;148:2–11. 10.1016/j.procs.2019.01.004 Search in Google Scholar

[65] Najva N, Bijoy KE. SIFT and tensor based object detection and classification in videos using deep neural networks. Procedia Comput Sci. 2016 Jan 1;93:351–8. 10.1016/j.procs.2016.07.220 Search in Google Scholar

[66] Yan H, Xu X. End-to-end video subtitle recognition via a deep residual neural network. Pattern Recognit Lett. 2020 Mar 1;131:368–75. 10.1016/j.patrec.2020.01.019 Search in Google Scholar

[67] Wu P, Liu J, Li M, Sun Y, Shen F. Fast sparse coding networks for anomaly detection in videos. Pattern Recognit. 2020 Nov 1;107:107515. 10.1016/j.patcog.2020.107515 Search in Google Scholar

[68] Fang Y, Zhang C, Min X, Huang H, Yi Y, Zhai G, et al. DevsNet: Deep video saliency network using short-term and long-term cues. Pattern Recognit. 2020 Jul 1;103:107294. 10.1016/j.patcog.2020.107294 Search in Google Scholar

[69] Wang F, Xu Z, Gan Y, Vong CM, Liu Q. SCNet: Scale-aware coupling-structure network for efficient video object detection. Neuro Comput. 2020 Sep 3;404:283–93. 10.1016/j.neucom.2020.03.110 Search in Google Scholar

[70] Zhu Y, Yan WQ. Traffic sign recognition based on deep learning. Multimed Tools Appl. 2022;81(13):17779–91. 10.1007/s11042-022-12163-0 Search in Google Scholar

[71] Hou S, Wang C, Quan W, Jiang J, Yan DM. Text-aware single image specular highlight removal. In Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part IV 4. Springer International Publishing; 2021. p. 115–27. 10.1007/978-3-030-88013-2_10 Search in Google Scholar

[72] Temlioglu E, Erer I, Kumlu D. A least mean square approach to buried object detection in ground penetrating radar. In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE; 2017 Jul 23. p. 4833–6. 10.1109/IGARSS.2017.8128084 Search in Google Scholar

[73] Wald A. Statistical decision functions. Wiley; 1950. 10.2307/2280105 Search in Google Scholar

[74] Cramér CH. On the mathematical theory of risk. Centraltryckeriet, Stockholm: Forsakringsaktiebolaget Skandias Festskrift; 1930. Search in Google Scholar

[75] Hermans A, Beyer L, Leibe B In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737; 2017. Search in Google Scholar

[76] Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition. In: Leibe B, Matas J, Sebe N, Welling M, (eds.). ECCV 2016. LNCS. 9911, Cham: Springer; 2016. p. 499–515. 10.1007/978-3-319-46478-731 . Search in Google Scholar

[77] Chaturvedi RP, Ghose U. Small object detection using retinanet with hybrid anchor box hyper tuning using interface of Bayesian mathematics. J Inf Optim Sci. 2022;43(8):2099–110. 10.1080/02522667.2022.2133217 Search in Google Scholar

[78] Fang Y, Cao Y, Zhang W, Yuan Q. Enhance feature representation of dual networks for attribute prediction. In International Conference on Neural Information Processing. Cham: Springer; 2019 Dec 12. p. 13–20. 10.1007/978-3-030-36808-1_2 Search in Google Scholar

[79] Fan C, Liu P. Federated generative adversarial learning. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Cham: Springer; 2020 Oct 16. p. 3–15. 10.1007/978-3-030-60636-7_1 Search in Google Scholar

[80] Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, et al. Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing. 2019 Apr 14;337:15–23. 10.1016/j.neucom.2018.12.061 Search in Google Scholar

[81] Sharma V, Mir RN. SSFNET-VOS: Semantic segmentation and fusion network for video object segmentation. Pattern Recognit Lett. 2020 Dec 1;140:49–58. 10.1016/j.patrec.2020.09.028 Search in Google Scholar

[82] Liu Z, Tang J, Zhao P. Salient object detection via hybrid upsampling and hybrid loss computing. Vis Computer. 2019;36(4):843–53. 10.1007/s00371-019-01659-w Search in Google Scholar

[83] Steno P, Alsadoon A, Prasad PW, Al-Dala’in T, Alsadoon OH. A novel enhanced region proposal network and modified loss function: threat object detection in secure screening using deep learning. J Supercomputing. 2021 Apr;77(4):3840–69. 10.1007/s11227-020-03418-4 Search in Google Scholar

[84] Gu L, Fang Q, Wang Z, Popov E, Dong G. Learning lightweight and superior detectors with feature distillation for onboard remote sensing object detection. Remote Sens. 2023;15(2):370. 10.3390/rs15020370 Search in Google Scholar

[85] Dumitru D, Andreica A, Dioşan L, Bálint Z. Robustness analysis of transferable cellular automata rules optimized for edge detection. Procedia Comput Sci. 2020 Jan 1;176:713–22. 10.1016/j.procs.2020.09.044 Search in Google Scholar

[86] Sasikala J, Juliet DS. Optimized vessel detection in marine environment using hybrid adaptive cuckoo search algorithm. Comput Electr Eng. 2019 Sep 1;78:482–92. 10.1016/j.compeleceng.2019.08.009 Search in Google Scholar

[87] Jain L, Katarya R, Sachdeva S. Opinion leader detection using whale optimization algorithm in online social network. Expert Syst Appl. 2020 Mar 15;142:113016. 10.1016/j.eswa.2019.113016 Search in Google Scholar

[88] Rammurthy D, Mahesh PK. Whale Harris hawks Optimization based deep learning classifier for brain tumor detection using MRI images. J King Saud University-Comput Inf Sci. 2022;34(6):3259–72. 10.1016/j.jksuci.2020.08.006 Search in Google Scholar

[89] Zhang Y, Liu Y, Li J, Zhu J, Yang C, Yang W, et al. WOCDA: A whale optimization based community detection algorithm. Phys A: Stat Mech Appl. 2020 Feb 1;539:122937. 10.1016/j.physa.2019.122937 Search in Google Scholar

[90] Luo JQ, Fang HS, Shao FM, Zhong Y, Hua X. Multi-scale traffic vehicle detection based on faster R-CNN with NAS optimization and feature enrichment. Def Technol. 2022;17(4):1542–54. 10.1016/j.dt.2020.10.006 Search in Google Scholar

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

  • X / Twitter

Supplementary Materials

Please login or register with De Gruyter to order this product.

Journal of Intelligent Systems

Journal and Issue

Articles in the same issue.

loss function literature review

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

BDCC-logo

Article Menu

loss function literature review

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Physics-informed neural network (pinn) evolution and beyond: a systematic literature review and bibliometric analysis.

loss function literature review

1. Introduction

2. background, 2.1. physics-informed neural networks, 2.2. modeling and computation.

  • Data-driven solutions.
  • Data-driven discovery.
  • Data-Driven solutions of Partial Differential Equations
  • Data Discovery of Partial Differential Equations

3. Methodology

3.1. quality assessment, 3.2. qualitative synthesis used in the literature review, 3.3. quantitative synthesis (meta-analysis), 4. result of bibliometric analyses, 4.1. newly proposed pinn methods, 4.1.1. extended pinns, 4.1.2. hybrid pinns, 4.1.3. minimized loss techniques, 5. future research direction, 6. conclusions, supplementary materials, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019 , 378 , 686–707. [ Google Scholar ] [ CrossRef ]
  • Hu, Z.; Jagtap, A.D.; Karniadakis, G.E.; Kawaguchi, K. When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization? arXiv 2021 , arXiv:2109.09444. [ Google Scholar ] [ CrossRef ]
  • Shukla, K.; Jagtap, A.D.; Karniadakis, G.E. Parallel physics-informed neural networks via domain decomposition. J. Comput. Phys. 2021 , 447 , 110683. [ Google Scholar ] [ CrossRef ]
  • Ang, E.; Ng, B.F. Physics-Informed Neural Networks for Flow Around Airfoil. In AIAA SCITECH 2022 Forum ; American Institute of Aeronautics and Astronautics: Fairfax, VA, USA, 2021. [ Google Scholar ] [ CrossRef ]
  • Gnanasambandam, R.; Shen, B.; Chung, J.; Yue, X. Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. arXiv 2022 , arXiv:2204.12589. [ Google Scholar ]
  • Cai, S.; Wang, Z.; Wang, S.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks for Heat Transfer Problems. J. Heat Transf. 2021 , 143 , 060801. [ Google Scholar ] [ CrossRef ]
  • Chiu, P.-H.; Wong, J.C.; Ooi, C.; Dao, M.H.; Ong, Y.-S. CAN-PINN: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method. Comput. Methods Appl. Mech. Eng. 2022 , 395 , 114909. [ Google Scholar ] [ CrossRef ]
  • Liu, X.; Zhang, X.; Peng, W.; Zhou, W.; Yao, W. A novel meta-learning initialization method for physics-informed neural networks. arXiv 2022 , arXiv:2107.10991. [ Google Scholar ] [ CrossRef ]
  • Yang, S.; Chen, H.-C.; Wu, C.-H.; Wu, M.-N.; Yang, C.-H. Forecasting of the Prevalence of Dementia Using the LSTM Neural Network in Taiwan. Mathematics 2021 , 9 , 488. [ Google Scholar ] [ CrossRef ]
  • Huang, B.; Wang, J. Applications of Physics-Informed Neural Networks in Power Systems—A Review. IEEE Trans. Power Syst. 2022 , 1. [ Google Scholar ] [ CrossRef ]
  • Chen, W.; Wang, Q.; Hesthaven, J.S.; Zhang, C. Physics-informed machine learning for reduced-order modeling of nonlinear problems. J. Comput. Phys. 2021 , 446 , 110666. [ Google Scholar ] [ CrossRef ]
  • Chen, Z.; Liu, Y.; Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 2021 , 12 , 6136. [ Google Scholar ] [ CrossRef ]
  • Karakusak, M.Z.; Kivrak, H.; Ates, H.F.; Ozdemir, M.K. RSS-Based Wireless LAN Indoor Localization and Tracking Using Deep Architectures. Big Data Cogn. Comput. 2022 , 6 , 84. [ Google Scholar ] [ CrossRef ]
  • De Ryck, T.; Jagtap, A.D.; Mishra, S. Error estimates for physics informed neural networks approximating the Navier-Stokes equations. arXiv 2022 , arXiv:2203.09346. [ Google Scholar ]
  • Zhai, H.; Sands, T. Controlling Chaos in Van Der Pol Dynamics Using Signal-Encoded Deep Learning. Mathematics 2022 , 10 , 453. [ Google Scholar ] [ CrossRef ]
  • Zhang, T.; Xu, H.; Guo, L.; Feng, X. A non-intrusive neural network model order reduction algorithm for parameterized parabolic PDEs. Comput. Math. Appl. 2022 , 119 , 59–67. [ Google Scholar ] [ CrossRef ]
  • Ankita; Rani, S.; Singh, A.; Elkamchouchi, D.H.; Noya, I.D. Lightweight Hybrid Deep Learning Architecture and Model for Security in IIOT. Appl. Sci. 2022 , 12 , 6442. [ Google Scholar ] [ CrossRef ]
  • Wight, C.L.; Zhao, J. Solving Allen-Cahn and Cahn-Hilliard Equations using the Adaptive Physics Informed Neural Networks. arXiv 2020 , arXiv:2007.04542. [ Google Scholar ]
  • Rasht-Behesht, M.; Huber, C.; Shukla, K.; Karniadakis, G.E. Physics-Informed Neural Networks (PINNs) for Wave Propagation and Full Waveform Inversions. J. Geophys. Res. Solid Earth 2022 , 127 , e2021JB023120. [ Google Scholar ] [ CrossRef ]
  • Nasiri, P.; Dargazany, R. Reduced-PINN: An Integration-Based Physics-Informed Neural Networks for Stiff ODEs. arXiv 2020 , arXiv:2208.12045. [ Google Scholar ]
  • Schiassi, E.; De Florio, M.; D’Ambrosio, A.; Mortari, D.; Furfaro, R. Physics-Informed Neural Networks and Functional Interpolation for Data-Driven Parameters Discovery of Epidemiological Compartmental Models. Mathematics 2021 , 9 , 2069. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.; Li, Y.; Zhou, W.; Chen, X.; Yao, W.; Zhao, Y. TONR: An exploration for a novel way combining neural network with topology optimization. Comput. Methods Appl. Mech. Eng. 2021 , 386 , 114083. [ Google Scholar ] [ CrossRef ]
  • Wang, S.; Teng, Y.; Perdikaris, P. Understanding and mitigating gradient pathologies in physics-informed neural networks. arXiv 2020 , arXiv:2001.04536. [ Google Scholar ] [ CrossRef ]
  • Fujita, K. Physics-Informed Neural Network Method for Space Charge Effect in Particle Accelerators. IEEE Access 2021 , 9 , 164017–164025. [ Google Scholar ] [ CrossRef ]
  • Yu, J.; de Antonio, A.; Villalba-Mora, E. Deep Learning (CNN, RNN) Applications for Smart Homes: A Systematic Review. Computers 2022 , 11 , 26. [ Google Scholar ] [ CrossRef ]
  • Dwivedi, V.; Srinivasan, B. A Normal Equation-Based Extreme Learning Machine for Solving Linear Partial Differential Equations. J. Comput. Inf. Sci. Eng. 2021 , 22 , 014502. [ Google Scholar ] [ CrossRef ]
  • Haghighat, E.; Amini, D.; Juanes, R. Physics-informed neural network simulation of multiphase poroelasticity using stress-split sequential training. Comput. Methods Appl. Mech. Eng. 2021 , 397 , 115141. [ Google Scholar ] [ CrossRef ]
  • Berg, J.; Nyström, K. A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 2018 , 317 , 28–41. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Mahesh, R.B.; Leandro, J.; Lin, Q. Physics informed neural network for spatial-temporal flood forecasting. In Climate Change and Water Security ; Lecture Notes in Civil Engineering; Springer Nature Singapore Pte Ltd.: Singapore, 2022; Volume 178. [ Google Scholar ] [ CrossRef ]
  • Ngo, P.; Tejedor, M.; Tayefi, M.; Chomutare, T.; Godtliebsen, F. Risk-Averse Food Recommendation Using Bayesian Feedforward Neural Networks for Patients with Type 1 Diabetes Doing Physical Activities. Appl. Sci. 2020 , 10 , 8037. [ Google Scholar ] [ CrossRef ]
  • Henkes, A.; Wessels, H.; Mahnken, R. Physics informed neural networks for continuum micromechanics. Comput. Methods Appl. Mech. Eng. 2022 , 393 , 114790. [ Google Scholar ] [ CrossRef ]
  • Patel, R.G.; Manickam, I.; Trask, N.A.; Wood, M.A.; Lee, M.; Tomas, I.; Cyr, E.C. Thermodynamically consistent physics-informed neural networks for hyperbolic systems. J. Comput. Phys. 2020 , 449 , 110754. [ Google Scholar ] [ CrossRef ]
  • Fang, Z. A High-Efficient Hybrid Physics-Informed Neural Networks Based on Convolutional Neural Network. IEEE Trans. Neural Networks Learn. Syst. 2021 , 33 , 5514–5526. [ Google Scholar ] [ CrossRef ]
  • Lawal, Z.K.; Yassin, H.; Zakari, R.Y. Flood Prediction Using Machine Learning Models: A Case Study of Kebbi State Nigeria. In Proceedings of the 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Brisbane, Australia, 8–10 December 2021. [ Google Scholar ] [ CrossRef ]
  • Jagtap, A.D.; Kharazmi, E.; Karniadakis, G.E. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 2020 , 365 , 113028. [ Google Scholar ] [ CrossRef ]
  • Mao, Z.; Jagtap, A.D.; Karniadakis, G.E. Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 2020 , 360 , 112789. [ Google Scholar ] [ CrossRef ]
  • Bihlo, A.; Popovych, R.O. Physics-informed neural networks for the shallow-water equations on the sphere. J. Comput. Phys. 2022 , 456 , 111024. [ Google Scholar ] [ CrossRef ]
  • Lagaris, I.; Likas, A.; Fotiadis, D. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Networks 1998 , 9 , 987–1000. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Zjavka, L. Construction and adjustment of differential polynomial neural network. J. Eng. Comput. Innov. 2011 , 2 , 40–50. [ Google Scholar ]
  • Zjavka, L. Approximation of multi-parametric functions using the differential polynomial neural network. Math. Sci. 2013 , 7 , 33. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Zjavka, L.; Snasel, V. Composing and Solving General Differential Equations Using Extended Polynomial Networks. In Proceedings of the 2015 International Conference on Intelligent Networking and Collaborative Systems, IEEE INCoS 2015, Taipei, Taiwan, 2–4 September 2015; pp. 110–115. [ Google Scholar ] [ CrossRef ]
  • Raissi, M.; Karniadakis, G.E. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys. 2018 , 357 , 125–141. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 2017 , arXiv:1711.10561. [ Google Scholar ]
  • Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems. arXiv 2018 , arXiv:1801.01236. [ Google Scholar ]
  • Raissi, M. Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations. 2018. Available online: http://jmlr.org/papers/v19/18-046.html (accessed on 3 June 2022).
  • Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations. arXiv 2017 , arXiv:1703.10230. [ Google Scholar ]
  • Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021 , 3 , 422–440. [ Google Scholar ] [ CrossRef ]
  • Lazovskaya, T.; Malykhina, G.; Tarkhov, D. Physics-Based Neural Network Methods for Solving Parameterized Singular Perturbation Problem. Computation 2021 , 9 , 97. [ Google Scholar ] [ CrossRef ]
  • Bati, G.F.; Singh, V.K. Nadal: A neighbor-aware deep learning approach for inferring interpersonal trust using smartphone data. Computers 2021 , 10 , 3. [ Google Scholar ] [ CrossRef ]
  • Klyuchinskiy, D.; Novikov, N.; Shishlenin, M. A Modification of Gradient Descent Method for Solving Coefficient Inverse Problem for Acoustics Equations. Computation 2020 , 8 , 73. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Zheng, L. DEEPWAVE: Deep Learning based Real-time Water Wave Simulation. Available online: https://jinningli.cn/cv/DeepWavePaper.pdf (accessed on 25 May 2022).
  • Nascimento, R.G.; Fricke, K.; Viana, F.A. A tutorial on solving ordinary differential equations using Python and hybrid physics-informed neural network. Eng. Appl. Artif. Intell. 2020 , 96 , 103996. [ Google Scholar ] [ CrossRef ]
  • Cheng, Y.; Huang, Y.; Pang, B.; Zhang, W. ThermalNet: A deep reinforcement learning-based combustion optimization system for coal-fired boiler. Eng. Appl. Artif. Intell. 2018 , 74 , 303–311. [ Google Scholar ] [ CrossRef ]
  • D’Ambrosio, A.; Schiassi, E.; Curti, F.; Furfaro, R. Pontryagin Neural Networks with Functional Interpolation for Optimal Intercept Problems. Mathematics 2021 , 9 , 996. [ Google Scholar ] [ CrossRef ]
  • Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys. 2017 , 335 , 736–746. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lawal, Z.K.; Yassin, H.; Zakari, R.Y. Stock Market Prediction using Supervised Machine Learning Techniques: An Overview. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, Australia, 16–18 December 2020; pp. 1–6. [ Google Scholar ] [ CrossRef ]
  • Deng, R.; Duzhin, F. Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets. Big Data Cogn. Comput. 2022 , 6 , 74. [ Google Scholar ] [ CrossRef ]
  • Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989 , 2 , 359–366. [ Google Scholar ] [ CrossRef ]
  • Dong, S.; Li, Z. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Eng. 2021 , 387 , 114129. [ Google Scholar ] [ CrossRef ]
  • Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection. Computers 2022 , 11 , 41. [ Google Scholar ] [ CrossRef ]
  • Arzani, A.; Dawson, S.T.M. Data-driven cardiovascular flow modelling: Examples and opportunities. J. R. Soc. Interface 2020 , 18 , 20200802. [ Google Scholar ] [ CrossRef ]
  • SBerrone, S.; Della Santa, F.; Mastropietro, A.; Pieraccini, S.; Vaccarino, F. Graph-Informed Neural Networks for Regressions on Graph-Structured Data. Mathematics 2022 , 10 , 786. [ Google Scholar ] [ CrossRef ]
  • Gutiérrez-Muñoz, M.; Coto-Jiménez, M. An Experimental Study on Speech Enhancement Based on a Combination of Wavelets and Deep Learning. Computation 2022 , 10 , 102. [ Google Scholar ] [ CrossRef ]
  • Mousavi, S.M.; Ghasemi, M.; Dehghan Manshadi, M.; Mosavi, A. Deep Learning for Wave Energy Converter Modeling Using Long Short-Term Memory. Mathematics 2021 , 9 , 871. [ Google Scholar ] [ CrossRef ]
  • Viana, F.A.; Nascimento, R.G.; Dourado, A.; Yucesan, Y.A. Estimating model inadequacy in ordinary differential equations with physics-informed neural networks. Comput. Struct. 2021 , 245 , 106458. [ Google Scholar ] [ CrossRef ]
  • Li, W.; Bazant, M.Z.; Zhu, J. A physics-guided neural network framework for elastic plates: Comparison of governing equations-based and energy-based approaches. Comput. Methods Appl. Mech. Eng. 2021 , 383 , 113933. [ Google Scholar ] [ CrossRef ]
  • Reyes, B.; Howard, A.A.; Perdikaris, P.; Tartakovsky, A.M. Learning unknown physics of non-Newtonian fluids. Phys. Rev. Fluids 2020 , 6 , 073301. [ Google Scholar ] [ CrossRef ]
  • Zhu, J.-A.; Jia, Y.; Lei, J.; Liu, Z. Deep Learning Approach to Mechanical Property Prediction of Single-Network Hydrogel. Mathematics 2021 , 9 , 2804. [ Google Scholar ] [ CrossRef ]
  • Rodrigues, P.J.; Gomes, W.; Pinto, M.A. DeepWings © : Automatic Wing Geometric Morphometrics Classification of Honey Bee ( Apis mellifera ) Subspecies Using Deep Learning for Detecting Landmarks. Big Data Cogn. Comput. 2022 , 6 , 70. [ Google Scholar ] [ CrossRef ]
  • Ji, W.; Qiu, W.; Shi, Z.; Pan, S.; Deng, S. Stiff-PINN: Physics-Informed Neural Network for Stiff Chemical Kinetics. J. Phys. Chem. A 2021 , 125 , 8098–8106. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • James, S.C.; Zhang, Y.; O’Donncha, F. A machine learning framework to forecast wave conditions. Coast. Eng. 2018 , 137 , 1–10. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Hu, X.; Buris, N.E. A Deep Learning Framework for Solving Rectangular Waveguide Problems. In Proceedings of the Asia-Pacific Microwave Conference Proceedings, APMC, Hong Kong, 8–11 December 2020; pp. 409–411. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; The MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Lim, S.; Shin, J. Application of a Deep Neural Network to Phase Retrieval in Inverse Medium Scattering Problems. Computation 2021 , 9 , 56. [ Google Scholar ] [ CrossRef ]
  • Wang, D.-L.; Sun, Q.-Y.; Li, Y.-Y.; Liu, X.-R. Optimal Energy Routing Design in Energy Internet with Multiple Energy Routing Centers Using Artificial Neural Network-Based Reinforcement Learning Method. Appl. Sci. 2019 , 9 , 520. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Su, B.; Xu, C.; Li, J. A Deep Neural Network Approach to Solving for Seal’s Type Partial Integro-Differential Equation. Mathematics 2022 , 10 , 1504. [ Google Scholar ] [ CrossRef ]
  • Seo, J.-K. A pretraining domain decomposition method using artificial neural networks to solve elliptic PDE boundary value problems. Sci. Rep. 2022 , 12 , 13939. [ Google Scholar ] [ CrossRef ]
  • Mishra, S.; Molinaro, R. Estimates on the generalization error of Physics Informed Neural Networks (PINNs) for approximating a class of inverse problems for PDEs. arXiv 2020 , arXiv:2007.01138. [ Google Scholar ]
  • Li, Y.; Wang, J.; Huang, Z.; Gao, R.X. Physics-informed meta learning for machining tool wear prediction. J. Manuf. Syst. 2022 , 62 , 17–27. [ Google Scholar ] [ CrossRef ]
  • Arzani, A.; Wang, J.-X.; D’Souza, R.M. Uncovering near-wall blood flow from sparse data with physics-informed neural networks. Phys. Fluids 2021 , 33 , 071905. [ Google Scholar ] [ CrossRef ]
  • Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. Available online: https://proceedings.mlr.press/v9/glorot10a.html (accessed on 17 June 2022).
  • Doan, N.; Polifke, W.; Magri, L. Physics-informed echo state networks. J. Comput. Sci. 2020 , 47 , 101237. [ Google Scholar ] [ CrossRef ]
  • Falas, S.; Konstantinou, C.; Michael, M.K. Special Session: Physics-Informed Neural Networks for Securing Water Distribution Systems. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Hartford, CT, USA, 18–21 October 2020; pp. 37–40. [ Google Scholar ] [ CrossRef ]
  • Filgöz, A.; Demirezen, G.; Demirezen, M.U. Applying Novel Adaptive Activation Function Theory for Launch Acceptability Region Estimation with Neural Networks in Constrained Hardware Environments: Performance Comparison. In Proceedings of the 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 3–7 October 2021; pp. 1–10. [ Google Scholar ] [ CrossRef ]
  • Fülöp, A.; Horváth, A. End-to-End Training of Deep Neural Networks in the Fourier Domain. Mathematics 2022 , 10 , 2132. [ Google Scholar ] [ CrossRef ]
  • Fang, Z.; Zhan, J. A Physics-Informed Neural Network Framework for PDEs on 3D Surfaces: Time Independent Problems. IEEE Access 2020 , 8 , 26328–26335. [ Google Scholar ] [ CrossRef ]
  • Markidis, S. The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers? arXiv 2021 , arXiv:2103.09655. [ Google Scholar ] [ CrossRef ]
  • Baydin, A.G.; Pearlmutter, B.A.; Radul, A.A.; Siskind, J.M. Automatic differentiation in machine learning: A survey. arXiv 2015 , arXiv:1502.05767. [ Google Scholar ]
  • Niaki, S.A.; Haghighat, E.; Campbell, T.; Poursartip, A.; Vaziri, R. Physics-informed neural network for modelling the thermochemical curing process of composite-tool systems during manufacture. Comput. Methods Appl. Mech. Eng. 2021 , 384 , 113959. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; Xu, L.; Ying, S. DWNN: Deep Wavelet Neural Network for Solving Partial Differential Equations. Mathematics 2022 , 10 , 1976. [ Google Scholar ] [ CrossRef ]
  • De Wolff, T.; Carrillo, H.; Martí, L.; Sanchez-Pi, N. Assessing Physics Informed Neural Networks in Ocean Modelling and Climate Change Applications. In Proceedings of the AI: Modeling Oceans and Climate Change Workshop at ICLR 2021, Santiago, Chile, 7 May 2021; Available online: https://hal.inria.fr/hal-03262684 (accessed on 17 June 2022).
  • Rao, C.; Sun, H.; Liu, Y. Physics informed deep learning for computational elastodynamics without labeled data. arXiv 2020 , arXiv:2006.08472. [ Google Scholar ] [ CrossRef ]
  • Liu, X.; Almekkawy, M. Ultrasound Computed Tomography using physical-informed Neural Network. In Proceedings of the 2021 IEEE International Ultrasonics Symposium (IUS), Xi’an, China, 11–16 September 2021; pp. 1–4. [ Google Scholar ] [ CrossRef ]
  • Vitanov, N.K.; Dimitrova, Z.I.; Vitanov, K.N. On the Use of Composite Functions in the Simple Equations Method to Obtain Exact Solutions of Nonlinear Differential Equations. Computation 2021 , 9 , 104. [ Google Scholar ] [ CrossRef ]
  • Guo, Y.; Cao, X.; Liu, B.; Gao, M. Solving Partial Differential Equations Using Deep Learning and Physical Constraints. Appl. Sci. 2020 , 10 , 5917. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Tartakovsky, A.M. Physics-informed Karhunen-Loéve and neural network approximations for solving inverse differential equation problems. J. Comput. Phys. 2022 , 462 , 111230. [ Google Scholar ] [ CrossRef ]
  • Qureshi, M.; Khan, N.; Qayyum, S.; Malik, S.; Sanil, H.; Ramayah, T. Classifications of Sustainable Manufacturing Practices in ASEAN Region: A Systematic Review and Bibliometric Analysis of the Past Decade of Research. Sustainability 2020 , 12 , 8950. [ Google Scholar ] [ CrossRef ]
  • Keathley-Herring, H.; Van Aken, E.; Gonzalez-Aleu, F.; Deschamps, F.; Letens, G.; Orlandini, P.C. Assessing the maturity of a research area: Bibliometric review and proposed framework. Scientometrics 2016 , 109 , 927–951. [ Google Scholar ] [ CrossRef ]
  • Zaccaria, V.; Rahman, M.; Aslanidou, I.; Kyprianidis, K. A Review of Information Fusion Methods for Gas Turbine Diagnostics. Sustainability 2019 , 11 , 6202. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shu, F.; Julien, C.-A.; Zhang, L.; Qiu, J.; Zhang, J.; Larivière, V. Comparing journal and paper level classifications of science. J. Inf. 2019 , 13 , 202–225. [ Google Scholar ] [ CrossRef ]
  • Leiva, M.A.; García, A.J.; Shakarian, P.; Simari, G.I. Argumentation-Based Query Answering under Uncertainty with Application to Cybersecurity. Big Data Cogn. Comput. 2022 , 6 , 91. [ Google Scholar ] [ CrossRef ]
  • Yang, L.; Meng, X.; Karniadakis, G.E. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021 , 425 , 109913. [ Google Scholar ] [ CrossRef ]
  • Goswami, S.; Anitescu, C.; Rabczuk, T. Adaptive fourth-order phase field analysis using deep energy minimization. Theor. Appl. Fract. Mech. 2020 , 107 , 102527. [ Google Scholar ] [ CrossRef ]
  • Costabal, F.S.; Yang, Y.; Perdikaris, P.; Hurtado, D.E.; Kuhl, E. Physics-Informed Neural Networks for Cardiac Activation Mapping. Front. Phys. 2020 , 8 , 42. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Jagtap, A.D.; Mao, Z.; Adams, N.; Karniadakis, G.E. Physics-informed neural networks for inverse problems in supersonic flows. arXiv 2022 , arXiv:2202.11821. [ Google Scholar ]
  • Meng, X.; Li, Z.; Zhang, D.; Karniadakis, G.E. PPINN: Parareal physics-informed neural network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 2020 , 370 , 113250. [ Google Scholar ] [ CrossRef ]
  • Haghighat, E.; Raissi, M.; Moure, A.; Gomez, H.; Juanes, R. A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics. Comput. Methods Appl. Mech. Eng. 2021 , 379 , 113741. [ Google Scholar ] [ CrossRef ]
  • Kharazmi, E.; Zhang, Z.; Karniadakis, G.E. hp-VPINNs: Variational physics-informed neural networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 2021 , 374 , 113547. [ Google Scholar ] [ CrossRef ]
  • Fang, Y.; Wu, G.Z.; Wang, Y.Y.; Dai, C.Q. Data-driven femtosecond optical soliton excitations and parameters discovery of the high-order NLSE using the PINN. Nonlinear Dyn. 2021 , 105 , 603–616. [ Google Scholar ] [ CrossRef ]
  • Dourado, A.; Viana, F.A.C. Physics-Informed Neural Networks for Missing Physics Estimation in Cumulative Damage Models: A Case Study in Corrosion Fatigue. J. Comput. Inf. Sci. Eng. 2020 , 20 , 061007. [ Google Scholar ] [ CrossRef ]
  • Shin, Y.; Darbon, J.; Karniadakis, G.E. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. arXiv 2020 , arXiv:2004.01806. [ Google Scholar ] [ CrossRef ]
  • Zobeiry, N.; Humfeld, K.D. A physics-informed machine learning approach for solving heat transfer equation in advanced manufacturing and engineering applications. Eng. Appl. Artif. Intell. 2021 , 101 , 104232. [ Google Scholar ] [ CrossRef ]
  • Mehta, P.P.; Pang, G.; Song, F.; Karniadakis, G.E. Discovering a universal variable-order fractional model for turbulent Couette flow using a physics-informed neural network. Fract. Calc. Appl. Anal. 2019 , 22 , 1675–1688. [ Google Scholar ] [ CrossRef ]
  • Liu, M.; Liang, L.; Sun, W. A generic physics-informed neural network-based constitutive model for soft biological tissues. Comput. Methods Appl. Mech. Eng. 2020 , 372 , 113402. [ Google Scholar ] [ CrossRef ]
  • Pu, J.; Li, J.; Chen, Y. Solving localized wave solutions of the derivative nonlinear Schrodinger equation using an improved PINN method. arXiv 2021 , arXiv:2101.08593. [ Google Scholar ] [ CrossRef ]
  • Meng, X.; Babaee, H.; Karniadakis, G.E. Multi-fidelity Bayesian neural networks: Algorithms and applications. J. Comput. Phys. 2021 , 438 , 110361. [ Google Scholar ] [ CrossRef ]
  • Jagtap, A.D.; Karniadakis, G.E. Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 2020 , 28 , 2002–2041. [ Google Scholar ] [ CrossRef ]
  • Pang, G.; D’Elia, M.; Parks, M.; Karniadakis, G. nPINNs: Nonlocal physics-informed neural networks for a parametrized nonlocal universal Laplacian operator. Algorithms and applications. J. Comput. Phys. 2020 , 422 , 109760. [ Google Scholar ] [ CrossRef ]
  • Rafiq, M.; Rafiq, G.; Choi, G.S. DSFA-PINN: Deep Spectral Feature Aggregation Physics Informed Neural Network. IEEE Access 2022 , 10 , 22247–22259. [ Google Scholar ] [ CrossRef ]
  • Raynaud, G.; Houde, S.; Gosselin, F.P. ModalPINN: An extension of physics-informed Neural Networks with enforced truncated Fourier decomposition for periodic flow reconstruction using a limited number of imperfect sensors. J. Comput. Phys. 2022 , 464 , 111271. [ Google Scholar ] [ CrossRef ]
  • Haitsiukevich, K.; Ilin, A. Improved Training of Physics-Informed Neural Networks with Model Ensembles. arXiv 2022 , arXiv:2204.05108. [ Google Scholar ]
  • Lahariya, M.; Karami, F.; Develder, C.; Crevecoeur, G. Physics-informed Recurrent Neural Networks for The Identification of a Generic Energy Buffer System. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; pp. 1044–1049. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Zhu, Y.; Wang, J.; Ju, L.; Qian, Y.; Ye, M.; Yang, J. GW-PINN: A deep learning algorithm for solving groundwater flow equations. Adv. Water Resour. 2022 , 165 , 104243. [ Google Scholar ] [ CrossRef ]
  • Yang, M.; Foster, J.T. Multi-output physics-informed neural networks for forward and inverse PDE problems with uncertainties. Comput. Methods Appl. Mech. Eng. 2022 , 115041. [ Google Scholar ] [ CrossRef ]
  • Psaros, A.F.; Kawaguchi, K.; Karniadakis, G.E. Meta-learning PINN loss functions. J. Comput. 2022 , 458 , 111121. [ Google Scholar ] [ CrossRef ]
  • Habib, A.; Yildirim, U. Developing a physics-informed and physics-penalized neural network model for preliminary design of multi-stage friction pendulum bearings. Eng. Appl. Artif. Intell. 2022 , 113 , 104953. [ Google Scholar ] [ CrossRef ]
  • Xiang, Z.; Peng, W.; Liu, X.; Yao, W. Self-adaptive loss balanced Physics-informed neural networks. Neurocomputing 2022 , 496 , 11–34. [ Google Scholar ] [ CrossRef ]
  • Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, K.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021 , 372 , n71. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

GroupPopulationPercentage (%)
Journal by Specialization
  1. Computer Science2924.167
  2. Engineering3125.833
  3. Mathematic3529.166
  4. Physics2520.833
Journal by Type
  1. Conference Article2117.500
  2. Journal Article9982.500
Journal by Methods
  1. Conventional PINNs9780.833
  2. Extended PINNs1210.000
  3. Hybrid PINNs75.833
  4. Minimized Loss PINNs43.333
AuthorsSource TitleNumber of Citations
Raissi et al. [ ]Journal of Computational Physics3442
Costabal et al. [ ]Frontiers in Physics122
Jagtap A.D. et al. [ ]Communications in Computational Physics118
Meng, Xuhui et al. [ ]Computer Methods in Applied Mechanics and Engineering143
Yang, Liu et al. [ ]Journal of Computational Physics183
Haghighat E. et al. [ ]Computer Methods in Applied Mechanics and Engineering161
Kharazmi E. et al. [ ]Computer Methods in Applied Mechanics and Engineering111
Fang, Yin et al. [ ]Nonlinear Dynamics29
Dourado A. et al. [ ]Journal of Computing and Information Science in Engineering37
Shin Y. et al. [ ]Communications in Computational Physics137
Zobeiry N. et al. [ ]Engineering Applications of Artificial Intelligence52
Goswami et al. [ ]Theoretical And Applied Fracture Mechanics177
Mehta, Pavan et al. [ ]Fractional Calculus and Applied Analysis25
Colby et al. [ ]Communications in Computational Physics54
Liu, Minliang et al. [ ]Computer Methods in Applied Mechanics and Engineering26
Doan N.A.K. et al. [ ]Journal of Computational Science28
Rao, Chengping et al. [ ]Journal of Engineering Mechanics57
Pu, Juncai et al. [ ]Nonlinear Dynamics21
Meng, Xuhui et al. [ ]Journal of Computational Physics31
Li W. et al. [ ]Computer Methods in Applied Mechanics and Engineering21
AuthorObjective(s)TechniqueLimitation(s)
Jagtap A.D. et al. [ ]The main goal of this study was to develop a unique conservative physics-informed neural network (cPINN) for solving complicated problems.Conservative physics-informed neural network (cPINN)Despite the parallelization of the cPINN, it cannot be used for parallel computation.
Jagtap A.D. et al. [ ]The main objective of this study was to introduce an XPINN model that improved the generalization capabilities of PINNs.Extended physics-informed neural networks (XPINNs)XPINNs enhance generalization in exceptional conditions. Decomposition results in less training data, which makes the model more likely to overfit and lose generalizability.
De Ryck et al. [ ]The main goal of this study was to precisely constrain the errors arising from the use of XPINNs to approximate incompressible Navier–Stokes equations.PINN error estimatesThe authors’ estimates in their experiment gave no indication of training errors.
G. Pang et al. [ ]This study aimed to extend PINNs to the inference of parameters and functions for integral equations, such as nonlocal Poisson and nonlocal turbulence models (nPINNs). A wide range of datasets must be adaptable to fit the nPINNs.Nonlocal physics-informed neural networks (nPINNs)nPINNs require more residual points. Increasing the number of discretization points, on the other hand, makes optimization more challenging and ineffective, and causes error stagnation.
Liu Yang et al. [ ]The aim of this study was to introduce a novel method that was designed for solving both forward and inverse nonlinear problems outlined by PDEs with noisy data, which aimed to be more accurate and much faster than a simple PINN.Bayesian physics-informed neural networks (B-PINNs)The proposed B-PINNs in this work were only tested in scenarios where data size was up to several hundreds, and no tests were performed with large datasets.
Ehsan Kharazmi et al. [ ]The purpose of this research was to bring together current developments in deep learning techniques for PDEs based on residuals of least-squares equations using a newly developed method.Variational physics-informed neural networks (hp-VPINNs)Although VPINN performance on inverse problems is encouraging, no comparison was made to classical approaches.
Juncai Pu et al. [ ]The goal of the study was to provide an improved PINN approach for localized wave solutions of the derivative nonlinear Schrödinger equation in complex space with faster convergence and optimum simulation performance.Improved PINN methodComplex integrable equations were not really considered in this study.
Enrico Schiassi et al. [ ]The main objective of this study was to propose a novel model for providing solutions to problems with parametric differential equations (DEs) that is more accurate and robust.Physics-informed neural network theory of functional connections (PINN-TFC)The proposed technique cannot be applied to data-driven discovery of problems when solving ODEs using both a deterministic and probabilistic approach.
Rafiq et al. [ ]The main goal of this experiment was to propose a unique deep Fourier neural network that expands information using spectral feature combination and a Fourier neural operator as the principal component.Deep spectral feature aggregation physics-informed neural network (DSFA-PINN)Other mathematical functions, such as the Laplace transform coupled with a Fourier transform, as well as the conventional CNN, cannot be used to generalize models using this method.
Gaétan et al. [ ]The major objective of this experiment was to design a robust model architecture for reconstructing periodic flows with a small number of imperfect sensors by extending PINNs with forced truncated Fourier decomposition.Modal physics-informed neural networks (ModalPINNs)The application of ModalPINNs is restricted to fluid mechanics only.
Colby et al. [ ]The primary objective of this study was to present an Extended PINN method which is more effective and accurate in solving larger PDE problems.Adaptive physics informed neural networksThis study focused primarily on the problem of solving differential equations.
Katsiaryna et al. [ ]The objective of this experiment was to determine an acceptable time window for expanding the solution interval using an ensemble of PINNs.PINNs with ensemble modelsThe ensemble algorithm seems to be more computationally intensive than the standard PINN and is not applicable to complex systems.
AuthorObjective(s)TechniqueLimitation(s)
Meng et al. [ ]The main goal of this research was to introduce a new a hybrid technique that can exploit the high-level computational efficacy of training a neural network with small datasets to significantly speed up the time taken to find solutions to challenging physical problems.Parareal physics-informed neural network (PPINN)Domain decomposition of fundamental problems with huge spatial databases cannot be solved with PPINNs.
Zhiwei Fang et al. [ ]This paper aimed to present a Hybrid PINN for PDEs and a differential operator approximation for solving the PDEs using a convolutional neural network (CNN).Hybrid physics-informed neural network (Hybrid PINN)This Hybrid PINN is not applicable to nonlinear operators.
Lahariya M. [ ]The goal of this research was to propose a physics-informed neural network based on grey-box modeling methods for identifying energy buffers using a recurrent neural network.Physics-informed recurrent neural networksThe proposed model was not validated with real-world industrial processes.
Wenqian Chen et al. [ ]The main goal of this research was to develop a reduced-order model that uses high-accuracy snapshots to generate reduced basis information from the accurate network while reducing the weighted sum of residual losses from the reduced-order equation.Physics-reinforced neural network (PRNN)The reduced basis set must be small to outperform the Proper Orthogonal Decomposition–Galerkin (POD–G) method in terms of accuracy, as the numerical results of the experiment showed.
Xiaoping Zhang [ ]The main objective of this study was to develop a novel method for solving groundwater flow equations using deep learning techniques.Ground Water-PINN (GW-PINN)The proposed model cannot be used to predict groundwater flow in more complex and larger areas.
Dourado et al. [ ]The major goal of this experiment was to develop a hybrid technique for missing physics estimates in cumulative damage models by combining data-driven and physics-informed layers in deep neural networks.PINNs for missing physicsEven if the proposed additional levels are used to initialize the neural network, suboptimal setting of these parameters may lead to the failure of the training.
Mingyuan Yang [ ]The goal of this experiment was to develop a new hybrid model for uncertain forward and inverse PDE problems.Multi-Output physics-informed neural network (MO-PINN)The proposed method cannot be used to solve problems involving multi-fidelity data.
AuthorObjective(s)TechniqueLimitation(s)
Apostolos F. et al. [ ]The main goal of this study was to provide a gradient-based meta-learning method for offline discovery that uses data from task distributions created using parameterized PDEs with numerous benchmarks to meta-learn PINN loss functions.Meta-learning PINN loss functionsOptimizing the performance of methods such as RMSProp and Adam for handling inner optimizers with memory was not considered in this experiment.
Liu X. et al. [ ]The major objective of this experiment was to use multiple sample tasks from parameterized PDEs and modify the loss penalty term to introduce a novel method that depends on labeled data.New reptile initialization-based physics-informed neural network (NRPINN)NRPINNs cannot be used to solve problems in the absence of prior knowledge.
Habib et al. [ ]The main goal of this experiment was to develop a model that expresses physical constraints and integrates the regulating physical laws into its loss function (physics-informed), which the model penalizes when they are violated (physics-penalized).Physics-informed and physics-penalized neural network model (PI-PP-NN)The proposed model can only be used to create friction pendulum bearings. For any other isolation system, the theoretical basis must be adapted accordingly before it can be used for design.
Zixue Xiang [ ]The main goal of this experiment was to develop a technique that allows PINNs to perfectly and efficiently learn PDEs using Gaussian probabilistic models.Loss-balanced physics-informed neural networks (lbPINNs)In this experiment, the adaptive weight of PDE loss gradually decreased. Therefore, a theoretical investigation of this paradigm is necessary to increase the robustness and scalability of the technique.
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Lawal, Z.K.; Yassin, H.; Lai, D.T.C.; Che Idris, A. Physics-Informed Neural Network (PINN) Evolution and Beyond: A Systematic Literature Review and Bibliometric Analysis. Big Data Cogn. Comput. 2022 , 6 , 140. https://doi.org/10.3390/bdcc6040140

Lawal ZK, Yassin H, Lai DTC, Che Idris A. Physics-Informed Neural Network (PINN) Evolution and Beyond: A Systematic Literature Review and Bibliometric Analysis. Big Data and Cognitive Computing . 2022; 6(4):140. https://doi.org/10.3390/bdcc6040140

Lawal, Zaharaddeen Karami, Hayati Yassin, Daphne Teck Ching Lai, and Azam Che Idris. 2022. "Physics-Informed Neural Network (PINN) Evolution and Beyond: A Systematic Literature Review and Bibliometric Analysis" Big Data and Cognitive Computing 6, no. 4: 140. https://doi.org/10.3390/bdcc6040140

Article Metrics

Article access statistics, supplementary material.

ZIP-Document (ZIP, 142 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews

  • Conference paper
  • First Online: 16 May 2023
  • Cite this conference paper

loss function literature review

  • Parth Vyas 41 ,
  • Manish Sharma 41 ,
  • Akhtar Rasool   ORCID: orcid.org/0000-0001-9964-2414 41 &
  • Aditya Dubey   ORCID: orcid.org/0000-0002-4885-0632 41  

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 998))

Included in the following conference series:

  • International Conference on Machine Intelligence and Signal Processing

366 Accesses

Loss functions play a critical role in evaluating the performance of a model trained on specific parameters of the dataset. In simpler terms, loss functions serve as a penalty for a bad prediction to improve the prediction with its testing values. However, in the case of a prediction on an imbalanced dataset, the loss function must also be modified so that the weights of the loss that occurred due to misclassification do not change. In contrast, the weights in case of loss due to correct classification were reduced. Different developers have brought up a variety of loss functions to curb this problem that occurred due to an imbalanced dataset, one of them being the focal loss function. This focal loss function will be compared with the normal cross-entropy loss function, widely used in the evaluation. The final comparison between the two-loss functions will help determine whether the change of loss function can create how much difference in the model performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

loss function literature review

Model Optimization in Imbalanced Regression

loss function literature review

A Comparative Study of Assessment Metrics for Imbalanced Learning

loss function literature review

Definition of Loss Functions for Learning from Imbalanced Data to Minimize Evaluation Metrics

Malavolta I, Ruberto S, Soru T, Terragni V (2015) Hybrid mobile apps in the google play store: an exploratory investigation. In: 2nd ACM international conference on mobile software engineering and systems, pp. 56–59

Google Scholar  

Viennot N, Garcia E, Nieh J (2014) A measurement study of google play. ACM SIGMETRICS Perform Eval Rev 42(1), 221–233

McIlroy S, Shang W, Ali N, Hassan AE (2017) Is it worth responding to reviews? Studying the top free apps in Google Play. IEEE Softw 34(3):64–71

Article   Google Scholar  

Shashank S, Naidu B (2020) Google play store apps—data analysis and ratings prediction. Int Res J Eng Technol (IRJET) 7:265–274

Arxiv A Longitudinal study of Google Play page, https://arxiv.org/abs/1802.02996 , Accessed 21 Dec 2021

Patil HP, Atique M (2015) Sentiment analysis for social media: a survey. In: 2nd international conference on information science and security (ICISS), pp. 1–4

Zainuddin N, Selamat, A.: Sentiment analysis using support vector machine. In: International conference on computer, communications, and control technology (I4CT) 2014, pp. 333–337

Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbor. Sci Rep 11(1)

Li X, Wang X, Liu H (2021) Research on fine-tuning strategy of sentiment analysis model based on BERT. In: International conference on communications, information system and computer engineering (CISCE), pp. 798–802

Mohammadian S, Karsaz A, Roshan YM (2017) A comparative analysis of classification algorithms in diabetic retinopathy screening. In: 7th international conference on computer and knowledge engineering (ICCKE) 2017, pp. 84–89

Latif R, Talha Abdullah M, Aslam Shah SU, Farhan M, Ijaz F, Karim A (2019) Data scraping from Google Play Store and visualization of its content for analytics. In: 2nd international conference on computing, mathematics and engineering technologies (iCoMET) 2019, pp. 1–8

Day M, Lin Y (2017) Deep learning for sentiment analysis on Google Play consumer review. IEEE Int Conf Inf Reuse Integr (IRI) 2017:382–388

Abdul Khalid KA, Leong TJ, Mohamed K (2016) Review on thermionic energy converters. IEEE Trans Electron Devices 63(6):2231–2241

Regulin D, Aicher T, Vogel-Heuser B (2016) Improving transferability between different engineering stages in the development of automated material flow modules. IEEE Trans Autom Sci Eng 13(4):1422–1432

Li D, Qian J (2016) Text sentiment analysis based on long short-term memory. In: First IEEE international conference on computer communication and the internet (ICCCI) 2016, pp. 471–475 (2016)

Lin T, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327

Arxiv A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, https://arxiv.org/abs/cs/0409058 , Accessed 21 Dec 2021

Sfu Webpage Methods for Creating Semantic Orientation Dictionaries, https://www.sfu.ca/~mtaboada/docs/publications/Taboada_et_al_LREC_2006.pdf , Accessed 21 Dec 2021

Sudhir P, Suresh VD (2021) Comparative study of various approaches, applications and classifiers for sentiment analysis. Glob TransitS Proc 2(2):205–211

Gillioz A, Casas J, Mugellini E, Khaled OA (2020) Overview of the transformer-based models for NLP tasks. In: 15th conference on computer science and information systems (FedCSIS) 2020, pp. 179–183

Zhou Y, Li M (2020) Online course quality evaluation based on BERT. In: 2020 International conference on communications, information system and computer engineering (CISCE) 2020, pp. 255–258

Truong TL, Le HL, Le-Dang TP (2020) Sentiment analysis implementing BERT-based pre-trained language model for Vietnamese. In: 7th NAFOSTED conference on information and computer science (NICS) 2020, pp. 362–367 (2020)

Kano T, Sakti S, Nakamura S (2021) Transformer-based direct speech-to-speech translation with transcoder. IEEE spoken language technology workshop (SLT) 2021, pp. 958–965

Arxiv Comparing BERT against traditional machine learning text classification, https://arxiv.org/abs/2005.13012 , Accessed 21 Dec 2021

Arxiv A Comparison of LSTM and BERT for Small Corpus, https://arxiv.org/abs/2009.05451 , Accessed 21 Dec 2021

Arxiv BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/abs/1810.04805 , Accessed 21 Dec 2021

Naseer M, Asvial M, Sari RF (2021) An empirical comparison of BERT, RoBERTa, and Electra for fact verification. In: International conference on artificial intelligence in information and communication (ICAIIC) 2021, pp. 241–246

Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8:4806–4813

Zhou Y, Wang X, Zhang M, Zhu J, Zheng R, Wu Q (2019) MPCE: a maximum probability based cross entropy loss function for neural network classification. IEEE Access 7:146331–146341

Yessou H, Sumbul G, Demir B (2020) A Comparative study of deep learning loss functions for multi-label remote sensing image classification. IGARSSIEEE international geoscience and remote sensing symposium 2020, pp. 1349–1352

Liu L, Qi H (2017) Learning effective binary descriptors via cross entropy. In: IEEE winter conference on applications of computer vision (WACV) 2017, pp. 1251–1258 (2017)

Riquelme N, Von Lücken C, Baran B (2015) Performance metrics in multi-objective optimization. In: Latin American Computing Conference (CLEI) 2015, pp. 1–11

Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714

Download references

Author information

Authors and affiliations.

Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, India

Parth Vyas, Manish Sharma, Akhtar Rasool & Aditya Dubey

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Aditya Dubey .

Editor information

Editors and affiliations.

Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India

Pradeep Singh

Deepak Singh

Department of Computer Science and Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India

Vivek Tiwari

Østfold University College, Halden, Norway

Sanjay Misra

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Vyas, P., Sharma, M., Rasool, A., Dubey, A. (2023). Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews. In: Singh, P., Singh, D., Tiwari, V., Misra, S. (eds) Machine Learning and Computational Intelligence Techniques for Data Engineering. MISP 2022. Lecture Notes in Electrical Engineering, vol 998. Springer, Singapore. https://doi.org/10.1007/978-981-99-0047-3_11

Download citation

DOI : https://doi.org/10.1007/978-981-99-0047-3_11

Published : 16 May 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-99-0046-6

Online ISBN : 978-981-99-0047-3

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. Three kinds of widely used loss functions in the literature. (a

    loss function literature review

  2. Loss functions and evaluation function

    loss function literature review

  3. Loss Function

    loss function literature review

  4. The illustrations of loss functions in classification

    loss function literature review

  5. Loss Function: Key Components & Types

    loss function literature review

  6. The illustrations of loss functions in classification

    loss function literature review

VIDEO

  1. Writing the Literature Review (recorded lecture during pandemic)

  2. Sources of literature review #bsc nursing #nursing research

  3. AI for Literature Review: Get research article summaries in seconds!!!

  4. Literature Review for Research Paper

  5. Literature Review Critical Questions Part 3

  6. Introduction to Literature Review, Systematic Review, and Meta-analysis

COMMENTS

  1. A Comprehensive Survey of Loss Functions in Machine Learning

    As one of the important research topics in machine learning, loss function plays an important role in the construction of machine learning algorithms and the improvement of their performance, which has been concerned and explored by many researchers. But it still has a big gap to summarize, analyze and compare the classical loss functions. Therefore, this paper summarizes and analyzes 31 ...

  2. Recent advances on loss functions in deep learning for computer vision

    Recently, designing loss functions for deep learning methods has become one of the most challenging problems. This paper provides a comprehensive review of the recent progress and frontiers about loss functions in deep learning, especially for computer vision tasks. Specifically, we discuss the loss functions in three main computer vision tasks ...

  3. A survey and taxonomy of loss functions in machine learning

    Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2301.05579 [cs.LG] (or arXiv:2301.05579v1 [cs.LG] for this version)

  4. Survey of the loss function in classification models: Comparative study

    The selection of an appropriate classification approach depends heavily on the classification rate, which is the most important factor in achieving the desired decision quality. While researchers have examined the impact of different features on the performance of classification approaches, cost/loss functions have received less attention in the comparative literature review, despite their ...

  5. Recent advances on loss functions in deep learning for computer vision

    This paper provides a comprehensive review of the recent progress and frontiers about loss functions in deep learning, especially for computer vision tasks. Specifically, we discuss the loss functions in three main computer vision tasks, i.e., object detection, face recognition, and image segmentation. Scholars have proposed several novel loss ...

  6. Influence of cost/loss functions on classification rate: A comparative

    In the literature, cost/loss functions based on the type of used function (i.e., Continuous/Semi-continuous/Discrete and Linear/Nonlinear) are categorized into three main categories and six subcategories. ... This section provides a brief review of the literature on cost/loss functions used for classification purposes in various applications ...

  7. PDF A Comprehensive Survey of Loss Functions in Machine Learning

    proposes a new partition criterion of loss functions, then summarizes 31 important loss functions from several perspectives according to the partition criterion, such as formula, image, algorithm and so on. All loss functions in this paper are listed in Table 1. In the rest of this paper, the partition criterion of loss functions in this paper will

  8. [2307.02694] Loss Functions and Metrics in Deep Learning

    Juan Terven, Diana M. Cordova-Esparza, Alfonso Ramirez-Pedraza, Edgar A. Chavez-Urbiola, Julio A. Romero-Gonzalez. View a PDF of the paper titled Loss Functions and Metrics in Deep Learning, by Juan Terven and 4 other authors. When training or evaluating deep learning models, two essential parts are picking the proper loss function and deciding ...

  9. A Comprehensive Survey of Loss Functions in Machine Learning

    Therefore, this paper summarizes and analyzes 31 classical loss. functions in machine learning. Specifically, w e describe the loss functions from the. aspects of traditional machine learning and ...

  10. A Comparative Study of Loss Functions for Deep Neural ...

    This paper is organized into six sections: Sect. 1 introduces the problem of selecting the appropriate loss function for deep neural networks. Section 2 reviews the existing literature on loss functions in regression tasks on time-series datasets. The significance of loss functions and their role in DNN architecture are discussed in detail in Sect. 3.

  11. Loss Functions and Metrics in Deep Learning

    This paper provides a comprehensive overview of the most common loss functions and metrics used across many different types of deep learning tasks, from general tasks such as regression and classification to more specific tasks in Computer Vision and Natural Language Processing. When training or evaluating deep learning models, two essential parts are picking the proper loss function and ...

  12. [PDF] Loss Functions in the Era of Semantic ...

    A novel taxonomy and thorough review of how these loss functions are customized and leveraged in image segmentation, with a systematic categorization emphasizing their significant features and applications is provided. Semantic image segmentation, the process of classifying each pixel in an image into a particular class, plays an important role in many visual understanding systems. As the ...

  13. PDF A Robust Loss Function for Multiclass Classification

    Loss Function has been proposed, applied and evaluated in binary data classification [13]. However, in many real ... We first review some important loss functions for multiclass classification ... literature to extend the binary classifiers to the multiclass case. In particular, a wide class of smooth convex loss

  14. A Comprehensive Survey of Regression Based Loss Functions for Time

    function/loss function which can reduce the set of experiments, converge the model faster, and overall help researchers lead their research in better direction. The rest of the paper is outlined as follows, Section II provides explanation about time series data, and Section III describes the regression based loss functions in detail. Section

  15. Human Pose Estimation Using Deep Learning: A Systematic Literature Review

    A loss function is a mathematical function that trains a model by updating its parameters. The choice of an appropriate loss function significantly impacts the model's performance. In 2D HPE, selecting a suitable loss function depends on several factors, including the type of task (regression or classification), the model architecture ...

  16. Novel loss functions for ensemble-based medical image classification

    Literature review techniques have to be strengthened by including the issues in the current system and how the author proposes to overcome the same. ... This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the ...

  17. A review of small object and movement detection based loss function and

    The first topic covered is the detection of small objects and VOs, as well as a study on current technology. The classification and description of the detection, loss function, and optimization strategies are presented as a comparison table. 2 Literature review. There are various approaches present for small object and movement detection.

  18. PDF Loss functions, utility functions and Bayesian sample size ...

    different types of asymmetric loss functions found in the literature. We also introduce a new bounded asymmetric loss function and obtain SSD under this loss function. In addition, to estimate a parameter following a particular model, we present some theoretical results for the optimum SSD problem under a particular choice of loss function.

  19. A novel loss function to reproduce texture features for deep learning

    CT and MRI images were collected for each patient, and then rigidly registered as pre-procession. We proposed a gray-level co-occurrence matrix (GLCM)-based loss function to improve the reproducibility of texture features. This novel loss function could be embedded into the present deep learning-based framework for image synthesis.

  20. Handling Censoring and Censored Data in Survival Analysis: A Standalone

    A quality literature review follows a four-phase process, including planning, literature selection, data extraction ... observed that the linear exponential loss function had the narrowest credible intervals with respect to the Tierney and Kadane approach as compared to the credible intervals of Bayes using Lindley and the confidence intervals ...

  21. Physics-Informed Neural Network (PINN) Evolution and Beyond: A ...

    This research aims to study and assess state-of-the-art physics-informed neural networks (PINNs) from different researchers' perspectives. The PRISMA framework was used for a systematic literature review, and 120 research articles from the computational sciences and engineering domain were specifically classified through a well-defined keyword search in Scopus and Web of Science databases ...

  22. Comparative Study of Loss Functions for Imbalanced Dataset ...

    At last, the conclusion was to compare the five loss functions and find which will be the best-optimized loss function for sentiment analysis of imbalanced datasets. This segment provides a literature review of the results achieved in this field. For the comparison of loss functions first need was for an imbalanced dataset.

  23. The Impact of Monoclonal Antibody Usage on Hearing Outcomes: A

    Two authors (P.A. and Y.S.) performed the initial review of the literature and a third (A.E.Q.) served as an adjudicator. Full-text articles were reviewed by the same two authors (P.A. and Y.S.) and adjudicated by a third (A.E.Q). Reference lists of included full-text articles were additionally reviewed to identify articles for inclusion.

  24. Financial time series forecasting with deep learning : A systematic

    The inverse of function f (x), called function g (h), produces the reconstruction of output r (W 2 denotes a weight matrix, b 2 denotes a bias vector, and σ 2 is an element-wise sigmoid activation function of the decoder). Eqs. (25), (26) illustrate the simple AE process [77]. Eq. (27) shows

  25. Exploring the Connections Between Project Management Offices and

    Moving in this direction required recognition that the PMO was a socially constructed entity that changed over time. Several theoretical perspectives appeared helpful in exploring the social construction of PMOs, including organizational design theory, interorganizational networks, institutional theory, practice-based and processual theories, contingency theory, and complexity theory, among ...