Image Classification Using CNN

Proceedings of the International Conference on Innovative Computing & Communication (ICICC) 2021

5 Pages Posted: 27 Apr 2021

Atul Sharma

Lovely Professional University

Gurbakash Phonsa

Date Written: April 24, 2021

Content Based Image Retrieval Technique(CBIR) is used to retrieve images from a database by adding some algorithms. The images are initially stored in the database and then retrieved on the basis of different features and techniques. User can extract images based on different search results. Still, there are various algorithms which are unable to find some specific criteria. Users directly write any name and get relevant results based on that. But there were lots of challenges which were solved by using various algorithms. The algorithms used in CBIR must be optimized for good results as well as higher accuracy and recall rate. Image classification is a technique in which the images are classified into different classes. Image classification is used to accurately classify the images based on different categories and based on different techniques the images are been set to a particular class. If an image belongs to the class A, then the algorithm must ensure that it must classify it as class A image. Convolutional neural network(CNN) is a technique which we can use for the image classification. This paper will show how the image classification works in case of cifar-10 dataset. We used the sequential method for the CNN and implemented the program in jupyter notebook. We took 3 classes and classify them using CNN. The classes were aeroplane, bird and car.We presente d the classification by using CNN and we took batch size as 64. We got 94% accuracy for the 3 classes used in cifar-10 dataset.

Suggested Citation: Suggested Citation

Lovely Professional University ( email )

Jalandhar, Punjab India

Gurbakash Phonsa (Contact Author)

Do you have a job opening that you would like to promote on ssrn, paper statistics.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Classification and Identification of Objects in Images Using CNN

  • Conference paper
  • First Online: 14 December 2022
  • Cite this conference paper

image classification using cnn research paper

  • Rajesh Kumar Chatterjee 11 ,
  • Md. Amir Khusru Akhtar 11 &
  • Dinesh K. Pradhan   ORCID: orcid.org/0000-0001-9132-9255 12  

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1673))

Included in the following conference series:

  • International Conference on Artificial Intelligence and Data Science

406 Accesses

Convolution neural network also called as CNN is one of the deep learning technique. CNN in recent time has evolved as most popular tool to solve vision related use cases. In the field of computer vision, the challenge of classifying a given image and detecting an object in an image is extremely difficult and it has numerous real-world applications. In recent years, the use of CNN has increased dramatically in a variety of fields, including image classification, segmentation, and object recognition. Alex Nets, GoogLeNet, and ResNet50 are the most popular CNNs for object detection and from the different images. The performance of CNN depends directly on its hyperparameters. More you tune those parameters better you get the results. As a result, it’s an important study on how to use CNN to improve object detection performance. Many strategies have been explored to optimise the hyperparameters of the CNN architecture. Gradient Descent, Back Propagation, Genetic Algorithm, Adam Optimization, and so on are some of them. The CNN architecture was trained using a variety of population-based search and evolutionary computing (EC) methodologies. Genetic algorithms, differential evolution, ant colony optimization, and particle swarm optimization, among other population-based techniques, have recently been utilised to train hyperparameters. In this literature, we will review the various aspects of CNN and its architecture followed by a detailed explanation of optimization strategies that aid in boosting accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

image classification using cnn research paper

A Review: Image Classification and Object Detection with Deep Learning

image classification using cnn research paper

Object Detection Using Deep Learning Approaches

image classification using cnn research paper

Comparative Analysis of Object Detection Algorithms

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521 (7553), 436–444 (2015)

Article   Google Scholar  

Taheri, K., Hasanipanah, M., Golzar, S.B., Majid, M.Z.A.: A hybrid artificial bee colony algorithm-artificial neural network for forecasting the blast-produced ground vibration. Eng. Comput. 33 (3), 689–700 (2017)

Ghaleini, E.N., Koopialipoor, M., Momenzadeh, M., Sarafraz, M.E., Mohamad, E.T., Gordan, B.: A combination of artificial bee colony and neural network for approximating the safety factor of retaining walls. Eng. Comput. 35 (2), 647–658 (2019)

Zhang, L., Li, H.: A mixed-coding adaptive differential evolution for optimising the architecture and parameters of feedforward neural networks. Int. J. Sensor Netw. 29 (4), 262–274 (2019)

Gupta, S., Deep, K., Mirjalili, S.: An efficient equilibrium optimizer with mutation strategy for numerical optimization. Appl. Soft Comput. 96 , 106542 (2020)

Sung, K.-K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1), 39–51 (1998)

Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34 (4), 743–761 (2012)

Kobatake, H., Yoshinaga, Y.: Detection of spicules on mammogram based on skeleton analysis. IEEE Trans. Med. Imaging 15 (3), 235–245 (1996)

Caffe \(|\) Proceedings of the 22nd ACM international conference on Multimedia

Google Scholar  

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields, pp. 7291–7299 (2017)

Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 633–638, December 2016

Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, pp. 2722–2730 (2015)

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-View 3D Object Detection Network for Autonomous Driving, pp. 1907–1915 (2017)

Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159 , 296–307 (2020)

Dhillon, A., Verma, G.K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Prog. Artif. Intell. 9 (2), 85–112 (2020)

Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160 (1), 106–154 (1962)

Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Amari, S., Arbib, M.A. (eds.) Competition and Cooperation in Neural Nets, pp. 267–285. Springer, Heidelberg (1982). https://doi.org/10.1007/978-3-642-46466-9_18

Chapter   Google Scholar  

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), 2278–2324 (1998)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)

Kim, Y., Li, Y.: Human activity classification with transmission and reflection coefficients of on-body antennas through deep convolutional neural networks. IEEE Trans. Antennas Propag. 65 (5), 2764–2768 (2017)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60 (6), 84–90 (2017)

Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)

Zhao, Z.-Q., Zheng, P., Shou-tao, X., Xindong, W.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30 (11), 3212–3232 (2019)

Kavukcuoglu, K., Ranzato, M.A., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1605–1612. IEEE (2009)

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

Shang, R., He, J., Wang, J., Kaiming, X., Jiao, L., Stolkin, R.: Dense connection and depthwise separable convolution based CNN for polarimetric SAR image classification. Knowl.-Based Syst. 194 , 105542 (2020)

Wen, S., et al.: Multilabel image classification via feature/label co-projection. IEEE Trans. Syst. Man Cybern. Syst. 51 (11), 7250–7259 (2020)

Chaganti, S.Y., Nanda, I., Pandi, K.R., Prudhvith, T.G., Kumar, N.: Image classification using SVM and CNN. In: 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), pp. 1–5. IEEE (2020)

Tang, C., Feng, Y., Yang, X., Zheng, C., Zhou, Y.: The object detection based on deep learning. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE), pp. 723–728. IEEE (2017)

Garcia Cardona, C.: Generalized convolutional representation for field data on graphs. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2017)

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)

Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678 (2014)

Zhao, Z.-Q., Xie, B.-J., Cheung, Y., Wu, X.: Plant leaf identification via a growing convolution neural network with progressive sample learning. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 348–361. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16808-1_24

Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38

Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166 (2014)

Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 633–638. IEEE (2016)

Mliki, H., Dammak, S., Fendri, E.: An improved multi-scale face detection using convolutional neural network. SIViP 14 (7), 1345–1353 (2020). https://doi.org/10.1007/s11760-020-01680-w

Tomè, D., Monti, F., Baroffio, L., Bondi, L., Tagliasacchi, M., Tubaro, S.: Deep convolutional neural networks for pedestrian detection. Signal Process. Image Commun. 47 , 482–489 (2016)

Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 924–933. IEEE (2017)

Zhao, Z.-Q., Bian, H., Hu, D., Cheng, W., Glotin, H.: Pedestrian detection based on fast R-CNN and batch normalization. In: Huang, D.-S., Bevilacqua, V., Premaratne, P., Gupta, P. (eds.) ICIC 2017. LNCS, vol. 10361, pp. 735–746. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63309-1_65

Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 461–470 (2015)

Download references

Author information

Authors and affiliations.

Usha Martin University, Ranchi, India

Rajesh Kumar Chatterjee & Md. Amir Khusru Akhtar

Dr. B. C. Roy Engineering College, Durgapur, India

Dinesh K. Pradhan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rajesh Kumar Chatterjee .

Editor information

Editors and affiliations.

Sreyas Institute of Engineering and Technology, Hyderabad, India

Ashwani Kumar

University of Maribor, Maribor, Slovenia

Iztok Fister Jr.

Jaypee University of Information Technology, Solan, India

P. K. Gupta

LGF CNRS UMR 5307, MINES, Saint-Etienne, France

Johan Debayle

University of North Florida, Jacksonville, FL, USA

Zuopeng Justin Zhang

King Khalid University, Abha, Saudi Arabia

Mohammed Usman

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Chatterjee, R.K., Akhtar, M.A.K., Pradhan, D.K. (2022). Classification and Identification of Objects in Images Using CNN. In: Kumar, A., Fister Jr., I., Gupta, P.K., Debayle, J., Zhang, Z.J., Usman, M. (eds) Artificial Intelligence and Data Science. ICAIDS 2021. Communications in Computer and Information Science, vol 1673. Springer, Cham. https://doi.org/10.1007/978-3-031-21385-4_2

Download citation

DOI : https://doi.org/10.1007/978-3-031-21385-4_2

Published : 14 December 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-21384-7

Online ISBN : 978-3-031-21385-4

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, image classification.

4047 papers with code • 151 benchmarks • 250 datasets

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection , which involves classification and location of multiple objects within an image, image classification typically pertains to single-object images. When the classification becomes highly detailed or reaches instance-level, it is often referred to as image retrieval , which also involves finding similar images in a large database.

Source: Metamorphic Testing for Object Detection Systems

image classification using cnn research paper

Benchmarks Add a Result

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
OmniVec(ViT)
efficient adaptive ensembling
EffNet-L2 (SAM)
VIT-L/16 (Spinal FC)
CoCa
Branching/Merging CNN + Homogeneous Vector Capsules
OmniVec2
E2E-M3
Baseline (ViT-G/14)
CCT-14/7x2
LRA-diffusion (CC)
LRA-diffusion (CLIP ViT)
ALIGN (50 hypers/task)
Model soups (BASIC-L)
Fine-Tuning DARTS
VGG-5 (Spinal FC)
Hiera-H (448px)
NOAH-ViTB/16
efficient adaptive ensembling
Astroformer
ViT-Large/16 (384)
ViT-Large/16 (384)
MAM (ViT-B/16)
InternImage-H
IMP+MTP(IntenImage-XL)
Hiera-H (448px)
WaveMixLite-128/7
ResNet50
Linear FT(ViT-L/14)
VIT-L/16 (Spinal FC, Background)
WaveMixLite-112/16
CurriculumNet
CoAtNet-1
EGNN+Transduction
Heinsen Routing
WaveMixLite-112/16
Bamboo (ViTB/16)
OmniVec2
efficient adaptive ensembling
MLP-DecAug
TWIST (ResNet-50)
NCR (ResNet-18)
NCR (ResNet-18)
NCR (ResNet-18)
adaptive minimal ensembling
LRA-diffusion (CLIP ViT)
BiT-L (ResNet)
UPANets
EfficientNet-B3
V-MoE-H/14 (Every-2)
STS-ResNet
InstanceGM-SS
Entropy-based Logic Explained Network
PGDF (ResNet-18)
SWAG (ViT H/14)
AG-Net
HyT-NAS-BA
cFlow
DL+PCA+GWO
TC-VII (with outside data)
kEffNet-B0 V2 16ch
FG-MAE (ViT-S/16)
Claude 3 Opus
HiFuse_Base
HiFuse_Small
DenseNet121_256x256_Nutrispace
Multi-task
PCGAN-CHAR
PCGAN-CHAR
PCGAN-CHAR
LRA-diffusion
Our Ensemble Learning-2
Fuzzy Distance Ensemble
FaMUS
MentorMix
FaMUS
L3D_original_2level
SimCLR
Sparse-CBM
SEER (RegNet10B)
SEER (RegNet10B)
Diffusion Classifier (zero-shot)
µ2Net+ (ViT-L/16)
SparseSwin with L2
WaveMix
WaveMix
Inception-v3
ASF-former-S
mMND (STDP)
ResMLP-24
UnMixMatch
HiFuse_Base
CapsNet
VGG-5(Spinal FC)
E2E-3M
WRN-28-2 + UDA+AutoDropout
ResNet-50 + UDA+AutoDropout
CoNAL
E2E-3M
ResNet-50
NNCLR
PDO-eConv (ours)
PDO-eConv (ours)
Max Margin Contrastive
MentorMix
Fuzzy rank-based fusion of CNN models using Gompertz function
TransBoost-ResNet50
µ2Net+ (ViT-L/16)
HSANR
ResNet-152 2x (RS training)
ThanosNet
EnGraf-Net101 (G=4, H=1)
SEER (RegNet10B)
ResNet-18
WaveMixLite-128/16
WaveMix-256/16 (level 2)
AP-GeM (ResNet-101)
kMobileNet V3 Large 16ch
shreynet
BinaryViT
FedAvgM + ASAM + SWA
µ2Net (ViT-L/16)
µ2Net+ (ViT-L/16)
µ2Net+ (ViT-L/16)
ResNet50
TransBoost-ResNet50
pFedBreD_ns_mg
SqueezeNet + Simple Bypass
SqueezeNet + Simple Bypass
µ2Net+ (ViT-L/16)
RADAM (ConvNeXt-XL)
RADAM (ConvNeXt-L)
ResNet
WaveMixLite
ResNet
WRN (N=28, k=10)
WRN (N=36, k=5)
VOLO-D5
Model with negotiation paradigm
Model with negotiation paradigm
Model with negotiation paradigm
Model with negotiation paradigm
ArabSignNet
ArabSignNet
RedNet-152
CNN+ Wilson-Cowan model RNN
Ours
SAM
EfficientNet-L2-Ns

image classification using cnn research paper

Most implemented papers

Deep residual learning for image recognition.

Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

We present a class of efficient models called MobileNets for mobile and embedded vision applications.

MobileNetV2: Inverted Residuals and Linear Bottlenecks

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

image classification using cnn research paper

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

Densely Connected Convolutional Networks

image classification using cnn research paper

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

For captioning and VQA, we show that even non-attention based models can localize inputs.

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection.

Rethinking the Inception Architecture for Computer Vision

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks.

  • Open access
  • Published: 11 February 2019

Research on image classification model based on deep convolution neural network

  • Mingyuan Xin 1 &
  • Yong Wang 2  

EURASIP Journal on Image and Video Processing volume  2019 , Article number:  40 ( 2019 ) Cite this article

49k Accesses

142 Citations

Metrics details

Based on the analysis of the error backpropagation algorithm, we propose an innovative training criterion of depth neural network for maximum interval minimum classification error. At the same time, the cross entropy and M 3 CE are analyzed and combined to obtain better results. Finally, we tested our proposed M3 CE-CEc on two deep learning standard databases, MNIST and CIFAR-10. The experimental results show that M 3  CE can enhance the cross-entropy, and it is an effective supplement to the cross-entropy criterion. M3 CE-CEc has obtained good results in both databases.

1 Introduction

Traditional machine learning methods (such as multilayer perception machines, support vector machines, etc.) mostly use shallow structures to deal with a limited number of samples and computing units. When the target objects have rich meanings, the performance and generalization ability of complex classification problems are obviously insufficient. The convolution neural network (CNN) developed in recent years has been widely used in the field of image processing because it is good at dealing with image classification and recognition problems and has brought great improvement in the accuracy of many machine learning tasks. It has become a powerful and universal deep learning model.

Convolutional neural network (CNN) is a multilayer neural network, and it is also the most classical and common deep learning framework. A new reconstruction algorithm based on convolutional neural networks is proposed by Newman et al. [ 1 ] and its advantages in speed and performance are demonstrated. Wang et al. [ 2 ] discussed three methods, that is, the CNN model with pretraining or fine-tuning and the hybrid method. The first two executive images are passed to the network one time, while the last category uses a patch-based feature extraction scheme. The survey provides a milestone in modern case retrieval, reviews a wide selection of different categories of previous work, and provides insights into the link between SIFT and the CNN based approach. After analyzing and comparing the retrieval performance of different categories on several data sets, we discuss a new direction of general and special case retrieval. Convolution neural network (CNN) is very interested in machine learning and has excellent performance in hyperspectral image classification. Al-Saffar et al. [ 3 ] proposed a classification framework called region-based pluralistic CNN, which can encode semantic context-aware representations to obtain promising features. By combining a set of different discriminant appearance factors, the representation based on CNN presents the spatial spectral contextual sensitivity that is essential for accurate pixel classification. The proposed method for learning contextual interaction features using various region-based inputs is expected to have more discriminant power. Then, the combined representation containing rich spectrum and spatial information is fed to the fully connected network and the label of each pixel vector is predicted by the Softmax layer. The experimental results of the widely used hyperspectral image datasets show that the proposed method can outperform any other traditional deep-learning-based classifiers and other advanced classifiers. Context-based convolution neural network (CNN) with deep structure and pixel-based multilayer perceptron (MLP) with shallow structure are recognized neural network algorithms which represent the most advanced depth learning methods and classical non-neural network algorithms. The two algorithms with very different behaviors are integrated in a concise and efficient manner, and a rule-based decision fusion method is used to classify very fine spatial resolution (VFSR) remote sensing images. The decision fusion rules, which are mainly based on the CNN classification confidence design, reflect the usual complementary patterns of each classifier. Therefore, the ensemble classifier MLP-CNN proposed by Said et al. [ 4 ] acquires supplementary results obtained from CNN based on deep spatial feature representation and MLP based on spectral discrimination. At the same time, the CNN constraints resulting from the use of convolution filters, such as the uncertainty of object boundary segmentation and the loss of useful fine spatial resolution details, are compensated. The validity of the ensemble MLP-CNN classifier was tested in urban and rural areas using aerial photography and additional satellite sensor data sets. MLP-CNN classifier achieves promising performance and is always superior to pixel based MLP, spectral and texture based MLP, and context-based CNN in classification accuracy. The research paves the way for solving the complex problem of VFSR image classification effectively. Periodic inspection of nuclear power plant components is important to ensure safe operation. However, current practice is time-consuming, tedious, and subjective, involving human technicians examining videos and identifying reactor cracks. Some vision-based crack detection methods have been developed for metal surfaces, and they generally perform poorly when used to analyze nuclear inspection videos. Detecting these cracks is a challenging task because of their small size and the presence of noise patterns on the surface of the components. Huang et al. [ 5 ] proposed a depth learning framework based on convolutional neural network (CNN) and Naive Bayes data fusion scheme (called NB-CNN), which can be used to analyze a single video frame for crack detection. At the same time, a new data fusion scheme is proposed to aggregate the information extracted from each video frame to enhance the overall performance and robustness of the system. In this paper, a CNN is proposed to detect the fissures in each video frame, the proposed data fusion scheme maintains the temporal and spatial coherence of the cracks in the video, and the Naive Bayes decision effectively discards the false positives. The proposed framework achieves a hit rate of 98.3% 0.1 false positives per frame which is significantly higher than the most advanced method proposed in this paper. The prediction of visual attention data from any type of media is valuable to content creators and is used to drive coding algorithms effectively. With the current trend in the field of virtual reality (VR), the adaptation of known technologies to this new media is beginning to gain momentum R. Gupta and Bhavsar [ 6 ] proposed an extension to the architecture of any convolutional neural network (CNN) to fine-tune traditional 2D significant prediction to omnidirectional image (ODI). In an end-to-end manner, it is shown that each step in the pipeline presented by them is aimed at making the generated salient map more accurate than the ground live data. Convolutional neural network (Ann) is a kind of depth machine learning method derived from artificial neural network (Ann), which has achieved great success in the field of image recognition in recent years. The training algorithm of neural network is based on the error backpropagation algorithm (BP), which is based on the decrease of precision. However, with the increase of the number of neural network layers, the number of weight parameters will increase sharply, which leads to the slow convergence speed of the BP algorithm. The training time is too long. However, CNN training algorithm is a variant of BP algorithm. By means of local connection and weight sharing, the network structure is more similar to the biological neural network, which not only keeps the deep structure of the network, but also greatly reduces the network parameters, so that the model has good generalization energy and is easier to train. This advantage is more obvious when the network input is a multi-dimensional image, so that the image can be directly used as the network input, avoiding the complex feature extraction and data reconstruction process in traditional recognition algorithm. Therefore, convolutional neural networks can also be interpreted as a multilayer perceptron designed to recognize two-dimensional shapes, which are highly invariant to translation, scaling, tilting, or other forms of deformation [ 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ].

With the rapid development of mobile Internet technology, more and more image information is stored on the Internet. Image has become another important network information carrier after text. Under this background, it is very important to make use of a computer to classify and recognize these images intelligently and make them serve human beings better. In the initial stage of image classification and recognition, people mainly use this technology to meet some auxiliary needs, such as Baidu’s star face function can help users find the most similar star. Using OCR technology to extract text and information from images, it is very important for graph-based semi-supervised learning method to construct good graphics that can capture intrinsic data structures. This method is widely used in hyperspectral image classification with a small number of labeled samples. Among the existing graphic construction methods, sparse representation (based on SR) shows an impressive performance in semi-supervised HSI classification tasks. However, most algorithms based on SR fail to consider the rich spatial information of HSI, which has been proved to be beneficial to classification tasks. Yan et al. [ 16 ] proposed a space and class structure regularized sparse representation (SCSSR) graph for semi-supervised HSI classification. Specifically, spatial information has been incorporated into the SR model through graph Laplace regularization, which assumes that spatial neighbors should have similar representation coefficients, so the obtained coefficient matrix can more accurately reflect the similarity between samples. In addition, they also combine the probabilistic class structure (which means the probabilistic relationship between each sample and each class) into the SR model to further improve the differentiability of graphs. Hyion and AVIRIS hyperspectral data show that our method is superior to the most advanced method. The invariance extracted by Zhang et al. [ 17 ], such as the specificity of uniform samples and the invariance of rotation invariance, is very important for object detection and classification applications. Current research focuses on the specific invariance of features, such as rotation invariance. In this paper, a new multichannel convolution neural network (mCNN) is proposed to extract the invariant features of object classification. Multi-channel convolution sharing the same weight is used to reduce the characteristic variance of sample pairs with different rotation in the same class. As a result, the invariance of the uniform object and the rotation invariance are encountered simultaneously to improve the invariance of the feature. More importantly, the proposed mCNN is particularly effective for small training samples. The experimental results of two datum datasets for handwriting recognition show that the proposed mCNN is very effective for extracting invariant features with a small number of training samples. With the development of big data era, convolutional neural network (CNN) with more hidden layers has more complex network structure and stronger feature learning and feature expression ability than traditional machine learning methods. Since the introduction of the convolutional neural network model trained by the deep learning algorithm, significant achievements have been made in many large-scale recognition tasks in the field of computer vision. Chaib et al. [ 18 ] first introduced the rise and development of deep learning and convolution neural network and summarized the basic model structure, convolution feature extraction, and pool operation of convolution neural network. Then, the research status and development trend of convolution neural network model based on deep learning in image classification are reviewed, and the typical network structure, training method, and performance are introduced. Finally, some problems in the current research are briefly summarized and discussed, and new directions of future development are predicted. Computer diagnostic technology has played an important role in medical diagnosis from the beginning to now. Especially, image classification technology, from the initial theoretical research to clinical diagnosis, has provided effective assistance for the diagnosis of various diseases. In addition, the image is the concrete image formed in the human brain by the objective things existing in the natural environment, and it is an important source of information for a human to obtain the knowledge of the external things. With the continuous development of computer technology, the general object image recognition technology in natural scene is applied more and more in daily life. From image processing technology in simple bar code recognition to text recognition (such as handwritten character recognition and optical character recognition OCR etc.) to biometric recognition (such as fingerprint, sound, iris, face, gestures, emotion recognition, etc.), there are many successful applications. Image recognition (Image Recognition), especially (Object Category Recognition) in natural scenes, is a unique skill of human beings. In a complex natural environment, people can identify concrete objects (such as teacups) at a glance (swallow, etc.) or a specific category of objects (household goods, birds, etc.). However, there are still many questions about how human beings do this and how to apply these related technologies to computers so that they have humanoid intelligence. Therefore, the research of image recognition algorithms is still in the fields of machine vision, machine learning, depth learning, and artificial intelligence [ 19 , 20 , 21 , 22 , 23 , 24 ].

Therefore, this paper applies the advantage of depth mining convolution neural network to image classification, tests the loss function constructed by M 3  CE on two depth learning standard databases MNIST and CIFAR-10, and pushes forward the new direction of image classification research.

2 Proposed method

Image classification is one of the hot research directions in computer vision field, and it is also the basic image classification system in other image application fields, which is usually divided into three important parts: image preprocessing, image feature extraction and classifier.

2.1 The ZCA process is shown as below

In this process, we first use PCA to zero the mean value. In this paper, we assume that X represents the image vector [ 25 ]: \( \mu =\frac{1}{m}\sum \limits_{j=1}^m{x}_j \)

Next, the covariance matrix for the entire data is calculated, with the following formulas:

where I represents the covariance matrix, I is decomposed by SVD [ 26 ], and its eigenvalues and corresponding eigenvectors are obtained.

Of which, U is the eigenvector matrix of ∑, and S is the eigenvalue matrix of ∑. Based on this, x can be whitened by PCA, and the formula is:

So X ZCAwhiten can be expressed as

For the data set in this paper, because the training sample and the test sample are not well distinguished [ 27 ], the random generation method is used to avoid the subjective color of the artificial classification.

2.2 Image feature extraction based on time-frequency composite weighting

Feature extraction is a concept in computer vision and image processing. It refers to the use of a computer to extract image information and determine whether the points of each image belong to an image feature extraction. The purpose of feature extraction is to divide the points on the image into different subsets, which are often isolated points, a continuous curve, or region. There are usually many kinds of features to describe the image. These features can be classified according to different criteria, such as point features, line features, and regional characteristics according to the representation of the features on the image data. According to the region size of feature extraction, it can be divided into two categories: global feature and local feature [ 24 ]. The image features used in some feature extraction methods in this paper include color feature and texture feature, analysis of the current situation of corner feature, and edge feature.

The time-frequency composite weighting algorithm for multi-frame blurred images is a frequency-domain and time-domain weighting simultaneous processing algorithm based on blurred image data. Based on the weighted characteristic of the algorithm and the feature extraction of target image in time domain and frequency domain, the depth extraction technique is based on the time-frequency composite weighting of night image to extract the target information from depth image. The main steps of the time-frequency composite weighted feature extraction method are as follows:

Step 1: Construct a time-frequency composite weighted signal model for multiple blurred images, as the following expression shows:

Of which, f ( t ) is original signal, S  = ( c  −  v )/( c  +  v ), called the image scale factor. Referred to as scale, it represents the signal scaling change of the original image time-frequency composite weighting algorithm. \( \sqrt{S} \) is the normalized factor of image time-frequency composite weighting algorithm.

Step 2: Map the one-dimensional function to the two-dimensional function y ( t ) of the time scale a and the time shift b , and perform a time-frequency composite weighted transform on the continuous nighttime image of the image time-frequency composite weighted 0 using the square integrable function as shown below:

Of which, divisor \( 1/\sqrt{\mid a\mid } \) . The energy normalization of the unitary transformation is ensured. ψ a , b is ψ ( t ) obtained by transforming U ( a ,  b ) through the affine group, as shown by the following expression:

Step 3: Substituting the variable of the original image f ( t )by a  = 1/ s and b  =  τ and rewriting the expression to obtain an expression:

Step 4: Build a multi-frame fuzzy image time-frequency composite weighted signal form.

Of which, rect( t ) = 1 and ∣ t   ∣   ≤ 1/2.

Step 5: The frequency modulation law of the time-frequency composite weighted signal of multi-thread fuzzy image is a hyperbolic function;

among them, K  =  Tf max f min / B , t 0  =  f 0 T / B , f 0 is arithmetic center frequency, and f max , f min are the minimum and maximum frequencies, respectively.

Step 6: Use the image transformation formula of the multi-detector fuzzy image time-frequency composite weighted signal to carry on the time-frequency composite weighting to the image, the definition of the image transformation is like the formula.

Of which, \( {b}_a=\left(1-a\right)\left(\frac{1}{afm_{ax}}-\frac{T}{2}\right) \) , and Ei (•) represents an exponential integral.

Final output image time-frequency composite weighted image signal W u u ( a ,  b ). Therefore, compared with the traditional time-domain, c extraction technique of image features can be better realized by the time-frequency composite weighting algorithm.

2.3 Application of deep convolution neural network in image classification

After obtaining the feature vectors from the image, the image can be described as a vector of fixed length, and then a classifier is needed to classify the feature vectors.

In general, a common convolution neural network consists of input layer, convolution layer, activation layer, pool layer, full connection layer, and final output layer from input to output. The convolutional neural network layer establishes the relationship between different computational neural nodes and transfers input information layer by layer, and the continuous convolution-pool structure decodes, deduces, converges, and maps the feature signals of the original data to the hidden layer feature space [ 28 ]. The next full connection layer classifies and outputs according to the extracted features.

2.3.1 Convolution neural network

Convolution is an important analytical operation in mathematics. It is a mathematical operator that generates a third function from two functions f and g , representing the area of overlap between function f and function g that has been flipped or translated. Its calculation is usually defined by a following formula:

Its integral form is the following:

In image processing, a digital image can be regarded as a discrete function of a two-dimensional space, denoted as f ( x , y ). Assuming the existence of a two-dimensional convolution function g ( x , y ), the output image z ( x , y ) can be represented by the following formula:

In this way, the convolution operation can be used to extract the image features. Similarly, in depth learning applications, when the input is a color image containing RGB three channels, and the image is composed of each pixel, the input is a high-dimensional array of 3 × image width × image length; accordingly, the kernel (called “convolution kernel” in the convolution neural network) is defined in the learning algorithm as the accounting. Computational parameter is also a high-dimensional array. Then, when two-dimensional images are input, the corresponding convolution operation can be expressed by the following formula:

The integral form is the following:

If a convolution kernel of m  ×  n is given, there is

where f represents the input image G to denote the size of the convolution kernel m and n . In a computer, the realization of convolution is usually represented by the product of a matrix. Suppose the size of an image is M × M and the size of the convolution kernel is n  ×  n . In computation, the convolution kernel multiplies with each image region of n  ×  n size of the image, which is equivalent to extracting the image region of n  ×  n and expressing it as a column vector of n  ×  n length. In a zero-zero padding operation with a step of 1, a total of ( M  −  n  + 1)  ∗  ( M  −  n  + 1) calculation results can be obtained; when these small image regions are each represented as a column vector of n  ×  n , the original image can be represented by the matrix [ n ∗ n ∗ ( M  −  n  + 1)]. Assuming that the number of convolution kernels is K , the output of the original image obtained by the above convolution operation is k ∗ ( M  −  n  + 1)  ∗  ( M  −  n  + 1). The output is the number of convolution kernels × the image width after convolution × the image length after convolution.

2.3.2 M 3 CE constructed loss function

In the process of neural network training, the loss function is the evaluation standard of the whole network model. It not only represents the current state of the network parameters, but also provides the gradient of the parameters in the gradient descent method, so the loss function is an important part of the deep learning training. In this paper, we introduce the loss function proposed by M 3  CE. Finally, the loss function of M 3  CE and cross-entropy is obtained by gradient analysis.

According to the definition of MCE, we use the output of Softmax function as the discriminant function. Then, the error classification metric formula is redefined as.

Where k is the label of the sample, q  = arg max l  ≠  k , P l represents the most confusing class of output of the Softmax function. If we use the logistic loss function, we can find the gradient of the loss function to Z .

This gradient is used in the backpropagation algorithm to get the gradient of the entire network, and it is worth noting that if z is misdivided,ℓ k will be infinitely close to 1, and a ℓ k (1 − ℓ k ) will be close to 0. Then, the gradient will be close to 0, which will cause almost no gradient to be reversed to the previous layer, which will not be good for the completion of the training process [ 29 ].

The sigmoid function is used in the traditional neural network activation function. But this is also the case during training. The observation formula shows that when the activation value is high the backpropagation gradient is very small which is called saturation. In the past, the influence of shallow neural networks was not very large, but with the increase of the number of network layers, this situation would affect the learning of the whole network. In particular, if the saturated sigmoid function is at a higher level, it will affect all the previous low-level gradients. Therefore, in the present depth neural networks, an unsaturated activation function linear rectifier unit (Rectified Linear Unit, Re LU) is used to replace the sigmoid function. It can be seen from the formula that when the input value is positive, the gradient of the linear rectifying unit is 1, so the gradient of the upper layer can be reversely transmitted to the lower layer without attenuation. The literature shows that linear rectification units can accelerate the training process and prevent gradient dispersion.

According to the fact that the saturation activation function in the middle of the network is not conducive to the training of the depth network, but the saturation function in the top loss function, has a great influence on the depth neural network.

We call it the max-margin loss, where the interval is defined as ∈ k  =  −  d k ( z ) =  P k  −  P q .

Since P k is a probability, that is, P k   ∈  [0, 1], then d k   ∈  [−1, 1], when a sample gradually becomes misclassified from the correct classification, d k increases from − 1 to 1, compared to the original logistic loss function, and even if the sample is seriously misclassified, the loss function still get the biggest loss value. Because of1 +  d k  ≥ 0, it can be simplified.

When we need to give a larger loss value to the wrong classification sample, the upper formula can be extended to

where γ is a positive integer. If γ  = 2 is set, we get the squared maximum interval loss function. If the function is to be applied to training deep neural networks, the gradient needs to be calculated and obtained according to the chain rule.

Here, we need to discuss (1) when the dimension is the dimension corresponding to the sample label, (2) when the dimension is the dimension corresponding to the confused category label, and (3) when the dimension is neither the sample label nor the dimension corresponding to the confused category label. The following conclusions have been drawn:

3 Experimental results

3.1 experimental platform and data preprocessing.

MNIST (Mixed National Institute of Standards and Technology) database is a standard database in machine learning. It consists of ten types of handwritten digital grayscale images, of which 60,000 training pictures are tested with a resolution of 28 × 28.

In this paper, we mainly use ZCA whitening to process the image data, such as reading the data into the array and reforming the size we need (Figs.  1 , 2 , 3 , 4 , and 5 ). The image of the data set is normalized and whitened respectively. It makes all pixels have the same mean value and variance, eliminates the white noise problem in the image, and eliminates the correlation between pixels and pixels.

figure 1

ZCA whitening flow chart

figure 2

Sample selection of different fonts and different colors

figure 3

Comparison of image feature extraction

figure 4

Image classification and modeling based on deep convolution neural network

figure 5

Comparison of recognition rates among different species

At the same time, a common way to change the results of image training is a random form of distortion, cropping, or sharpening the training input, which has the advantage of extending the effective size of the training data, thanks to all possible changes in the same image. And it tends to help network learning to deal with all distortion problems that will occur in the real use of classifiers. Therefore, when the training results are abnormal, the images will be deformed randomly to avoid the large interference caused by individual abnormal images to the whole model.

3.2 Build a training network

Classification algorithm is a relatively large class of algorithms, and image classification algorithm is one of them. Common classification algorithms are support vector machine, k -nearest algorithm, random forest, and so on. In image classification, support vector machine (SVM) based on the maximum boundary is the most widely used classification algorithm, especially the support vector machine (SVM) which uses kernel techniques. Support vector machine (SVM) is based on VC dimension theory and structural risk minimization theory. Its main purpose is to find the optimal classification hyperplane in high dimensional space so that the classification spacing is in maximum and the classification error rate is minimized. But it is more suitable for the case where the feature dimension of the image is small and the amount of data is large after extracting the special vector.

Another commonly used target recognition method is the depth learning model, which describes the image by hierarchical feature representation. The mainstream depth learning networks include constrained Boltzmann machine, depth belief network foot, automatic encoder, convolution neural network, biological model, and so on. We tested the proposed M3 CE-CEc. We design different convolution neural networks for different datasets. The experimental settings are as follows: the weight parameters are initialized randomly, the bias parameters are set as constants, the basic learning rate is set to 0.01, and the impulse term is set to 0.9. In the course of training, when the error rate is no longer decreasing, the learning rate is multiplied by 0.1.

3.3 Image classification and modeling based on deep convolution neural network

The following is a model for image classification based on deep convolution neural networks.

Input: Input is a collection of N images; each image label is one of the K classification tags. This set is called the training set.

Learning: The task of this step is to use the training set to learn exactly what each class looks like. This step is generally called a training classifier or learning a model.

Evaluation: The classifier is used to predict the classification labels of images it has not seen and to evaluate the quality of the classifiers. We compare the labels predicted by the classifier with the real labels of the image. There is no doubt that the classification labels predicted by the classifier are consistent with the true classification labels of the image, which is a good thing, and the more such cases, the better.

3.4 Evaluation index

In this paper, the image recognition effect is mainly divided into three parts: the overall classification accuracy, classification accuracy of different categories, and classification time consumption. The classification accuracy of an image includes the accuracy of the overall image classification and the accuracy of each classification. Assuming that nij represents the number of images in category I divided into category j , the accuracy of the overall classification is as follows:

The accuracy of each classification is as follows:

Run time is the average time to read a picture to get a classification result.

4 Discussion

4.1 comparison of classification effects of different loss functions.

We compare the images of the traditional logistic loss function with our proposed maximum interval loss function. It can be clearly seen that the value of the loss function increases with the increase of the severity of the misclassification, which indicates that the loss function can effectively express the error degree of the classification.

4.2 Comparison of recognition rates between the same species

Classification

Bicycle

Car

Bus

Motor

Flower

Definition

0.82

0.84

0.81

0.80

0.85

4.3 Comparison of recognition rates among different species

As can be seen from the following table, the recognition rate of this method is generally the same among different species, reaching more than 80% level, among which the accuracy of this method is relatively high in classifying clearly defined images such as cars. This may be due to the fact that clearly defined images have greater advantages in feature extraction.

4.4 Time-consuming comparison of SVM, KNN, BP, and CNN methods

On the premise of feature extraction using the same loss function method constructed by M 3  CE, the selection of classifier is the key factor to affect the automatic detection accuracy of human physiological function. Therefore, this paper discusses the influence of different classifiers on classification accuracy in this part (Table  1 ). The following table summarizes the influence of some common classifiers on classification accuracy. These classifiers include linear kernel support vector machine (SVM-Linear), Gao Si kernel support vector machine (SVM-RBF), and Naive Bayes (NB) (NB) k -nearest neighbor (KNN), random forest (RF), and decision. Strategy tree (DT) and gradient elevation decision tree (GBDT).

The experimental results show that the accuracy of CNN classifier is higher than that of other classifiers in training set and test set. Although the speed of DT is the fastest when it is used for automatic detection of human physiological function in the classifier contrast experiment, its accuracy on the test set is only 69.47% unacceptable.由In this paper, the following conclusions can be drawn in the comparison experiment of classifier: compared with other six common classifiers, CNN has the highest accuracy, and the spending of 6 s is acceptable in the seven classifiers of comparison.

First, because each test image needs to be compared with all the stored training images, it takes up a lot of storage space, consumes a lot of computing resources, and takes a lot of time to calculate. Because in practice, we focus on testing efficiency far higher than training efficiency. In fact, the convolution neural network that we want to learn later reaches the other extreme in this trade-off: although the training takes a lot of time, once the training is completed, the classification of new test data is very fast. Such a model is in line with the actual use of the requirements.

5 Conclusions

Deep convolution neural networks are used to identify scaling, translation, and other forms of distortion-invariant images. In order to avoid explicit feature extraction, the convolutional network uses feature detection layer to learn from training data implicitly, and because of the weight sharing mechanism, neurons on the same feature mapping surface have the same weight. The ya training network can extract features by W parallel computation, and its parameters and computational complexity are obviously smaller than those of the traditional neural network. Its layout is closer to the actual biological neural network. Weight sharing can greatly reduce the complexity of the network structure. Especially, the multi-dimensional input vector image WDIN can effectively avoid the complexity of data reconstruction in the process of feature extraction and image classification. Deep convolution neural network has incomparable advantages in image feature representation and classification. However, many researchers still regard the deep convolutional neural network as a black box feature extraction model. To explore the connection between each layer of the deep convolutional neural network and the visual nervous system of the human brain, and how to make the deep neural network incremental, as human beings do, to compensate for learning, and to increase understanding of the details of the target object, further research is needed.

Abbreviations

Artificial neural network

Backpropagation

Convolutional neural network and Naive Bayes

Convolutional neural network

Multilayer perceptron

Omnidirectional image

Very fine spatial resolution

Virtual reality

E. Newman, M. Kilmer, L. Horesh, Image classification using local tensor singular value decompositions (IEEE, international workshop on computational advances in multi-sensor adaptive processing. IEEE, Willemstad, 2018), pp. 1–5.

Google Scholar  

X. Wang, C. Chen, Y. Cheng, et al, Zero-shot image classification based on deep feature extraction. United Kingdom: IEEE Transactions on Cognitive & Developmental Systems, 10 (2), 1–1 (2018).

A.A.M. Al-Saffar, H. Tao, M.A. Talab, Review of deep convolution neural network in image classification (International conference on radar, antenna, microwave, electronics, and telecommunications. IEEE, Jakarta, 2018), pp. 26–31.

A.B. Said, I. Jemel, R. Ejbali, et al., A hybrid approach for image classification based on sparse coding and wavelet decomposition (Ieee/acs, international conference on computer systems and applications. IEEE, Hammamet, 2018), pp. 63–68.

Huang G, Chen D, Li T, et al. Multi-Scale Dense Networks for Resource Efficient Image Classification. 2018.

V. Gupta, A. Bhavsar, Feature importance for human epithelial (HEp-2) cell image classification. J Imaging. 4 (3), 46 (2018).

Article   Google Scholar  

L. Yang, A.M. Maceachren, P. Mitra, et al., Visually-enabled active deep learning for (geo) text and image classification: a review. ISPRS Int. J. Geo-Inf. 7 (2), 65 (2018).

Chanti D A, Caplier A. Improving bag-of-visual-words towards effective facial expressive image classification Visigrapp, the, International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 2018.

X. Long, H. Lu, Y. Peng, X. Wang, S. Feng, Image classification based on improved VLAD. Multimedia Tools Appl. 75 (10), 5533–5555 (2016).

B. Kieffer, M. Babaie, S. Kalra, et al., Convolutional neural networks for histopathology image classification: training vs. using pre-trained networks (International conference on image processing theory. IEEE, Montreal, 2018), pp. 1–6.

J. Zhao, T. Fan, L. Lü, H. Sun, J. Wang, Adaptive intelligent single particle optimizer based image de-noising in shearlet domain. Intelligent Automation & Soft Computing 23 (4), 661–666 (2017).

Mou L, Ghamisi P, Zhu X X. Unsupervised spectral-spatial feature learning via deep residual conv-Deconv network for hyperspectral image classification IEEE transactions on geoscience & Remote Sensing. 2018,(99):1–16.

Newman E, Kilmer M, Horesh L. Image classification using local tensor singular value decompositions IEEE, international workshop on computational advances in multi-sensor adaptive processing. IEEE, 2018:1–5.

S.A. Quadri, O. Sidek, Quantification of biofilm on flooring surface using image classification technique. Neural Comput. & Applic. 24 (7–8), 1815–1821 (2014).

X.-C. Yin, Q. Liu, H.-W. Hao, Z.-B. Wang, K. Huang, FMI image based rock structure classification using classifier combination. Neural Comput. & Applic. 20 (7), 955–963 (2011).

Z. Yan, V. Jagadeesh, D. Decoste, et al., HD-CNN: hierarchical deep convolutional neural network for image classification. Eprint Arxiv 4321-4329 (2014).

C. Zhang, X. Pan, H. Li, et al., A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification. Isprs Journal of Photogrammetry & Remote Sensing 140 , 133–144 (2018).

Chaib S, Yao H, Gu Y, et al. Deep feature extraction and combination for remote sensing image classification based on pre-trained CNN models. International Conference on Digital Image Processing. 2017:104203D.

S. Roychowdhury, J. Ren, Non-deep CNN for multi-modal image classification and feature learning: an azure-based model (IEEE international conference on big data. IEEE, Washington, D.C., 2017), pp. 2893–2812.

M.Z. Afzal, A. Kölsch, S. Ahmed, et al., Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification (Iapr international conference on document analysis and recognition. IEEE computer Society, Kyoto, 2017), pp. 883–888.

X. Fu, L. Li, K. Mao, et al., in Chinese High Technology Letters . Remote sensing image classification based on CNN model (2017).

Sachin R, Sowmya V, Govind D, et al. Dependency of various color and intensity planes on CNN based image classification. International Symposium on Signal Processing and Intelligent Recognition Systems. Springer, Cham, Manipal, 2017:167–177.

Shima Y. Image augmentation for object image classification based on combination of pre-trained CNN and SVM. International Conference on Informatics, Electronics and Vision & 2017, International sSymposium in Computational Medical and Health Technology. 2018:1–6.

J.Y. Lee, J.W. Lim, E.J. Koh, A study of image classification using HMC method applying CNN ensemble in the infrared image. Journal of Electrical Engineering & Technology 13 (3), 1377–1382 (2018).

Zhang C, Pan X, Zhang S Q, et al. A rough set decision tree based Mlp-Cnn for very high resolution remotely sensed image classification. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017:1451–1454.

M. Kumar, Y.H. Mao, Y.H. Wang, T.R. Qiu, C. Yang, W.P. Zhang, Fuzzy theoretic approach to signals and systems: Static systems. Inf. Sci. 418 , 668–702 (2017).

W. Zhang, J. Yang, Y. Fang, H. Chen, Y. Mao, M. Kumar, Analytical fuzzy approach to biological data analysis. Saudi J Biol Sci. 24 (3), 563, 2017–573.

Z. Sun, F. Li, H. Huang, Large scale image classification based on CNN and parallel SVM. International conference on neural information processing (Springer, Cham, Manipal, 2017), pp. 545–555.

Sachin R, Sowmya V, Govind D, et al. Dependency of various color and intensity planes on CNN based image classification. 2017.

Download references

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

About the author

Xin Mingyuan was born in Heihe, Heilongjiang, P.R. China, in 1983. She received the Master degree from harbin university of science and technology, P.R. China. Now, she works in School of computer and information engineering, Heihe University, His research interests include Artificial intelligence, data mining and information security.

Wang yong was born in Suihua, Heilongjiang, P.R. China, in 1979. She received the Master degree from qiqihaer university, P.R. China. Now, she works in School of Heihe University, His research interests include Artificial intelligence, Education information management.

This work was supported by University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province (No.UNPYSCT-2017104). Scientific research items of basic research business of provincial higher education institutions of Heilongjiang Provincial Department of Education (No.2017-KYYWF-0353).

Availability of data and materials

Please contact author for data requests.

Author information

Authors and affiliations.

School of Computer and Information Engineering, Heihe University, No. 1 Xueyuan Road education science and technology zone, Heihe, Heilongjiang, China

Mingyuan Xin

Heihe University, No. 1 Xueyuan Road education science and technology zone, Heihe, Heilongjiang, China

You can also search for this author in PubMed   Google Scholar

Contributions

All authors take part in the discussion of the work described in this paper. XM wrote the first version of the paper. XM and WY did part experiments of the paper. XM revised the paper in a different version of the paper, respectively. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yong Wang .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Xin, M., Wang, Y. Research on image classification model based on deep convolution neural network. J Image Video Proc. 2019 , 40 (2019). https://doi.org/10.1186/s13640-019-0417-8

Download citation

Received : 17 October 2018

Accepted : 07 January 2019

Published : 11 February 2019

DOI : https://doi.org/10.1186/s13640-019-0417-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Convolution neural network
  • Image classification

image classification using cnn research paper

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: image classification with classic and deep learning techniques.

Abstract: To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning techniques. Specifically, we study the performance of a Bag of Visual Words classifier using Support Vector Machines, a Multilayer Perceptron, an existing architecture named InceptionV3 and our own CNN, TinyNet, designed from scratch. We evaluate each of the cases in terms of accuracy and loss, and we obtain results that vary between 0.6 and 0.96 depending on the model and configuration used.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Home

Search form

Remote sensing image classification using cnn-lstm model.

Manthena Narasimha Raju *  |  Kumaran Natarajan  |  Chandra Sekhar Vasamsetty 

© 2022 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license ( http://creativecommons.org/licenses/by/4.0/ ).

OPEN ACCESS

The image classification of remote sensing (RS) plays a significant role in earth observation technology using RS data, extensively used in the military and civic sectors. However, the RS image classification confronts substantial scientific and practical difficulties because of RS data features, such as high dimensionality and relatively limited quantities of labeled examples accessible. In recent years, as new methods of deep learning (DL) have emerged, RS image classification approaches using DL have made significant advances, providing new possibilities for RS image classification research and development. Most of the researchers are using CNN to classify remote sensing images, but CNN alone problem with sequence data processing. But to get some sense out of the classification of remote sensing images. To avoid this in this paper, we use the CNN-LSTM model. The model performed ineffective classification of remote sensing images; the experimental results show that the proposed model is effective in classifying remote sensing images.

remote sensing, deep learning, CNN, LSTM, image classification, SIRI-WHU data

If you're interested in learning more about how remote sensing may help you classify scenes, you'll want to have a look at the Remote Sensing Scene Classification section [1]. Data from remote sensing images is crucial for understanding. The issue may be used in a number of areas, such as catastrophe monitoring and vegetation mapping, land resource management and urban planning, and traffic. Many distant sensors are gathering data with unique characteristics due to recent advancements in Earth observation technology, including remote sensing [2-6] technologies. It becomes difficult or perhaps impossible to manually analyze the data after it has been gathered since it is so vast and complicated. For example, remote sensors are frequently used to deliver data that is multi-source, multi-temporal, and multi-scale in nature.

In contrast, manually exploring them and extracting valuable information from them would be excessively time-consuming, and the performance would suffer. Therefore, the remote sensing research community has been concentrating its efforts in recent years on developing efficient techniques for processing remote sensing pictures [7-9] in combination with physics. Many scholars are interested in remote sensing, and there has been considerable development in this area. The picture below shows how quickly image processing methods for enhancement, analysis, and comprehension are developing. It is a well-known fact that there are still numerous difficulties to overcome in remote sensing, which stimulates new efforts and innovations to comprehend remote sensing pictures via image processing better.

Color and form are used in most prior approaches, or mid-level holistic picture representations [10-11] created by encoding hand-crafted visual characteristics are used. Computer vision has lately been transformed by Deep CNN, which has made significant advances in the domains like picture classification [12], object detection/segmentation [13-15], and action identification [16-19]. Neural networks that learn by watching data are known as DCNNs. The use of deep learning techniques in satellite image analysis, such as aerial scene classification [20] and hyperspectral picture analysis [21-25], has been similarly successful. As a general rule, DCNN takes a fixed-size picture as input and processes it via a convolution sequence, local normalization, and (termed as layers). These in-depth features may be utilized for several applications related to vision [26], which includes categorization of remote sensing scene] in a CNN [27] fully connected (FC) final layers. It's common for deep convolutional neural networks to be trained using data from the vast ImageNet dataset, a collection of RGB pixel values. When it comes to feature extraction in categorizing the scenes of remote sensing, these CNNs, which have been already trained on the dataset of ImageNet, are used by most current techniques. Unresolved research questions include studying various color spaces and integrating these color spaces for remote sensing scene classification. When it comes to vehicle color identification, He et al. [28] investigated the use of several color spaces; when it comes to super image resolution, Tang et al. [29] looked at the usage of YCbCr and RGB color channels in face recognition; Tang et al. [30] also proposed collaborative facial color feature learning approach that covered a variety of color spaces and included This study investigates a variety of color properties within the context of a deep learning framework for classification of scenes of remote sensing.

A lot of research was done before deep learning on the effects of different color features on object recognition and detection. When combined convolutional neural networks (CNNs) and long short-term memory (LSTMs) [31] are connected, sequential data classification system is created.

Because of this, most remote sensing scene classification techniques utilize a DCNN [32-36] that has previously been trained to identify an image. This approach, however, will run into the built-in problem of generating a high-dimensional final image representation when combining activations from several deep-color CNN. An effective classification system may be created by combining the properties of CNN with those of remote sensing images. The SIRI-WHU data collection is used throughout this paper. The rest of the article is organized as follows: On section two, you'll see the findings of existing models, as seen in Section 3, the suggested model is shown. On section 4, you'll find the details of the experiment, and on section 5, you'll find a summary.

Inspired by the capacity of human vision to identify items based on the highlights, which attracts the viewer's attention towards the object while disregarding the backdrop, the salient object identification technique is used to discover salient things. The salient model must captivate their attention to attract the attention of grasping items and complete segmentation of the objects [18]. Top-down and bottom-up techniques of salient item identification are used to detect salient objects, respectively. The bottom-up approach focuses on distinguishing between things in the background and those in the forefront in visual situations. On the other hand, top-down methods emphasize items unique to a specific category within visual sceneries.

According to Zhang et al. [19], there are two components to the salient object identification model. A patch-level cue exploration model and an object-level cue exploration model make up the model. As an initial stage, the objectless approach is used to identify the coarsely localized positions of the dominant feature of the image. If you want to know how well colors are dispersed in a space, you may use variance to estimate how compact it is. However, it didn't matter that the model did well in photos with a more plain background. For pictures with a conspicuous object and an environment that share a similar shade of color, this algorithm does not perform as well. As a result; it must be capable of extracting the regions and the objects that are distinct from one another in the image to enhance salient object.

Deep learning was utilized in Ref. [20] to concentrate on the layered skip structure, which was previously unknown. They developed a novel technique by including the holistically layered edge detector architecture, connections that are short in the skip-layer for the salient have been investigated, which was previously unexplored (HED). The VGGNet model and the HED model served as the basis for their proposed design. The combined characteristics from both the shallow and deep side outputs (salient regions) (low-level features) to get the best results. The architecture comprises interconnected phases, namely, the salient locating stage and the details refining stage, respectively. A top-to-bottom technique is introduced in the next step after the salient stage has identified salient areas in the picture and the clear view has been identified. Creating short connections between the two layers is necessary to forecast the salient items better. This results in an accurate and dense saliency map since the characteristics of both levels may be utilized to improve the prediction of the salient objects.

In an image, a method known as objectless detection creates many bounding boxes for every object possible without considering the item's category. Objectively, our goal is to provide a metric that may be used to generate candidate proposal ideas for consideration. The confidence score determines a proposal's inclusion or exclusion of an item. Two kinds of deep network object identification frameworks exist free and regional-based methods. The success of both region-free and region-based techniques [19] led to the development of a methodology [20] that integrates the best features of both methods. Several factors went into this, but the two most significant were multiscale localization and harmful space mining. In the case of localization that can be multiscale, there is a chance for objects to be discovered at any place on the image; authors have to consider all the locations while performing the object detection; on the other hand, it is recommended to utilize a reverse connection so that the objects may be recognized on the proper Convolutional feature-maps are subjected to an objectless prior phase during the training phase, which helps to decrease searching time for objects by reducing object search space. The Reverse connection with objectless previous networks architecture may be used to identify objects end-to-end with high precision. Convolutional layers are used to gather semantic information, which is then used in conjunction with reverse connections to create an objectless prior, which serves as a roadmap for searching for objects inside an image. Last but not least, the multitask loss function is used to complete the optimization process.

It's been demonstrated that data augmentation and harmful mining methods may help increase item detection accuracy [21]. To prevent exhaustingly looking for large sliding windows, there is a growing need for rapid object identification, such as moving cars, requiring fewer candidate windows to avoid exhaustingly searching for moving cars. One research suggested that the quality of the blind proposal must be improved by utilizing Union-Over-Intersection and Representation and Local Linear Regression (DORLLR) for Intersection-Over-Union to be used (IOU).

When employed in real-time object detection [22], it may be utilized to evaluate the quality of a sliding window that generates suggestions for the object detection task. Many attempts have been made to enhance the quality of the proposals via the use of blind estimates, and these efforts have proven fruitful. There are two possible explanations for the blind quality proposal evaluation. Because the foreground regions are believed to have more information than the background areas. It is regarded as a background and foreground segmentation problem when it comes to blind proposal quality. That's all it does. The segmentation method differentiates between the proposal quality at the back and the proposal quality. Instead of looking at scores and rankings based on particular visual cues, the first technique looks at the scores and ranks of the window function. An evaluation model for blind proposal quality (BPQA) has been created due to these factors to choose a greater number of proposals according to (BPQA). Both deep objectless representation and local linear regression are used during training. CNN-based feature extraction is utilized to mention the details of the deep objectless, and the local linear regression model is used to guess the quality of each recommendation.

In research [23], the model named hierarchical objectless network model was developed, which can identify object and proposal creation, among other things. They considered the most important aspects of object identification, such as accuracy, multi-scale, and computing cost, while developing their model. The model operates in three stages, with the CNN extracting the features from the picture in the first stage. The last step in which the stripe objectless is used to cut down the list of potential recommendations. It predicts a saliency map that may be used to search for objects. The Objectless stripes offer border objectless, in-out objectless, and in-out objectless, all contributing to the model's accuracy. They provide an object border and a score regarding the confidence in the suggested placement locations. Vertical and horizontal stripes have been added to the proposal to show the object border probability or object itself appearing in the vertical and the horizontal lines. To get high-level semantic information, it is essential to reverse the sequence of the deep and shallow convolutional layers. As a result of these qualities, a wide variety of resolution information is available for objects when seen at. Using just one saliency map, less memory is utilized, and less computation time is required.

Shen et al. [35] proposed a model for timeseries remote sensing pictures, we suggested a semi-supervised convolutional Long Short-Term Memory neural network (SemiLSTM) that was verified on three data sets with various time distributions in this research. By using a limited number of labelled samples in conjunction with a large number of unlabeled samples, it is possible to accomplish accurate and automatic land cover categorization. Aside from that, it is a very reliable classification technique for timeseries optical pictures with cloud covering, which minimises the need for cloudless remote sensing images and may be used broadly in places that are often hidden by clouds, such as subtropical areas.

Unnikrishnan et al. [36] proposed a two-band AlexNet architecture with a decreased number of filters was used to train the model in this study, and high-level features derived from the tested model were able to correctly categorise the various land cover classes available in the dataset. A comparison is made between the suggested architecture and a benchmark, and estimates are made on the outcomes in terms of accuracy, precision, and the total number of trainable parameters.

Most of the existing literature works are concentrated on classification of remote sensing images based on different deep learning models. And CNN model is used for classification of images it has a drawback of letting few features. If we consider all features will give better classification results.

The full method for detecting things in a scene is shown in Figure 1, which is broken into many phases. It was necessary to submit raw remote sensing images to the preprocessing pipeline before they could be processed in the final processing pipeline. Data resizing, shuffling, and normalisation operations were performed on the data in the preprocessing channel. Following that, the preprocessed data set is separated into two parts: a training set and a large number of testing instances. Following the training data, we trained the CNN and CNN- We estimated the accuracy and loss for each phase of training. The system's performance was evaluated using measures such as sensitivity, accuracy, AUC based on ROC, confusion matrix, and F1-score to establish its effectiveness.

image classification using cnn research paper

Figure 1. Proposed model architecture

3.1 CNN model

Backpropagation is the only method for getting all the parameters trained by their weights and biases in a convolutional neural network. Here's a quick rundown of what the algorithm is all about. In hidden layers, the function of cost concerning each unique training. For, example (x, y) may be expressed as follows:

$J(Q, \theta ; e, f)=\frac{1}{2}\|h y, \theta(e)-f\|^{2}$

Now, errors period A to the layer P, the equation is given as:

$A^{(P)}=\left(\left(Q^{(P)}\right)^{B} A^{(P+1)}\right) \cdot i^{\prime}\left(a^{(1)}\right)$

where the error for A(P+1)th layer is (A+1) whose cost function is J(Q, θ;e,f). i (a((1))) represents the derivative of activation function.

$\nabla y^{(P)} J(Q, \theta ; e, f)=A^{(P+1)}\left(j^{(P+1)}\right)^{B}$

$\nabla \theta^{(P)} J(X, \theta ; e, f)=A^{(P+1)}$  Where I is information, such that i ((1)) is the information for the 1st layer (it is the correct input) and i ((L)) is the information for the L-th layer.

The calculation for error of the sub-sampling layer Error is as:

$\Delta s^{(P)}=$ unsample $\left(\left(W_{S}^{(L)}\right)^{B} A_{K}^{(P+1)}\right) \cdot h^{\prime}\left(a_{S}^{(P) \mid}\right)$

In this case, q represents the number of filters in the layer. If mean pooling is utilized, the mistake must be cascaded oppositely in the subsampling layer. For example, when mean pooling is employed, upsampling equally shares the error for the preceding input unit. Finally, the gradient concerning the feature maps:

$\begin{aligned} \nabla y_{m}{ }^{(P)} n(X, \theta ; e, f) & \\ &=\sum_{b-1}\left(i_{t}^{(P)}\right) \\ & * \operatorname{rot} 90\left(A_{m}{ }^{(P+1)}, 2\right) \end{aligned}$

$\nabla \theta_{m}^{(P)}(X, \theta ; e, f)=\sum_{i, j}\left(A_{s}^{(P+1)}\right)_{i, j}$.

Backpropagation Algorithm in CNN

  • The weights are initialized to randomly (small) generated value.
  • The rate of learning is set to a small positive value.
  • The value of r is set to 1, and iteration begins.
  • for r<maximum iteration OR if the criteria Cost function is met, do
  • for the values of n_1 to n_i, do
  • The propagation is forwarded through CL, PL, FCL.
  • The cost function is derived for the input.
  • Now, the error term A^((P)) concerning the weight of each layer.
  • The error must be propagated from one layer to another layer in the sequence given below:
  • FC layer where FC= fully connected
  • P layer where P=Pooling
  • C layer where C= Convolution
  • Now, Calculate the gradient $\nabla y_{s}^{(P)}$ and $\nabla \theta_{s}^{(P)}$ for the weights $\nabla y_{s}^{(P)}$ and bias respectively for each layer.
  • The Gradient is calculated in the sequence given below:
  • ii. P layer
  • iii. FC layer
  • Now, update the weights
  • $w_{d j}^{(P)} \leftarrow w_{d j}^{(P)}+\nabla w_{d j}^{(P)}$  
  • Update bias
  • $\theta_{d}{ }^{(P)} \leftarrow \theta_{d}^{(P)}+\nabla \theta_{d}^{(P)}$

3.2 LSTM model

The following are the primary components of the LSTM unit:

  • First, the LSTM unit accepts the prevailing input vector indicated by r b and the output vector designated by i b1 from the previous time step (as obtained via the recurrent edges). In this step, the weighted inputs are added together and sent via tanh activation, which results in a b .

$a_{t}=\tanh \left(X^{2} r_{b}+D^{2} i_{b-1}+d_{a}\right)$

  • To begin with, the input gate reads two numbers, r t and h t-1 , computes the weighted total, and adds sigmoid activation to it. Because of the a b factor, the result is multiplied by a b , resulting in the input streaming into the memory.

$j_{b}=\sigma\left(X^{j} r_{b}+D^{j} i_{b-1}+d^{j}\right)$

From this process, LSTM learns how to re-establish the contents of its memory when they become obsolete and can no more serve a useful purpose. In such a case, the network would have to start processing a new set of instructions from scratch by employing a sigmoid activation, the forget gate, which is r t and h t1 , activates inputs with weighted inputs. Once multiplied by the previous time step, it gives us the result j d . This allows us to delete any unnecessary memory content.

$v_{b}=\sigma\left(X^{v} r_{b}+D^{v} i_{b-1}+d_{v}\right)$

There are four types of memory cells: CEC, which has a repeating edge with unit weight, as well as an unweighted. By eliminating the unnecessary information (if any) from the preceding time step and accepting correct information (if any) from the present input, it is feasible to compute the current cell state s b .

$u_{b}=\sigma\left(X^{u} t_{d}+D^{u} i_{b-1}+d^{u}\right)$

Output gate: LSTM unit's output gate takes the weighted sum of x t and h t1 and uses the sigmoid activation to coordinate the data sent out from LSTM.

$h_{b}=a_{b} \odot j_{b}+h_{b-1} \odot v_{b}$

Output: To calculate the output of the LSTM unit (h t ), the cell state s t must be sent through an inverter (tanh) and multiplied by the output gate (out). It is possible to describe the operation of the LSTM unit using a series of equations similar to the following.

$i_{b}=\tanh \left(h_{b}\right) \bigodot u_{b}$

(1) The data needed for CNN-LSTM training must be entered first (see step one).

(2) Data standardization: As we have a significant gap in the data, the z-score standardization technique is used to normalize the input data to improve the model's training performance. The formula for this method is as follows:

$k_{i}=\frac{r_{j}-\bar{r}}{h}$

$r_{i}=k_{j} * h+\bar{r}$

If the standardized value (Yi), data taken as input (xi) is the average, and s represents the standard deviation of input data (xi) is the average and s means standard

(3) The biases and weights of each layer of the CNN-LSTM should be set to their initial values.

(4) A succession of feature extraction layers is applied to the input data before being transferred to the final convolution and pooling layers.

(5) It's also possible to use an LSTM algorithm to compute the CNN layer's output data, which can then be used to determine the output value.

(6) A comparison is made in step 6 of this process between the value generated computed by the layer that produces output and the actual number of the group data, and the inaccuracy is determined.

(7) This error is found when the output value computed by the output layer is compared to the actual value of the group.

(8) Error in the calculation: Eighteen) the forecasting must finish a certain number of cycles, the weight must be below a specific threshold, and the forecasting miserror rate must be below a certain point. Otherwise, the procedure will continue to step 9 if one or more conditions for completion are met. If one of the criteria for completion is met, training is complete, the CNN-LSTM network is updated, and step 10 is Backpropagation of computed errors.

(9) The biases and weights of every layer are updated, then go to step 4 to continue training the network.

(10) The forecasting model is saved.

(11) Input data: enter the data that will be used in the forecasting process, if any.

(12) Standardization of input data: The input data is standardized by a formula (8).

(13) To forecast, feed the standardized data into the CNN-LSTM trained model, and you'll get a predicted value.

(14) The model of CNN-LSTM generates a standardized value as an output, which is subsequently returned to its original value. Using the formula below (9). Where is the standard deviation of data, and x is the average value of input data.

(15) Finalise the forecasting process by presenting the corrected results.

The RSIDEA group has compiled a collection of Google images of China's major cities (Remote Sensing Intelligent Data Retrieval, Interpretation, and Application).

In all, SIRI-WHU has 2,400 pictures and 12 situations. With a spatial resolution of 2 m, each class contains 200 images scaled at 200 pixels [29]. In addition to agriculture, enterprises, harbors [30], idle soil, production, wildlife, parks, and residential wetlands, the 12 land-use groups also include water and residential wetlands. Sample of 12 class Google image dataset of SIRI-WHU: (a) water; (b) river; (c) residential; (d) pond; (e) park; (f) overpass; (g) meadow; (h) industrial; (i) idle land; (j) harbor; (k) commercial; (l) agriculture. Here Figure 2 represents sample images from SIRI-WHU data set.

image classification using cnn research paper

Figure 2. Images from SIRI-WHU data set

The positive true (e), positive false (f), negative true (g), or negative false(h) prediction are all possible outcomes in a binary classification issue (h). Where e indicates instances that are positive and expected to be positive, f denotes positive situations, and they are predicted to be negative, g means conditions that are negative and anticipated to be harmful, and h denotes problems that are negative and expected to be positive.

The most straightforward way to assess classification performance is to look at accuracy, the ratio of the number of adequately guessed situations to the total number of predicted instances. Then, using the language that has been presented, accuracy A may be calculated using the equation:

Accuracy is easy to understand, and it is used for both binary and multiple-class classification problems. However, accuracy could give an unfair representation of classification performance in imbalanced data sets. For example, in a binary classification problem where 90% of the samples are of the same class, simply assigning all cases to that class would already achieve an accuracy of 90%.

Therefore, we introduce three other metrics, which will be used to assess the per-class classification performance: precision, recall, and the F1-score. Accuracy indicates how many of the positive predicted cases are correctly expected, and recall expresses the fraction of all positive cases which are correctly predicted. These metrics are captured within the F1 metric, the harmonic mean of precision and recall. Precision (P), memory (R), and the F1-score (F1) are obtained by respectively:

Accuracy= (e+f)/(e+f+g+h)

Sensitivity= e/(e+h)

Specificity=f/(f+g)

F1-score=(2*e)/(2*e+g+h)

image classification using cnn research paper

Figure 3. Accuracy

There are a lot of data points that can be appropriately calculated out of all the data. It's a little more complicated than that. Still, the number of true positives and real negatives is computed as dividing the number of positives true by the total number of positives actual. Positive, adverse facts are counted separately, with positive and negative falsehoods and negative truths separated. As you can see in Figure 3, the SIRI-WHU data set has an accuracy of identifying objects on top of that, the proposed CNN-LSTM and current models are differentiated to find the accuracy of object identification in the SIRI-W In Figure 3, we can see how well suggested and recent models have fared in terms of detecting. As a result, the presented model is more accurate in detecting the items than current.

image classification using cnn research paper

Figure 4. Precession

When it comes to pattern detection (also known as machine learning), accurate information (also known as positive predictive value) is the percentage of relevant examples among the retrievals. In contrast, rescue (sensitivity) fragments all related instances found. The precession of objects detected in SIRI-WHU data collection is shown in Figure 4. Besides, the suggested and existing models with various things are used to assess the precession of the detection of objects in the SIRI-WHU data set. A comparison between suggested and current models may be seen in Figure 4. The suggested model outperforms the current one in terms of objective observation.

image classification using cnn research paper

Figure 5. Recall

In statistics, a model's capacity to recognize every important instance in a dataset is known as a reminder. Remembrance may be defined as the addition of positive true, and harmful false. As you can see in Figure 5, the recall rate of identifying objects in the SIRI-WHU Apart from that, the proposed and current models with different things assess the recall of the SIRI-WHU dataset detection of objects remembrance. It is shown in Figure 5 that a recall of suggested and current models is carried out.

image classification using cnn research paper

Figure 6. F-Score

Using the F-measure, the harmonic mean of accuracy and remembrance is determined. A single rating may be utilized to determine the output of the model and differentiate it to the consistency and reminder. The F1-score for identifying objects in the SIRI-WHU data set is shown in Figure 6. In addition, the proposed and current models with various items are used to assess the accuracy of the detection of objects in the SIRI-WHU data collection. A comparison of proposed and current models' F1 scores for detecting objects is shown in Figure 6.

With the purpose of categorising remote sensing photos using CNN-LSTM deep learning methods, we investigated the effect of colour in a CNN-LSTM deep learning system that was used to classify remote sensing photographs. Combining deep colour features with varied levels of information provides more efficient remote sensing scene categorization by expanding the number of categories available. The high dimensionality of deep colour feature fusion was also addressed, with the result that a dense final picture description was achieved without significant degradation in the five challenging remote sensing scene classification datasets that we used to evaluate the performance of our technique. Several of us believe that the strategy that has been described has shown to be really effective.

[1] Panetta, K. (2017). Gartner top 10 strategic technology trends for 2018-Smarter with Gartner. Gartner, Inc. https://www.gartner.com. [2] Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.D. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11): 3212-3232. https://doi.org/10.1109/TNNLS.2018.2876865 [3] Huang, T. (1996). Computer vision: Evolution and promise. 19th Cern Sch. Comput., pp. 21-25. [4] Kamate, S., Yilmazer, N. (2015). Application of object detection and tracking techniques for unmanned aerial vehicles. Procedia Computer Science, 61: 436-441. https://doi.org/10.1016/j.procs.2015.09.183 [5] Pathak, A.R., Pandey, M., Rautaray, S. (2018). Application of deep learning for object detection. Procedia Computer Science, 132: 1706-1717. https://doi.org/10.1016/j.procs.2018.05.144 [6] Ouyang, W., Wang, X., Zeng, X., et al. (2015). DeepID-Net: Deformable deep convolutional neural networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403-2412. https://doi.org/10.1109/TPAMI.2016.2587642  [7] Pittaras, N., Markatopoulou, F., Mezaris, V., Patras, I. (2017). Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In International Conference on Multimedia Modeling, pp. 102-114. https://doi.org/10.1007/978-3-319-51811-4_9 [8] TensorFlow. (2018). Available: https://www.tensorflow.org/. [9] Keras. (2018). Available: https://keras.io/. [10] Microsoft Cognitive Toolkit. (2018). Available: https://www.microsoft.com/en-us/cognitive-toolkit/. [11] PyTorch. (2018). Available: https://pytorch.org/about/. [12] Unsupervised Feature Learning and Deep Learning Tutorial. (2018). Available: http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/. [13] Liang, M., Hu, X. (2015). Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367-3375. https://doi.org/10.1109/CVPR.2015.7298958 [14] Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448. https://doi.org/10.1109/ICCV.2015.169 [15] LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature, 521(7553): 436-444. https://doi.org/10.1038/nature14539 [16] Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT press. [17] Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D. (2018). Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine, 35(1): 84-100. https://doi.org/10.1109/MSP.2017.2749125 [18] Borji, A., Cheng, M.M., Jiang, H., Li, J. (2015). Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12): 5706-5722. https://doi.org/10.1109/TIP.2015.2487833  [19] Zhang, Q., Lin, J., Li, W., Shi, Y., Cao, G. (2018). Salient object detection via compactness and objectness cues. The Visual Computer, 34(4): 473-489. https://doi.org/10.1007/s00371-017-1354-0 [20] Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H. (2017). Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3203-3212. https://doi.org/10.1109/CVPR.2017.563 [21] Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y. (2017). Ron: Reverse connection with objectness prior networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5936-5944. https://doi.org/10.1109/CVPR.2017.557 [22] Wu, Q., Li, H., Meng, F., Ngan, K.N., Xu, L. (2017). Blind proposal quality assessment via deep objectness representation and local linear regression. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1482-1487. https://doi.org/10.1109/ICME.2017.8019305 [23] Wang, J., Tao, X., Xu, M., Duan, Y., Lu, J. (2018). Hierarchical objectness network for region proposal generation and object detection. Pattern Recognition, 83: 260-272. https://doi.org/10.1016/j.patcog.2018.05.009 [24] Gopi, A.P., Naga Sravana Jyothi, R., Lakshman Narayana, V., Satya Sandeep, K. (2020).."Classification of tweets data based on polarity using improved RBF kernel of SVM. International Journal of Information Technology, 1-16. https://doi.org/10.1007/s41870-019-00409-4 [25] Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587. https://doi.org/10.1109/CVPR.2014.81 [26] Erhan, D., Szegedy, C., Toshev, A., Anguelov, D. (2014). Scalable object detection using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147-2154. https://doi.org/10.1109/CVPR.2014.276 [27] Kuo, W., Hariharan, B., Malik, J. (2015). Deepbox: Learning objectness with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2479-2487. https://doi.org/10.1109/ICCV.2015.285 PMid:25693659 [28] He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778.  [29] Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R., Chen, L. (2016). Large scale semi-supervised object detection using visual and semantic knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2119-2128. https://doi.org/10.1109/CVPR.2016.233 [30] Tang, Y., Wang, X., Dellandrea, E., Chen, L. (2016). Weakly supervised learning of deformable part-based models for object detection via region proposals. IEEE Transactions on Multimedia, 19(2): 393-407. https://doi.org/10.1109/TMM.2016.2614862 [31] Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 39(6): 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031  [32] Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y. (2018). Scale-transferrable object detection. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 528-537. https://doi.org/10.1109/CVPR.2018.00062  [33] Chu, W., Cai, D. (2018). Deep feature based contextual model for object detection. Neurocomputing, 275: 1035-1042. https://doi.org/10.1016/j.neucom.2017.09.048 [34] Fu, K., Gu, I. Y.H., Yang, J. (2018). Spectral salient object detection. Neurocomputing, 275: 788-803. https://doi.org/10.1016/j.neucom.2017.09.028 [35] Shen, J., Tao, C., Qi, J., Wang, H. (2021). Semi-supervised convolutional long short-term memory neural networks for time series land cover classification. Remote Sensing, 13(17): 3504. https://doi.org/10.3390/rs13173504 [36] Unnikrishnan, A., Sowmya, V., Soman, K.P. (2018). Deep AlexNet with reduced number of trainable parameters for satellite image classification. Procedia Computer Science, 143: 931-938. https://doi.org/10.1016/j.procs.2018.10.342

Phone: + 1 825 436 9306

Email: [email protected]

Subscription

Language support

Please sign up to receive notifications on new issues and newsletters from IIETA

Select Journal/Journals:

Copyright © 2024 IIETA. All Rights Reserved.

  • DOI: 10.1007/s13721-024-00474-1
  • Corpus ID: 271783174

A CNN model with pseudo dense layers: some case studies on medical image classification

  • Mridul Biswas , Ritodeep Sikdar , +1 author M. Kundu
  • Published in Network Modeling Analysis in… 7 August 2024
  • Medicine, Computer Science

14 References

A comprehensive review of artificial intelligence approaches in omics data processing: evaluating progress and challenges, cluster analysis on longitudinal data of patients with kidney dialysis using a smoothing cubic b-spline model, deep feature selection using adaptive β-hill climbing aided whale optimization algorithm for lung and colon cancer detection, mri brain tumor detection and classification using parallel deep convolutional neural networks, brain mri analysis using deep neural network for medical of internet things applications, brain image identification and classification on internet of medical things in healthcare system using support value based deep neural network, a brain tumor identification and classification using deep learning based on cnn-lstm method, a deep learning approach for brain tumor classification using mri images, ensemble transfer learning for lung cancer detection, classification with respect to colon adenocarcinoma and colon benign tissue of colon histopathological images with a new cnn model: ma_colonnet, related papers.

Showing 1 through 3 of 0 Related Papers

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

diagnostics-logo

Article Menu

image classification using cnn research paper

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Brain tumor detection and classification using an optimized convolutional neural network.

image classification using cnn research paper

1. Introduction

Key contributions of our work.

  • An optimized CNN hyperparameter model: The paper presents an advanced CNN hyperparameter model that has been carefully developed to optimize critical parameters in diagnosing brain tumors. The activation function, learning rate, batch, padding, filter size and numbers, and pooling layers are just a few of the carefully selected parameters that enhance the model performance and ability to generalize the model. The objective is to increase the model’s overall diagnostic accuracy and dependability by fine-tuning these hyperparameters.
  • Datasets used: In this study, three publicly available brain MRI datasets sourced from Kaggle were utilized to test and validate the proposed model.
  • Outstanding predictions: The proposed approach demonstrates exceptional results in average precision, recall, and f1-score values of 97% and an accuracy of 97.18% for dataset 1. These outcomes indicate the effectiveness of the optimized CNN model in accurately diagnosing brain tumors.
  • Comparative analysis: The study extensively compares our optimized model with established techniques, affirming the strength and reliability of the findings. The proposed method consistently surpasses these approaches, showcasing its superiority in accuracy and reliability when it comes to diagnosing brain tumors.
  • Practical implications: This model offers medical professionals a more accurate and effective tool to aid their decision-making in diagnosing brain tumors. By enhancing diagnostic accuracy and reliability, the model has the potential to advance medical imaging and improve patient care.

2. Related Work

3. materials and methods, 3.1. mri dataset, 3.2. pre-processing, 3.3. hyperparameters of ccn for training, 3.4. hyperparametric fine-tuning of cnn, 3.5. working of hyperparameteric cnn, 4.1. evaluation criteria, 4.2. applied model results.

Click here to enlarge figure

5. Discussion

6. limitations of the model and future work, 7. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Sadoon, T.A.; Ali, M.H. Deep learning model for glioma, meningioma and pituitary classification. Int. J. Adv. Appl. Sci. 2021 , 2252 , 8814. [ Google Scholar ] [ CrossRef ]
  • Cheng, J.; Huang, W.; Cao, S.; Yang, R.; Yang, W.; Yun, Z.; Wang, Z.; Feng, Q. Enhanced performance of brain tumor classification via tumor region augmentation and partition. PLoS ONE 2015 , 10 , e0140381. [ Google Scholar ] [ CrossRef ]
  • Wen, P.Y.; Kesari, S. Malignant gliomas in adults. N. Engl. J. Med. 2008 , 359 , 492–507. [ Google Scholar ] [ CrossRef ]
  • Goodenberger, M.L.; Jenkins, R.B. Genetics of adult glioma. Cancer Genet. 2012 , 205 , 613–621. [ Google Scholar ] [ CrossRef ]
  • Laws, E.R. Pituitary Disorders ; Wiley Online Library: Hoboken, NJ, USA, 2013. [ Google Scholar ]
  • Devi, O.R.; Bindu, C.S.; Kumar, E.S. Identification and Classification of Brain Tumor from MRI Using Transfer Learning Approach. In Proceedings of the 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 4–5 November 2022. [ Google Scholar ]
  • Nagaraj, P.; Muneeswaran, V.; Reddy, L.V.; Upendra, P.; Reddy, M.V. Programmed multi-classification of brain tumor images using deep neural network. In Proceedings of the 2020 4th international conference on intelligent computing and control systems (ICICCS), Madurai, India, 13–15 May 2020. [ Google Scholar ]
  • Jans, L.B.; Chen, M.; Elewaut, D.; Bosch, F.V.D.; Carron, P.; Jacques, P.; Wittoek, R.; Jaremko, J.L.; Herregods, N. MRI-based synthetic CT in the detection of structural lesions in patients with suspected sacroiliitis: Comparison with MRI. Radiology 2021 , 298 , 343–349. [ Google Scholar ] [ CrossRef ]
  • Ammirati, M.; Nahed, B.V.; Andrews, D.; Chen, C.C.; Olson, J.J. Congress of neurological surgeons systematic review and evidence-based guidelines on treatment options for adults with multiple metastatic brain tumors. Neurosurgery 2019 , 84 , E180–E182. [ Google Scholar ] [ CrossRef ]
  • Wahlang, I.; Maji, A.K.; Saha, G.; Chakrabarti, P.; Jasinski, M.; Leonowicz, Z.; Jasinska, E. Brain magnetic resonance imaging classification using deep learning architectures with gender and age. Sensors 2022 , 22 , 1766. [ Google Scholar ] [ CrossRef ]
  • Alanazi, M.F.; Ali, M.U.; Hussain, S.J.; Zafar, A.; Mohatram, M.; Irfan, M.; AlRuwaili, R.; Alruwaili, M.; Ali, N.H.; Albarrak, A.M. Brain tumor/mass classification framework using magnetic-resonance-imaging-based isolated and developed transfer deep-learning model. Sensors 2022 , 22 , 372. [ Google Scholar ] [ CrossRef ]
  • Wohlfahrt, P.; Möhler, C.; Troost, E.G.; Greilich, S.; Richter, C. Dual-energy computed tomography to assess intra-and inter-patient tissue variability for proton treatment planning of patients with brain tumor. Int. J. Radiat. Oncol. Biol. Phys. 2019 , 105 , 504–513. [ Google Scholar ] [ CrossRef ]
  • Lohmann, P.; Werner, J.-M.; Shah, N.J.; Fink, G.R.; Langen, K.-J.; Galldiks, N. Combined amino acid positron emission tomography and advanced magnetic resonance imaging in glioma patients. Cancers 2019 , 11 , 153. [ Google Scholar ] [ CrossRef ]
  • Hu, R.; Hoch, M.J. Application of diffusion weighted imaging and diffusion tensor imaging in the pretreatment and post-treatment of brain tumor. Radiol. Clin. 2021 , 59 , 335–347. [ Google Scholar ] [ CrossRef ]
  • Branzoli, F.; Pontoizeau, C.; Tchara, L.; Di Stefano, A.L.; Kamoun, A.; Deelchand, D.K.; Valabrègue, R.; Lehéricy, S.; Sanson, M.; Ottolenghi, C.; et al. Cystathionine as a marker for 1p/19q codeleted gliomas by in vivo magnetic resonance spectroscopy. Neuro-Oncology 2019 , 21 , 765–774. [ Google Scholar ] [ CrossRef ]
  • Islam, S.; Udhayakumar, R.; Kar, S.; Pavitha, N.; Aswal, U.S.; Saini, D.K. Enhancing Brain Tumor Detection Classification by Computational Intelligence. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023. [ Google Scholar ]
  • Chakraborty, P.; Das, S.S.; Dey, A.; Chakraborty, A.; Bhattacharyya, C.; Kandimalla, R.; Mukherjee, B.; Gopalakrishnan, A.V.; Singh, S.K.; Kant, S.; et al. Quantum dots: The cutting-edge nanotheranostics in brain cancer management. J. Control. Release 2022 , 350 , 698–715. [ Google Scholar ] [ CrossRef ]
  • Biratu, E.S.; Schwenker, F.; Ayano, Y.M.; Debelee, T.G. A survey of brain tumor segmentation and classification algorithms. J. Imaging 2021 , 7 , 179. [ Google Scholar ] [ CrossRef ]
  • Asiri, A.A.; Shaf, A.; Ali, T.; Aamir, M.; Irfan, M.; Alqahtani, S. Enhancing brain tumor diagnosis: An optimized CNN hyperparameter model for improved accuracy and reliability. PeerJ Comput. Sci. 2024 , 10 , e1878. [ Google Scholar ] [ CrossRef ]
  • Images, B.M. Available online: https://github.com/muhammadaamir1234/Dataset-Link (accessed on 7 May 2024).
  • ZainEldin, H.; Gamel, S.A.; El-Kenawy, E.-S.M.; Alharbi, A.H.; Khafaga, D.S.; Ibrahim, A.; Talaat, F.M. Brain tumor detection and classification using deep learning and sine-cosine fitness grey wolf optimization. Bioengineering 2022 , 10 , 18. [ Google Scholar ] [ CrossRef ]
  • Umirzakova, S.; Mardieva, S.; Muksimova, S.; Ahmad, S.; Whangbo, T. Enhancing the Super-Resolution of Medical Images: Introducing the Deep Residual Feature Distillation Channel Attention Network for Optimized Performance and Efficiency. Bioengineering 2023 , 10 , 1332. [ Google Scholar ] [ CrossRef ]
  • Ait Amou, M.; Xia, K.; Kamhi, S.; Mouhafid, M. A novel MRI diagnosis method for brain tumor classification based on CNN and Bayesian Optimization. Healthcare 2022 , 10 , 494. [ Google Scholar ] [ CrossRef ]
  • Rasool, M.; Ismail, N.A.; Al-Dhaqm, A.; Yafooz, W.M.S.; Alsaeedi, A. A novel approach for classifying brain tumours combining a squeezenet model with svm and fine-tuning. Electronics 2022 , 12 , 149. [ Google Scholar ] [ CrossRef ]
  • Balakrishnan, R.; Hernández, M.D.C.V.; Farrall, A.J. Automatic segmentation of white matter hyperintensities from brain magnetic resonance images in the era of deep learning and big data—A systematic review. Comput. Med. Imaging Graph. 2021 , 88 , 101867. [ Google Scholar ] [ CrossRef ]
  • Piórkowski, A.; Lasek, J. Evaluation of local thresholding algorithms for segmentation of white matter hyperintensities in magnetic resonance images of the brain. In Proceedings of the Applied Informatics: Fourth International Conference, ICAI 2021, Buenos Aires, Argentina, 28–30 October 2021; Proceedings 4. Springer International Publishing: Cham, Swizterland, 2021; pp. 331–345. [ Google Scholar ]
  • Reyes, D.; Sánchez, J. Performance of convolutional neural networks for the classification of brain tumors using magnetic resonance imaging. Heliyon 2024 , 10 , e25468. [ Google Scholar ] [ CrossRef ]
  • Kurdi, S.Z.; Ali, M.H.; Jaber, M.M.; Saba, T.; Rehman, A.; Damaševičius, R. Brain tumor classification using meta-heuristic optimized convolutional neural networks. J. Pers. Med. 2023 , 13 , 181. [ Google Scholar ] [ CrossRef ]
  • Takahashi, S.; Takahashi, M.; Kinoshita, M.; Miyake, M.; Kawaguchi, R.; Shinojima, N.; Mukasa, A.; Saito, K.; Nagane, M.; Otani, R.; et al. Fine-tuning approach for segmentation of gliomas in brain magnetic resonance images with a machine learning method to normalize image differences among facilities. Cancers 2021 , 13 , 1415. [ Google Scholar ] [ CrossRef ]
  • Saba, T.; Mohamed, A.S.; El-Affendi, M.; Amin, J.; Sharif, M. Brain tumor detection using fusion of hand crafted and deep learning features. Cogn. Syst. Res. 2020 , 59 , 221–230. [ Google Scholar ] [ CrossRef ]
  • Amin, J.; Sharif, M.; Raza, M.; Saba, T.; Anjum, M.A. Brain tumor detection using statistical and machine learning method. Comput. Methods Programs Biomed. 2019 , 177 , 69–79. [ Google Scholar ] [ CrossRef ]
  • Amin, J.; Sharif, M.; Yasmin, M.; Fernandes, S.L. Big data analysis for brain tumor detection: Deep convolutional neural networks. Future Gener. Comput. Syst. 2018 , 87 , 290–297. [ Google Scholar ] [ CrossRef ]
  • Amin, J.; Sharif, M.; Yasmin, M.; Fernandes, S.L. A distinctive approach in brain tumor detection and classification using MRI. Pattern Recognit. Lett. 2020 , 139 , 118–127. [ Google Scholar ] [ CrossRef ]
  • Toğaçar, M.; Ergen, B.; Cömert, Z. BrainMRNet: Brain tumor detection using magnetic resonance images with a novel convolutional neural network model. Med. Hypotheses 2020 , 134 , 109531. [ Google Scholar ] [ CrossRef ]
  • Shakeel, P.M.; Tobely, T.E.E.; Al-Feel, H.; Manogaran, G.; Baskar, S. Neural network based brain tumor detection using wireless infrared imaging sensor. IEEE Access 2019 , 7 , 5577–5588. [ Google Scholar ] [ CrossRef ]
  • Hossain, T.; Shishir, F.S.; Ashraf, M.; Al Nasim, M.A.; Shah, F.M. Brain tumor detection using convolutional neural network. In Proceedings of the 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019. [ Google Scholar ]
  • Özyurt, F.; Sert, E.; Avci, E.; Dogantekin, E. Brain tumor detection based on Convolutional Neural Network with neutrosophic expert maximum fuzzy sure entropy. Measurement 2019 , 147 , 106830. [ Google Scholar ] [ CrossRef ]
  • Sajid, S.; Hussain, S.; Sarwar, A. Brain tumor detection and segmentation in MR images using deep learning. Arab. J. Sci. Eng. 2019 , 44 , 9249–9261. [ Google Scholar ] [ CrossRef ]
  • Nazir, M.; Shakil, S.; Khurshid, K. Role of deep learning in brain tumor detection and classification (2015 to 2020): A review. Comput. Med. Imaging Graph. 2021 , 91 , 101940. [ Google Scholar ] [ CrossRef ]
  • Woźniak, M.; Siłka, J.; Wieczorek, M. Deep neural network correlation learning mechanism for CT brain tumor detection. Neural Comput. Appl. 2023 , 35 , 14611–14626. [ Google Scholar ] [ CrossRef ]
  • Majib, M.S.; Rahman, M.; Sazzad, T.M.S.; Khan, N.I.; Dey, S.K. Vgg-scnet: A vgg net-based deep learning framework for brain tumor detection on mri images. IEEE Access 2021 , 9 , 116942–116952. [ Google Scholar ] [ CrossRef ]
  • Alnowami, M.; Taha, E.; Alsebaeai, S.; Anwar, S.M.; Alhawsawi, A. MR image normalization dilemma and the accuracy of brain tumor classification model. J. Radiat. Res. Appl. Sci. 2022 , 15 , 33–39. [ Google Scholar ] [ CrossRef ]
  • Shehab, L.H.; Fahmy, O.M.; Gasser, S.M.; El-Mahallawy, M.S. An efficient brain tumor image segmentation based on deep residual networks (ResNets). J. King Saud Univ.-Eng. Sci. 2021 , 33 , 404–412. [ Google Scholar ] [ CrossRef ]
  • Sankareswaran, S.P.; Krishnan, M. Unsupervised end-to-end brain tumor magnetic resonance image registration using RBCNN: Rigid transformation, B-spline transformation and convolutional neural network. Curr. Med. Imaging 2022 , 18 , 387–397. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shukla, M.; Sharma, K.K. A comparative study to detect tumor in brain MRI images using clustering algorithms. In Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020. [ Google Scholar ]
  • Minarno, A.E.; Mandiri, M.H.C.; Munarko, Y.; Hariyady, H. Convolutional neural network with hyperparameter tuning for brain tumor classification. Kinetik Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 2021 , 6 . [ Google Scholar ] [ CrossRef ]
  • Latif, G.; Iskandar, D.A.; Alghazo, J.; Butt, M.M. Brain MR Image Classification for Glioma tumor detection using deep convolutional neural network features. Curr. Med. Imaging 2021 , 17 , 56–63. [ Google Scholar ]
  • Grampurohit, S.; Shalavadi, V.; Dhotargavi, V.R.; Kudari, M.; Jolad, S. Brain tumor detection using deep learning models. In Proceedings of the 2020 IEEE India Council International Subsections Conference (INDISCON), Visakhapatnam, India, 3–4 October 2020. [ Google Scholar ]
  • Noreen, N.; Palaniappan, S.; Qayyum, A.; Ahmad, I.; Alassafi, M.O. Brain Tumor Classification Based on Fine-Tuned Models and the Ensemble Method. Comput. Mater. Contin. 2021 , 67 , 3967–3982. [ Google Scholar ] [ CrossRef ]
  • Asiri, A.A.; Aamir, M.; Ali, T.; Shaf, A.; Irfan, M.; Mehdar, K.M.; Alqhtani, S.M.; Alghamdi, A.H.; Alshamrani, A.F.A.; Alshehri, O.M. Next-Gen brain tumor classification: Pioneering with deep learning and fine-tuned conditional generative adversarial networks. PeerJ Comput. Sci. 2023 , 9 , e1667. [ Google Scholar ] [ CrossRef ]
  • Khan, M.A.; Mostafa, R.R.; Zhang, Y.-D.; Baili, J.; Alhaisoni, M.; Tariq, U.; Khan, J.A.; Kim, Y.J.; Cha, J. Deep-Net: Fine-Tuned Deep Neural Network Multi-Features Fusion for Brain Tumor Recognition. Comput. Mater. Contin. 2023 , 76 , 3029–3047. [ Google Scholar ]
Dataset 1Dataset 2Dataset 3
ClassImagesTrainTestClassImagesTrainTestClassImagesTrainTest
Glioma16211321300Yes15513520Yes15001200300
Meningioma16451339306No846618No15001200300
Pituitary17571457300
No Tumor20001595405
Fine-Tuning of CNN Hyperparameter
Find the best hyperparameters to train the final model.
Develop new model instances for the best hyperparameters.
Train the model with the specified parameters.
Test and evaluate the CNN model.
Find the best performance metrics (e.g., accuracy).
Sr. NoParametersDataset1 Dataset2Dataset3
ValuesValuesValues
1Batch size888
2Epochs85050
3OptimizerSGD, AdamSGD, AdamSGD, Adam
4Learning rate
5ShuffleEvery epochEvery epochEvery epoch
6Dropout rate0.20.20.2
7Number of filters16, 32, 64, 1282, 4, 16, 32, 644, 8, 16, 32, 64
8Filter size3 × 3, 5 × 53 × 3, 5 × 53 × 3, 5 × 5
9Activation functionReLUReLUReLU
Dataset 1Dataset 2Dataset 3
ClassPreRF1-SAccClassPreRF1-SAccClassPreRF1-SAcc
Glioma0.950.970.9697.18Yes0.901.000.950.93Yes0.970.960.970.96
Meningioma0.930.940.94No1.000.830.91No0.960.970.96
No Tumor1.001.001.00
Pituitary1.000.970.98
Average0.970.970.970.950.910.930.960.960.96
MethodDatasetAccPreRF1-S
Inception-V3 Fine-tuned model [ ]Brain MRI0.940.930.950.94
MobileNetV2 [ ]Brain MRI0.920.930.900.91
Deep-Net: Fine-Tuned model [ ]Brain MRI0.950.930.940.95
CNN model [ ]Brain MRI
Dataset1
Dataset 2

0.96
0.88

0.94
0.87

0.94
0.87

0.94
0.87
Brain MRI0.970.970.970.97
Brain MRI0.930.950.910.93
Brain MRI0.960.960.960.96
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Aamir, M.; Namoun, A.; Munir, S.; Aljohani, N.; Alanazi, M.H.; Alsahafi, Y.; Alotibi, F. Brain Tumor Detection and Classification Using an Optimized Convolutional Neural Network. Diagnostics 2024 , 14 , 1714. https://doi.org/10.3390/diagnostics14161714

Aamir M, Namoun A, Munir S, Aljohani N, Alanazi MH, Alsahafi Y, Alotibi F. Brain Tumor Detection and Classification Using an Optimized Convolutional Neural Network. Diagnostics . 2024; 14(16):1714. https://doi.org/10.3390/diagnostics14161714

Aamir, Muhammad, Abdallah Namoun, Sehrish Munir, Nasser Aljohani, Meshari Huwaytim Alanazi, Yaser Alsahafi, and Faris Alotibi. 2024. "Brain Tumor Detection and Classification Using an Optimized Convolutional Neural Network" Diagnostics 14, no. 16: 1714. https://doi.org/10.3390/diagnostics14161714

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Detection and Classification of Bone Fractures in X-Ray Images using Yolov8

  • Paper Details
  • Abstract & PDF

Citations acebook

Give proper credits, use Citation.

All content is copyright protected.

Connect with us

Ready to publish your paper, paper format:.

  • Paper Format (pdf)
  • Download Format (.doc)
  • Fee Structure
  • Track Paper Status

About IJARIIT

image classification using cnn research paper

Author Links

  • Author Guidelines
  • Conferences
  • Registration Fee
  • FAQ – Help!
  • Editorial Board
  • View Indexing
  • News & Media
  • Track Your Paper
  • Past Issues
  • Guides Login

Creative Commons Licence

  • Google Plus

Before Noon (9am to 1pm) After Noon (2pm to 5pm)

[honeypot honeypot-378]

Your Moblie Number

Enquiry Type —Please choose an option— Enquiry or Question Suggestion or Complain Appreciation or Feedback Conference Support

Leave your message

—Please choose an option— Research Paper Thesis or Dissertation Request to Guide Others

Member Name

What is bigger, 6 or 5?

[honeypot honeypot-527]

What is the capital letter of y?

[honeypot honeypot-310]

Multi-task multi-objective evolutionary network for hyperspectral image classification and pansharpening

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, cmnet: classification-oriented multi-task network for hyperspectral pansharpening.

Hyperspectral pansharpening (HP) can fuse different kinds of remote sensing images for the downstream classification or detection task. Deep learning methods have made rapid development in HP. However, most deep learning-based HP ...

Saliency-Regularized Deep Multi-Task Learning

Multi-task learning (MTL) is a framework that enforces multiple learning tasks to share their knowledge to improve their generalization abilities. While shallow multi-task learning can learn task relations, it can only handle pre-defined features. Modern ...

Enhanced task attention with adversarial learning for dynamic multi-task CNN

  • We propose a novel learning framework of multi-task CNN to enhance task attention through tuning the TTC of the shared subnet DMT-CNN with adversarial ...

Multi-task deep learning is promising to solve multi-label multi-instance visual recognition tasks. However, flexible information sharing in the task group might bring performance bottlenecks to an individual task. To tackle this ...

Information

Published in.

Elsevier Science Publishers B. V.

Netherlands

Publication History

Author tags.

  • Multi-task learning
  • Multi-objective evolution algorithm
  • Hyperspectral pansharpening
  • Hyperspectral classification
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. (PDF) Image Classification using CNN

    image classification using cnn research paper

  2. Using CNNs for Image Classification

    image classification using cnn research paper

  3. Image Classification using CNN : Python Implementation

    image classification using cnn research paper

  4. General structure of CNN for image classification.

    image classification using cnn research paper

  5. A schematic diagram of CNN research from image selection to training

    image classification using cnn research paper

  6. image-classification-using-CNN/image_classification_using_CNN.ipynb at

    image classification using cnn research paper

COMMENTS

  1. An Analysis Of Convolutional Neural Networks For Image Classification

    Abstract. This paper presents an empirical analysis of theperformance of popular convolutional neural networks (CNNs) for identifying objects in real time video feeds. The most popular convolution neural networks for object detection and object category classification from images are Alex Nets, GoogLeNet, and ResNet50.

  2. (PDF) Image Classification using CNN

    Image Classification using CNN. Jatin Kayasth. Y eshiva Uni versity - Katz School of Science and Health. [email protected]. Abstract. Image Classification is a fundamental task that attempts ...

  3. Image Classification Using CNN by Atul Sharma, Gurbakash Phonsa

    This paper presents a technique to classify images using convolutional neural network (CNN) and content based image retrieval (CBIR). It shows the results of image classification on cifar-10 dataset using jupyter notebook and sequential method.

  4. Convolutional neural networks for image classification

    This paper describes a learning approach based on training convolutional neural networks (CNN) for a traffic sign classification system. In addition, it presents the preliminary classification results of applying this CNN to learn features and classify RGB-D images task. To determine the appropriate architecture, we explore the transfer learning technique called "fine tuning technique", of ...

  5. Image Classification using Convolutional Neural Networks

    The Convolutional Neural Network, a machine learning algorithm is being. used for the ima ge classification. In [4], uses deep learning algorithm for classify the quality. of wood board by using ...

  6. Image Classification based on CNN: Models and Modules

    With the recent development of deep learning techniques, deep learning methods are widely used in image classification tasks, especially for those based on convolutional neural networks (CNN). In this paper, a general overview on the image classification tasks will be presented. Besides, the differences and contributions to essential progress in the image classification tasks of the deep ...

  7. [1905.03288] Advancements in Image Classification using Convolutional

    Convolutional Neural Network (CNN) is the state-of-the-art for image classification task. Here we have briefly discussed different components of CNN. In this paper, We have explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from LeNet-5 to latest SENet model. We have discussed the model description and training details of each ...

  8. Advancements in Image Classification using Convolutional Neural Network

    ment in problems of computer vision, especially in image classification.Convolutional Neural Network (CNN or ConvNet) is LeCun et al. introduced the practical model of CNN [6] [7] and developed LeNet-5 [8]. Training by backpropagation [9] algorithm helped LeNet-5 recognizing visual patterns from raw p. xels directly without using any separate ...

  9. Image Classification using Convolutional Neural Network

    Machine learning and deep learning techniques are used in image classification. The execution of a classification system is based on the quality of extracted image features. This paper deals with the Convolutional Neural Network for identifying the category of the image. Convolution and pooling operations are explained for classifying the image. Trained dataset (caltech101) is used for the ...

  10. (PDF) Image Classification using CNN

    Differ ent CNN Models. Contd. Gradient-based learning applied to document recognition: [LeCun et al.], Proceedings of IEEE, 1998. Larger and deeper than LeNet-5. Total 8 layers - 5 convolutional ...

  11. Image Classification using CNN

    Binary neural networks and mobile networks are the most often used approaches for sophisticated deep learning models to carry out a variety of tasks on embedded devices. The deep learning pre-trained model Mobile Net for Single Shot Multi box Detector (SSD) is used in this research work to provide a method for item identification. This approach is used for real-time detection and to find ...

  12. Image Classification Using Convolutional Neural Networks

    Convolutional Neural Network (CNN) is the progressive method for image classification task. In this paper, the authors have mentioned completely different parts and architectures of CNN in brief. They have shown that how CNN advanced from LeNET-5 to the latest SENet model.

  13. Image Classification Using Convolutional Neural Networks With Multi

    Abstract. Convolutional neural networks (CNN) have been widely used in automatic image classification systems. In most cases, features from the top layer of the CNN are utilized for classification; however, those features may not contain enough useful information to predict an image correctly. In some cases, features from the lower layer carry ...

  14. Image Classification Using Convolutional Neural Networks

    Data set augmentation is often used in the field of image classification (commonly performed via use of a CNN [30]), where simple strategies such as the addition of random noise and/or image ...

  15. Classification and Identification of Objects in Images Using CNN

    4.1 Image Classification. Image classification [] is the field of computer vision where we analyze a given image and ascertain in which class it belongs.Thereby, this process assigns maximum chance for a given class. Classifying an image [] is a very important but complex task in computer vision.In order to achieve that we label each class and in neural network, we train the data with ...

  16. Image Classification

    4041 papers with code • 151 benchmarks • 250 datasets. Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection, which involves classification and location of multiple objects within an image, image classification typically ...

  17. Research on image classification model based on deep convolution neural

    For the data set in this paper, because the training sample and the test sample are not well distinguished [], the random generation method is used to avoid the subjective color of the artificial classification.2.2 Image feature extraction based on time-frequency composite weighting. Feature extraction is a concept in computer vision and image processing.

  18. PDF Image Classification using Convolutional Neural Networks

    CONCLUSION In this paper, we used Convolutional Neural Networks ( CNN ) for image classification using images form hand written MNIST data sets . This data sets used both and training and testing purpose using CNN. It provides the accuracy rate 98%. Images used in the training purpose are small and Grayscale images.

  19. MRI-based brain tumour image detection using CNN based deep learning

    Hence it's important to improve the accuracy of previously proposed methods for the betterment of medical image research. In our paper, our proposed 99.74% accurate CNN-based algorithm will help medical representatives in their treatment job without manually analyzing the MRI images so that the treatment speed can be enhanced. 2.

  20. Remote Sensing

    Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To learn the relations among samples in non-grid data, GCNs are employed and ...

  21. Remote Sensing

    The vision transformer (ViT) has demonstrated performance comparable to that of convolutional neural networks (CNN) in the hyperspectral image classification domain. This is achieved by transforming images into sequence data and mining global spectral-spatial information to establish remote dependencies. Nevertheless, both the ViT and CNNs have their own limitations. For instance, a CNN is ...

  22. Image Classification with Classic and Deep Learning Techniques

    To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning ...

  23. Remote Sensing Image Classification Using CNN-LSTM Model

    To avoid this in this paper, we use the CNN-LSTM model. The model performed ineffective classification of remote sensing images; the experimental results show that the proposed model is effective in classifying remote sensing images. Keywords: remote sensing, deep learning, CNN, LSTM, image classification, SIRI-WHU data. 1.

  24. Deep-Learning-Driven Turbidity Level Classification

    This paper implements a convolutional neural network (CNN) to classify water samples based on their turbidity levels. The dataset consisted of images captured under controlled laboratory conditions, with turbidity levels measured using a 2100P Portable Turbidimeter. The CNN achieved a classification accuracy of 97.00% in laboratory settings.

  25. Machine learning and transfer learning techniques for accurate brain

    The features of brain MRI images were extracted using a pre-trained CNN network, i.e., GoogleNet, and acceptable results were achieved in tumor classification. In, 11 the k-nearest neighbors algorithm, one of the most powerful machine learning algorithms for classification, was used and achieved suitable accuracy, though it was lower than that ...

  26. A CNN model with pseudo dense layers: some case studies on medical

    DOI: 10.1007/s13721-024-00474-1 Corpus ID: 271783174; A CNN model with pseudo dense layers: some case studies on medical image classification @article{Biswas2024ACM, title={A CNN model with pseudo dense layers: some case studies on medical image classification}, author={Mridul Biswas and Ritodeep Sikdar and Ram Sarkar and Mahantapas Kundu}, journal={Network Modeling Analysis in Health ...

  27. (PDF) Image Classification using SVM and CNN

    Image Classification using SVM and CNN. March 2020. DOI: 10.1109/ICCSEA49143.2020.9132851. Conference: 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA ...

  28. Brain Tumor Detection and Classification Using an Optimized ...

    Brain tumors are a leading cause of death globally, with numerous types varying in malignancy, and only 12% of adults diagnosed with brain cancer survive beyond five years. This research introduces a hyperparametric convolutional neural network (CNN) model to identify brain tumors, with significant practical implications. By fine-tuning the hyperparameters of the CNN model, we optimize feature ...

  29. Research Paper: Detection and Classification of Bone Fractures in X-Ray

    This study investigates the potential of the YOLO (You Only Look Once) deep learning framework to advance the automatic detection and classification of bone fractures in X-ray images. Specifically, it aims to enhance the YOLOv8 model's ability to identify various fracture types by training it on an extensive and diverse set of labelled X-ray ...

  30. Multi-task multi-objective evolutionary network for hyperspectral image

    research-article. Share on. ... To address this challenge, this paper proposes a multi-task multi-objective evolutionary network (DMOEAD) for joint learning of HC and HP. ... C. Liu, C.-I. Chang, Feedback attention-based dense CNN for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 60 (2021) 1-16. Google Scholar [44]