Machine Learning: Algorithms, Real-World Applications and Research Directions

Affiliations.

  • 1 Swinburne University of Technology, Melbourne, VIC 3122 Australia.
  • 2 Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349 Chattogram, Bangladesh.
  • PMID: 33778771
  • PMCID: PMC7983091
  • DOI: 10.1007/s42979-021-00592-x

In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study's key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Keywords: Artificial intelligence; Data science; Data-driven decision-making; Deep learning; Intelligent applications; Machine learning; Predictive analytics.

© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.

Publication types

Book cover

Proceedings of ICRIC 2019 pp 47–63 Cite as

Machine Learning: A Review of the Algorithms and Its Applications

  • Devanshi Dhall 39 ,
  • Ravinder Kaur 39 &
  • Mamta Juneja 39  
  • Conference paper
  • First Online: 22 November 2019

5166 Accesses

53 Citations

1 Altmetric

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

In today’s world, machine learning has gained much popularity, and its algorithms are employed in every field such as pattern recognition, object detection, text interpretation and different research areas. Machine learning, a part of AI (artificial intelligence), is used in the designing of algorithms based on the recent trends of data. This paper aims at introducing the algorithms of machine learning, its principles and highlighting the advantages and disadvantages in this field. It also focuses on the advancements that have been carried out so that the current researchers can be benefitted out of it. Based on artificial intelligence, many techniques have been developed such as perceptron-based techniques and logic-based techniques and also in statistics, instance-based techniques and Bayesian networks. So, overall this paper produces the work done by the authors in the area of machine learning and its applications and to draw attention towards the scholars who are working in this field.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Das, S., Dey, A., Pal, A., Roy, N.: Applications of artificial intelligence in machine learning: review and prospect. Int. J. Comput. Appl. 115 (9) (2015)

Google Scholar  

Angra, S., Ahuja, S.: Machine learning and its applications: a review. In: 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), pp. 57–60. IEEE (2017)

Dey, A.: Machine learning algorithms: a review. Int. J. Comput. Sci. Inf. Technol. 7 (3), 1174–1179 (2016)

Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26 (3), 159–190 (2006)

Article   Google Scholar  

Simon, A., Singh, M.: An overview of M learning and its Ap. Int. J. Electr. Sci. Electr. Sci. Eng. (IJESE) 22 (2015)

Support Vector Machine, https://scikit-learn.org/stable/modules/svm.html . Last accessed 27 Feb 2019

Linear Regression, http://scikitlearn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py . Last accessed 11 May 2018

Logistic regression 3 class-classifier, http://scikitlearn.org/stable/auto_examples/linear_model/plot_iris_logistic.html#sphx-glr-auto-examples-linear-model-plot-iris-logistic-py . Last accessed 11 May 2018

Demonstration of K-means assumption, http://scikitlearn.org/stable/auto_examples/cluster/plot_kmeans_assumptions.html#sphx-glr-auto-examples-cluster-plot-kmeans-assumptions-py . Last accessed 11 May 2018

Deng, L.: Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Trans. Signal Inf. Process. (2012)

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521 (7553), 436 (2015)

Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends ® Signal Process. 7 (3–4):197–387 (2014)

Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61 , 85–117 (2015)

Kaur, R., Juneja, M.A.: Survey of different imaging modalities for renal cancer. Indian J. Sci. Technol. 9 , 44 (2016)

Bhatia, N., Rana, M.C.: Deep learning techniques and its various algorithms and techniques. Int. J. Eng. Innov. Res. 4 (5) (2015)

Kaur, R., Juneja, M., Mandal, A.K.: A comprehensive review of denoising techniques for abdominal CT images. Multimedia Tools Appl. 77 (17), 22735–22770 (2018)

Valenti, R., Sebe, N., Gevers, T., Cohen, I.: Machine learning techniques for face analysis. In: Machine Learning Techniques for Multimedia, pp. 159–187. Springer, Berlin, Heidelberg (2008)

Wang, J., Yuille, A.L.: Semantic part segmentation using compositional model combining shape and appearance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1788–1797 (2015)

Kaur, R., Juneja, M.: Comparison of different renal imaging modalities: an overview. In: Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 47–57. Springer, Singapore (2018)

Cho, S.B., Won, H.H.: Machine learning in DNA microarray analysis for cancer classification. In: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, vol. 19, pp. 189–198. Australian Computer Society, Inc. (2003)

Kaur, R., Juneja, M.: A survey of kidney segmentation techniques in CT images. Curr. Med. Imaging Rev. 14 (2), 238–250 (2018)

Download references

Author information

Authors and affiliations.

University Institute of Engineering and Technology, Punjab University, Chandigarh, India

Devanshi Dhall, Ravinder Kaur & Mamta Juneja

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Ravinder Kaur or Mamta Juneja .

Editor information

Editors and affiliations.

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India

Pradeep Kumar Singh

Indian Institute of Technology Delhi, New Delhi, Delhi, India

Arpan Kumar Kar

Central University of Jammu, Jammu, Jammu and Kashmir, India

Yashwant Singh

Indian Institute of Technology Patna, Patna, Bihar, India

Maheshkumar H. Kolekar

Institute of Technology, Nirma University, Ahmedabad, Gujarat, India

Sudeep Tanwar

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Cite this paper.

Dhall, D., Kaur, R., Juneja, M. (2020). Machine Learning: A Review of the Algorithms and Its Applications. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_5

Download citation

DOI : https://doi.org/10.1007/978-3-030-29407-6_5

Published : 22 November 2019

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-29406-9

Online ISBN : 978-3-030-29407-6

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 June 2020

Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward

  • Samuele Lo Piano   ORCID: orcid.org/0000-0002-2625-483X 1 , 2  

Humanities and Social Sciences Communications volume  7 , Article number:  9 ( 2020 ) Cite this article

95k Accesses

89 Citations

176 Altmetric

Metrics details

  • Science, technology and society

Decision-making on numerous aspects of our daily lives is being outsourced to machine-learning (ML) algorithms and artificial intelligence (AI), motivated by speed and efficiency in the decision process. ML approaches—one of the typologies of algorithms underpinning artificial intelligence—are typically developed as black boxes. The implication is that ML code scripts are rarely scrutinised; interpretability is usually sacrificed in favour of usability and effectiveness. Room for improvement in practices associated with programme development have also been flagged along other dimensions, including inter alia fairness, accuracy, accountability, and transparency. In this contribution, the production of guidelines and dedicated documents around these themes is discussed. The following applications of AI-driven decision-making are outlined: (a) risk assessment in the criminal justice system, and (b) autonomous vehicles, highlighting points of friction across ethical principles. Possible ways forward towards the implementation of governance on AI are finally examined.

Similar content being viewed by others

machine learning algorithms research paper

Principles alone cannot guarantee ethical AI

Brent Mittelstadt

machine learning algorithms research paper

The global landscape of AI ethics guidelines

Anna Jobin, Marcello Ienca & Effy Vayena

How AI can learn from the law: putting humans in the loop only on appeal

I. Glenn Cohen, Boris Babic, … Klaus Wertenbroch

Introduction

Artificial intelligence (AI) is the branch of computer science that deals with the simulation of intelligent behaviour in computers as regards their capacity to mimic, and ideally improve , human behaviour. To achieve this, the simulation of human cognition and functions, including learning and problem-solving, is required (Russell, 2010 ). This simulation may limit itself to some simple predictable features, thus limiting human complexity (Cowls, 2019 ).

AI became a self-standing discipline in the year 1955 (McCarthy et al., 2006 ) with significant development over the last decades. AI resorts to ML to implement a predictive functioning based on data acquired from a given context. The strength of ML resides in its capacity to learn from data without need to be explicitly programmed (Samuel, 1959 ); ML algorithms are autonomous and self-sufficient when performing their learning function. This is the reason why they are ubiquitous in AI developments. Further to this, ML implementations in data science and other applied fields are conceptualised in the context of a final decision-making application, hence their prominence.

Applications in our daily lives encompass fields, such as (precision) agriculture (Sennaar, 2019 ), air combat and military training (Gallagher, 2016 ; Wong, 2020 ), education (Sears, 2018 ), finance (Bahrammirzaee, 2010 ), health care (Beam and Kohane, 2018 ), human resources and recruiting (Hmoud and Laszlo, 2019 ), music composition (Cheng, 2009/09 ), customer service (Kongthon et al., 2009 ), reliable engineering and maintenance (Dragicevic et al., 2019 ), autonomous vehicles and traffic management (Ye, 2018 ), social-media news-feed (Rader et al., 2018 ), work scheduling and optimisation (O’Neil, 2016 ), and several others.

In all these fields, an increasing amount of functions are being ceded to algorithms to the detriment of human control, raising concern for loss of fairness and equitability (Sareen et al., 2020 ). Furthermore, issues of garbage-in-garbage-out (Saltelli and Funtowicz, 2014 ) may be prone to emerge in contexts when external control is entirely removed. This issue may be further exacerbated by the offer of new services of auto-ML (Chin, 2019 ), where the entire algorithm development workflow is automatised and the residual human control practically removed.

In the following sections, we will (i) detail a series of research questions around the ethical principles in AI; (ii) take stock of the production of guidelines elaborated in the field; (iii) showcase their prominence in practical examples; and (iv) discuss actions towards the inclusion of these dimensions in the future of AI ethics.

Research questions on the ethical dimensions of artificial intelligence

Critical aspects in AI deployment have already gained traction in mainstreaming literature and media. For instance, according to O’Neil ( 2016 ), a main shortcoming of ML approaches is the fact these resort to proxies for driving trends, such as person’s ZIP code or language in relation with the capacity of an individual to pay back a loan or handle a job, respectively. However, these correlations may be discriminatory, if not illegal.

Potential black swans (Taleb, 2007 ) in the code should also be considered. These have been documented, for instance, in the case of the Amazon website, for which errors, such as the quotation of plain items (often books) up to 10,000 dollars (Smith, 2018 ) have been reported. While mistakes about monetary values may be easy to spot, the situation may become more complex and less intelligible when incommensurable dimensions come to play. That is the reason why a number of guidelines on the topic of ethics in AI have been proliferating over the last few years.

While reflections around the ethical implications of machines and automation deployment were already put forth in the ’50s and ’60s (Samuel, 1959 ; Wiener, 1988 ), the increasing use of AI in many fields raises new important questions about its suitability (Yu et al., 2018 ). This stems from the complexity of the aspects undertaken and the plurality of views, stakes, and values at play. A fundamental aspect is how and to what extent the values and the perspectives of the involved stakeholders have been taken care of in the design of the decision-making algorithm (Saltelli, 2020 ). In addition to this ex-ante evaluation, an ex-post evaluation would need to be put in place so as to monitor the consequences of AI-driven decisions in making winners and losers.

To wrap up, it is fundamental to assess how and if ethical aspects have been included in the AI-driven decision-making implemented by asking questions such as:

What are the most prominent ethical concerns raised by large-scale deployment of AI applications?

How are these multiple dimensions interwoven?

What are the actions the involved stakeholders are carrying out to address these concerns?

What are possible ways forward to improve ML and AI development and use over their full life-cycle?

We will firstly examine the production of relevant guidelines in the fields along with academic secondary literature. These aspects will then be discussed in the context of two applied cases: (i) recidivism-risk assessment in the criminal justice system, and (ii) autonomous vehicles.

Guidelines and secondary literature on AI ethics, its dimensions and stakes

The production of dedicated documents has been skyrocketing from 2016 (Jobin et al., 2019 ). We here report on the most prominent international initiatives. A suggested reading on national and international AI strategies providing a comprehensive list of documents (Future of Earth Institute, 2020 ).

The France’s Digital Republic Act gives the right to an explanation as regards decisions on an individual made through the use of administrative algorithms (Edwards and Veale, 2018 ). This law touches upon several aspects including:

how and to what extent the algorithmic processing contributed to the decision-making;

which data was processed and its source;

how parameters were treated and weighted;

which operations were carried out in the treatment.

Sensitive governmental areas, such as national security and defence, and the private sector (the largest user and producer of ML algorithms by far) are excluded from this document.

An international European initiative is the multi-stakeholder European Union High-Level Expert Group on Artificial Intelligence , which is composed by 52 experts from academia, civil society, and industry. The group produced a deliverable on the required criteria for AI trustworthiness (Daly, 2019 ). Even articles 21 and 22 of the recent European Union General Data Protection Regulation include passages functional to AI governance, although further action has been recently demanded from the European Parliament (De Sutter, 2019 ). In this context, China has also been allocating efforts on privacy and data protection (Roberts, 2019 ).

As regards secondary literature, Floridi and Cowls ( 2019 ) examined a list of statements/declarations elaborated since 2016 from multi-stakeholder organisations. A set of 47 principles has been identified, which mapped onto five overarching dimensions (Floridi and Cowls, 2019 ): beneficence, non-maleficence, autonomy, justice and, explicability . The latter is a new dimension specifically acknowledged in the case of AI, while the others were already identified in the controversial domain of bioethics .

Jobin et al. ( 2019 ) reviewed 84 documents, which were produced by several actors of the field, almost half of which from private companies or governmental agencies. The classification proposed by Jobin et al. ( 2019 ) is around a slightly different set of values: transparency, justice and fairness, non-maleficience, responsibility and privacy . Other potentially relevant dimensions, such as accountability and responsibility, were rarely defined in the studies reviewed by these authors.

Seven of the most prominent value statements from the AI/ML fields were examined in Greene et al. ( 2019 ): The Partnership on AI to Benefit People and Society ; The Montreal Declaration for a Responsible Development of Artificial Intelligence ; The Toronto Declaration Protecting the rights to equality and non-discrimination in machine-learning systems ; OpenAI ; The Centre for Humane Technology ; Fairness, Accountability and Transparency in Machine Learning ; Axon’s AI Ethics Board for Public Safety . Greene et al. ( 2019 ) found seven common core elements across these documents: (i) design’s moral background (universal concerns, objectively measured); (ii) expert oversight; (iii) values-driven determinism; (iv) design as locus of ethical scrutiny; (v) better building; (vi) stakeholder-driven legitimacy; and, (vii) machine translation.

Mittelstadt ( 2019 ) critically analysed the current debate and actions in the field of AI ethics and noted that the dimensions addressed in AI ethics are converging towards those of medical ethics. However, this process appears problematic due to four main differences between medicine and the medical professionals on one side, and AI and its developers on the other. Firstly, the medical professional rests on common aims and fiduciary duties, which AI developers lack. Secondly, a formal profession with a set of clearly defined and governed good-behaviour practices exists in medicine. This is not the case for AI, which also lacks a full understanding of the consequences of the actions enacted by algorithms (Wallach and Allen, 2008 ). Thirdly, AI faces the difficulty of translating overarching principle into practices. Even its current setting of seeking maximum speed, efficiency and profit clashes with the resource and time requirements of an ethical assessment and/or counselling. Finally, the accountability of professionals or institutions is at this stage mainly theoretical, having the vast majority of these guidelines been defined on a merely voluntary basis and hence with the total lack of a sanctionary scheme for non-compliance.

Points of friction between ethical dimensions

Higher transparency is a common refrain when discussing ethics of algorithms, in relation to dimensions such as how an algorithmic decision is arrived at, based on what assumptions, and how this could be corrected to incorporate feedback from the involved parties. Rudin ( 2019 ) argued that the community of algorithm developers should go beyond explaining black-box models by developing interpretable models in the first place.

On a larger scale, the use of open-source software in the context of ML applications has already been advocated for over a decade (Thimbleby, 2003 ) with an indirect call for tools to execute more interpretable and reproducible programming such as Jupyter Notebooks , available from 2015 onwards. However, publishing scripts expose their developers to the public scrutiny of professional programmers, who may find shortcomings in the development of the code (Sonnenburg, 2007 ).

Ananny and Crawford ( 2018 ) comment that resorting to full algorithmic transparency may not be an adequate means to address their ethical dimensions; opening up the black-box would not suffice to disclose their modus operandi . Moreover, developers of algorithm may not be capable of explaining in plain language how a given tool works and what functional elements it is based on. A more social relevant understanding would encompass the human/non-human interface (i.e., looking across the system rather than merely inside ). Algorithmic complexity and all its implications unravel at this level, in terms of relationships rather than as mere self-standing properties.

Other authors pointed to possible points of friction between transparency and other relevant ethical dimensions. de Laat ( 2018 ) argues that transparency and accountability may even be at odds in the case of algorithms. Hence, he argues against full transparency along four main lines of reasoning: (i) leaking of privacy sensitive data into the open; (ii) backfiring into an implicit invitation to game the system; (iii) harming of the company property rights with negative consequences on their competitiveness (and on the developers reputation as discussed above); (iv) inherent opacity of algorithms, whose interpretability may be even hard for experts (see the example below about the code adopted in some models of autonomous vehicles). All these arguments suggest limitations to full disclosure of algorithms, be it that the normative implications behind these objections should be carefully scrutinised.

Raji et al. ( 2020 ) suggest that a process of algorithmic auditing within the software-development company could help in tackling some of the ethical issues raised. Larger interpretability could be in principle achieved by using simpler algorithms, although this may come at the expenses of accuracy. To this end, Watson and Floridi ( 2019 ) defined a formal framework for interpretable ML, where explanatory accuracy can be assessed against algorithmic simplicity and relevance.

Loss in accuracy may be produced by the exclusion of politically critical features (such as gender, race, age, etc.) from the pool of training predictive variables. For instance, Amazon scrapped a gender-biased recruitment algorithm once it realised that despite excluding gender, the algorithm was resorting to surrogate gender variables to implement its decisions (Dastin, 2018 ). This aspect points again to possible political issues of a trade-off between fairness, demanded by society, and algorithmic accuracy, demanded by, e.g., a private actor.

Fairness may be further hampered by reinforcement effects. This is the case of algorithms attributing credit scores, that have a reinforcement effect proportional to people wealth that de facto rules out credit access for people in a more socially difficult condition (O’Neil, 2016 ).

According to Floridi and Cowls ( 2019 ) a prominent role is also played by the autonomy dimension; the possibility of refraining from ceding decision power to AI for overriding reasons (e.g., the gain of efficacy is not deemed fit to justify the loss of control over decision-making). In other words, machines autonomy could be reduced in favour of human autonomy according to this meta-autonomy dimension.

Contrasting dimensions in terms of the theoretical framing of the issue also emerged from the review of Jobin et al. ( 2019 ), as regards interpretation of ethical principles, reasons for their importance, ownership and responsibility of their implementation. This also applies to different ethical principles, resulting in the trade-offs previously discussed, difficulties in setting prioritisation strategies, operationalisation and actual compliance with the guidelines. For instance, while private actors demand and try to cultivate trust from their users, this runs counter to the need for society to scrutinise the operation of algorithms in order to maintain developer accountability (Cowls, 2019 ). Attributing responsibilities in complicated projects where many parties and developers may be involved, an issue known as the problem of many hands (Nissenbaum, 1996 ), may indeed be very difficult.

Conflicts may also emerge between the requirements to overcome potential algorithm deficits in accuracy associated with large data bases and the individual rights to privacy and autonomy of decision. Such conflicts may exacerbate tensions, further complicating agreeing on standards and practices.

In the following two sections, the issues and points of friction raised are examined in two practical case studies, criminal justice and autonomous vehicles. These examples have been selected due to their prominence in the public debate on the ethical aspects of AI and ML algorithms.

Machine-learning algorithms in the field of criminal justice

ML algorithms have been largely used to assist juridical deliberation in many states of the USA (Angwin and Larson, 2016 ). This country faces the issue of the world’s highest incarcerated population, both in absolute and per-capita terms (Brief, 2020 ). The COMPAS algorithm, developed by the private company Northpointe , attributes a 2-year recidivism-risk score to arrested people. It also evaluates the risk of violent recidivism as a score.

The fairness of the algorithm has been questioned in an investigative report, that examined a pool of cases where a recidivism score was attributed to >18,000 criminal defendants in Broward County, Florida and flagged up a potential racial bias in the application of the algorithm (Angwin and Larson, 2016 ). According to the authors of the report, the recidivism-risk was systematically overestimated for black people: the decile distribution of white defendants was skewed towards the lower end. Conversely, the decile distribution of black defendants was only slightly decreasing towards the higher end. The risk of violent recidivism within 2 years followed a similar trend. This analysis was debunked by the company, which, however, refused to disclose the full details of its proprietary code. While the total number of variables amounts to about 140, only the core variables were disclosed (Northpointe, 2012 ). The race of the subject was not one of those.

Here, a crucial point is how this fairness is to be attained: whether it is more important a fair treatment across groups of individuals or within the same group. For instance, let us take the case of gender, where men are overrepresented in prison in comparison with women. As to account for this aspect, the algorithm may discount violent priors for men in order to reduce their recidivism-risk score. However, attaining this sort of algorithmic fairness would imply inequality of treatment across genders (Berk et al., 2018 ).

Fairness could be further hampered by the combined use of this algorithm with others driving decisions on neighbourhood police patrolling. The fact these algorithms may be prone to drive further patrolling in poor neighbourhoods may result from a training bias as crimes occurring in public tend to be more frequently reported (Karppi, 2018 ). One can easily understand how these algorithms may jointly produce a vicious cycle—more patrolling would lead to more arrests that would worsen the neighbourhood average recidivism-risk score , which would in turn trigger more patrolling. All this would result in exacerbated inequalities, likewise the case of credit scores previously discussed (O’Neil, 2016 ).

A potential point of friction may also emerge between the algorithm dimensions of fairness and accuracy. The latter may be theoretically defined as the classification error in terms of rate of false positive (individuals labelled at risk of recidivism, that did not re-offend within 2 years) and false negative (individuals labelled at low risk of recidivism, that did re-offend within the same timeframe) (Loi and Christen, 2019 ). Different classification accuracy (the fraction of observed outcomes in disagreement with the predictions) and forecasting accuracy (the fraction of predictions in disagreement with the observed outcomes) may exist across different classes of individuals (e.g., black or white defendants). Seeking equal rates of false positive and false negative across these two pools would imply a different forecasting error (and accuracy) given the different characteristics of the two different training pools available for the algorithm. Conversely, having the same forecasting accuracy would come at the expense of different classification errors between these two pools (Corbett-Davies et al., 2016 ). Hence, a trade-off exists between these two different shades of fairness, which derives from the very statistical properties of the data population distributions the algorithm has been trained on. However, the decision-making rests again on the assumptions the algorithm developers have adopted, e.g., on the relative importance of false positive and false negative (i.e., the weights attributed to the different typologies of errors, and the accuracy sought (Berk, 2019 )). When it comes to this point, an algorithm developer may decide (or be instructed) to train his/her algorithm to attribute, e.g., a five/ten/twenty times higher weight for a false negative (re-offender, low recidivism-risk score) in comparison with a false positive (non re-offender, high recidivism-risk score).

As with all ML, an issue of transparency exists as no one knows what type of inference is drawn on the variables out of which the recidivism-risk score is estimated. Reverse-engineering exercises have been run so as to understand what are the key drivers on the observed scores. Rudin ( 2019 ) found that the algorithm seemed to behave differently from the intentions of their creators (Northpointe, 2012 ) with a non-linear dependence on age and a weak correlation with one’s criminal history. These exercises (Rudin, 2019 ; Angelino et al., 2018 ) showed that it is possible to implement interpretable classification algorithms that lead to a similar accuracy as COMPAS. Dressel and Farid ( 2018 ) achieved this result by using a linear predictor-logistic regressor that made use of only two variables (age and total number of previous convictions of the subject).

Machine-learning algorithms in the field of autonomous vehicles

The case of autonomous vehicles, also known as self-driving vehicles, poses different challenges as a continuity of decisions is to be enacted while the vehicle is moving. It is not a one-off decision as in the case of the assessment of recidivism risk.

An exercise to appreciate the value-ladenness of these decisions is the moral-machine experiment (Massachussets Institute of Technology 2019 )—a serious game where users are requested to fulfil the function of an autonomous-vehicle decision-making algorithm in a situation of danger. This experiment entails performing choices that would prioritise the safety of some categories of users over others. For instance, choosing over the death of car occupants, pedestrians, or occupants of other vehicles, et cetera. While such extreme situations may be a simplification of reality, one cannot exclude that the algorithms driving an autonomous-vehicle may find themselves in circumstances where their decisions may result in harming some of the involved parties (Bonnefon et al., 2019 ).

In practice, the issue would be framed by the algorithm in terms of a statistical trolley dilemma in the words of Bonnefon et al. ( 2019 ), whereby the risk of harm for some road users will be increased. This corresponds to a risk management situation by all means, with a number of nuances and inherent complexity (Goodall, 2016 ).

Hence, autonomous vehicles are not bound to play the role of silver bullets, solving once and forever the vexing issue of traffic fatalities (Smith, 2018 ). Furthermore, the way decisions enacted could backfire in complex contexts to which the algorithms had no extrapolative power, is an unpredictable issue one has to deal with (Wallach and Allen, 2008 ; Yurtsever et al., 2020 ).

Coding algorithms that assure fairness in autonomous vehicles can be a very challenging issue. Contrasting and incommensurable dimensions are likely to emerge (Goodall, 2014 ) when designing an algorithm to reduce the harm of a given crash. For instance, in terms of material damage against human harm. Odds may emerge between the interest of the vehicle owner and passengers, on one side, and the collective interest of minimising the overall harm, on the other. Minimising the overall physical harm may be achieved by implementing an algorithm that, in the circumstance of an unavoidable collision, would target the vehicles with the highest safety standards. However, one may want to question the fairness of targeting those who have invested more in their own and others’ safety. The algorithm may also face a dilemma between low probability of a serious harm and higher probability of a mild harm. Unavoidable normative rules will need to be included in the decision-making algorithms to tackle these types of situations.

Accuracy in the context of self-autonomous vehicles rests on their capacity to correctly simulate the course of the events. While this is based on physics and can be informed by the numerous sensors these vehicles are equipped with, unforeseen events can still play a prominent role, and profoundly affect the vehicles behaviour and reactions (Yurtsever et al., 2020 ). For instance, fatalities due to autonomous-vehicle malfunctioning were reported as caused by the following failures: (i) the incapability of perceiving a pedestrian as such (National Transport Safety Board 2018 ); (ii) the acceleration of the vehicle in a situation when braking was required due to contrasting instructions from different algorithms the vehicle was hinged upon (Smith, 2018 ). In this latter case, the complexity of autonomous-vehicle algorithms was witnessed by the millions lines of code composing their scripts, a universe no one fully understands in the words of The Guardian (Smith, 2018 ), so that the causality of the decisions made was practically impossible to scrutinise. Hence, no corrective action in the algorithm code may be possible at this stage, with no room for improvement in accuracy.

One should also not forget that these algorithms are learning by direct experience and they may still end up conflicting with the initial set of ethical rules around which they have been conceived. Learning may occur through algorithms interaction taking place at a higher hierarchical level than the one imagined in the first place (Smith, 2018 ). This aspect would represent a further open issue to be taken into account in their development (Markham et al., 2018 ). It also poses further tension between the accuracy a vehicle manufacturer seeks and the capability to keep up the agreed fairness standards upstream from the algorithm development process.

Discussion and conclusions

In this contribution, we have examined the ethical dimensions affected by the application of algorithm-driven decision-making. These are entailed both ex-ante, in terms of the assumptions underpinning the algorithm development, and ex-post as regards the consequences upon society and social actors on whom the elaborated decisions are to be enforced.

Decision-making-based algorithms rest inevitably on assumptions, even silent ones, such as the quality of data the algorithm is trained on (Saltelli and Funtowicz, 2014 ), or the actual modelling relations adopted (Hoerl, 2019 ), with all the implied consequences (Saltelli, 2019 ).

A decision-making algorithm will always be based on a formal system, which is a representation of a real system (Rosen, 2005 ). As such, it will always be based on a restricted set of relevant relations, causes, and effects. It does not matter how complicated the algorithm may be (how many relations may be factored in), it will always represent one-specific vision of the system being modelled (Laplace, 1902 ).

Eventually, the set of decision rules underpinning the AI algorithm derives from human-made assumptions, such as, where to define the boundary between action and no action, between different possible choices. This can only take place at the human/non-human interface: the response of the algorithm is driven by these human-made assumptions and selection rules. Even the data on which an algorithm is trained on are not an objective truth, they are dependent upon the context in which they have been produced (Neff et al., 2017 ).

Tools for technically scrutinising the potential behaviour of an algorithm and its uncertainty already exist and could be included in the workflow of algorithm development. For instance, global sensitivity analysis (Saltelli, 2008 ) may help in exploring how the uncertainty in the input parameters and modelling assumptions would affect the output. Additionally, a modelling of the modelling process would assist in the model transparency and in addressing questions such as: Are the results from a particular model more sensitive to changes in the model and the methods used to estimate its parameters, or to changes in the data? (Majone, 1989 ).

Tools of post-normal-science inspiration for knowledge and modelling quality assessment could be adapted to the analysis of algorithms, such as the NUSAP (Numeral Unit Spread Assessment Pedigree) notation system for the management and communication of uncertainty (Funtowicz and Ravetz, 1990 ; Van Der Sluijs, 2005 ) and sensitivity auditing (Saltelli and Funtowicz, 2014 ), respectively. Ultimately, developers should acknowledge the limits of AI, and what its ultimate function should be in the equivalent of an Hippocratic Oath for ML developers (O’Neil, 2016 ). An example comes from the field of financial modelling, with a manifesto elaborated in the aftermath of the 2008 financial crisis (Derman and Wilmott, 2009 ).

As to address these dimensions, value statements and guidelines have been elaborated by political and multi-stakeholder organisations. For instance, The Alan Turing Institute released a guide for responsible design and implementation of AI (Leslie, 2019 ) that covers the whole life-cycle of design, use, and monitoring. However, the field of AI ethics is just at its infancy and it is still to be conceptualised how AI developments that encompass ethical dimensions could be attained. Some authors are pessimistic, such as Supiot ( 2017 ) who speaks of governance by numbers , where quantification is replacing the traditional decision-making system and profoundly affecting the pillar of equality of judgement. Trying to revert the current state of affairs may expose the first movers in the AI field to a competitive disadvantage (Morley et al., 2019 ). One should also not forget that points of friction across ethical dimensions may emerge, e.g., between transparency and accountability, or accuracy and fairness as highlighted in the case studies. Hence, the development process of the algorithm cannot be perfect in this setting, one has to be open to negotiation and unavoidably work with imperfections and clumsiness (Ravetz, 1987 ).

The development of decision-making algorithms remains quite obscure in spite of the concerns raised and the intentions manifested to address them. Attempts to expose to public scrutiny the algorithms developed are yet scant. As are the attempt to make the process more inclusive, with a higher participation from all the stakeholders. Identifying a relevant pool of social actors may require an important effort in terms of stakeholders’ mapping so as to assure a complete, but also effective, governance in terms of number of participants and simplicity of working procedures. The post-normal-science concept of extended peer communities could assist also in this endeavour (Funtowicz and Ravetz, 1997 ). Example-based explanations (Molnar, 2020 ) may also contribute to an effective engagement of all the parties by helping in bridging technical divides across developers, experts in other fields, and lay-people.

An overarching meta-framework for the governance of AI in experimental technologies (i.e., robot use) has also been proposed (Rego de Almeida et al., 2020 ). This initiative stems from the attempt to include all the forms of governance put forth and would rest on an integrated set of feedback and interactions across dimensions and actors. An interesting proposal comes from Berk ( 2019 ), who asked for the intervention of super partes authorities to define standards of transparency, accuracy and fairness for algorithm developers in line with the role of the Food and Drug administration in the US and other regulation bodies. A shared regulation could help in tackling the potential competitive disadvantage a first mover may suffer. The development pace of new algorithms would be necessarily reduced so as to comply with the standards defined and the required clearance processes. In this setting, seeking algorithm transparency would not be harmful for their developers as scrutiny would be delegated to entrusted intermediate parties, to take place behind closed doors (de Laat, 2018 ).

As noted by a perceptive reviewer, ML systems that keep learning are dangerous and hard to understand because they can quickly change. Thus, could a ML system with real world consequences be “locked down” to increase transparency? If yes, the algorithm could become defective. If not, transparency today may not be helpful in understanding what the system does tomorrow. This issue could be tackled by hard-coding the set of rules on the behaviour of the algorithm, once these are agreed upon among the involved stakeholders. This would prevent the algorithm-learning process from conflicting with the standards agreed. Making mandatory to deposit these algorithms in a database owned and operated by this entrusted super-partes body could ease the development of this overall process.

Ananny M, Crawford K (2018) Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society 20:973–989

Article   Google Scholar  

Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2018) Learning certifiably optimal rule lists for categorical data. http://arxiv.org/abs/1704.01701

Angwin J, Larson J (2016) Machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19:1165–1195

Beam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA 319:1317

Berk R (2019) Machine learning risk assessments in criminal justice settings. Springer International Publishing, Cham

Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2018) Fairness in criminal justice risk assessments: the state of the art. Soc Methods Res 004912411878253

Board NTS (2018) Vehicle automation report. Tech. Rep. HWY18MH010, Office of Highway Safety, Washington, D.C.

Bonnefon J-F, Shariff A, Rahwan I (2019) The trolley, the bull bar, and why engineers should care about the ethics of autonomous cars [point of view]. Proc IEEE 107:502–504

Brief WP (2020) World prison brief- an online database comprising information on prisons and the use of imprisonment around the world. https://www.prisonstudies.org/

Cheng J (2009) Virtual composer makes beautiful music and stirs controversy. https://arstechnica.com/science/news/2009/09/virtual-composer-makes-beautiful-musicand-stirs-controversy.ars

Chin J (2019) The death of data scientists. https://towardsdatascience.com/the-death-of-data-scientists-c243ae167701

Corbett-Davies S, Pierson E, Feller A, Goel S (2016) A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. Washington Post. https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/

Cowls J (2020) Deciding how to decide: six key questions for reducing AI’s democratic deficit. In: Burr C, Milano S (eds) The 2019 Yearbook of the Digital Ethics Lab, Digital ethics lab yearbook. Springer International Publishing, Cham. pp. 101–116. https://doi.org/10.1007/978-3-030-29145-7_7

Daly A et al. (2019) Artificial intelligence, governance and ethics: global perspectives. SSRN Electron J. https://www.ssrn.com/abstract=3414805

Dastin J (2018) Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

De Sutter P (2020) Automated decision-making processes: ensuring consumer protection, and free movement of goods and services. https://www.europarl.europa.eu/meetdocs/2014_2019/plmrep/COMMITTEES/IMCO/DV/2020/01-22/Draft_OQ_Automated_decision-making_EN.pdf

Derman E, Wilmott P (2009) The financial modelers’ manifesto. SSRN Electron J. http://www.ssrn.com/abstract=1324878 .

Dragičević T, Wheeler P, Blaabjerg F (2019) Artificial intelligence aided automated design for reliability of power electronic systems. IEEE Trans Power Electron 34:7161–7171

Article   ADS   Google Scholar  

Dressel J, Farid H (2018) The accuracy, fairness, and limits of predicting recidivism. Sci Adv 4:eaao5580

Edwards L, Veale M (2018) Enslaving the algorithm: from A -right to an explanation- to A -right to better decisions-? IEEE Security, Priv 16:46–54

Floridi L, Cowls J (2019) A unified framework of five principles for AI in society. Harvard Data Science Review. https://hdsr.mitpress.mit.edu/pub/l0jsh9d1

Funtowicz SO, Ravetz JR (1990) Uncertainty and quality in science for policy. Springer Science, Business Media, Berlin, Heidelberg

Book   Google Scholar  

Funtowicz S, Ravetz J (1997) Environmental problems, post-normal science, and extended peer communities. Études et Recherches sur les Systémes Agraires et le Développement. INRA Editions. pp. 169–175

Future of Earth Institute (2020) National and International AI Strategies. https://futureoflife.org/national-international-ai-strategies/

Gallagher S (2016) AI bests Air Force combat tactics experts in simulated dogfights. https://arstechnica.com/information-technology/2016/06/ai-bests-air-force-combat-tactics-experts-in-simulated-dogfights/

Goodall NJ (2014) Ethical decision making during automated vehicle crashes. Transportation Res Rec: J Transportation Res Board 2424:58–65

Goodall NJ (2016) Away from trolley problems and toward risk management. Appl Artif Intell 30:810–821

Greene D, Hoffmann AL, Stark L (2019) Better, nicer, clearer, fairer: a critical assessment of the movement for ethical artificial intelligence and machine learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences

Hmoud B, Laszlo V (2019) Will artificial intelligence take over human-resources recruitment and selection? Netw Intell Stud VII:21–30

Hoerl RW (2019) The integration of big data analytics into a more holistic approach-JMP. Tech. Rep., SAS Institute. https://www.jmp.com/en_us/whitepapers/jmp/integration-of-big-data-analytics-holistic-approach.html

Jobi A, Ienca M, Vayena E (2019) Artificial intelligence: the global landscape of ethics guidelines. Nat Mach Intell 1:389–399

Karppi T (2018) The computer said so-: on the ethics, effectiveness, and cultural techniques of predictive policing. Soc Media + Soc 4:205630511876829

Kongthon A, Sangkeettrakarn C, Kongyoung S, Haruechaiyasak C (2009) Implementing an online help desk system based on conversational agent. In: Proceedings of the International Conference on Management of Emergent Digital EcoSystems, MEDES ’09, vol. 69. ACM, New York, NY, USA. pp. 450–69:451. Event-place: France. https://doi.org/10.1145/1643823.1643908

de Laat PB (2018) Algorithmic decision-making based on machine learning from big data: can transparency restore accountability? Philos Technol 31:525–541

Laplace PS (1902) A philosophical essay on probabilities. J. Wiley, New York; Chapman, Hall, London. http://archive.org/details/philosophicaless00lapliala

Leslie D (2019) Understanding artificial intelligence ethics and safety. http://arxiv.org/abs/1906.05684

Loi M, Christen M (2019) How to include ethics in machine learning research. https://ercim-news.ercim.eu/en116/r-s/how-to-include-ethics-in-machine-learning-research

Majone G (1989) Evidence, argument, and persuasion in the policy process. Yale University Press, Yale

Google Scholar  

Markham AN, Tiidenberg K, Herman A (2018) Ethics as methods: doing ethics in the era of big data research-introduction. Soc Media + Soc 4:205630511878450

Massachussets Institute of Technology (2019) Moral machine. Massachussets Institute of Technology. http://moralmachine.mit.edu

McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag 27:12–12

Mittelstadt B (2019) Principles alone cannot guarantee ethical AI. Nat Mach Intell 1:501–507

Molnar C (2020) Interpretable machine learning (2020). https://christophm.github.io/interpretable-ml-book/

Morley J, Floridi L, Kinsey K, Elhalal A (2019) From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices. Tech Rep. https://arxiv.org/abs/1905.06876

Neff G, Tanweer A, Fiore-Gartland B, Osburn L (2017) Critique and contribute: a practice-based framework for improving critical data studies and data science. Big Data 5:85–97

Nissenbaum H (1996) Accountability in a computerized society. Sci Eng Ethics 2:25–42

Northpointe (2012) Practitioner’s guide to COMPAS. northpointeinc.com/files/technical_documents/FieldGuide2_081412.pdf

O’Neil C (2016) Weapons of math destruction: how big data increases inequality and threatens democracy, 1st edn. Crown, New York

MATH   Google Scholar  

Rader E, Cotter K, Cho J (2018) Explanations as mechanisms for supporting algorithmic transparency. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’1 8 . ACM Press, Montreal QC, Canada. pp. 1–13. http://dl.acm.org/citation.cfm?doid=3173574.3173677

Raji ID et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency pp 33–44 (Association for Computing Machinery, 2020). https://doi.org/10.1145/3351095.3372873

Ravetz JR (1987) Usable knowledge, usable ignorance: incomplete science with policy implications. Knowledge 9:87–116

Rêgo de Almeida PG, Denner dos Santos C, Silva Farias J (2020) Artificial intelligence regulation: a meta-framework for formulation and governance. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020). http://hdl.handle.net/10125/64389

Roberts H et al. (2019) The Chinese approach to artificial intelligence: an analysis of policy and regulation. SSRN Electron J. https://www.ssrn.com/abstract=3469784

Rosen R (2005) Life itself: a comprehensive inquiry into the nature, origin, and fabrication of life. Columbia University Press, New York

Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. http://arxiv.org/abs/1811.10154

Russell SJ (2010) Artificial intelligence : a modern approach. Prentice Hall, Upper Saddle River, NJ

Saltelli A et al. (2008) Global sensitivity analysis: the primer. Wiley, Hoboken, NJ

Saltelli A (2019) A short comment on statistical versus mathematical modelling. Nat Commun 10:3870

Saltelli A (2020) Ethics of quantification or quantification of ethics? Futures 116:102509

Saltelli A, Funtowicz S (2014) When all models are wrong. Issues Sci Technol 30:79–85

Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3:210–229

Article   MathSciNet   Google Scholar  

Sareen S, Saltelli A, Rommetveit K (2020) Ethics of quantification: illumination, obfuscation and performative legitimation. Palgrave Commun 6:1–5

Sears (2018) The role of artificial intelligence in the classroom. https://elearningindustry.com/artificial-intelligence-in-the-classroom-role

Sennaar K (2019) AI in agriculture-present applications and impact. https://emerj.com/ai-sector-overviews/ai-agriculture-present-applications-impact/

Van Der Sluijs JP et al. (2005) Combining quantitative and qualitative measures of uncertainty in model-based environmental assessment: The NUSAP system. Risk Anal 25:481–492

Smith A (2018) Franken-algorithms: the deadly consequences of unpredictable code. The Guardian. https://www.theguardian.com/technology/2018/aug/29/coding-algorithms-frankenalgos-program-danger

Sonnenburg S et al. (2007) The need for open source software in machine learning. J Mach Learn Res 8:2443–2466

Supiot A (2017) Governance by numbers: the making of a legal model of allegiance. Hart Publishing, Oxford; Portland, Oregon

Taleb NN (2007) The Black Swan: the impact of the highly improbable. Random House Publishing Group, New York, NY

Thimbleby H (2003) Explaining code for publication. Softw: Pract Experience 33:975–1001

Wallach W, Allen C (2008) Moral machines: teaching robots right from wrong. Oxford University Press, Oxford, USA

Watson D, Floridi L (2019) The explanation game: A formal framework for interpretable machine learning. https://papers.ssrn.com/abstract=3509737

Wiener N (1988) The human use of human beings: cybernetics and society. Da Capo Press, New York, N.Y, new edition

Wong YH et al. (2020). Deterrence in the age of thinking machines: product page. RAND Corporation. https://www.rand.org/pubs/research_reports/RR2797.html

Ye H et al. (2018) Machine learning for vehicular networks: recent advances and application examples. IEEE Vehicular Technol Mag 13:94–101

Yu H et al. (2018) Building ethics into artificial intelligence. http://arxiv.org/abs/1812.02953

Yurtsever E, Capito L, Redmill K, Ozguner U (2020) Integrating deep reinforcement learning with model-based path planners for automated driving. http://arxiv.org/abs/2002.00434

Download references

Acknowledgements

I would like to thank Kjetil Rommetveit, Andrea Saltelli and Siddarth Sareen for the organisation of the Workshop Ethics of Quantification , and the Centre for the Study of Sciences and the Humanities of the University of Bergen for the travel grant, at which a previous version of this paper was presented. Thomas Hodgson, Jill Walter Rettberg, Elizabeth Chatterjee, Ragnar Fjelland and Marta Kuc-Czarnecka for their useful comments in this venue. Finally, Stefn Thor Smith and Andrea Saltelli for their suggestions and constructive criticism on a draft version of the present manuscript.

Author information

Authors and affiliations.

School of the Built Environment, University of Reading, Reading, UK

Samuele Lo Piano

Open Evidence, Universitat Oberta de Catalunya, Barcelona, Catalonia, Spain

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Samuele Lo Piano .

Ethics declarations

Competing interests.

The author declares no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Lo Piano, S. Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward. Humanit Soc Sci Commun 7 , 9 (2020). https://doi.org/10.1057/s41599-020-0501-9

Download citation

Received : 29 January 2020

Accepted : 12 May 2020

Published : 17 June 2020

DOI : https://doi.org/10.1057/s41599-020-0501-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Reframing data ethics in research methods education: a pathway to critical data literacy.

  • Javiera Atenas
  • Leo Havemann
  • Cristian Timmermann

International Journal of Educational Technology in Higher Education (2023)

AI ethics: from principles to practice

  • Jianlong Zhou

AI & SOCIETY (2023)

The Challenge of Quantification: An Interdisciplinary Reading

  • Monica Di Fiore
  • Marta Kuc-Czarnecka
  • Andrea Saltelli

Minerva (2023)

Developing persuasive systems for marketing: the interplay of persuasion techniques, customer traits and persuasive message design

  • Annye Braca
  • Pierpaolo Dondio

Italian Journal of Marketing (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

machine learning algorithms research paper

Research on machine learning algorithms and feature extraction for time series

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Healthcare (Basel)

Logo of healthcare

Machine-Learning-Based Disease Diagnosis: A Comprehensive Review

Md manjurul ahsan.

1 School of Industrial and Systems Engineering, University of Oklahoma, Norman, OK 73019, USA

Shahana Akter Luna

2 Medicine & Surgery, Dhaka Medical College & Hospital, Dhaka 1000, Bangladesh; [email protected]

Zahed Siddique

3 Department of Aerospace and Mechanical Engineering, University of Oklahoma, Norman, OK 73019, USA; ude.uo@euqiddisz

Globally, there is a substantial unmet need to diagnose various diseases effectively. The complexity of the different disease mechanisms and underlying symptoms of the patient population presents massive challenges in developing the early diagnosis tool and effective treatment. Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Based on relevant research, this review explains how machine learning (ML) is being used to help in the early identification of numerous diseases. Initially, a bibliometric analysis of the publication is carried out using data from the Scopus and Web of Science (WOS) databases. The bibliometric study of 1216 publications was undertaken to determine the most prolific authors, nations, organizations, and most cited articles. The review then summarizes the most recent trends and approaches in machine-learning-based disease diagnosis (MLBDD), considering the following factors: algorithm, disease types, data type, application, and evaluation metrics. Finally, in this paper, we highlight key results and provides insight into future trends and opportunities in the MLBDD area.

1. Introduction

In medical domains, artificial intelligence (AI) primarily focuses on developing the algorithms and techniques to determine whether a system’s behavior is correct in disease diagnosis. Medical diagnosis identifies the disease or conditions that explain a person’s symptoms and signs. Typically, diagnostic information is gathered from the patient’s history and physical examination [ 1 ]. It is frequently difficult due to the fact that many indications and symptoms are ambiguous and can only be diagnosed by trained health experts. Therefore, countries that lack enough health professionals for their populations, such as developing countries like Bangladesh and India, face difficulty providing proper diagnostic procedures for their maximum population of patients [ 2 ]. Moreover, diagnosis procedures often require medical tests, which low-income people often find expensive and difficult to afford.

As humans are prone to error, it is not surprising that a patient may have overdiagnosis occur more often. If overdiagnosis, problems such as unnecessary treatment will arise, impacting individuals’ health and economy [ 3 ]. According to the National Academics of Science, Engineering, and Medicine report of 2015, the majority of people will encounter at least one diagnostic mistake during their lifespan [ 4 ]. Various factors may influence the misdiagnosis, which includes:

  • lack of proper symptoms, which often unnoticeable
  • the condition of rare disease
  • the disease is omitted mistakenly from the consideration

Machine learning (ML) is used practically everywhere, from cutting-edge technology (such as mobile phones, computers, and robotics) to health care (i.e., disease diagnosis, safety). ML is gaining popularity in various fields, including disease diagnosis in health care. Many researchers and practitioners illustrate the promise of machine-learning-based disease diagnosis (MLBDD), which is inexpensive and time-efficient [ 5 ]. Traditional diagnosis processes are costly, time-consuming, and often require human intervention. While the individual’s ability restricts traditional diagnosis techniques, ML-based systems have no such limitations, and machines do not get exhausted as humans do. As a result, a method to diagnose disease with outnumbered patients’ unexpected presence in health care may be developed. To create MLBDD systems, health care data such as images (i.e., X-ray, MRI) and tabular data (i.e., patients’ conditions, age, and gender) are employed [ 6 ].

Machine learning (ML) is a subset of AI that uses data as an input resource [ 7 ]. The use of predetermined mathematical functions yields a result (classification or regression) that is frequently difficult for humans to accomplish. For example, using ML, locating malignant cells in a microscopic image is frequently simpler, which is typically challenging to conduct just by looking at the images. Furthermore, since advances in deep learning (a form of machine learning), the most current study shows MLBDD accuracy of above 90% [ 5 ]. Alzheimer’s disease, heart failure, breast cancer, and pneumonia are just a few of the diseases that may be identified with ML. The emergence of machine learning (ML) algorithms in disease diagnosis domains illustrates the technology’s utility in medical fields.

Recent breakthroughs in ML difficulties, such as imbalanced data, ML interpretation, and ML ethics in medical domains, are only a few of the many challenging fields to handle in a nutshell [ 8 ]. In this paper, we provide a review that highlights the novel uses of ML and DL in disease diagnosis and gives an overview of development in this field in order to shed some light on this current trend, approaches, and issues connected with ML in disease diagnosis. We begin by outlining several methods to machine learning and deep learning techniques and particular architecture for detecting and categorizing various forms of disease diagnosis.

The purpose of this review is to provide insights to recent and future researchers and practitioners regarding machine-learning-based disease diagnosis (MLBDD) that will aid and enable them to choose the most appropriate and superior machine learning/deep learning methods, thereby increasing the likelihood of rapid and reliable disease detection and classification in diagnosis. Additionally, the review aims to identify potential studies related to the MLBDD. In general, the scope of this study is to provide the proper explanation for the following questions:

  • 1. What are some of the diseases that researchers and practitioners are particularly interested in when evaluating data-driven machine learning approaches?
  • 2. Which MLBDD datasets are the most widely used?
  • 3. Which machine learning and deep learning approaches are presently used in health care to classify various forms of disease?
  • 4. Which architecture of convolutional neural networks (CNNs) is widely employed in disease diagnosis?
  • 5. How is the model’s performance evaluated? Is that sufficient?

In this paper, we summarize the different machine learning (ML) and deep learning (DL) methods utilized in various disease diagnosis applications. The remainder of the paper is structured as follows. In Section 2 , we discuss the background and overview of ML and DL, whereas in Section 3 , we detail the article selection technique. Section 4 includes bibliometric analysis. In Section 5 , we discuss the use of machine learning in various disease diagnoses, and in Section 6 , we identify the most frequently utilized ML methods and datatypes based on the linked research. In Section 7 , we discuss the findings, anticipated trends, and problems. Finally, Section 9 concludes the article with a general conclusion.

2. Basics and Background

Machine learning (ML) is an approach that analyzes data samples to create main conclusions using mathematical and statistical approaches, allowing machines to learn without programming. Arthur Samuel presented machine learning in games and pattern recognition algorithms to learn from experience in 1959, which was the first time the important advancement was recognized. The core principle of ML is to learn from data in order to forecast or make decisions depending on the assigned task [ 9 ]. Thanks to machine learning (ML) technology, many time-consuming jobs may now be completed swiftly and with minimal effort. With the exponential expansion of computer power and data capacity, it is becoming simpler to train data-driven ML models to predict outcomes with near-perfect accuracy. Several papers offer various sorts of ML approaches [ 10 , 11 ].

The ML algorithms are generally classified into three categories such as supervised, unsupervised, and semisupervised [ 10 ]. However, ML algorithms can be divided into several subgroups based on different learning approaches, as shown in Figure 1 . Some of the popular ML algorithms include linear regression, logistic regression, support vector machines (SVM), random forest (RF), and naïve Bayes (NB) [ 10 ].

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g001.jpg

Different types of machine learning algorithms.

2.1. Machine Learning Algorithms

This section provides a comprehensive review of the most frequently used machine learning algorithms in disease diagnosis.

2.1.1. Decision Tree

The decision tree (DT) algorithm follows divide-and-conquer rules. In DT models, the attribute may take on various values known as classification trees; leaves indicate distinct classes, whereas branches reflect the combination of characteristics that result in those class labels. On the other hand, DT can take continuous variables called regression trees. C4.5 and EC4.5 are the two famous and most widely used DT algorithms [ 12 ]. DT is used extensively by following reference literature: [ 13 , 14 , 15 , 16 ].

2.1.2. Support Vector Machine

For classification and regression-related challenges, support vector machine (SVM) is a popular ML approach. SVM was introduced by Vapnik in the late twentieth century [ 17 ]. Apart from disease diagnosis, SVMs have been extensively employed in various other disciplines, including facial expression recognition, protein fold, distant homology discovery, speech recognition, and text classification. For unlabeled data, supervised ML algorithms are unable to perform. Using a hyperplane to find the clustering among the data, SVM can categorize unlabeled data. However, SVM output is not nonlinearly separable. To overcome such problems, selecting appropriate kernel and parameters is two key factors when applying SVM in data analysis [ 11 ].

2.1.3. K -Nearest Neighbor

K -nearest neighbor (KNN) classification is a nonparametric classification technique invented in 1951 by Evelyn Fix and Joseph Hodges. KNN is suitable for classification as well as regression analysis. The outcome of KNN classification is class membership. Voting mechanisms are used to classify the item. Euclidean distance techniques are utilized to determine the distance between two data samples. The projected value in regression analysis is the average of the values of the KNN [ 18 ].

2.1.4. Naïve Bayes

The naïve Bayes (NB) classifier is a Bayesian-based probabilistic classifier. Based on a given record or data point, it forecasts membership probability for each class. The most probable class is the one having the greatest probability. Instead of predictions, the NB classifier is used to project likelihood [ 11 ].

2.1.5. Logistic Regression

Logistic regression (LR) is an ML approach that is used to solve classification issues. The LR model has a probabilistic framework, with projected values ranging from 0 to 1. Examples of LR-based ML include spam email identification, online fraud transaction detection, and malignant tumor detection. The cost function, often known as the sigmoid function, is used by LR. The sigmoid function transforms every real number between 0 and 1 [ 19 ].

2.1.6. AdaBoost

Yoav Freund and Robert Schapire developed Adaptive Boosting, popularly known as AdaBoost. AdaBoost is a classifier that combines multiple weak classifiers into a single classifier. AdaBoost works by giving greater weight to samples that are harder to classify and less weight to those that are already well categorized. It may be used for categorization as well as regression analysis [ 20 ].

2.2. Deep Learning Overview

Deep learning (DL) is a subfield of machine learning (ML) that employs multiple layers to extract both higher and lower-level information from input (i.e., images, numerical value, categorical values). The majority of contemporary DL models are built on artificial neural networks (ANN), notably convolutional neural networks (CNN), which may be integrated with other DL models, including generative models, deep belief networks, and the Boltzmann machine. Deep learning may be classified into three types: supervised, semisupervised, and unsupervised. Deep neural networks (DNN), reinforcement learning, and recurrent neural networks (RNN) are some of the most prominent DL architectures (RNN) [ 21 ].

Each level in DL learns to convert its input data to the succeeding layers while learning distinct data attributes. For example, the raw input may be a pixel matrix in image recognition applications, and the first layers may detect the image’s edges. On the other hand, the second layer will construct and encode the nose and eyes, and the third layer may recognize the face by merging all of the information gathered from the previous two layers [ 6 ].

In medical fields, DL has enormous promise. Radiology and pathology are two well-known medical fields that have widely used DL in disease diagnosis over the years [ 22 ]. Furthermore, collecting valuable information from molecular state and determining disease progression or therapy sensitivity are practical uses of DL that are frequently unidentified by human investigations [ 23 ].

Convolutional Neural Network

Convolutional neural networks (CNNs) are a subclass of artificial neural networks (ANNs) that are extensively used in image processing. CNN is widely employed in face identification, text analysis, human organ localization, and biological image detection or recognition [ 24 ]. Since the initial development of CNN in 1989, a different type of CNN has been proposed that has performed exceptionally well in disease diagnosis over the last three decades. A CNN architecture comprises three parts: input layer, hidden layer, and output layer. The intermediate levels of any feedforward network are known as hidden layers, and the number of hidden layers varies depending on the type of architecture. Convolutions are performed in hidden layers, which contain dot products of the convolution kernel with the input matrix. Each convolutional layer provides feature maps used as input by the subsequent layers. Following the concealed layer are more layers, such as pooling and fully connected layers [ 21 ]. Several CNN models have been proposed throughout the years, and the most extensively used and popular CNN models are shown in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g002.jpg

Some of the most well-known CNN models, along with their development time frames.

In general, it may be considered that ML and DL have grown substantially throughout the years. The increased computational capability of computers and the enormous number of data available inspire academics and practitioners to employ ML/DL more efficiently. A schematic overview of machine learning and deep learning algorithms and their development chronology is shown in Figure 3 , which may be a helpful resource for future researchers and practitioner.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g003.jpg

Illustration of machine learning and deep learning algorithms development timeline.

2.3. Performance Evaluations

This section describes the performance measures used in reference literature. Performance indicators, including accuracy, precision, recall, and F1 score, are widely employed in disease diagnosis. For example, lung cancer can be categorized as true positive ( T P ) or true-negative ( T N ) if individuals are diagnosed correctly, while it can be categorized into false positive ( F P ) or false negative ( F N ) if misdiagnosed. The most widely used metrics are described below [ 10 ].

Accuracy (Acc) : The accuracy denotes total correctly identifying instances among all of the instances. Accuracy can be calculated using following formulas:

Precision ( P n ): Precision is measured as the proportion of precisely predicted to all expected positive observations.

Recall ( R c ): The proportion of overall relevant results that the algorithm properly recognizes is referred to as recall.

Sensitivity ( S n ) : Sensitivity denotes only true positive measure considering total instances and can be measured as follows:

Specificity ( S p ): It identifies how many true negatives are appropriately identified and calculated as follows:

F-measure: The F1 score is the mean of accuracy and recall in a harmonic manner. The highest F score is 1, indicating perfect precision and recall score.

Area under curve (AUC): The area under the curve represents the models’ behaviors in different situations. The AUC can be calculated as follows:

where l p and l n denotes positive and negative data samples and R i is the rating of the i th positive samples.

3. Article Selection

3.1. identification.

The Scopus and Web of Science (WOS) databases are utilized to find original research publications. Due to their high quality and peer review paper index, Scopus and WOS are prominent databases for article searching, as many academics and scholars utilized them for systematic review [ 25 , 26 ]. Using keywords along with Boolean operators, the title search was carried out as follows:

“disease” AND (“diagnsois” OR “Supprot vector machine” OR “SVM” OR “KNN” OR “K-nearest neighbor” OR “logistic regression” OR “K-means clustering” OR “random forest” OR “RF” OR “adaboost” OR “XGBoost”, “decision tree” OR “neural network” OR “NN” OR “artificial neural network” OR “ANN” OR “convolutional neural network” OR “CNN” OR “deep neural network” OR “DNN” OR “machine learning" or “adversarial network” or “GAN”).

The initial search yielded 16,209 and 2129 items, respectively, from Scopus and Web of Science (WOS).

3.2. Screening

Once the search period was narrowed to 2012–2021 and only peer-reviewed English papers were evaluated, the total number of articles decreased to 9117 for Scopus and 1803 for WOS, respectively.

3.3. Eligibility and Inclusion

These publications were chosen for further examination if they are open access and are journal articles. There were 1216 full-text articles (724 from the Scopus database and 492 from WOS). Bibliographic analysis was performed on all 1216 publications. One investigator (Z.S.) imported the 1216 article information as excel CSV data for future analysis. Excel duplication functions were used to identify and eliminate duplicates. Two independent reviewers (M.A. and Z.S.) examined the titles and abstracts of 1192 publications. Disagreements were settled through conversation. We omitted studies that were not relevant to machine learning but were relevant to disease diagnosis or vice versa.

After screening the titles and abstracts, the complete text of 102 papers was examined, and all 102 articles satisfied all inclusion requirements. Factors that contributed to the article’s exclusion from the full-text screening includes:

  • 1. Inaccessibility of the entire text
  • 2. Nonhuman studies, book chapters, reviews
  • 3. Incomplete information related to test result

Figure 4 shows the flow diagram of the systematic article selection procedure used in this study.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g004.jpg

MLBDD article selection procedure used in this study.

4. Bibliometric Analysis

The bibliometric study in this section was carried out using reference literature gathered from the Scopus and WOS databases. The bibliometric study examines publications in terms of the subject area, co-occurrence network, year of publication, journal, citations, countries, and authors.

4.1. Subject Area

Many research disciplines have uncovered machine learning-based disease diagnostics throughout the years. Figure 5 depicts a schematic representation of machine learning-based disease detection spread across several research fields. According to the graph, computer science (40%) and engineering (31.2%) are two dominating fields that vigorously concentrated on MLBDD.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g005.jpg

Distribution of articles by subject area.

4.2. Co-Occurrence Network

Co-occurrence of keywords provides an overview of how the keywords are interconnected or used by the researchers. Figure 6 displays the co-occurrence network of the article’s keywords and their connection, developed by VOSviewer software. The figure shows that some of the significant clusters include neural networks (NN), decision trees (DT), machine learning (ML), and logistic regression (LR). Each cluster is also connected with other keywords that fall under that category. For instance, the NN cluster contains support vector machine (SVM), Parkinson’s disease, and classification.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g006.jpg

Bibliometric map representing co-occurrence analysis of keywords in network visualization.

4.3. Publication by Year

The exponential growth of journal publications is observed from 2017. Figure 7 displays the number of publications between 2012 to 2021 based on the Scopus and WOS data. Note that while the image may not accurately depict the MLBDD’s real contribution, it does illustrate the influence of MLBDD over time.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g007.jpg

Publications of machine-learning-based disease diagnosis (MLBDD) by year.

4.4. Publication by Journal

We investigated the most prolific journals in MLBDD domains based on our referred literature.The top ten journals and the number of articles published in the last ten years are depicted in Figure 8 . IEEE Access and Scientific Reports are the most productive journals that published 171 and 133 MLBDD articles, respectively.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g008.jpg

Publications by journals.

4.5. Publication by Citations

Citations are one of the primary indicators of an article’s effect. Here, we have identified the top ten cited articles using the R Studio tool. Table 1 summarizes the top articles that achieved the highest citation during the year between 2012 to 2021. Note that Google Scholar and other online databases may have various indexing procedures and times; therefore, the citations in this manuscript may differ from the number of citations shown in this study. The table shows that published articles by [ 27 ] earned the most citations (257), with 51.4 citations per year, followed by Gray [ 28 ]’s article, which obtained 218 citations. It is assumed that all the authors included in Table 1 are among those prominent authors that contributed to MLBDD.

Top ten cited papers published in MLBDD in between 2012–2021 based on Scopus and WOS database.

4.6. Publication by Countries

Figure 9 displayed that China published the most publications in MLBDD, total 259 articles. USA and India are placed 2nd and 3rd, respectively, as they published 139 and 103 papers related to MLBDD. Interestingly, four out of the top ten productive countries are from Asia: China, India, Korea, and Japan.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g009.jpg

Top ten countries that contributed to MLBDD literature.

4.7. Publication by Author

According to Table 2 , author Kim J published the most publications (20 out of 1216). Wang Y and Li J Ranked 2nd and 3rd by publishing 19 and 18 articles, respectively. As shown in Table 2 , the number of papers produced by the top 10 authors ranges between 15–20.

Top ten authors based on total number of publications.

5. Machine Learning Techniques for Different Disease Diagnosis

Many academics and practitioners have used machine learning (ML) approaches in disease diagnosis. This section describes many types of machine-learning-based disease diagnosis (MLBDD) that have received much attention because of their importance and severity. For example, due to the global relevance of COVID-19, several studies concentrated on COVID-19 disease detection using ML from 2020 to the present, which also received greater priority in our study. Severe diseases such as heart disease, kidney disease, breast cancer, diabetes, Parkinson’s, Alzheimer’s, and COVID-19 are discussed briefly, while other diseases are covered briefly under the “other disease”.

5.1. Heart Disease

Most researchers and practitioners use machine learning (ML) approaches to identify cardiac disease [ 37 , 38 ]. Ansari et al. (2011), for example, offered an automated coronary heart disease diagnosis system based on neurofuzzy integrated systems that yield around 89% accuracy [ 37 ]. One of the study’s significant weaknesses is the lack of a clear explanation for how their proposed technique would work in various scenarios such as multiclass classification, big data analysis, and unbalanced class distribution. Furthermore, there is no explanation about the credibility of the model’s accuracy, which has lately been highly encouraged in medical domains, particularly to assist users who are not from the medical domains in understanding the approach.

Rubin et al. (2017) uses deep-convolutional-neural-network-based approaches to detect irregular cardiac sounds. The authors of this study adjusted the loss function to improve the training dataset’s sensitivity and specificity. Their suggested model was tested in the 2016 PhysioNet computing competition. They finished second in the competition, with a final prediction of 0.95 specificity and 0.73 sensitivity [ 39 ].

Aside from that, deep-learning (DL)-based algorithms have lately received attention in detecting cardiac disease. Miao and Miao et al. (2018), for example, offered a DL-based technique to diagnosing cardiotocographic fetal health based on a multiclass morphologic pattern. The created model is used to differentiate and categorize the morphologic pattern of individuals suffering from pregnancy complications. Their preliminary computational findings include accuracy of 88.02%, a precision of 85.01%, and an F-score of 0.85 [ 40 ]. During that study, they employed multiple dropout strategies to address overfitting problems, which finally increased training time, which they acknowledged as a tradeoff for higher accuracy.

Although ML applications have been widely employed in heart disease diagnosis, no research has been conducted that addressed the issues associated with unbalanced data with multiclass classification. Furthermore, the model’s explainability during final prediction is lacking in most cases. Table 3 summarizes some of the cited publications that employed ML and DL approaches in the diagnosis of cardiac disease. However, further information about machine-learning-based cardiac disease diagnosis can be found in [ 5 ].

Referenced literature that considered machine-learning-based heart disease diagnosis.

5.2. Kidney Disease

Kidney disease, often known as renal disease, refers to nephropathy or kidney damage. Patients with kidney disease have decreased kidney functional activity, which can lead to kidney failure if not treated promptly. According to the National Kidney Foundation, 10% of the world’s population has chronic kidney disease (CKD), and millions die each year due to insufficient treatment. The recent advancement of ML- and DL-based kidney disease diagnosis may provide a possibility for those countries that are unable to handle the kidney disease diagnostic-related tests [ 49 ]. For instance, Charleonnan et al. (2016) used publicly available datasets to evaluate four different ML algorithms: K -nearest neighbors (KNN), support vector machine (SVM), logistic regression (LR), and decision tree classifiers and received the accuracy of 98.1%, 98.3%, 96.55%, and 94.8%, respectively [ 50 ]. Aljaaf et al. (2018) conducted a similar study. The authors tested different ML algorithms, including RPART, SVM, LOGR, and MLP, using a comparable dataset, CKD, as used by [ 50 ], and found that MLP performed best (98.1 percent) in identifying chronic kidney disease [ 51 ]. To identify chronic kidney disease, Ma et al. (2020) utilizes a collection of datasets containing data from many sources [ 52 ]. Their suggested heterogeneous modified artificial neural network (HMANN) model obtained an accuracy of 87–99%.

Table 4 summarizes some of the cited publications that employed ML and DL approaches to diagnose kidney disease.

Referenced literature that considered machine-learning-based kidney disease diagnosis.

5.3. Breast Cancer

Many scholars in the medical field have proposed machine-learning (ML)-based breast cancer analysis as a potential solution to early-stage diagnosis. Miranda and Felipe (2015), for example, proposed fuzzy-logic-based computer-aided diagnosis systems for breast cancer categorization. The advantage of fuzzy logic over other classic ML techniques is that it can minimize computational complexity while simulating the expert radiologist’s reasoning and style. If the user inputs parameters such as contour, form, and density, the algorithm offers a cancer categorization based on their preferred method [ 57 ]. Miranda and Felipe (2015)’s proposed model had an accuracy of roughly 83.34%. The authors employed an approximately equal ratio of images for the experiment, which resulted in improved accuracy and unbiased performance. However, as the study did not examine the interpretation of their results in an explainable manner, it may be difficult to conclude that accuracy, in general, indicates true accuracy for both benign and malignant classifications. Furthermore, no confusion matrix is presented to demonstrate the models’ actual prediction for the each class.

Zheng et al. (2014) presented hybrid strategies for diagnosing breast cancer disease utilizing k -means clustering (KMC) and SVM. Their proposed model considerably decreased the dimensional difficulties and attained an accuracy of 97.38% using Wisconsin Diagnostic Breast Cancer (WDBC) dataset [ 58 ]. The dataset is normally distributed and has 32 features divided into 10 categories. It is difficult to conclude that their suggested model will outperform in a dataset with an unequal class ratio, which may contain missing value as well.

To determine the best ML models, Asri et al. (2016) applied various ML approaches such as SVM, DT (C4.5), NB, and KNN on the Wisconsin Breast Cancer (WBC) datasets. According to their findings, SVM outperformed all other ML algorithms, obtaining an accuracy of 97.13% [ 59 ]. However, if a same experiment is repeated in a different database, the results may differ. Furthermore, experimental results accompanied by ground truth values may provide a more precise estimate in determining which ML model is the best or not.

Mohammed et al. (2020) conducted a nearly identical study. The authors employ three ML algorithms to find the best ML methods: DT (J48), NB, and sequential minimal optimization (SMO), and the experiment was conducted on two popular datasets: WBC and breast cancer datasets. One of the interesting aspects of this research is that they focused on data imbalance issues and minimized the imbalance problem through the use of resampling data labeling procedures. Their findings showed that the SMO algorithms exceeded the other two classifiers, attaining more than 95% accuracy on both datasets [ 60 ]. However, in order to reduce the imbalance ratio, they used resampling procedures numerous times, potentially lowering the possibility of data diversity. As a result, the performance of those three ML methods may suffer on a dataset that is not normally distributed or imbalanced.

Assegie (2021) used the grid search approach to identify the best k -nearest neighbor (KNN) settings. Their investigation showed that parameter adjustment had a considerable impact on the model’s performance. They demonstrated that by fine-tuning the settings, it is feasible to get 94.35% accuracy, whereas the default KNN achieved around 90% accuracy [ 61 ].

To detect breast cancer, Bhattacherjee et al. (2020) employed a backpropagation neural network (BNN). The experiment was carried out in the WBC dataset with nine features, and they achieved 99.27% accuracy [ 62 ]. Alshayeji et al. (2021) used the WBCD and WDBI datasets to develop a shallow ANN model for classifying breast cancer tumors. The authors demonstrated that the suggested model could classify tumors up to 99.85% properly without selecting characteristics or tweaking the algorithms [ 63 ].

Sultana et al. (2021) detect breast cancer using a different ANN architecture on the WBC dataset. They employed a variety of NN architectures, including the multilayer perceptron (MLP) neural network, the Jordan/Elman NN, the modular neural network (MNN), the generalized feedforward neural network (GFFNN), the self-organizing feature map (SOFM), the SVM neural network, the probabilistic neural network (PNN), and the recurrent neural network (RNN). Their final computational result demonstrates that the PNN with 98.24% accuracy outperforms the other NN models utilized in that study [ 64 ]. However, this study lacks the interpretability as of many other investigations because it does not indicate which features are most important during the prediction phase.

Deep learning (DL) was also used by Ghosh et al. (2021). The WBC dataset was used by the authors to train seven deep learning (DL) models: ANN, CNN, GRU, LSTM, MLP, PNN, and RNN. Long short-term memory (LSTM) and gated recurrent unit (GRU) demonstrated the best performance among all DL models, achieving an accuracy of roughly 99% [ 65 ]. Table 5 summarizes some of the referenced literature that used ML and DL techniques in breast cancer diagnosis.

Referenced literature that considered machine-learning-based breast cancer disease diagnosis.

5.4. Diabetes

According to the International Diabetes Federation (IDF), there are currently over 382 million individuals worldwide who have diabetes, with that number anticipated to increase to 629 million by 2045 [ 71 ]. Numerous studies widely presented ML-based systems for diabetes patient detection. For example, Kandhasamy and Balamurali (2015) compared ML classifiers (J48 DT, KNN, RF, and SVM) for classifying patients with diabetes mellitus. The experiment was conducted on the UCI Diabetes dataset, and the KNN (K = 1) and RF classifiers obtained near-perfect accuracy [ 72 ]. However, one disadvantage of this work is that it used a simplified Diabetes dataset with only eight binary-classified parameters. As a result, getting 100% accuracy with a less difficult dataset is unsurprising. Furthermore, there is no discussion of how the algorithms influence the final prediction or how the result should be viewed from a nontechnical position in the experiment.

Yahyaoui et al. (2019) presented a Clinical Decision Support Systems (CDSS) to aid physicians or practitioners with Diabetes diagnosis. To reach this goal, the study utilized a variety of ML techniques, including SVM, RF, and deep convolutional neural network (CNN). RF outperformed all other algorithms in their computations, obtaining an accuracy of 83.67%, while DL and SVM scored 76.81% and 65.38% accuracy, respectively [ 73 ].

Naz and Ahuja (2020) employed a variety of ML techniques, including artificial neural networks (ANN), NB, DT, and DL, to analyze open-source PIMA Diabetes datasets. Their study indicates that DL is the most accurate method for detecting the development of diabetes, with an accuracy of approximately 98.07% [ 71 ]. The PIMA dataset is one of the most thoroughly investigated and primary datasets, making it easy to perform conventional and sophisticated ML-based algorithms. As a result, gaining greater accuracy with the PIMA Indian dataset is not surprising. Furthermore, the paper makes no mention of interpretability issues and how the model would perform with an unbalanced dataset or one with a significant number of missing variables. As is widely recognized in healthcare, several types of data can be created that are not always labeled, categorized, and preprocessed in the same way as the PIMA Indian dataset. As a result, it is critical to examine the algorithms’ fairness, unbiasedness, dependability, and interpretability while developing a CDSS, especially when a considerable amount of information is missing in a multiclass classification dataset.

Ashiquzzaman et al. (2017) developed a deep learning strategy to address the issue of overfitting in diabetes datasets. The experiment was carried out on the PIMA Indian dataset and yielded an accuracy of 88.41%. The authors claimed that performance improved significantly when dropout techniques were utilized and the overfitting problems were reduced [ 74 ]. Overuse of the dropout approach, on the other hand, lengthens overall training duration. As a result, as they did not address these concerns in their study, assessing whether their proposed model is optimum in terms of computational time is difficult.

Alhassan et al. (2018) introduced the King Abdullah International Research Center for Diabetes (KAIMRCD) dataset, which includes data from 14k people and is the world’s largest diabetic dataset. During that experiment, the author presented a CDSS architecture based on LSTM and GRU-based deep neural networks, which obtained up to 97% accuracy [ 75 ]. Table 6 highlights some of the relevant publications that employed ML and DL approaches in the diagnosis of diabetic disease.

Referenced literature that considered machine-learning-based diabetic disease diagnosis.

5.5. Parkinson’s Disease

Parkinson’s disease is one of the conditions that has received a great amount of attention in the ML literature. It is a slow-progressing chronic neurological disorder. When dopamine-producing neurons in certain parts of the brain are harmed or die, people have difficulty speaking, writing, walking, and doing other core activities [ 80 ]. There are several ML-based approaches have been proposed. For instance, Sriram et al. (2013) used KNN, SVM, NB, and RF algorithms to develop intelligent Parkinson’s disease diagnosis systems. Their computational result shows that, among all other algorithms, RF shows the best performance (90.26% accuracy), and NB demonstrate the worst performance (69.23% accuracy) [ 81 ].

Esmaeilzadeh et al. (2018) proposed a deep CNN-based model to diagnose Parkinson’s disease and achieved almost 100% accuracy on train and test set [ 82 ]. However, there was no mention of any overfitting difficulties in the trial. Furthermore, the experimental results do not provide a good interpretation of the final classification and regression, which is now widely expected, particularly in CDSS. Grover et al. (2018) also used DL-based approaches on UCI’s Parkinson’s telemonitoring voice dataset. Their experiment using DNN has achieved around 81.67% accuracy in diagnosing patients with Parkinson’s disease symptoms [ 80 ].

Warjurkar and Ridhorkar (2021) conducted a thorough study on the performance of the ML-based approach in decision support systems that can detect both brain tumors and diagnose Parkinson’s patients. Based on their findings, it was obvious that, when compared to other algorithms, boosted logistic regression surpassed all other models, attaining 97.15% accuracy in identifying Parkinson’s disease patients. In tumor segmentation, however, the Markov random technique performed best, obtaining an accuracy of 97.4% [ 83 ]. Parkinson’s disease diagnosis using ML and DL approaches is summarized in Table 7 , which includes a number of references to the relevant research.

Referenced literature that considered machine-learning-based Parkinson’s disease diagnosis.

5.6. COVID-19

The new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, pandemic has become humanity’s greatest challenge in contemporary history. Despite the fact that a vaccine had been advanced in distribution because to the global emergency, it was unavailable to the majority of people for the duration of the crisis [ 88 ]. Because of the new COVID-19 Omicron strain’s high transmission rates and vaccine-related resistance, there is an extra layer of concern. The gold standard for diagnosing COVID-19 infection is now Real-Time Reverse Transcription-Polymerase Chain Reaction (RT-PCR) [ 89 , 90 ]. Throughout the epidemic, the researcher advocated other technologies including as chest X-rays and Computed Tomography (CT) combined with Machine Learning and Artificial Intelligence to aid in the early detection of people who might be infected. For example, Chen et al. (2020) proposed a UNet++ model employing CT images from 51 COVID-19 and 82 non-COVID-19 patients and achieved an accuracy of 98.5% [ 91 ]. Ardakani et al. (2020) used a small dataset of 108 COVID-19 and 86 non-COVID-19 patients to evaluate ten different DL models and achieved a 99% overall accuracy [ 92 ]. Wang et al. (2020) built an inception-based model with a large dataset, containing 453 CT scan images, and achieved 73.1% accuracy. However, the model’s network activity and region of interest were poorly explained [ 93 ]. Li et al. (2020) suggested the COVNet model and obtain 96% accuracy utilizing a large dataset of 4356 chest CT images of Pneumonia patients, 1296 of which were verified COVID-19 cases [ 94 ].

Several studies investigated and advised screening COVID-19 patients utilizing chest X-ray images in parallel, with major contributions in [ 95 , 96 , 97 ]. For example, Hemdan et al. (2020) used a small dataset of only 50 images to identify COVID-19 patients from chest X-ray images with an accuracy of 90% and 95%, respectively, using VGG19 and ResNet50 models [ 95 ]. Using a dataset of 100 chest X-ray images, Narin et al. (2021) distinguished COVID-19 patients from those with Pneumonia with 86% accuracy [ 97 ].

In addition, in order to develop more robust and better screening systems, other studies considered larger datasets. For example, Brunese et al. (2020) employed 6505 images with a data ratio of 1:1.17, with 3003 images classified as COVID-19 symptoms and 3520 as “other patients” for the objectives of that study [ 98 ]. With a dataset of 5941 images, Ghoshal and Tucker (2020) achieved 92.9% accuracy [ 99 ]. However, neither study looked at how their proposed models would work with data that was severely unbalanced and had mismatched class ratios. Apostolopoulos and Mpesiana (2020) employed a CNN-based Xception model on an imbalanced dataset of 284 COVID-19 and 967 non-COVID-19 patient chest X-ray images and achieved 89.6% accuracy [ 100 ].

The following Table 8 summarizes some of the relevant literature that employed ML and DL approaches to diagnose COVID-19 disease.

Referenced literature that considered machine-learning-based COVID-19 disease diagnosis.

5.7. Alzheimer’s Disease

Alzheimer is a brain illness that often begins slowly but progresses over time, and it affects 60–70% of those who are diagnosed with dementia [ 103 ]. Alzheimer’s disease symptoms include language problems, confusion, mood changes, and other behavioral disorders. Body functions gradually deteriorated, and the usual life expectancy is three to nine years after diagnosis. Early diagnosis, on the other hand, may assist to avoid and take required actions to enter into suitable treatment as soon as possible, which will also raise the possibility of life expectancy. Machine learning and deep learning have shown promising outcomes in detecting Alzheimer’s disease patients throughout the years. For instance, Neelaveni and Devasana (2020) proposed a model that can detect Alzheimer patients using SVM and DT, and achieved an accuracy of 85% and 83% respectively [ 104 ]. Collij et al. (2016) also used SVM to detect single-subject Alzheimer’s disease and mild cognitive impairment (MCI) prediction and achieved an accuracy of 82% [ 105 ].

Multiple algorithms have been adopted and tested in developing ML based Alzheimer disease diagnosis. For example, Vidushi and Shrivastava (2019) experimented using Logistic Regression (LR), SVM, DT, ensemble Random Forest (RF), and Boosting Adaboost and achieved an accuracy of 78.95%, 81.58%, 81.58%, 84.21%, and 84.21% respectively [ 106 ]. Many of the study adopted CNN based approach to detect Alzheimer patients as CNN demonstrates robust results in image processing compared to other existing algorithms. As a consequence, Ahmed et al. (2020) proposed a CNN model for earlier diagnosis and classification of Alzheimer disease. Within the dataset consists of 6628 MRI images, the proposed model achieved 99% accuracy [ 107 ]. Nawaz et al. (2020) proposed deep feature-based models and achieved an accuracy of 99.12% [ 108 ]. Additionally, Studies conducted by Haft-Javaherian et al. (2019) [ 109 ] and Aderghal et al. (2017) [ 110 ] are some of the CNN based study that also demonstrates the robustness of CNN based approach in Alzheimer disease diagnosis. ML and DL approaches employed in the diagnosis of Alzheimer’s disease are summarized in Table 9 .

Referenced literature that considered Machine Learning-based Alzheimer disease diagnosis.

5.8. Other Diseases

Beyond the disease mentioned above, ML and DL have been used to identify various other diseases. Big data and increasing computer processing power are two key reasons for this increased use. For example, Mao et al. (2020) used Decision Tree (DT) and Random Forest (RF) to disease classification based on eye movement [ 114 ]. Nosseir and Shawky (2019) evaluated KNN and SVM to develop automatic skin disease classification systems, and the best performance was observed using KNN by achieving an accuracy of 98.22% [ 115 ]. Khan et al. (2020) employed CNN-based approaches such as VGG16 and VGG19 to classify multimodal Brain tumors. The experiment was carried out using publicly available three image datasets: BraTs2015, BraTs2017, and BraTs2018, and achieved 97.8%, 96.9%, and 92.5% accuracy, respectively [ 116 ]. Amin et al. (2018) conducted a similar experiment utilizing the RF classifier for tumor segmentation. The authors achieved 98.7%, 98.7%, 98.4%, 90.2%, and 90.2% accuracy using BRATS 2012, BRATS 2013, BRATS 2014, BRATS 2015, and ISLES 2015 dataset, respectively [ 117 ].

Dai et al. (2019) proposed a CNN-based model to develop an application to detect Skin cancer. The authors used a publicly available dataset, HAM10000, to experiment and achieved 75.2% accuracy [ 118 ]. Daghrir et al. (2020) evaluated KNN, SVM, CNN, Majority Voting using ISIC (International Skin Imaging Collaboration) dataset to detect Melanoma skin cancer. The best result was found using Majority Voting (88.4% accuracy) [ 119 ]. Table 10 summarizes some of the referenced literature that used ML and DL techniques in various disease diagnosis.

Referenced literature that considered Machine Learning on various disease diagnoses.

6. Algorithm and Dataset Analysis

Most of the referenced literature considered multiple algorithms in MLBDD approaches. Here we have addressed multiple algorithms as hybrid approaches. For instance, Sun et al. (2021) used hybrid approaches to predict coronary Heart disease using Gaussian Naïve Bayes, Bernoulli Naïve Bayes, and Random Forest (RF) algorithms [ 111 ]. Bemando et al. (2021) adopted CNN and SVM to automate the diagnosis of Alzheimer’s disease and mild cognitive impairment [ 41 ]. Saxena et al. (2019) used KNN and Decision Tree (DT) in Heart disease diagnosis [ 131 ]; Elsalamony (2018) employed Neural Networks (NN) and SVM in detecting Anaemia disease in human red blood cells [ 132 ]. One of the key benefits of using the hybrid technique is that it is more accurate than using single ML models.

According to the relevant literature, the most extensively utilized induvial algorithms in developing MLBDD models are CNN, SVM, and LR. For instance, Kalaiselvi et al. (2020) proposed CNN based approach in Brain tumor diagnosis [ 123 ]; Dai et al. (2019) used CNN in developing a device inference app for Skin cancer detection [ 118 ]; Fathi et al. (2020) used SVM to classify liver diseases [ 121 ]; Sing et al. (2019) used SVM to classify the patients with Heart disease symptoms [ 43 ]; and Basheer et al. (2019) used Logistic Regression to detect Heart disease [ 133 ].

Figure 10 depicts the most commonly used Machine Learning algorithms in disease diagnosis. The bolder and larger font emphasizes the importance and frequency with which the algorithms in MLBDD are used. Based on the Figure, we can observe that Neural Networks, CNN, SVM, and Logistic Regression are the most commonly employed algorithms by MLBDD researchers.

An external file that holds a picture, illustration, etc.
Object name is healthcare-10-00541-g010.jpg

Word cloud for most frequently used ML algorithms in MLBDD publications.

Most MLBDD researchers utilize publically accessible datasets since they do not require permission and provide sufficient information to do the entire study. Manually gathering data from patients, on the other hand, is time-consuming; yet, numerous research utilized privately collected/owned data, either owing to their special necessity based on their experiment setup or to produce a result with actual data [ 46 , 55 , 56 , 68 , 70 ]. The Cleveland Heart disease dataset, PIMA dataset, and Parkinson dataset are the most often utilized datasets in disease diagnosis areas. Table 11 lists publicly available datasets and sources that may be useful to future academics and practitioners.

Most widely used disease diagnosis dataset URL along with the referenced literature (accessed on 16 December 2021).

7. Discussion

In the last 10 years, Machine Learning (ML) and Deep Learning (DL) have grown in prominence in disease diagnosis, which the annotated literature has strengthened in this study. The review began with specific research questions and attempted to answer them using the reference literature. According to the overall research, CNN is one of the most emerging algorithms, outperforming all other ML algorithms due to its solid performance with both image and tabular data [ 94 , 123 , 128 , 137 ]. Transfer learning is also gaining popularity since it does not necessitate constructing a CNN model from scratch and produces better results than typical ML methods [ 47 , 91 ]. Aside from CNN, the reference literature lists SVM, RF, and DT as some of the most common algorithms utilized widely in MLBDD. Furthermore, several researchers are emphasizing ensemble techniques in MLBDD [ 127 , 130 ]. Nonetheless, when compared to other ML algorithms, CNN is the most dominating. VGG16, VGG19, ResNet50, and UNet++ are among of the most prominent CNN architectures utilized widely in disease diagnosis.

In terms of databases, it was discovered that UCI repository data is the preferred option of academics and practitioners for constructing a Machine Learning-based Disease Diagnosis (MLBDD) model. However, while the current dataset frequently has shortcomings, several researchers have recently relied on additional data acquired from the hospital or clinic (i.e., imbalance data, missing data). To assist future researchers and practitioners interested in studying MLBDD, we have included a list of some of the most common datasets utilized in the reference literature in Table 11 , along with the link to the repository.

As previously indicated, there were several inconsistencies in terms of assessment measures published by the literature. For example, some research reported their results with accuracy [ 45 ]; others provided with accuracy, precision, recall, and F1-score [ 42 ]; while a few studies emphasized sensitivity, specificity, and true positive [ 67 ]. As a result, there were no criteria for the authors to follow in order to report their findings correctly and genuinely. Nonetheless, of all assessment criteria, accuracy is the most extensively utilized and recognized by academics.

With the emergence of COVID-19, MLBDD research switched mostly on Pneumonia and COVID-19 patient detection beginning in 2020, and COVID-19 remains a popular subject as the globe continues to battle this disease. As a result, it is projected that the application of ML and DL in the medical sphere for disease diagnosis would expand significantly in this domain in the future as well. Many questions have been raised due to the progress of ML and DL-based disease diagnosis. For example, if a doctor or other health practitioner incorrectly diagnoses a patient, he or she will be held accountable. However, if the machine does, who will be held accountable? Furthermore, fairness is an issue in ML because most ML models are skewed towards the majority class. As a result, future research should concentrate on ML ethics and fairness.

Model interpretation is absent in nearly all investigations, which is surprising. Interpreting machine learning models used to be difficult, but explainable and interpretable XAI have made it much easier. Despite the fact that the previous MLBDD lacked sufficient interpretations, it is projected that future researchers and practitioners would devote more attention to interpreting the machine learning model due to the growing demand for model interpretability.

The idea that ML alone will enough to construct an MLBDD model is a flawed one. To make the MLBDD model more dynamic, it may be anticipated that the model will need to be developed and stored on a cloud system, as the heath care industry generates a lot of data that is typically kept in cloud systems. As a result, the adversarial attack will focus on patients’ data, which is very sensitive. For future ML-based models, the data bridge and security challenges must be taken into consideration.

It is a major issue to analyze data if there is a large disparity in the data. As the ML-based diagnostic model deals with human life, every misdiagnosis is a possible danger to one’s health. However, despite the fact that many study used the imbalance dataset to perform their experiment, none of the cited literature highlights issues related to imbalance data. Thus, future work should demonstrate the validity of any ML models while developing with imbalanced data.

Within the many scopes this review paper also have some limitations which can be summarized as follows:

  • 1. The study first searched the Scopus and WOS databases for relevant papers and then examined other papers that were pertinent to this investigation. If other databases like Google Scholar and Pubmed were used, the findings might be somewhat different. As a result, our study may provide some insight into MLBDD, but there is still a great deal of information that is outside of our control.
  • 2. ML algorithms, DL algorithms, dataset, disease classifications, and evaluation metrics are highlighted in the review. Though the suggested ML process is thoroughly examined in reference literature, this paper does not go into that level of detail.
  • 3. Only those publications that adhered to a systematic literature review technique were included in the study’s paper selection process. Using a more comprehensive range of keywords, on the other hand, might lead to higher search activity. However, our SLR approach will provide researchers and practitioners with a more thorough understanding of MLBDD.

8. Research Challenges and Future Agenda

While machine learning-based applications have been used extensively in disease diagnosis, researchers and practitioners still face several challenges when deploying them as a practical application in healthcare. In this section, the key challenges associated with ML in disease diagnosis have been summarized as follows:

8.1. Data Related Challenges

  • 1. Data scarcity: Even though many patients’ data has been recorded by different hospitals and healthcare, due to the data privacy act, real-world data is not often available for global research purposes.
  • 2. Noisy data: Frequently, the clinical data contains noise or missing values; therefore, such kind of data takes a reasonable amount of time to make it trainable.
  • 3. Adversarial attack: Adversarial attack is one of the key issues in the disease dataset. Adversarial attack means the manipulation of training data, testing data, or machine learning model to result in wrong output from ML.

8.2. Disease Diagnosis-Related Challenges

  • 1. Misclassification: While the machine learning model can be used to develop as a disease diagnosis model, any misclassification for a particular disease might bring severe damage. For instance, if a patient with stomach cancer is diagnosed as a non-cancer patient, it will have a huge impact.
  • 2. Wrong image segmentation: One of the key challenges with the ML model is that the model often identifies the wrong region as an infected region. For instance, author Ahsan et al. (2020) shows that even though the accuracy is around 100% in detecting COVID-19 and non-COVID-19 patients, the pre-trained CNN models such as VGG16 and VGG19 often pay attention to the wrong region during the training process [ 2 ]. As a result, it also raises the question of the validity of the MLBDD.
  • 3. Confusion: Some of the diseases such as COVID-19, pneumonia, edema in the chest often demonstrate similar symptoms; in these particular cases, many CNN models detect all of the data samples into one class, i.e., COVID-19.

8.3. Algorithm Related Challenges

  • 1. Supervised vs. unsupervised: Most ML models (Linear regression, logistic regression) performed very well with the labeled data. However, similar algorithms’ performance was significantly reduced with the unlabeled data. On the other hand, popular algorithms that can perform well with unlabeled data such as K-means clustering, SVM, and KNNs performance also degraded with multidimensional data.
  • 2. Blackbox-related challenges: One of the most widely used ML algorithms is convolutional neural networks. However, one of the key challenges associated with this algorithm is that it is often hard to interpret how the model adjusts internal parameters such as learning rate and weights. In healthcare, implementing such an algorithm-related model needs proper explanations.

8.4. Future Directions

The challenges addressed in the above section might give some future direction to future researchers and practitioners. Here we have introduced some of the possible algorithms and applications that might overcome existing MLBDD challenges.

  • 1. GAN-based approach: Generative adversarial network is one of the most popular approaches in deep learning fields. Using this approach, it is possible to generate synthetic data which looks almost similar to the real data. Therefore, GAN might be a good option for handling data scarcity issues. Moreover, it will also reduce the dependency on real data and also will help to follow the data privacy act.
  • 2. Explainable AI: Explainable AI is a popular domain that is now widely used to explain the algorithms’ behavior during training and prediction. Still, the explainable AI domains face many challenges; however, the implementation of interpretability and explainability clarifies the ML models’ deployment in the real world.
  • 3. Ensemble-based approach: With the advancement of modern technology, we can now capture high resolutions and multidimensional data. While the traditional ML approach might not perform well with high-quality data, a combination of several machine learning models might be an excellent option to handle such high-dimensional data.

9. Conclusions and Potential Remarks

This study reviewed the papers published between 2012–2021 that focused on Machine Learning-based Disease Diagnosis (MLBDD). Researchers are particularly interested in some diseases, such as Heart disease, Breast cancer, Kidney disease, Diabetes, Alzheimer’s, and Parkinson’s diseases, which are discussed considering machine learning/deep learning-based techniques. Additionally, some other ML-based disease diagnosis approaches are discussed as well. Prior to that, A bibliometric analysis was performed, taking into account a variety of parameters such as subject area, publication year, journal, country, and identifying the most prominent contributors in the MLBDD field. According to our bibliometric research, machine learning applications in disease diagnosis have grown at an exponential rate since 2017. In terms of overall number of publications over the years, IEEE Access, Scientific Reports, and the International Journal of advanced computer science and applications are the three most productive journals. The three most-cited publications on MLBDD are those by Motwani et al. (2017), Gray et al. (2013), and Mohan et al. (2019). In terms of overall publications, China, the United States, and India are the three most productive countries. Kim J, the most influential author, published around 20 publications between 2012 and 2021, followed by Wang Y and Li J, who came in second and third place, respectively. Around 40% of the publication are from computer science domains and around 31% from engineering fields, demonstrating their domination in the MLBDD field.

Finally, we have systematically selected 102 papers for in-depth analysis. Our overall findings were highlighted in the discussion sections. Because of its remarkable performance in constructing a robust model, our primary conclusion implies that deep learning is the most popular method for researchers. Despite the fact that deep learning is widely applied in MLBDD fields, the majority of the research lacks sufficient explanations of the final predictions. As a result, future research in MLBDD needs focus on pre and post hoc analysis and model interpretation in order to use the ML model in healthcare.

Physical patient services are increasingly dangerous as a result of the emergence of COVID-19. At the same time, the health-care system must be maintained. While telemedicine and online consultation are becoming more popular, it is still important to consider an alternate strategy that may also highlight the importance of in-person health facilities. Many recent studies recommend home-robot service for patient care rather than hospitalization [ 138 ].

Many countries are increasingly worried about the privacy of patients’ data. Many nations have also raised legal concerns about the ethics of AI and ML when used with real-world patient data [ 139 ]. As a result, rather of depending on data gathering and processing, future study could try producing synthetic data. Some of the techniques that future researchers and practitioners may be interested in to produce synthetic data for the experiment include generative adversarial networks, ADASYN, SMOTE, and SVM-SMOTE.

Cloud systems are becoming potential threats as a result of data storage in it. As a result, any built ML models must safeguard patient access and transaction concerns. Many academics exploited blockchain technology to access and distribute data [ 140 , 141 ]. As a result, blockchain technology paired with deep learning and machine learning might be a promising study subject for constructing safe diagnostic systems.

We anticipate that our review will guide both novice and expert research and practitioners in MLBDD. It would be interesting to see some research work based on the limitations addressed in the discussion and conclusion section. Additionally, future works in MLBDD might focus on multiclass classification with highly imbalanced data along with highly missing data, explanation and Interpretation of multiclass data classification using XAI, and optimizing the big data containing numerical, categorical, and image data.

Author Contributions

Conceptualization—M.M.A.; methodology—M.M.A. and Z.S.; software—M.M.A.; validation—Z.S. and S.A.L.; formal analysis—S.A.L.; investigation—Z.S.; writing—original draft preparation—M.M.A.; writing—review and editing—M.M.A., S.A.L., Z.S.; supervision—Z.S. All authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

  1. (PDF) Machine Learning Algorithms -A Review

    machine learning algorithms research paper

  2. (PDF) A systematic review of the machine learning algorithms for the

    machine learning algorithms research paper

  3. (PDF) Multiple disease prediction using Machine learning algorithms

    machine learning algorithms research paper

  4. (PDF) A Review on Machine Learning Algorithms, Tasks and Applications

    machine learning algorithms research paper

  5. Classification of Machine Learning Models into Five Categories (2023)

    machine learning algorithms research paper

  6. A general framework of the machine learning algorithm.

    machine learning algorithms research paper

VIDEO

  1. Why you should read Research Papers in ML & DL? #machinelearning #deeplearning

  2. Machine Learning Algorithms for Psychological Assessment

  3. Top 10 Machine Learning Algorithms

  4. MLDescent #1: Can Anyone write a Research Paper in the Age of AI?

  5. Machine Learning Lecture 25 "Kernelized algorithms" -Cornell CS4780 SP17

  6. Introduction To Machine Learning by Ahmed Hafez

COMMENTS

  1. Machine Learning: Algorithms, Real-World Applications and Research

    To discuss the applicability of machine learning-based solutions in various real-world application domains. To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services. The rest of the paper is organized as follows.

  2. (PDF) Machine Learning Algorithms -A Review

    In this paper, a brief review and future prospect of the vast applications of machine learning algorithms has been made. Discover the world's research 25+ million members

  3. Machine Learning: Algorithms, Real-World Applications and Research

    The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. " Types of Real-World Data and Machine Learning Techniques ". The popularity of these approaches to learning is increasing day-by-day, which is shown ...

  4. Machine learning-based approach: global trends, research directions

    Since ML appeared in the 1990s, all published documents (i.e., journal papers, reviews, conference papers, preprints, code repositories and more) related to this field from 1990 to 2020 have been selected, and specifically, within the search fields, the following keywords were used: "machine learning" OR "machine learning-based approach" OR ...

  5. A Comparative Analysis of Machine Learning Algorithms for

    On five different datasets, four classification models are compared: Decision tree, SVM, Naive Bayesian, and K-nearest neighbor. The Naive Bayesian algorithm is proven to be the most effective among other algorithms. Keywords: Naive Bayes; K-Nearest Neighbour; Decision Tree; Support Vector Machine; 1.

  6. PDF Machine Learning: Algorithms, Real-World Applications and Research

    Types of Real‐World Data and Machine Learning Techniques. Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as cat-egories of machine learning algorithms.

  7. Machine Learning: Algorithms, Models, and Applications

    View a PDF of the paper titled Machine Learning: Algorithms, Models, and Applications, by Jaydip Sen and 14 other authors ... the current volume presents a few innovative research works and their applications in real world, such as stock trading, medical and healthcare systems, and software automation. The chapters in the book illustrate how ...

  8. Advancing agricultural research using machine learning algorithms

    Advancing agricultural research using machine learning algorithms. Scientific Reports 11, Article number: 17879 ( 2021 ) Cite this article. Rising global population and climate change realities ...

  9. Machine Learning: Algorithms, Real-World Applications and Research

    In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study's key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains ...

  10. Machine Learning: A Review of the Algorithms and Its Applications

    Abstract. In today's world, machine learning has gained much popularity, and its algorithms are employed in every field such as pattern recognition, object detection, text interpretation and different research areas. Machine learning, a part of AI (artificial intelligence), is used in the designing of algorithms based on the recent trends of ...

  11. Machine Learning

    Machine Learning Algorithms, Models and Applications Edited by Jaydip Sen Edited by Jaydip Sen Recent times are witnessing rapid development in machine learning algorithm systems, especially in reinforcement learning, natural language processing, computer and robot vision, image processing, speech, and emotional processing and understanding.

  12. A comprehensive review on ensemble deep learning ...

    With the increased use of advanced machine learning algorithms, the difficulties of training these learning algorithms have led to an increased interest in meta-learning. ... The paper also illustrated the recent trends in ensemble learning using quantitative analysis of several research papers. Moreover, the paper offered various factors that ...

  13. Ethical principles in machine learning and artificial intelligence

    Decision-making on numerous aspects of our daily lives is being outsourced to machine-learning (ML) algorithms and artificial intelligence (AI), motivated by speed and efficiency in the decision ...

  14. Research on Machine Learning and Its Algorithms and Development

    Abstract. This article analyzes the basic classification of machine learning, including supervised learning, unsupervised learning, and reinforcement learning. It combines analysis on common ...

  15. A Comprehensive Review on Machine Learning in Healthcare Industry

    In this paper, we reviewed several machine learning algorithms in healthcare applications. After a comprehensive overview and investigation of supervised and unsupervised machine learning algorithms, we also demonstrated time series tasks based on past values (along with reviewing their feasibility for both small and large datasets).

  16. The use of Machine Learning Algorithms in Recommender Systems

    Moreover, the development of a recommender system using a machine learning algorithm often has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This paper presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender

  17. Applied machine learning in cancer research: A systematic review for

    2. Literature review. The PubMed biomedical repository and the dblp computer science bibliography were selected to perform a literature overview on ML-based studies in cancer towards disease diagnosis, disease outcome prediction and patients' classification. We searched and selected original research journal papers excluding reviews and technical reports between 2016 (January) and 2020 ...

  18. (PDF) Predictive analysis using machine learning: Review of trends and

    Abstract —Artificial Intelligence (AI) has been growing con-. siderably over the last ten years. Machine Learning (ML) is. probably the most popular branch of AI to date. Most systems. that use ...

  19. The Role of Machine Learning in Cybersecurity

    Taken individually, research papers—commonly claiming to outperform previous work—often lead to contradictory results. For instance, Reference shows that deep learning methods outperform "traditional" ML methods, but the ... Typical machine learning algorithms. An algorithm can be "deep" if it relies on neural networks; otherwise ...

  20. Research on machine learning algorithms and feature extraction for time

    This paper aims to use various machine learning algorithms and explore the influence between different algorithms and multi-feature in the time series. The real consumption records constitute the time series as the research object. We extract consumption mark, frequency and other features. Moreover, we utilize support vector machine (SVM), long short-term memory (LSTM) and other algorithms to ...

  21. AutoML: A systematic review on automated machine learning with neural

    In their research, Ziwei Zhang et al. talked about the challenges involved in designing efficient graph-based machine learning models exploring various approaches to automated machine learning on graphs such as graph neural networks, reinforcement learning and evolutionary algorithms [30]. V.

  22. Artificial intelligence, machine learning and deep learning in advanced

    Machine learning is a subset of AI that focuses on training machines to improve their performance on specific tasks by providing them with data and algorithms [124]. Deep learning is a subset of machine learning that involves the use of neural networks to analyze large amounts of data and learn patterns [125]. In the context of robotics taxi ...

  23. A Comparative Analysis of Machine Learning Algorithms to Predict

    A model is a machine learning system that has been trained to identify specific types of patterns using an algorithm in a machine learning system . ... This paper compares different machine learning performances to diagnose Alzheimer's syndrome. ... Journal of Machine Learning Research. 2012; 13:1063-1095. [Google Scholar]

  24. Machine-Learning-Based Disease Diagnosis: A Comprehensive Review

    Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Based on relevant research, this review explains how machine learning (ML) is being used to help in the early identification of numerous diseases.

  25. PDF A Machine Learning Approach for DeepFake Detection

    The goal of this paper is to develop a computer vision al- ... is a Machine Deep Learning algorithm that can capture an input image, assign weight and bias to various characteristics of an image, differentiate objects, and perform less expensive analysis on ... ratio with the hardware available for the present research. A learning rate of 0. ...

  26. PCOS Disease Prediction Using Machine Learning Algorithms

    The study focuses on the development of a robust and clinically applicable predictive model that can aid healthcare professionals in early identification of individuals at risk of PCOS, and explores various machine learning algorithms, including linear regression, decision tree, and random forests. Polycystic Ovary Syndrome (PCOS) is a prevalent endocrine disorder affecting reproductive-aged ...

  27. Comparative Study of Various Machine Learning Algorithms ...

    This research paper presents a comparative study on various machine learning algorithms for sign language detection. The objective of this study is to find the sign language identification method ...